[Clamav-devel] clamAV scanning algorithm

2008-12-02 Thread Thomasz Blaszczyk
 Hi,

 I am new to CLAMAV  I am just wonder how files are scanned.

 Does it work like:
 1. PE section is taken from file to be scanned
 2. MD5 is calculated
 3. That MD5 is compared to all signatures in ClamAV Database
 4. If match virus is found.

 I have simplified this. But please let me know if I am right in above
 steps for scanning files.

 Regards,
 Tom
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] confirm 878cdf1f1ee11bbfe4f147caa216e145422ff8a2

2008-12-02 Thread Thomasz Blaszczyk
Hi,

I am new to CLAMAV  I am just wonder how files are scanned.

Does it work like:
1. PE section is taken from file to be scanned
2. MD5 is calculated
3. That MD5 is compared to all signatures in ClamAV Database
4. If match virus is found.

I have simplified this. But please let me know if I am right in above
steps for scanning files.

Regards,
Tom
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] clamAV scanning algorithm

2008-12-03 Thread Thomasz Blaszczyk
Thank you for reply,

Török Edwin, Very, very good web seminar!

I have 2 more questions:

1) I'd like to measure  compare performance of AC  BM algorithms.

clamscan displays in 'scan summary' a 'time'. Does this time include
disc access, signature tree building in AC(phase1) or BM
Just wonder If I can use this time or I should figure out new timestamps.

Time: 2.189 sec (0 m 2 s)

2) I've downloaded Eicar Test Anti-Virus File and crated 10bytes file.
(See logs below) Then I've appended Eicar to this file. Why clamscan
doesn't find a signature in this file?



LOGS:
1. Creating 10bytes file

[EMAIL PROTECTED]
~/projects/aau/virus_scanner/clamav-0.94.1/database $ time dd
if=/dev/urandom of=../../testbox/new10bytes.com bs=10 count=1
1+0 records in
1+0 records out
10 bytes (10 B) copied, 4.8609e-05 s, 206 kB/s

real0m0.001s
user0m0.000s
sys 0m0.000s

2. Testbox folder contains:

[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ ls -l
total 8
-rw-r--r-- 1 tomb tomb 68 Dec  3 22:26 eicar.com
-rw-r--r-- 1 tomb tomb 10 Dec  3 22:27 new10bytes.com
[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump eicar.com
000 3558 214f 2550 4140 5b50 5c34 5a50 3558
010 2834 5e50 3729 4343 3729 247d 4945 4143
020 2d52 5453 4e41 4144 4452 412d 544e 5649
030 5249 5355 542d 5345 2d54 4946 454c 2421
040 2b48 2a48
044
[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump
new10bytes.com
000 05b6 1256 0057 d6b2 9740
00a

3.
 68bytes of  Eicar has been appended to the end of random generated
new10bytes.com

[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ cat
eicar.com  new10bytes.com
[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump
new10bytes.com
000 05b6 1256 0057 d6b2 9740 3558 214f 2550
010 4140 5b50 5c34 5a50 3558 2834 5e50 3729
020 4343 3729 247d 4945 4143 2d52 5453 4e41
030 4144 4452 412d 544e 5649 5249 5355 542d
040 5345 2d54 4946 454c 2421 2b48 2a48
04e

4.
Why signature is not found in this file?

[EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ clamscan
new10bytes.com

new10bytes.com: OK

--- SCAN SUMMARY ---
Known viruses: 455125
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Time: 2.194 sec (0 m 2 s)

---
Thanks in advance,
Tom
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] clamAV scanning algorithm

2008-12-06 Thread Thomasz Blaszczyk
Thanks Joseph for answer,

The quote appears too restrictive - as I found that the file can be
longer, as long as it starts with the Eicar.

 Any anti-virus product that supports the EICAR test file should
 detect it in any file providing that the file starts with the
 following 68 characters, and is exactly 68 bytes long

 Best Regards,
 Joseph Benden
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
And there is also 'groot'.

Tom

On Tue, Dec 9, 2008 at 4:51 PM, Thomasz Blaszczyk [EMAIL PROTECTED] wrote:
 Thank you for answer,

 I have another question. I cannot figure out meaning for ftonly and troot.
 Can I get some explanation for this 2 variables?

 They are used in matcher.c [code snipped]:

 if(!ftonly  (ret = cli_ac_initdata(gdata, groot-ac_partsigs,
 groot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
return ret;

if(troot) {
if((ret = cli_ac_initdata(tdata, troot-ac_partsigs,
 troot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
return ret;
}

 Thanks in advance.
 Tom

___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
Thank you for fast reply,

Sorry for bothering you again. I am missing something in this huge project.
I cannot understand why both functions: cli_ac_scanbuff and
cli_ac_scanbuff are called in one cli_scandesc() function call.
I just have one signature in database and for me it was obvious that
file will be scanned once using AC or BM. But both algorithms are
used. Have a look below:

Here is output:
--output-
groot-maxpatlen: 24

troot-ac_only IN TROOT!!!1
cli_ac_scanbuff_function_callroot-ac_root6488480
RET IN TROOT!!!0

groot-ac_only IN GROOT!!!0
cli_bm_scanbuff_function_callroot6346288
RET IN GROOT!!!1
end_output-
from this code:
code
if(troot) {printf(\ntroot-ac_only IN TROOT!!!%d \n,troot-ac_only);
if(troot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, troot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
troot,
tdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN TROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
free(buffer);
if(!ftonly)
cli_ac_freedata(gdata);
  --cut---
else
return CL_VIRUS;
}
}

if(!ftonly) {printf(\ngroot-ac_only IN GROOT!!!%d \n,groot-ac_only);
if(groot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, groot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
groot,
gdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN GROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
free(buffer);
cli_ac_freedata(gdata);
--cut---

--end_code--

Maybe there is something magic with groot  troot but they are just
pointing to cli_matcher struct.

struct cli_matcher *groot = NULL, *troot = NULL;


struct cli_matcher {
/* Extended Boyer-Moore */
uint8_t *bm_shift;
struct cli_bm_patt **bm_suffix;
struct hashset md5_sizes_hs;
uint32_t *soff, soff_len; /* for PE section sigs */
uint32_t bm_patterns;

/* Extended Aho-Corasick */
uint32_t ac_partsigs, ac_nodes, ac_patterns, ac_lsigs;
struct cli_ac_lsig **ac_lsigtable;
struct cli_ac_node *ac_root, **ac_nodetable;
struct cli_ac_patt **ac_pattable;
uint8_t ac_mindepth, ac_maxdepth;

uint16_t maxpatlen;
uint8_t ac_only;
};

Am I missing something?
Best Regards,
Tom


On Tue, Dec 9, 2008 at 5:00 PM, Török Edwin [EMAIL PROTECTED] wrote:
 On 2008-12-09 18:51, Thomasz Blaszczyk wrote:
 Thank you for answer,

 I have another question. I cannot figure out meaning for ftonly and troot.
 Can I get some explanation for this 2 variables?

 They are used in matcher.c [code snipped]:

 if(!ftonly  (ret = cli_ac_initdata(gdata, groot-ac_partsigs,
 groot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
   return ret;


 ft stands for filetype.

 if(troot) {
   if((ret = cli_ac_initdata(tdata, troot-ac_partsigs,
 troot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
   return ret;
 }

 Look at signatures.pdf again, in the .ndb format each pattern has a
 TargetType field, hence a different trie is used for each type.

 As for groot, there is a comment explaining what it is:
 groot = ctx-engine-root[0]; /* generic signatures */


 Best regards,
 --Edwin
 ___
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
Another thing,

If I force troot-ac_only=0

if(troot) {troot-ac_only=0;printf(\ntroot-ac_only IN TROOT!!!%d
\n,troot-ac_only);
if(troot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, troot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL,
troot, tdata, offset, ftype, desc, ftoffset, acmode, NULL);
  printf(\nRET IN TROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
cut

It doesnt change anything, I get the same results for troot-ac_only=1:

-output
troot-ac_only IN TROOT!!!0
cli_ac_scanbuffroot-ac_root6488480
RET IN TROOT!!!0

groot-ac_only IN GROOT!!!0
cli_bm_scanbuffroot6346288
RET IN GROOT!!!1

Inside cli_scanraw function After inst. RET: 1 |
../test/vir/tinyVirus: TinyVirus.UNOFFICIAL FOUND
cl_scandesc function has been called. RET:1
---Extended SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 1
Aco-Corasick: ENABLEBoyer-Moore: DISABLEData scanned: 0.00 MB
Time: 0.004 sec (0 m 0 s)
--end-output--


If I force groot-ac_only to 1 I got 'segmet fault' (seg fault is
another funciotn - cli_scanraw)

---code---
if(!ftonly) {(groot-ac_only=1; printf(\ngroot-ac_only IN GROOT!!!%d
\n,groot-ac_only);
if(groot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, groot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
groot,
gdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN GROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
---end-code---

Does anybody tried to change this flags? Swap scanning algorithms?

Thanks in advance,
Tom
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] clamAV scanning algorithm

2008-12-17 Thread Thomasz Blaszczyk
ok, it seems that limits.maxfilesize limits to 10MB, but I am able to
scan up to 25MB files. see below:
(when I scan 30MB file the data scanned is 0, Why is like that? and I
am able to scan nearly 25MB)
Every byte in sample file is 'B8'

ls -l
 total 60656
-rw-r--r-- 1 root root 1600 Dec 17 16:08 16MB
-rw-r--r-- 1 root root  200 Dec 17 16:07 2MB
-rw-r--r-- 1 root root 3200 Dec 17 16:08 32MB
-rw-r--r-- 1 root root  400 Dec 17 16:08 4MB
-rw-r--r-- 1 root root  800 Dec 17 16:08 8MB
-rw-r--r-- 1 root root   27 Dec 17 12:41 database_sig.ndb
drwx-- 2 root root16384 Dec 17 11:58 lost+found
-rw-r--r-- 1 root root0 Dec 17 16:38 testbed
cat database.ndb   (only one signature)
TinyVirus:0:*:B89A02C3
2MB: OK

--- SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 1.91 MB
Time: 0.077 sec (0 m 0 s)
2MB: OK

--- SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 7.63 MB
Time: 0.309 sec (0 m 0 s)
8MB: OK

--- SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 15.26 MB
Time: 0.582 sec (0 m 0 s)
16MB: OK

--- SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 24.79 MB
Time: 0.995 sec (0 m 0 s)
25MB: OK
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] clamAV scanning algorithm

2008-12-17 Thread Thomasz Blaszczyk
 What kind of data was scanned?
 Was it hand-crafted, automatically generated, or real world files?

I create files by calling in loop function: fputc('my_byte')
i.e:
file_builder -n sizeoffile -xB8

So entire file consists of bytes 'B8'   and I create 2MB, 4MB file, up
to 60MB files

 What is the confidence of the values you measured?
 (I don't see if you've repeated the experiment or not, there is no
 standard deviation, or any other statistical indicator).
Right there is some deviation, (I repeat measurement 3 times) I take
average, but I will repeat measurments again and calculate deviation.


 Just wonder about BM, it is not efficient as AC, cannot see point why
 BM is used for static signatures. AC works here better...

 or am I missing something,


 I've already answered this:

 if you switch ClamAV to use only AC, you'll notice a significant
 performance improvement, at the expense of increased memory usage for
 the DB.
Right, AC trees are quite large and takes lot of memory..
So BM is only used to save memory? I guess, it was implemented first
and some people still feel sentiment to this algorithm..:)
Since AC works faster and handles wildcards...


 I copied 20 first signature from main.ndb, and use them for performace
 measurements.


 Results from benchmarks with such a low signature count will be useless
 in practice.

 Hint: larger tries don't fit in L2, and also produce a lot of DTLB misses

Thanks for hints
Regards,
Tom

 Best regards,
 --Edwin



-- 
Stay Hungry. Stay Foolish. Steve Jobs
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] clamAV scanning algorithm

2008-12-17 Thread Thomasz Blaszczyk
 You might want to scan something resembling a real world file, and I'm
 not saying to use /dev/urandom instead of B8.
 I can think of a much more efficient algorithm to match on B8 bytes...

Ohh, yes, there will be several test cases, B8 bytes is only one
There will be also test case upon DNA sequence scanning  :)

Cheers!
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net