[Clamav-devel] ClamAV Scanning Algorithm
Hi, I am newbie to ClamAV and want to know what is the scanning algorithm currently used by ClamAV. I would appreciate it if somebody guides me to the best place (may be an article or source code file) that talks about that. I read somewhere that it uses aho-corasick algorithm; so is it still using it? Thanks ~Moe ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
Thomasz Blaszczyk in message 'Re: [Clamav-devel] clamAV scanning algorithm' wrote: if you switch ClamAV to use only AC, you'll notice a significant performance improvement, at the expense of increased memory usage for the DB. Right, AC trees are quite large and takes lot of memory.. So BM is only used to save memory? I guess, it was implemented first and some people still feel sentiment to this algorithm..:) Since AC works faster and handles wildcards... If I remember correctly it was just on the contrary, AC was implemented first (version I recall was 0.67, I haven't looked at a code of earlier versions except some truly historical stuff like 0.11 or 0.15) and later because of memory problems Edwin mentioned at least a few times BM was added to resolve those issues. cheers, -- main(int a[puts(Michal 'GiM' Spadlinski)]){} signature.asc Description: Digital signature ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
ok, it seems that limits.maxfilesize limits to 10MB, but I am able to scan up to 25MB files. see below: (when I scan 30MB file the data scanned is 0, Why is like that? and I am able to scan nearly 25MB) Every byte in sample file is 'B8' ls -l total 60656 -rw-r--r-- 1 root root 1600 Dec 17 16:08 16MB -rw-r--r-- 1 root root 200 Dec 17 16:07 2MB -rw-r--r-- 1 root root 3200 Dec 17 16:08 32MB -rw-r--r-- 1 root root 400 Dec 17 16:08 4MB -rw-r--r-- 1 root root 800 Dec 17 16:08 8MB -rw-r--r-- 1 root root 27 Dec 17 12:41 database_sig.ndb drwx-- 2 root root16384 Dec 17 11:58 lost+found -rw-r--r-- 1 root root0 Dec 17 16:38 testbed cat database.ndb (only one signature) TinyVirus:0:*:B89A02C3 2MB: OK --- SCAN SUMMARY --- Known viruses: 1 Engine version: 0.94.1 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 1.91 MB Time: 0.077 sec (0 m 0 s) 2MB: OK --- SCAN SUMMARY --- Known viruses: 1 Engine version: 0.94.1 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 7.63 MB Time: 0.309 sec (0 m 0 s) 8MB: OK --- SCAN SUMMARY --- Known viruses: 1 Engine version: 0.94.1 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 15.26 MB Time: 0.582 sec (0 m 0 s) 16MB: OK --- SCAN SUMMARY --- Known viruses: 1 Engine version: 0.94.1 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 24.79 MB Time: 0.995 sec (0 m 0 s) 25MB: OK ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
On 2008-12-17 18:37, Thomasz Blaszczyk wrote: ok, it seems that limits.maxfilesize limits to 10MB, but I am able to scan up to 25MB files. see below: (when I scan 30MB file the data scanned is 0, Why is like that? and I am able to scan nearly 25MB) Read the archives of -users. This question has been repeatedly raised there. --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
On 2008-12-17 20:27, Thomasz Blaszczyk wrote: I just got first results here, http://omploader.org/vMTExNA What do you think about them? What kind of data was scanned? Was it hand-crafted, automatically generated, or real world files? What is the confidence of the values you measured? (I don't see if you've repeated the experiment or not, there is no standard deviation, or any other statistical indicator). Just wonder about BM, it is not efficient as AC, cannot see point why BM is used for static signatures. AC works here better... or am I missing something, I've already answered this: if you switch ClamAV to use only AC, you'll notice a significant performance improvement, at the expense of increased memory usage for the DB. I copied 20 first signature from main.ndb, and use them for performace measurements. Results from benchmarks with such a low signature count will be useless in practice. Hint: larger tries don't fit in L2, and also produce a lot of DTLB misses Best regards, --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
What kind of data was scanned? Was it hand-crafted, automatically generated, or real world files? I create files by calling in loop function: fputc('my_byte') i.e: file_builder -n sizeoffile -xB8 So entire file consists of bytes 'B8' and I create 2MB, 4MB file, up to 60MB files What is the confidence of the values you measured? (I don't see if you've repeated the experiment or not, there is no standard deviation, or any other statistical indicator). Right there is some deviation, (I repeat measurement 3 times) I take average, but I will repeat measurments again and calculate deviation. Just wonder about BM, it is not efficient as AC, cannot see point why BM is used for static signatures. AC works here better... or am I missing something, I've already answered this: if you switch ClamAV to use only AC, you'll notice a significant performance improvement, at the expense of increased memory usage for the DB. Right, AC trees are quite large and takes lot of memory.. So BM is only used to save memory? I guess, it was implemented first and some people still feel sentiment to this algorithm..:) Since AC works faster and handles wildcards... I copied 20 first signature from main.ndb, and use them for performace measurements. Results from benchmarks with such a low signature count will be useless in practice. Hint: larger tries don't fit in L2, and also produce a lot of DTLB misses Thanks for hints Regards, Tom Best regards, --Edwin -- Stay Hungry. Stay Foolish. Steve Jobs ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
On 2008-12-17 21:28, Thomasz Blaszczyk wrote: What kind of data was scanned? Was it hand-crafted, automatically generated, or real world files? I create files by calling in loop function: fputc('my_byte') i.e: file_builder -n sizeoffile -xB8 So entire file consists of bytes 'B8' and I create 2MB, 4MB file, up to 60MB files You might want to scan something resembling a real world file, and I'm not saying to use /dev/urandom instead of B8. I can think of a much more efficient algorithm to match on B8 bytes... Best regards, --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
You might want to scan something resembling a real world file, and I'm not saying to use /dev/urandom instead of B8. I can think of a much more efficient algorithm to match on B8 bytes... Ohh, yes, there will be several test cases, B8 bytes is only one There will be also test case upon DNA sequence scanning :) Cheers! ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
On 2008-12-17 18:12, Thomasz Blaszczyk wrote: Hi, I have notice kind of limitation in ClamAV. When time of scanning one file is longer than 1 sec, the entire file scan is droped. There is no such limitation in ClamAV. Best regards, --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
Thanks Joseph for answer, The quote appears too restrictive - as I found that the file can be longer, as long as it starts with the Eicar. Any anti-virus product that supports the EICAR test file should detect it in any file providing that the file starts with the following 68 characters, and is exactly 68 bytes long Best Regards, Joseph Benden ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
On 2008-12-04 00:41, Thomasz Blaszczyk wrote: Thank you for reply, Török Edwin, Very, very good web seminar! Thanks I have 2 more questions: 1) I'd like to measure compare performance of AC BM algorithms. clamscan displays in 'scan summary' a 'time'. Does this time include disc access, signature tree building in AC(phase1) or BM Just wonder If I can use this time or I should figure out new timestamps. It includes all of the above: it is the time from the launch of clamscan (after options are parsed), till the scan is complete. Best regards, --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
Thank you for reply, Török Edwin, Very, very good web seminar! I have 2 more questions: 1) I'd like to measure compare performance of AC BM algorithms. clamscan displays in 'scan summary' a 'time'. Does this time include disc access, signature tree building in AC(phase1) or BM Just wonder If I can use this time or I should figure out new timestamps. Time: 2.189 sec (0 m 2 s) 2) I've downloaded Eicar Test Anti-Virus File and crated 10bytes file. (See logs below) Then I've appended Eicar to this file. Why clamscan doesn't find a signature in this file? LOGS: 1. Creating 10bytes file [EMAIL PROTECTED] ~/projects/aau/virus_scanner/clamav-0.94.1/database $ time dd if=/dev/urandom of=../../testbox/new10bytes.com bs=10 count=1 1+0 records in 1+0 records out 10 bytes (10 B) copied, 4.8609e-05 s, 206 kB/s real0m0.001s user0m0.000s sys 0m0.000s 2. Testbox folder contains: [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ ls -l total 8 -rw-r--r-- 1 tomb tomb 68 Dec 3 22:26 eicar.com -rw-r--r-- 1 tomb tomb 10 Dec 3 22:27 new10bytes.com [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump eicar.com 000 3558 214f 2550 4140 5b50 5c34 5a50 3558 010 2834 5e50 3729 4343 3729 247d 4945 4143 020 2d52 5453 4e41 4144 4452 412d 544e 5649 030 5249 5355 542d 5345 2d54 4946 454c 2421 040 2b48 2a48 044 [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump new10bytes.com 000 05b6 1256 0057 d6b2 9740 00a 3. 68bytes of Eicar has been appended to the end of random generated new10bytes.com [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ cat eicar.com new10bytes.com [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ hexdump new10bytes.com 000 05b6 1256 0057 d6b2 9740 3558 214f 2550 010 4140 5b50 5c34 5a50 3558 2834 5e50 3729 020 4343 3729 247d 4945 4143 2d52 5453 4e41 030 4144 4452 412d 544e 5649 5249 5355 542d 040 5345 2d54 4946 454c 2421 2b48 2a48 04e 4. Why signature is not found in this file? [EMAIL PROTECTED] ~/projects/aau/virus_scanner/testbox $ clamscan new10bytes.com new10bytes.com: OK --- SCAN SUMMARY --- Known viruses: 455125 Engine version: 0.94.1 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 0.00 MB Time: 2.194 sec (0 m 2 s) --- Thanks in advance, Tom ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] clamAV scanning algorithm
See: http://www.eicar.org/anti_virus_test_file.htm Specifically: Any anti-virus product that supports the EICAR test file should detect it in any file providing that the file starts with the following 68 characters, and is exactly 68 bytes long Best Regards, Joseph Benden .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/ http://www.ThrallingPenguin.com/ We design, develop, and extend software technologies for the most demanding business applications, as well as offer VoIP Consulting services. On Dec 3, 2008, at 5:41 PM, Thomasz Blaszczyk wrote: 2) I've downloaded Eicar Test Anti-Virus File and crated 10bytes file. (See logs below) Then I've appended Eicar to this file. Why clamscan doesn't find a signature in this file? ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
[Clamav-devel] clamAV scanning algorithm
Hi, I am new to CLAMAV I am just wonder how files are scanned. Does it work like: 1. PE section is taken from file to be scanned 2. MD5 is calculated 3. That MD5 is compared to all signatures in ClamAV Database 4. If match virus is found. I have simplified this. But please let me know if I am right in above steps for scanning files. Regards, Tom ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net