Re: [Clamav-devel] Queries on signature database organization/loading
Hi there, On Tue, 30 Dec 2008 T?r?k Edwin wrote: > On 2008-12-29 12:53, Babu.N wrote: > > > 3. When the signature database is updated, Feshclam returns 0. Is > > there a way to find whether main.cvd is updated or daily.cvd is > > updated or both ? > > > > Yes, you could parse freshclam's logs/stdout, it says one of > "main.cvd is up to date", "main.cld is up to date", "main.cld updated", > "main.cvd updated" > Similarly for daily.cvd/cld. > > Or just use sigtool --info to find out the DB version, and compare with > last. Check the DNS? -- 73, Ged. ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] Queries on signature database organization/loading
On 2008-12-30 13:44, Babu.N wrote: > Hi Edwin, > > Thanks for the response. > > Please see inline.. > > > At 05:26 PM 12/29/2008, Török Edwin wrote: > >> On 2008-12-29 12:53, Babu.N wrote: >> >>> Hi, >>> >>> I am developing SHIM layer for ClamAV to support Freescale pattern >>> matching hardware. Could you please clarify a few queries: >>> >>> 1. Freescale has a pattern matching engine with 64k pattern capacity. >>> >>> >> How long can the patterns be? Does it support wildcards? >> Does it support regular expressions? >> > > Yes. > > There has to be a limit on the size of a regular expression, or else I could upload a 2Gb regular expression into it ;) >>> But clamAV has approx 169000 signatures. This means hardware engine >>> will not be able to accomodate all the signatures. >>> >> What if you combine N patterns into a single regular expression >> (hardware limits allowing). >> If there is a match, then you use software to tell which of the N >> patterns matched. >> > > After hardware reports a match in a combined > regex, how can software distinguish which sub-regex actually matched ? > By matching with a specialized trie for the candidate sub-regexes. For example lets assume you combine patterns 1, 74, and 192 into a single regex for hardware matching. When the hardware reports a match, in software you only need to try matching with a trie containing signatures 1, 74, and 192, which should be very fast. Keep in mind that in a real situation most files you scan are clean, and you should get matches only for when the file is infected. Of course there are also the on-the-fly filetype signatures (html/pe/sfx), which tend to match quite often. But you already speed up the situation a lot, if you are able to determine in hardware that software only needs to match with a trie that has 4-5 patterns. Of course those tries should be prebuilt. Also patterns that are part of logical signatures need special treatment (you need to count how many times the sub-signatures matched). > I have gone through the function reload_db. It is > first freeing the existing signatures (cl_free) & > then loading the new signatures ? which code path > should I follow to understand that old signatures > are not released till the last thread finishes it's processing ? > cl_engine_free only drops reference count. When refcount is zero, then it is freed, otherwise it isn't. Best regards, --Edwin ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: [Clamav-devel] Queries on signature database organization/loading
Hi Edwin, Thanks for the response. Please see inline.. At 05:26 PM 12/29/2008, Török Edwin wrote: >On 2008-12-29 12:53, Babu.N wrote: > > Hi, > > > > I am developing SHIM layer for ClamAV to support Freescale pattern > > matching hardware. Could you please clarify a few queries: > > > > 1. Freescale has a pattern matching engine with 64k pattern capacity. > > > >How long can the patterns be? Does it support wildcards? >Does it support regular expressions? Yes. >Is it faster than a quad-core CPU? We haven't yet taken performance numbers. But it is supposed to be so. > > But clamAV has approx 169000 signatures. This means hardware engine > > will not be able to accomodate all the signatures. > >What if you combine N patterns into a single regular expression >(hardware limits allowing). >If there is a match, then you use software to tell which of the N >patterns matched. After hardware reports a match in a combined regex, how can software distinguish which sub-regex actually matched ? > > So we plan to read > > .db & .ndb files line by line & load as many possible signatures in > > hardware pattern table & then let the remaining signatures into > > software data structures. > > > >You can try loading type 0, and type 1 patterns into hardware, those are >the most time consuming ones. > > > Queries: > > - With the above logic, the signatures in daily.cvd always end > > up in software data structures.Can we assume that daily.cvd file > > contains the currently prevalent signatures ? If so, does it improve > > the performance if we store the daily.cvd signatures in hardware tables ? > > - Is main.cvd organized in such a fashion that prevalent > > signatures are at the top ? If not, the concern is that hardware scan > > hit rate is not as optimal as possible. > > > >There is no particular ordering in the .cvd files. I think new >signatures are just added to the bottom. >If your hardware allows regular expressions, load those patterns which >have a very short static subpattern (2,3,4 bytes). > > > 2. In clamd signature reloading process, does it always unload the > > current signatures & then reload the fresh signatures ? Even if only > > daily.cvd is updated in the freshclam update ? > > > >It loads the new signatures, and the old signatures are freed when the >last thread that was using it >finishes. It always loads all the databases. I have gone through the function reload_db. It is first freeing the existing signatures (cl_free) & then loading the new signatures ? which code path should I follow to understand that old signatures are not released till the last thread finishes it's processing ? Thanks, Babu. > > 3. When the signature database is updated, Feshclam returns 0. Is > > there a way to find whether main.cvd is updated or daily.cvd is > > updated or both ? > > > >Yes, you could parse freshclam's logs/stdout, it says one of >"main.cvd is up to date", "main.cld is up to date", "main.cld updated", >"main.cvd updated" >Similarly for daily.cvd/cld. > >Or just use sigtool --info to find out the DB version, and compare with >last. > >Best regards, >--Edwin >___ >http://lurker.clamav.net/list/clamav-devel.html >Please submit your patches to our Bugzilla: http://bugs.clamav.net ___ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net