Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-30 Thread G.W. Haywood
Hi there,

On Tue, 30 Dec 2008 T?r?k Edwin wrote:

> On 2008-12-29 12:53, Babu.N wrote:
>
> > 3. When the signature database is updated, Feshclam returns 0. Is
> > there a way to find whether main.cvd is updated or daily.cvd is
> > updated or both ?
> >
>
> Yes, you could parse freshclam's logs/stdout, it says one of
> "main.cvd is up to date", "main.cld is up to date", "main.cld updated",
> "main.cvd updated"
> Similarly for daily.cvd/cld.
>
> Or just use sigtool --info to find out the DB version, and compare with
> last.

Check the DNS?

--

73,
Ged.
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-30 Thread Török Edwin
On 2008-12-30 13:44, Babu.N wrote:
> Hi Edwin,
>
> Thanks for the response.
>
> Please see inline..
>
>
> At 05:26 PM 12/29/2008, Török Edwin wrote:
>   
>> On 2008-12-29 12:53, Babu.N wrote:
>> 
>>> Hi,
>>>
>>> I am developing SHIM layer for ClamAV to support Freescale pattern
>>> matching hardware. Could you please clarify a few queries:
>>>
>>> 1. Freescale has a pattern matching engine with 64k pattern capacity.
>>>
>>>   
>> How long can the patterns be? Does it support wildcards?
>> Does it support regular expressions?
>> 
>
> Yes.
>
>   

There has to be a limit on the size of a regular expression, or else I
could upload a 2Gb regular expression into it ;)


>>> But clamAV has approx 169000 signatures. This means hardware engine
>>> will not be able to accomodate all the signatures.
>>>   
>> What if you combine N patterns into a single regular expression
>> (hardware limits allowing).
>> If there is a match, then you use software to tell which of the N
>> patterns matched.
>> 
>
> After hardware reports a match in a combined 
> regex, how can software distinguish which sub-regex actually matched ?
>   

By matching with a specialized trie for the candidate sub-regexes.
For example lets assume you combine patterns 1, 74, and 192 into a
single regex for hardware matching.
When the hardware reports a match, in software you only need to try
matching with a trie containing signatures 1, 74, and 192, which should
be very fast.

Keep in mind that in a real situation most files you scan are clean, and
you should get matches only for when the file is infected.
Of course there are also the on-the-fly filetype signatures
(html/pe/sfx), which tend to match quite often.

But you already speed up the situation a lot, if you are able to
determine in hardware that software only needs to match with a trie that
has 4-5 patterns.
Of course those tries should be prebuilt.

Also patterns that are part of logical signatures need special treatment
(you need to count how many times the sub-signatures matched).

> I have gone through the function reload_db. It is 
> first freeing the existing signatures (cl_free) & 
> then loading the new signatures ? which code path 
> should I follow to understand that old signatures 
> are not released till the last thread finishes it's processing ?
>   

cl_engine_free only drops reference count. When refcount is zero, then
it is freed, otherwise it isn't.

Best regards,
--Edwin
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-30 Thread Babu.N
Hi Edwin,

Thanks for the response.

Please see inline..


At 05:26 PM 12/29/2008, Török Edwin wrote:
>On 2008-12-29 12:53, Babu.N wrote:
> > Hi,
> >
> > I am developing SHIM layer for ClamAV to support Freescale pattern
> > matching hardware. Could you please clarify a few queries:
> >
> > 1. Freescale has a pattern matching engine with 64k pattern capacity.
> >
>
>How long can the patterns be? Does it support wildcards?
>Does it support regular expressions?

Yes.


>Is it faster than a quad-core CPU?

We haven't yet taken performance numbers. But it is supposed to be so.


> > But clamAV has approx 169000 signatures. This means hardware engine
> > will not be able to accomodate all the signatures.
>
>What if you combine N patterns into a single regular expression
>(hardware limits allowing).
>If there is a match, then you use software to tell which of the N
>patterns matched.

After hardware reports a match in a combined 
regex, how can software distinguish which sub-regex actually matched ?

> > So we plan to read
> > .db & .ndb files line by line & load as many possible signatures in
> > hardware pattern table & then let the remaining signatures into
> > software data structures.
> >
>
>You can try loading type 0, and type 1 patterns into hardware, those are
>the most time consuming ones.
>
> > Queries:
> >  - With the above logic, the signatures in daily.cvd always end
> > up in software data structures.Can we assume that daily.cvd file
> > contains the currently prevalent signatures ? If so, does it improve
> > the performance if we store the daily.cvd signatures in hardware tables ?
> >  - Is main.cvd organized in such a fashion that prevalent
> > signatures are at the top ? If not, the concern is that hardware scan
> > hit rate is not as optimal as possible.
> >
>
>There is no particular ordering in the .cvd files. I think new
>signatures are just added to the bottom.
>If your hardware allows regular expressions, load those patterns which
>have a very short static subpattern  (2,3,4 bytes).
>
> > 2. In clamd signature reloading process, does it always unload the
> > current signatures & then reload the fresh signatures ? Even if only
> > daily.cvd is updated in the freshclam update ?
> >
>
>It loads the new signatures, and the old signatures are freed when the
>last thread that was using it
>finishes. It always loads all the databases.

I have gone through the function reload_db. It is 
first freeing the existing signatures (cl_free) & 
then loading the new signatures ? which code path 
should I follow to understand that old signatures 
are not released till the last thread finishes it's processing ?


Thanks,
Babu.


> > 3. When the signature database is updated, Feshclam returns 0. Is
> > there a way to find whether main.cvd is updated or daily.cvd is
> > updated or both ?
> >
>
>Yes, you could parse freshclam's logs/stdout, it says one of
>"main.cvd is up to date", "main.cld is up to date", "main.cld updated",
>"main.cvd updated"
>Similarly for daily.cvd/cld.
>
>Or just use sigtool --info to find out the DB version, and compare with
>last.
>
>Best regards,
>--Edwin
>___
>http://lurker.clamav.net/list/clamav-devel.html
>Please submit your patches to our Bugzilla: http://bugs.clamav.net

___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net