Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-30 Thread Babu.N
Hi Edwin,

Thanks for the response.

Please see inline..


At 05:26 PM 12/29/2008, Török Edwin wrote:
On 2008-12-29 12:53, Babu.N wrote:
  Hi,
 
  I am developing SHIM layer for ClamAV to support Freescale pattern
  matching hardware. Could you please clarify a few queries:
 
  1. Freescale has a pattern matching engine with 64k pattern capacity.
 

How long can the patterns be? Does it support wildcards?
Does it support regular expressions?

Yes.


Is it faster than a quad-core CPU?

We haven't yet taken performance numbers. But it is supposed to be so.


  But clamAV has approx 169000 signatures. This means hardware engine
  will not be able to accomodate all the signatures.

What if you combine N patterns into a single regular expression
(hardware limits allowing).
If there is a match, then you use software to tell which of the N
patterns matched.

After hardware reports a match in a combined 
regex, how can software distinguish which sub-regex actually matched ?

  So we plan to read
  .db  .ndb files line by line  load as many possible signatures in
  hardware pattern table  then let the remaining signatures into
  software data structures.
 

You can try loading type 0, and type 1 patterns into hardware, those are
the most time consuming ones.

  Queries:
   - With the above logic, the signatures in daily.cvd always end
  up in software data structures.Can we assume that daily.cvd file
  contains the currently prevalent signatures ? If so, does it improve
  the performance if we store the daily.cvd signatures in hardware tables ?
   - Is main.cvd organized in such a fashion that prevalent
  signatures are at the top ? If not, the concern is that hardware scan
  hit rate is not as optimal as possible.
 

There is no particular ordering in the .cvd files. I think new
signatures are just added to the bottom.
If your hardware allows regular expressions, load those patterns which
have a very short static subpattern  (2,3,4 bytes).

  2. In clamd signature reloading process, does it always unload the
  current signatures  then reload the fresh signatures ? Even if only
  daily.cvd is updated in the freshclam update ?
 

It loads the new signatures, and the old signatures are freed when the
last thread that was using it
finishes. It always loads all the databases.

I have gone through the function reload_db. It is 
first freeing the existing signatures (cl_free)  
then loading the new signatures ? which code path 
should I follow to understand that old signatures 
are not released till the last thread finishes it's processing ?


Thanks,
Babu.


  3. When the signature database is updated, Feshclam returns 0. Is
  there a way to find whether main.cvd is updated or daily.cvd is
  updated or both ?
 

Yes, you could parse freshclam's logs/stdout, it says one of
main.cvd is up to date, main.cld is up to date, main.cld updated,
main.cvd updated
Similarly for daily.cvd/cld.

Or just use sigtool --info to find out the DB version, and compare with
last.

Best regards,
--Edwin
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-30 Thread G.W. Haywood
Hi there,

On Tue, 30 Dec 2008 T?r?k Edwin wrote:

 On 2008-12-29 12:53, Babu.N wrote:

  3. When the signature database is updated, Feshclam returns 0. Is
  there a way to find whether main.cvd is updated or daily.cvd is
  updated or both ?
 

 Yes, you could parse freshclam's logs/stdout, it says one of
 main.cvd is up to date, main.cld is up to date, main.cld updated,
 main.cvd updated
 Similarly for daily.cvd/cld.

 Or just use sigtool --info to find out the DB version, and compare with
 last.

Check the DNS?

--

73,
Ged.
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] Queries on signature database organization/loading

2008-12-29 Thread Török Edwin
On 2008-12-29 12:53, Babu.N wrote:
 Hi,

 I am developing SHIM layer for ClamAV to support Freescale pattern 
 matching hardware. Could you please clarify a few queries:

 1. Freescale has a pattern matching engine with 64k pattern capacity. 
   

How long can the patterns be? Does it support wildcards?
Does it support regular expressions?

Is it faster than a quad-core CPU?

 But clamAV has approx 169000 signatures. This means hardware engine 
 will not be able to accomodate all the signatures. 

What if you combine N patterns into a single regular expression
(hardware limits allowing).
If there is a match, then you use software to tell which of the N
patterns matched.

 So we plan to read 
 .db  .ndb files line by line  load as many possible signatures in 
 hardware pattern table  then let the remaining signatures into 
 software data structures.
   

You can try loading type 0, and type 1 patterns into hardware, those are
the most time consuming ones.

 Queries:
  - With the above logic, the signatures in daily.cvd always end 
 up in software data structures.Can we assume that daily.cvd file 
 contains the currently prevalent signatures ? If so, does it improve 
 the performance if we store the daily.cvd signatures in hardware tables ?
  - Is main.cvd organized in such a fashion that prevalent 
 signatures are at the top ? If not, the concern is that hardware scan 
 hit rate is not as optimal as possible.
   

There is no particular ordering in the .cvd files. I think new
signatures are just added to the bottom.
If your hardware allows regular expressions, load those patterns which
have a very short static subpattern  (2,3,4 bytes).

 2. In clamd signature reloading process, does it always unload the 
 current signatures  then reload the fresh signatures ? Even if only 
 daily.cvd is updated in the freshclam update ?
   

It loads the new signatures, and the old signatures are freed when the
last thread that was using it
finishes. It always loads all the databases.

 3. When the signature database is updated, Feshclam returns 0. Is 
 there a way to find whether main.cvd is updated or daily.cvd is 
 updated or both ?
   

Yes, you could parse freshclam's logs/stdout, it says one of
main.cvd is up to date, main.cld is up to date, main.cld updated,
main.cvd updated
Similarly for daily.cvd/cld.

Or just use sigtool --info to find out the DB version, and compare with
last.

Best regards,
--Edwin
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net