Re: [Clamav-devel] build debugging ex1.c

2008-12-16 Thread Török Edwin
On 2008-12-16 01:13, Thomasz Blaszczyk wrote:
 Hello,

 I just reviewed few multi-pattern string scanning algorithms.
 And there are many variants for multi-pattern for Boyer-Moore.
 I am curious if the one implemented in Clamav is Boyer-Moore-Horspool
 or the one taken from authors of GLIMPSE or Set-wise Boyer-Moore? or
 AC_BM proposed by Silicon Defence?
 Any hints?
   

It is not AC_BM.
Hint: it uses a hash.


   
 You can't get a BMOnly, because some signatures *require* the AC
 matcher, such as any signature containing wildcards.
 
 OK, all signatures without wildcards(static signatures) are used by BM
 multi-pattern.
 The one with wildcards are used by AC.

Read cli_parse_add(). Yes, any of the wildcards listed in signatures.pdf
cause the signature to be loaded in the AC trie.

  Is that rule? Or there are
 other factors as well?
   

But signatures for certain filetypes are always loaded in the AC trie
even if they are static, see cli_mtargets.

Have a look at the --debug output, it tells you how many signatures have
been loaded in which trie, and for which filetype.

 How much BM slow down scanner with signatures with wildcards? Did
 someone perform such analysis?

I don't think wildcards were ever implemented in matcher-bm.c, so I
don't know.
BM is good for longer signatures, and it uses less memory than AC.
However if you switch ClamAV to use only AC, you'll notice a significant
performance improvement, at the expense of increased memory usage for
the DB.
Since BM is slower implementing wildcard support doesn't provide any
benefits.

P.S.: If you're running on multicore you can use multiscan to use all of
your cores: clamdscan -m

Some more things to watch out for when benchmarking:
- performance depends very much on what kind of data you scan
(executables, mails, small files, large files)
- there are scalability issues with the Linux kernel (google for
mmap_sem and ClamAV)
- you should have fast disks, so that you're sure you're benchmarking
ClamAV and not your I/O system

Best regards,
--Edwin
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
And there is also 'groot'.

Tom

On Tue, Dec 9, 2008 at 4:51 PM, Thomasz Blaszczyk [EMAIL PROTECTED] wrote:
 Thank you for answer,

 I have another question. I cannot figure out meaning for ftonly and troot.
 Can I get some explanation for this 2 variables?

 They are used in matcher.c [code snipped]:

 if(!ftonly  (ret = cli_ac_initdata(gdata, groot-ac_partsigs,
 groot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
return ret;

if(troot) {
if((ret = cli_ac_initdata(tdata, troot-ac_partsigs,
 troot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
return ret;
}

 Thanks in advance.
 Tom

___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
Thank you for fast reply,

Sorry for bothering you again. I am missing something in this huge project.
I cannot understand why both functions: cli_ac_scanbuff and
cli_ac_scanbuff are called in one cli_scandesc() function call.
I just have one signature in database and for me it was obvious that
file will be scanned once using AC or BM. But both algorithms are
used. Have a look below:

Here is output:
--output-
groot-maxpatlen: 24

troot-ac_only IN TROOT!!!1
cli_ac_scanbuff_function_callroot-ac_root6488480
RET IN TROOT!!!0

groot-ac_only IN GROOT!!!0
cli_bm_scanbuff_function_callroot6346288
RET IN GROOT!!!1
end_output-
from this code:
code
if(troot) {printf(\ntroot-ac_only IN TROOT!!!%d \n,troot-ac_only);
if(troot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, troot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
troot,
tdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN TROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
free(buffer);
if(!ftonly)
cli_ac_freedata(gdata);
  --cut---
else
return CL_VIRUS;
}
}

if(!ftonly) {printf(\ngroot-ac_only IN GROOT!!!%d \n,groot-ac_only);
if(groot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, groot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
groot,
gdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN GROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
free(buffer);
cli_ac_freedata(gdata);
--cut---

--end_code--

Maybe there is something magic with groot  troot but they are just
pointing to cli_matcher struct.

struct cli_matcher *groot = NULL, *troot = NULL;


struct cli_matcher {
/* Extended Boyer-Moore */
uint8_t *bm_shift;
struct cli_bm_patt **bm_suffix;
struct hashset md5_sizes_hs;
uint32_t *soff, soff_len; /* for PE section sigs */
uint32_t bm_patterns;

/* Extended Aho-Corasick */
uint32_t ac_partsigs, ac_nodes, ac_patterns, ac_lsigs;
struct cli_ac_lsig **ac_lsigtable;
struct cli_ac_node *ac_root, **ac_nodetable;
struct cli_ac_patt **ac_pattable;
uint8_t ac_mindepth, ac_maxdepth;

uint16_t maxpatlen;
uint8_t ac_only;
};

Am I missing something?
Best Regards,
Tom


On Tue, Dec 9, 2008 at 5:00 PM, Török Edwin [EMAIL PROTECTED] wrote:
 On 2008-12-09 18:51, Thomasz Blaszczyk wrote:
 Thank you for answer,

 I have another question. I cannot figure out meaning for ftonly and troot.
 Can I get some explanation for this 2 variables?

 They are used in matcher.c [code snipped]:

 if(!ftonly  (ret = cli_ac_initdata(gdata, groot-ac_partsigs,
 groot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
   return ret;


 ft stands for filetype.

 if(troot) {
   if((ret = cli_ac_initdata(tdata, troot-ac_partsigs,
 troot-ac_lsigs, AC_DEFAULT_TRACKLEN)))
   return ret;
 }

 Look at signatures.pdf again, in the .ndb format each pattern has a
 TargetType field, hence a different trie is used for each type.

 As for groot, there is a comment explaining what it is:
 groot = ctx-engine-root[0]; /* generic signatures */


 Best regards,
 --Edwin
 ___
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Thomasz Blaszczyk
Another thing,

If I force troot-ac_only=0

if(troot) {troot-ac_only=0;printf(\ntroot-ac_only IN TROOT!!!%d
\n,troot-ac_only);
if(troot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, troot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL,
troot, tdata, offset, ftype, desc, ftoffset, acmode, NULL);
  printf(\nRET IN TROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
cut

It doesnt change anything, I get the same results for troot-ac_only=1:

-output
troot-ac_only IN TROOT!!!0
cli_ac_scanbuffroot-ac_root6488480
RET IN TROOT!!!0

groot-ac_only IN GROOT!!!0
cli_bm_scanbuffroot6346288
RET IN GROOT!!!1

Inside cli_scanraw function After inst. RET: 1 |
../test/vir/tinyVirus: TinyVirus.UNOFFICIAL FOUND
cl_scandesc function has been called. RET:1
---Extended SCAN SUMMARY ---
Known viruses: 1
Engine version: 0.94.1
Scanned directories: 0
Scanned files: 1
Infected files: 1
Aco-Corasick: ENABLEBoyer-Moore: DISABLEData scanned: 0.00 MB
Time: 0.004 sec (0 m 0 s)
--end-output--


If I force groot-ac_only to 1 I got 'segmet fault' (seg fault is
another funciotn - cli_scanraw)

---code---
if(!ftonly) {(groot-ac_only=1; printf(\ngroot-ac_only IN GROOT!!!%d
\n,groot-ac_only);
if(groot-ac_only || (ret = cli_bm_scanbuff(upt, length,
ctx-virname, groot, offset, ftype, desc)) != CL_VIRUS)
ret = cli_ac_scanbuff(upt, length, ctx-virname, NULL, NULL, 
groot,
gdata, offset, ftype, desc, ftoffset, acmode, NULL);
printf(\nRET IN GROOT!!!%d \n, ret);
if(ret == CL_VIRUS) {
---end-code---

Does anybody tried to change this flags? Swap scanning algorithms?

Thanks in advance,
Tom
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-09 Thread Török Edwin
On 2008-12-10 01:31, Thomasz Blaszczyk wrote:
 Another thing,

 If I force troot-ac_only=0

 If I force groot-ac_only to 1 I got 'segmet fault' (seg fault is
 another funciotn - cli_scanraw)
   

That is not the way to go. If you want aconly, use the --dev-ac-only
flag, don't forcibly set it.
You can't get a BMOnly, because some signatures *require* the AC
matcher, such as any signature containing wildcards.

On 2008-12-09 23:43, Thomasz Blaszczyk wrote:
 Thank you for fast reply,

 Sorry for bothering you again. I am missing something in this huge project.
 I cannot understand why both functions: cli_ac_scanbuff and
 cli_ac_scanbuff are called in one cli_scandesc() function call.
 I just have one signature in database and for me it was obvious that
 file will be scanned once using AC or BM. But both algorithms are
 used. Have a look below:
   

You may have one signature, but keep in mind that there are filetype
signatures also, so you never have just one signature in the trie.

Best regards,
--Edwin

___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Re: [Clamav-devel] build debugging ex1.c

2008-12-07 Thread Török Edwin
On 2008-12-06 20:34, Thomasz Blaszczyk wrote:
 Thanks,

 There were many troubles. (I am building project from files I have
 copied from libclamav to my new created project folder).
 In many files a line #include inttypes.h was missing.
 Guess something wrong with cltypes.h  (#ifndef __CLTYPES_H).
 Please have a look on my compilation errors:
 http://omploader.org/vem1s
 http://omploader.org/vem14

 But I get stuck now  whith message in mbox.c|4591|error: incompatible
 type for argument 2 of 'connect'|

 The argument is: (const struct sockaddr *)sin

 Can anyone help with this?

 Please have a look for this errot here:
 http://omploader.org/venNy

   

You are probably missing clamav-config.h which is generated by configure.

I think copying libclamav into your project is the wrong way to go,
since you'll be missing a lot of stuff created by configure.

Just ./configure  make  make install, and then link to libclamav by
using -lclamav, and including clamav.h
Or you can link statically (build with ./configure --disable-shared), if
you want to.

Best regards,
--Edwin
___
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net