Re: [clamav-users] [External] Re: Scan very slow

2019-05-23 Thread Andrew Williams
As of daily-25458, we've updated the Email.Phishing.VOF2 signatures such
that they should have better performance when scanning larger email files.

Specifically, the signatures each had a PCRE component that began by
looking for the string 'filename', and as it turns out, the PCRE library
will begin evaluating the regex more thoroughly each time the first
character in the regex is encountered in a file being scanned.  It also
turns out that RTF files, which get embedded in emails as plain text, can
consist of a surprisingly large number of f's.  In an email we were testing
with that had an embedded RTF file, the email was ~13 million bytes in
size, and ~10 million of those were the letter f!  We modified the regex to
begin by looking for a semicolon, which is much less common in RTF files
and is not in the base64 character set.

Please let us know if you encounter any other cases of unreasonably slow
scan times, and we will do our best to investigate.  Thank you!

-Andrew

Andrew Williams
Malware Research Team
Cisco Talos

On Wed, Apr 10, 2019 at 8:57 PM Micah Snyder (micasnyd) via clamav-users <
clamav-users@lists.clamav.net> wrote:

> JME,
>
> As you've pointed out, it appears that some signatures containing a PCRE
> regex components are responsible for slow scan times on larger email files.
>
> I did a bunch of profiling similar to what Maarten did earlier in order to
> narrow it down.  I found that Email.Phishing.VOF2 signatures are performing
> slower with the eml sample you sent me.  Email.Phishing.VOF2 signatures
> contain a PCRE regex component to alert on email attachments with specific
> names.  Now that we've determined which signatures are performing slowly in
> these cases, I am hopeful that we will be able to optimize the
> Email.Phishing.VOF2 signatures to improve performance.
>
> I will note that your idea to lower the PCRERecMatchLimit setting to 1
> will effectively neuter all signatures that rely on regexes and so I can't
> recommend this.
>
> Regards,
> Micah
>
>
> On 4/10/19, 12:36 PM, "clamav-users on behalf of JME via clamav-users" <
> clamav-users-boun...@lists.clamav.net on behalf of
> clamav-users@lists.clamav.net> wrote:
>
> Helo,
>
> I managed to significantly reduce the problems of very long analysis,
> more than 400sec on some emails. Not by disabling PhishingSignatures that
> did not work. But putting: PCRERecMatchLimit to 1.
> The PCRE analyzes are thus bypassed, but SafeBrawsing and the other
> scans continue to work. Is it a mistake to precede as well?
>
> Regards,
> JME
>
> -Message d'origine-
> De : clamav-users  De la part
> de Brent Clark via clamav-users
> Envoyé : mercredi 10 avril 2019 12:33
> À : ClamAV users ML 
> Cc : Brent Clark 
> Objet : Re: [clamav-users] [External] Re: Scan very slow
>
> Thanks for doing this.
>
> What Im getting out of your feedback is that maybe you guys need to
> look to implementing or relooking at your CI process(es).
>
> Before pushing a commit, your CI can run the same test(s) and alert on
> slow or long running scans.
>
> All this can be automated and report on issues.
>
> I highly recommend to doing this, I dont think you guys realise how
> many systems are running and dependent on Clamav. Might be a good time to
> too remind the community and ask to support and donate for the project.
>
> HTH
>
> Regards
> Brent
>
> On 2019/04/09 17:58, Maarten Broekman via clamav-users wrote:
> > Clearly the latest daily.cvd is performing better, but the remaining
> > "Phishtank" sigs are _not_ a majority of the slowness.
> >
> > I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
> > -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test
> > scan with each part to see what the load times looked like:
> >
> > daily.cdb  Time: 0.007 sec (0 m 0 s)
> > daily.cfg  Time: 0.004 sec (0 m 0 s)
> > daily.crb  Time: 0.006 sec (0 m 0 s)
> > *daily.cvd  Time: 11.384 sec (0 m 11 s)*
> > daily.fp  Time: 0.009 sec (0 m 0 s)
> > daily.ftm  Time: 0.005 sec (0 m 0 s)
> > daily.hdb  Time: 0.303 sec (0 m 0 s)
> > daily.hdu  Time: 0.006 sec (0 m 0 s)
> > daily.hsb  Time: 1.093 sec (0 m 1 s)
> > daily.hsu  Time: 0.005 sec (0 m 0 s)
> > daily.idb  Time: 0.006 sec (0 m 0 s)
> > *daily.ldb  Time: 5.563 sec (0 m 5 s)
> > *
> > daily.ldu  Time: 0.005 sec (0 m 0 s)
> > daily.mdb  Time: 0.061 sec (0 m 0 s)
> 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-18 Thread Micah Snyder (micasnyd) via clamav-users
Mark, Kevin,

I’m glad to hear that your load and scan times are back down to reasonable 
levels.

We’ll continue to investigate the best, safest way to add the phishing 
detection in a way that is fast and optional.

Regards,
Micah

From: clamav-users  on behalf of Mark 
Allan via clamav-users 
Reply-To: ClamAV users ML 
Date: Thursday, April 18, 2019 at 6:09 AM
To: ClamAV users ML 
Cc: Mark Allan 
Subject: Re: [clamav-users] [External] Re: Scan very slow

Fantastic! I can also confirm that scan times are back to normal now - 
more-or-less back to what they were in early February.

The time for one of our FP test volumes which I've been referencing in this 
thread is back down to 3m 30s, and the total time for our full FP test is back 
down from several hours to just 47 minutes.

Thank you!
Mark

On Thu, 18 Apr 2019 at 09:46, Al Varnell via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:
Looks like all Phish.Phishing.REPHISH_ID_... signatures were dropped by 
daily-25423 today.

-Al-


On Apr 17, 2019, at 04:02, Al Varnell 
mailto:alvarn...@mac.com>> wrote:

There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in daily.ldb

-Al-


On Apr 17, 2019, at 03:36, Maarten Broekman 
mailto:maarten.broek...@gmail.com>> wrote:

Are the "Phish" REPHISH signatures still in the daily or were they removed as 
well? Those were causing part of the issue.


--Maarten

On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:
An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were dropped 
by daily-25417 on 12 April, and I can't seem to locate any more.

-Al-


On Apr 17, 2019, at 02:01, Mark Allan via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:

Hi Micah,

Sorry to pester you, but have you any update on when the remaining Phishtank 
signatures will be getting removed? It would be really great to get scan times 
properly back to normal.

Best regards
Mark

On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
mailto:micas...@cisco.com>> wrote:
Mark,

Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
wanted to get things back to relative normal and resolve the immediate crisis.  
We’ll remove the rest of them soon.

Best,
Micah

From: Mark Allan mailto:markjal...@gmail.com>>
Date: Tuesday, April 9, 2019 at 6:26 AM
To: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Cc: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>
Subject: Re: [External] Re: [clamav-users] Scan very slow

The scan times are definitely better than they were - in fact, they're back to 
how they were before last week's inclusion of the Phishtank signatures. They're 
still almost double what they used to be though, and as far as I can see, there 
are still almost 4000 Phishtank signatures in the DB:
$ sigtool --find Phishtank | wc -l
3968

Can I request that those ones also be removed please?

Best regards
Mark

On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
mailto:micas...@cisco.com>> wrote:
Tim,

There are a couple of ways for users to drop specific categories of signatures 
at this time.  Sadly, they wouldn’t have helped this last week.  These include 
bytecode signatures, PUA (potentially unwanted applications) signatures, 
Email.Phishing and HTML.Phishing signatures, and the Safebrowsing database.

If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
Email.Phishing.Phishtank then they could have been disabled with the clamscan 
option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).

Maybe a better option would be for us to create a new optional database for 
phishing signatures. However, the names for the databases are hardcoded into 
freshclam, so it is non-trivial to add a new database and would require a few 
changes to ClamAV’s code. We have talked about making the databases easier to 
add/remove in the future so users can have more categories to enable/disable. 
In this light, it ties in well with existing plans.

Of note the Phishtank sigs from Friday’s daily were removed yesterday and scan 
times should be back to normal.

Regards,
Micah

From: Tim Hawkins 
mailto:tim.hawk...@redflaggroup.com>>
Date: Friday, April 5, 2019 at 6:06 PM
To: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>, Mark 
Allan mailto:markjal...@gmail.com>>
Cc: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Subject: Re: [External] Re: [clamav-users] Scan very slow

Hi Micah

Does clamav partition the database so that signatures that are mainly 
associated with email scanning can be dropped out for folks only needing 
filesystems scans,  none of our systems use email, and we dont make use of the 
mailer extension.

Having to load all the email focused signatures could as you have observed 
impact performance.
Sent from Nine<http://www.9folders.com/&

Re: [clamav-users] [External] Re: Scan very slow

2019-04-18 Thread Mark Allan via clamav-users
Fantastic! I can also confirm that scan times are back to normal now -
more-or-less back to what they were in early February.

The time for one of our FP test volumes which I've been referencing in this
thread is back down to 3m 30s, and the total time for our *full* FP test is
back down from several hours to just 47 minutes.

Thank you!
Mark

On Thu, 18 Apr 2019 at 09:46, Al Varnell via clamav-users <
clamav-users@lists.clamav.net> wrote:

> Looks like all Phish.Phishing.REPHISH_ID_... signatures were dropped by
> daily-25423 today.
>
> -Al-
>
> On Apr 17, 2019, at 04:02, Al Varnell  wrote:
>
> There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in
> daily.ldb
>
> -Al-
>
> On Apr 17, 2019, at 03:36, Maarten Broekman 
> wrote:
>
> Are the "Phish" REPHISH signatures still in the daily or were they removed
> as well? Those were causing part of the issue.
>
>
> --Maarten
>
> On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users <
> clamav-users@lists.clamav.net> wrote:
>
>> An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were
>> dropped by daily-25417 on 12 April, and I can't seem to locate any more.
>>
>> -Al-
>>
>> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users <
>> clamav-users@lists.clamav.net> wrote:
>>
>> Hi Micah,
>>
>> Sorry to pester you, but have you any update on when the remaining
>> Phishtank signatures will be getting removed? It would be really great to
>> get scan times properly back to normal.
>>
>> Best regards
>> Mark
>>
>> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
>> wrote:
>>
>>> Mark,
>>>
>>>
>>> Yes, the plan is still to remove the rest of the Phishtank signatures.
>>> We wanted to get things back to relative normal and resolve the immediate
>>> crisis.  We’ll remove the rest of them soon.
>>>
>>>
>>>
>>> Best,
>>>
>>> Micah
>>>
>>>
>>>
>>> *From: *Mark Allan 
>>> *Date: *Tuesday, April 9, 2019 at 6:26 AM
>>> *To: *"Micah Snyder (micasnyd)" 
>>> *Cc: *ClamAV users ML 
>>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> The scan times are definitely better than they were - in fact, they're
>>> back to how they were before last week's inclusion of the Phishtank
>>> signatures. They're still almost double what they used to be though, and as
>>> far as I can see, there are still almost 4000 Phishtank signatures in the
>>> DB:
>>>
>>> $ sigtool --find Phishtank | wc -l
>>>
>>> 3968
>>>
>>>
>>>
>>> Can I request that those ones also be removed please?
>>>
>>>
>>>
>>> Best regards
>>>
>>> Mark
>>>
>>>
>>>
>>> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
>>> wrote:
>>>
>>> Tim,
>>>
>>>
>>>
>>> There are a couple of ways for users to drop specific categories of
>>> signatures at this time.  Sadly, they wouldn’t have helped this last week.
>>> These include bytecode signatures, PUA (potentially unwanted applications)
>>> signatures, Email.Phishing and HTML.Phishing signatures, and the
>>> Safebrowsing database.
>>>
>>>
>>>
>>> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank
>>> or Email.Phishing.Phishtank then they could have been disabled with the
>>> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>>>
>>>
>>>
>>> Maybe a better option would be for us to create a new optional database
>>> for phishing signatures. However, the names for the databases are hardcoded
>>> into freshclam, so it is non-trivial to add a new database and would
>>> require a few changes to ClamAV’s code. We have talked about making the
>>> databases easier to add/remove in the future so users can have more
>>> categories to enable/disable. In this light, it ties in well with existing
>>> plans.
>>>
>>>
>>>
>>> Of note the Phishtank sigs from Friday’s daily were removed yesterday
>>> and scan times should be back to normal.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Micah
>>>
>>>
>>>
>>> *From: *Tim Hawkins 
>>> *Date: *Friday, April 5, 2019 at 6:06 PM
>>> *To: *ClamAV users ML , Mark Allan <
>>> markjal...@gmail.com>
>>> *Cc: *"Micah Snyder (micasnyd)" 
>>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> Hi Micah
>>>
>>>
>>> Does clamav partition the database so that signatures that are mainly
>>> associated with email scanning can be dropped out for folks only needing
>>> filesystems scans,  none of our systems use email, and we dont make use of
>>> the mailer extension.
>>>
>>> Having to load all the email focused signatures could as you have
>>> observed impact performance.
>>>
>>> Sent from Nine 
>>> --
>>>
>>> *From:* "Micah Snyder (micasnyd) via clamav-users" <
>>> clamav-users@lists.clamav.net>
>>> *Sent:* Saturday, April 6, 2019 03:18
>>> *To:* ClamAV users ML; Mark Allan
>>> *Cc:* Micah Snyder (micasnyd)
>>> *Subject:* [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> Regarding slow scan times today (and slow scan times in general), it
>>> appears that the signatures we generate based 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-18 Thread Al Varnell via clamav-users
Looks like all Phish.Phishing.REPHISH_ID_... signatures were dropped by 
daily-25423 today.

-Al-

> On Apr 17, 2019, at 04:02, Al Varnell  wrote:
> 
> There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in daily.ldb
> 
> -Al-
> 
>> On Apr 17, 2019, at 03:36, Maarten Broekman > > wrote:
>> 
>> Are the "Phish" REPHISH signatures still in the daily or were they removed 
>> as well? Those were causing part of the issue.
>> 
>> 
>> --Maarten
>> 
>> On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users 
>> mailto:clamav-users@lists.clamav.net>> wrote:
>> An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were 
>> dropped by daily-25417 on 12 April, and I can't seem to locate any more.
>> 
>> -Al-
>> 
>>> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users 
>>> mailto:clamav-users@lists.clamav.net>> 
>>> wrote:
>>> 
>>> Hi Micah,
>>> 
>>> Sorry to pester you, but have you any update on when the remaining 
>>> Phishtank signatures will be getting removed? It would be really great to 
>>> get scan times properly back to normal.
>>> 
>>> Best regards
>>> Mark
>>> 
>>> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) >> > wrote:
>>> Mark,
>>> 
>>> 
>>> Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
>>> wanted to get things back to relative normal and resolve the immediate 
>>> crisis.  We’ll remove the rest of them soon.
>>> 
>>>  
>>> 
>>> Best,
>>> 
>>> Micah  
>>> 
>>>  
>>> 
>>> From: Mark Allan mailto:markjal...@gmail.com>>
>>> Date: Tuesday, April 9, 2019 at 6:26 AM
>>> To: "Micah Snyder (micasnyd)" >> >
>>> Cc: ClamAV users ML >> >
>>> Subject: Re: [External] Re: [clamav-users] Scan very slow
>>> 
>>>  
>>> 
>>> The scan times are definitely better than they were - in fact, they're back 
>>> to how they were before last week's inclusion of the Phishtank signatures. 
>>> They're still almost double what they used to be though, and as far as I 
>>> can see, there are still almost 4000 Phishtank signatures in the DB: 
>>> 
>>> $ sigtool --find Phishtank | wc -l
>>> 
>>> 3968
>>> 
>>>  
>>> 
>>> Can I request that those ones also be removed please?
>>> 
>>>  
>>> 
>>> Best regards
>>> 
>>> Mark 
>>> 
>>>  
>>> 
>>> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) >> > wrote:
>>> 
>>> Tim,
>>> 
>>>  
>>> 
>>> There are a couple of ways for users to drop specific categories of 
>>> signatures at this time.  Sadly, they wouldn’t have helped this last week.  
>>> These include bytecode signatures, PUA (potentially unwanted applications) 
>>> signatures, Email.Phishing and HTML.Phishing signatures, and the 
>>> Safebrowsing database. 
>>> 
>>>  
>>> 
>>> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
>>> Email.Phishing.Phishtank then they could have been disabled with the 
>>> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>>> 
>>>  
>>> 
>>> Maybe a better option would be for us to create a new optional database for 
>>> phishing signatures. However, the names for the databases are hardcoded 
>>> into freshclam, so it is non-trivial to add a new database and would 
>>> require a few changes to ClamAV’s code. We have talked about making the 
>>> databases easier to add/remove in the future so users can have more 
>>> categories to enable/disable. In this light, it ties in well with existing 
>>> plans.
>>> 
>>>  
>>> 
>>> Of note the Phishtank sigs from Friday’s daily were removed yesterday and 
>>> scan times should be back to normal.
>>> 
>>>  
>>> 
>>> Regards,
>>> 
>>> Micah
>>> 
>>>  
>>> 
>>> From: Tim Hawkins >> >
>>> Date: Friday, April 5, 2019 at 6:06 PM
>>> To: ClamAV users ML >> >, Mark Allan >> >
>>> Cc: "Micah Snyder (micasnyd)" >> >
>>> Subject: Re: [External] Re: [clamav-users] Scan very slow
>>> 
>>>  
>>> 
>>> Hi Micah
>>> 
>>> 
>>> Does clamav partition the database so that signatures that are mainly 
>>> associated with email scanning can be dropped out for folks only needing 
>>> filesystems scans,  none of our systems use email, and we dont make use of 
>>> the mailer extension. 
>>> 
>>> Having to load all the email focused signatures could as you have observed 
>>> impact performance.
>>> 
>>> Sent from Nine 
>>> From: "Micah Snyder (micasnyd) via clamav-users" 
>>> mailto:clamav-users@lists.clamav.net>>
>>> Sent: Saturday, April 6, 2019 03:18
>>> To: ClamAV users ML; Mark Allan
>>> Cc: Micah Snyder (micasnyd)
>>> Subject: [External] Re: [clamav-users] Scan very slow
>>> 
>>>  
>>> 
>>> Regarding slow scan times today (and slow scan times in general), it 
>>> appears that the signatures we generate based on PhishTank’s feed for 
>>> phishing URLs are 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Micah Snyder (micasnyd) via clamav-users
I do still have an interest in dropping the Phish.Phishing.REPHISH signatures. 
I will look into dropping these as well.

We identified the issue causing the PhishTank and Phish signatures to run 
slowly and how to make them run quickly, but have yet to decide whether or not 
to reintroduce the signatures as a new database, or perhaps under 
Email.Phishing or HTML.Phishing so that they can be enabled/disabled with clamd 
and clamscan configuration options.

I’ll try to see that this is addressed sooner rather than later.

Regards,
Micah

From: clamav-users  on behalf of Al 
Varnell via clamav-users 
Reply-To: ClamAV users ML 
Date: Wednesday, April 17, 2019 at 7:03 AM
To: "clamav-users@lists.clamav.net" 
Cc: Al Varnell 
Subject: Re: [clamav-users] [External] Re: Scan very slow

There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in daily.ldb

-Al-


On Apr 17, 2019, at 03:36, Maarten Broekman 
mailto:maarten.broek...@gmail.com>> wrote:

Are the "Phish" REPHISH signatures still in the daily or were they removed as 
well? Those were causing part of the issue.


--Maarten

On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:
An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were dropped 
by daily-25417 on 12 April, and I can't seem to locate any more.

-Al-


On Apr 17, 2019, at 02:01, Mark Allan via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:

Hi Micah,

Sorry to pester you, but have you any update on when the remaining Phishtank 
signatures will be getting removed? It would be really great to get scan times 
properly back to normal.

Best regards
Mark

On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
mailto:micas...@cisco.com>> wrote:
Mark,

Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
wanted to get things back to relative normal and resolve the immediate crisis.  
We’ll remove the rest of them soon.

Best,
Micah

From: Mark Allan mailto:markjal...@gmail.com>>
Date: Tuesday, April 9, 2019 at 6:26 AM
To: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Cc: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>
Subject: Re: [External] Re: [clamav-users] Scan very slow

The scan times are definitely better than they were - in fact, they're back to 
how they were before last week's inclusion of the Phishtank signatures. They're 
still almost double what they used to be though, and as far as I can see, there 
are still almost 4000 Phishtank signatures in the DB:
$ sigtool --find Phishtank | wc -l
3968

Can I request that those ones also be removed please?

Best regards
Mark

On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
mailto:micas...@cisco.com>> wrote:
Tim,

There are a couple of ways for users to drop specific categories of signatures 
at this time.  Sadly, they wouldn’t have helped this last week.  These include 
bytecode signatures, PUA (potentially unwanted applications) signatures, 
Email.Phishing and HTML.Phishing signatures, and the Safebrowsing database.

If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
Email.Phishing.Phishtank then they could have been disabled with the clamscan 
option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).

Maybe a better option would be for us to create a new optional database for 
phishing signatures. However, the names for the databases are hardcoded into 
freshclam, so it is non-trivial to add a new database and would require a few 
changes to ClamAV’s code. We have talked about making the databases easier to 
add/remove in the future so users can have more categories to enable/disable. 
In this light, it ties in well with existing plans.

Of note the Phishtank sigs from Friday’s daily were removed yesterday and scan 
times should be back to normal.

Regards,
Micah

From: Tim Hawkins 
mailto:tim.hawk...@redflaggroup.com>>
Date: Friday, April 5, 2019 at 6:06 PM
To: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>, Mark 
Allan mailto:markjal...@gmail.com>>
Cc: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Subject: Re: [External] Re: [clamav-users] Scan very slow

Hi Micah

Does clamav partition the database so that signatures that are mainly 
associated with email scanning can be dropped out for folks only needing 
filesystems scans,  none of our systems use email, and we dont make use of the 
mailer extension.

Having to load all the email focused signatures could as you have observed 
impact performance.
Sent from Nine<http://www.9folders.com/>

From: "Micah Snyder (micasnyd) via clamav-users" 
mailto:clamav-users@lists.clamav.net>>
Sent: Saturday, April 6, 2019 03:18
To: ClamAV users ML; Mark Allan
Cc: Micah Snyder (micasnyd)
Subject: [External] Re: [clamav-users] Scan very slow

Regarding

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Maarten Broekman via clamav-users
Gotcha. Those were slowing the scans down more than the 3000-someodd
PhishTank sigs the last time I tested (Apr 9th).

daily_Phish.ldb  Time: 1.612 sec (0 m 1 s)
daily_Phishtank.ldb  Time: 0.146 sec (0 m 0 s)

2515 daily_Phish.ldb
3516 daily_Phishtank.ldb


On Wed, Apr 17, 2019 at 7:03 AM Al Varnell via clamav-users <
clamav-users@lists.clamav.net> wrote:

> There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in
> daily.ldb
>
> -Al-
>
> On Apr 17, 2019, at 03:36, Maarten Broekman 
> wrote:
>
> Are the "Phish" REPHISH signatures still in the daily or were they removed
> as well? Those were causing part of the issue.
>
>
> --Maarten
>
> On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users <
> clamav-users@lists.clamav.net> wrote:
>
>> An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were
>> dropped by daily-25417 on 12 April, and I can't seem to locate any more.
>>
>> -Al-
>>
>> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users <
>> clamav-users@lists.clamav.net> wrote:
>>
>> Hi Micah,
>>
>> Sorry to pester you, but have you any update on when the remaining
>> Phishtank signatures will be getting removed? It would be really great to
>> get scan times properly back to normal.
>>
>> Best regards
>> Mark
>>
>> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
>> wrote:
>>
>>> Mark,
>>>
>>>
>>> Yes, the plan is still to remove the rest of the Phishtank signatures.
>>> We wanted to get things back to relative normal and resolve the immediate
>>> crisis.  We’ll remove the rest of them soon.
>>>
>>>
>>>
>>> Best,
>>>
>>> Micah
>>>
>>>
>>>
>>> *From: *Mark Allan 
>>> *Date: *Tuesday, April 9, 2019 at 6:26 AM
>>> *To: *"Micah Snyder (micasnyd)" 
>>> *Cc: *ClamAV users ML 
>>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> The scan times are definitely better than they were - in fact, they're
>>> back to how they were before last week's inclusion of the Phishtank
>>> signatures. They're still almost double what they used to be though, and as
>>> far as I can see, there are still almost 4000 Phishtank signatures in the
>>> DB:
>>>
>>> $ sigtool --find Phishtank | wc -l
>>>
>>> 3968
>>>
>>>
>>>
>>> Can I request that those ones also be removed please?
>>>
>>>
>>>
>>> Best regards
>>>
>>> Mark
>>>
>>>
>>>
>>> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
>>> wrote:
>>>
>>> Tim,
>>>
>>>
>>>
>>> There are a couple of ways for users to drop specific categories of
>>> signatures at this time.  Sadly, they wouldn’t have helped this last week.
>>> These include bytecode signatures, PUA (potentially unwanted applications)
>>> signatures, Email.Phishing and HTML.Phishing signatures, and the
>>> Safebrowsing database.
>>>
>>>
>>>
>>> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank
>>> or Email.Phishing.Phishtank then they could have been disabled with the
>>> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>>>
>>>
>>>
>>> Maybe a better option would be for us to create a new optional database
>>> for phishing signatures. However, the names for the databases are hardcoded
>>> into freshclam, so it is non-trivial to add a new database and would
>>> require a few changes to ClamAV’s code. We have talked about making the
>>> databases easier to add/remove in the future so users can have more
>>> categories to enable/disable. In this light, it ties in well with existing
>>> plans.
>>>
>>>
>>>
>>> Of note the Phishtank sigs from Friday’s daily were removed yesterday
>>> and scan times should be back to normal.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Micah
>>>
>>>
>>>
>>> *From: *Tim Hawkins 
>>> *Date: *Friday, April 5, 2019 at 6:06 PM
>>> *To: *ClamAV users ML , Mark Allan <
>>> markjal...@gmail.com>
>>> *Cc: *"Micah Snyder (micasnyd)" 
>>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> Hi Micah
>>>
>>>
>>> Does clamav partition the database so that signatures that are mainly
>>> associated with email scanning can be dropped out for folks only needing
>>> filesystems scans,  none of our systems use email, and we dont make use of
>>> the mailer extension.
>>>
>>> Having to load all the email focused signatures could as you have
>>> observed impact performance.
>>>
>>> Sent from Nine 
>>> --
>>>
>>> *From:* "Micah Snyder (micasnyd) via clamav-users" <
>>> clamav-users@lists.clamav.net>
>>> *Sent:* Saturday, April 6, 2019 03:18
>>> *To:* ClamAV users ML; Mark Allan
>>> *Cc:* Micah Snyder (micasnyd)
>>> *Subject:* [External] Re: [clamav-users] Scan very slow
>>>
>>>
>>>
>>> Regarding slow scan times today (and slow scan times in general), it
>>> appears that the signatures we generate based on PhishTank’s feed for
>>> phishing URLs are resulting in very slow load and scan times.
>>>
>>>
>>>
>>> Today’s daily update saw 7448 new Phishtank signatures (much higher than
>>> usual) coinciding with the immediate 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Al Varnell via clamav-users
There are still 2515 "Phish.Phishing.REPHISH_ID_" signatures in daily.ldb

-Al-

> On Apr 17, 2019, at 03:36, Maarten Broekman  > wrote:
> 
> Are the "Phish" REPHISH signatures still in the daily or were they removed as 
> well? Those were causing part of the issue.
> 
> 
> --Maarten
> 
> On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users 
> mailto:clamav-users@lists.clamav.net>> wrote:
> An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were 
> dropped by daily-25417 on 12 April, and I can't seem to locate any more.
> 
> -Al-
> 
>> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users 
>> mailto:clamav-users@lists.clamav.net>> wrote:
>> 
>> Hi Micah,
>> 
>> Sorry to pester you, but have you any update on when the remaining Phishtank 
>> signatures will be getting removed? It would be really great to get scan 
>> times properly back to normal.
>> 
>> Best regards
>> Mark
>> 
>> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) > > wrote:
>> Mark,
>> 
>> 
>> Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
>> wanted to get things back to relative normal and resolve the immediate 
>> crisis.  We’ll remove the rest of them soon.
>> 
>>  
>> 
>> Best,
>> 
>> Micah  
>> 
>>  
>> 
>> From: Mark Allan mailto:markjal...@gmail.com>>
>> Date: Tuesday, April 9, 2019 at 6:26 AM
>> To: "Micah Snyder (micasnyd)" > >
>> Cc: ClamAV users ML > >
>> Subject: Re: [External] Re: [clamav-users] Scan very slow
>> 
>>  
>> 
>> The scan times are definitely better than they were - in fact, they're back 
>> to how they were before last week's inclusion of the Phishtank signatures. 
>> They're still almost double what they used to be though, and as far as I can 
>> see, there are still almost 4000 Phishtank signatures in the DB: 
>> 
>> $ sigtool --find Phishtank | wc -l
>> 
>> 3968
>> 
>>  
>> 
>> Can I request that those ones also be removed please?
>> 
>>  
>> 
>> Best regards
>> 
>> Mark 
>> 
>>  
>> 
>> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) > > wrote:
>> 
>> Tim,
>> 
>>  
>> 
>> There are a couple of ways for users to drop specific categories of 
>> signatures at this time.  Sadly, they wouldn’t have helped this last week.  
>> These include bytecode signatures, PUA (potentially unwanted applications) 
>> signatures, Email.Phishing and HTML.Phishing signatures, and the 
>> Safebrowsing database. 
>> 
>>  
>> 
>> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
>> Email.Phishing.Phishtank then they could have been disabled with the 
>> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>> 
>>  
>> 
>> Maybe a better option would be for us to create a new optional database for 
>> phishing signatures. However, the names for the databases are hardcoded into 
>> freshclam, so it is non-trivial to add a new database and would require a 
>> few changes to ClamAV’s code. We have talked about making the databases 
>> easier to add/remove in the future so users can have more categories to 
>> enable/disable. In this light, it ties in well with existing plans.
>> 
>>  
>> 
>> Of note the Phishtank sigs from Friday’s daily were removed yesterday and 
>> scan times should be back to normal.
>> 
>>  
>> 
>> Regards,
>> 
>> Micah
>> 
>>  
>> 
>> From: Tim Hawkins > >
>> Date: Friday, April 5, 2019 at 6:06 PM
>> To: ClamAV users ML > >, Mark Allan > >
>> Cc: "Micah Snyder (micasnyd)" > >
>> Subject: Re: [External] Re: [clamav-users] Scan very slow
>> 
>>  
>> 
>> Hi Micah
>> 
>> 
>> Does clamav partition the database so that signatures that are mainly 
>> associated with email scanning can be dropped out for folks only needing 
>> filesystems scans,  none of our systems use email, and we dont make use of 
>> the mailer extension. 
>> 
>> Having to load all the email focused signatures could as you have observed 
>> impact performance.
>> 
>> Sent from Nine 
>> From: "Micah Snyder (micasnyd) via clamav-users" 
>> mailto:clamav-users@lists.clamav.net>>
>> Sent: Saturday, April 6, 2019 03:18
>> To: ClamAV users ML; Mark Allan
>> Cc: Micah Snyder (micasnyd)
>> Subject: [External] Re: [clamav-users] Scan very slow
>> 
>>  
>> 
>> Regarding slow scan times today (and slow scan times in general), it appears 
>> that the signatures we generate based on PhishTank’s feed for phishing URLs 
>> are resulting in very slow load and scan times.
>> 
>>  
>> 
>> Today’s daily update saw 7448 new Phishtank signatures (much higher than 
>> usual) coinciding with the immediate performance drop for load time and scan 
>> time.  One user reported that the load time today on some of his slower 
>> machines was slow enough to 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Maarten Broekman via clamav-users
Are the "Phish" REPHISH signatures still in the daily or were they removed
as well? Those were causing part of the issue.


--Maarten

On Wed, Apr 17, 2019 at 5:24 AM Al Varnell via clamav-users <
clamav-users@lists.clamav.net> wrote:

> An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were
> dropped by daily-25417 on 12 April, and I can't seem to locate any more.
>
> -Al-
>
> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users <
> clamav-users@lists.clamav.net> wrote:
>
> Hi Micah,
>
> Sorry to pester you, but have you any update on when the remaining
> Phishtank signatures will be getting removed? It would be really great to
> get scan times properly back to normal.
>
> Best regards
> Mark
>
> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
> wrote:
>
>> Mark,
>>
>>
>> Yes, the plan is still to remove the rest of the Phishtank signatures.
>> We wanted to get things back to relative normal and resolve the immediate
>> crisis.  We’ll remove the rest of them soon.
>>
>>
>>
>> Best,
>>
>> Micah
>>
>>
>>
>> *From: *Mark Allan 
>> *Date: *Tuesday, April 9, 2019 at 6:26 AM
>> *To: *"Micah Snyder (micasnyd)" 
>> *Cc: *ClamAV users ML 
>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>
>>
>>
>> The scan times are definitely better than they were - in fact, they're
>> back to how they were before last week's inclusion of the Phishtank
>> signatures. They're still almost double what they used to be though, and as
>> far as I can see, there are still almost 4000 Phishtank signatures in the
>> DB:
>>
>> $ sigtool --find Phishtank | wc -l
>>
>> 3968
>>
>>
>>
>> Can I request that those ones also be removed please?
>>
>>
>>
>> Best regards
>>
>> Mark
>>
>>
>>
>> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
>> wrote:
>>
>> Tim,
>>
>>
>>
>> There are a couple of ways for users to drop specific categories of
>> signatures at this time.  Sadly, they wouldn’t have helped this last week.
>> These include bytecode signatures, PUA (potentially unwanted applications)
>> signatures, Email.Phishing and HTML.Phishing signatures, and the
>> Safebrowsing database.
>>
>>
>>
>> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or
>> Email.Phishing.Phishtank then they could have been disabled with the
>> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>>
>>
>>
>> Maybe a better option would be for us to create a new optional database
>> for phishing signatures. However, the names for the databases are hardcoded
>> into freshclam, so it is non-trivial to add a new database and would
>> require a few changes to ClamAV’s code. We have talked about making the
>> databases easier to add/remove in the future so users can have more
>> categories to enable/disable. In this light, it ties in well with existing
>> plans.
>>
>>
>>
>> Of note the Phishtank sigs from Friday’s daily were removed yesterday and
>> scan times should be back to normal.
>>
>>
>>
>> Regards,
>>
>> Micah
>>
>>
>>
>> *From: *Tim Hawkins 
>> *Date: *Friday, April 5, 2019 at 6:06 PM
>> *To: *ClamAV users ML , Mark Allan <
>> markjal...@gmail.com>
>> *Cc: *"Micah Snyder (micasnyd)" 
>> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>>
>>
>>
>> Hi Micah
>>
>>
>> Does clamav partition the database so that signatures that are mainly
>> associated with email scanning can be dropped out for folks only needing
>> filesystems scans,  none of our systems use email, and we dont make use of
>> the mailer extension.
>>
>> Having to load all the email focused signatures could as you have
>> observed impact performance.
>>
>> Sent from Nine 
>> --
>>
>> *From:* "Micah Snyder (micasnyd) via clamav-users" <
>> clamav-users@lists.clamav.net>
>> *Sent:* Saturday, April 6, 2019 03:18
>> *To:* ClamAV users ML; Mark Allan
>> *Cc:* Micah Snyder (micasnyd)
>> *Subject:* [External] Re: [clamav-users] Scan very slow
>>
>>
>>
>> Regarding slow scan times today (and slow scan times in general), it
>> appears that the signatures we generate based on PhishTank’s feed for
>> phishing URLs are resulting in very slow load and scan times.
>>
>>
>>
>> Today’s daily update saw 7448 new Phishtank signatures (much higher than
>> usual) coinciding with the immediate performance drop for load time and
>> scan time.  One user reported that the load time today on some of his
>> slower machines was slow enough to exceed the timeout for service startup (
>> https://bugzilla.clamav.net/show_bug.cgi?id=12317).
>>
>>
>>
>> In limited testing on my own machine I saw the following change after
>> dropping the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:
>>
>>- Database load time on my laptop went from 75.43203997612 seconds
>>down to 14.859203100204468 seconds
>>- Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.
>>
>>
>>
>> After some discussion between the teams that work on ClamAV and ClamAV
>> signature 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Al Varnell via clamav-users
An additional 3968 Phishtank.Phishing.PHISH_ID_??? signatures were dropped 
by daily-25417 on 12 April, and I can't seem to locate any more.

-Al-

> On Apr 17, 2019, at 02:01, Mark Allan via clamav-users 
>  wrote:
> 
> Hi Micah,
> 
> Sorry to pester you, but have you any update on when the remaining Phishtank 
> signatures will be getting removed? It would be really great to get scan 
> times properly back to normal.
> 
> Best regards
> Mark
> 
> On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd)  > wrote:
> Mark,
> 
> 
> Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
> wanted to get things back to relative normal and resolve the immediate 
> crisis.  We’ll remove the rest of them soon.
> 
>  
> 
> Best,
> 
> Micah  
> 
>  
> 
> From: Mark Allan mailto:markjal...@gmail.com>>
> Date: Tuesday, April 9, 2019 at 6:26 AM
> To: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
> Cc: ClamAV users ML  >
> Subject: Re: [External] Re: [clamav-users] Scan very slow
> 
>  
> 
> The scan times are definitely better than they were - in fact, they're back 
> to how they were before last week's inclusion of the Phishtank signatures. 
> They're still almost double what they used to be though, and as far as I can 
> see, there are still almost 4000 Phishtank signatures in the DB: 
> 
> $ sigtool --find Phishtank | wc -l
> 
> 3968
> 
>  
> 
> Can I request that those ones also be removed please?
> 
>  
> 
> Best regards
> 
> Mark 
> 
>  
> 
> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd)  > wrote:
> 
> Tim,
> 
>  
> 
> There are a couple of ways for users to drop specific categories of 
> signatures at this time.  Sadly, they wouldn’t have helped this last week.  
> These include bytecode signatures, PUA (potentially unwanted applications) 
> signatures, Email.Phishing and HTML.Phishing signatures, and the Safebrowsing 
> database. 
> 
>  
> 
> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
> Email.Phishing.Phishtank then they could have been disabled with the clamscan 
> option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
> 
>  
> 
> Maybe a better option would be for us to create a new optional database for 
> phishing signatures. However, the names for the databases are hardcoded into 
> freshclam, so it is non-trivial to add a new database and would require a few 
> changes to ClamAV’s code. We have talked about making the databases easier to 
> add/remove in the future so users can have more categories to enable/disable. 
> In this light, it ties in well with existing plans.
> 
>  
> 
> Of note the Phishtank sigs from Friday’s daily were removed yesterday and 
> scan times should be back to normal.
> 
>  
> 
> Regards,
> 
> Micah
> 
>  
> 
> From: Tim Hawkins  >
> Date: Friday, April 5, 2019 at 6:06 PM
> To: ClamAV users ML  >, Mark Allan  >
> Cc: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
> Subject: Re: [External] Re: [clamav-users] Scan very slow
> 
>  
> 
> Hi Micah
> 
> 
> Does clamav partition the database so that signatures that are mainly 
> associated with email scanning can be dropped out for folks only needing 
> filesystems scans,  none of our systems use email, and we dont make use of 
> the mailer extension. 
> 
> Having to load all the email focused signatures could as you have observed 
> impact performance.
> 
> Sent from Nine 
> From: "Micah Snyder (micasnyd) via clamav-users" 
> mailto:clamav-users@lists.clamav.net>>
> Sent: Saturday, April 6, 2019 03:18
> To: ClamAV users ML; Mark Allan
> Cc: Micah Snyder (micasnyd)
> Subject: [External] Re: [clamav-users] Scan very slow
> 
>  
> 
> Regarding slow scan times today (and slow scan times in general), it appears 
> that the signatures we generate based on PhishTank’s feed for phishing URLs 
> are resulting in very slow load and scan times.
> 
>  
> 
> Today’s daily update saw 7448 new Phishtank signatures (much higher than 
> usual) coinciding with the immediate performance drop for load time and scan 
> time.  One user reported that the load time today on some of his slower 
> machines was slow enough to exceed the timeout for service startup 
> (https://bugzilla.clamav.net/show_bug.cgi?id=12317 
> ).
> 
>  
> 
> In limited testing on my own machine I saw the following change after 
> dropping the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:
> 
> Database load time on my laptop went from 75.43203997612 seconds down to 
> 14.859203100204468 seconds
> Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.
>  
> 
> After some discussion between the teams that work on ClamAV and ClamAV 
> signature content and deployment, we’ve agreed to drop PhishTank 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-17 Thread Mark Allan via clamav-users
Hi Micah,

Sorry to pester you, but have you any update on when the remaining
Phishtank signatures will be getting removed? It would be really great to
get scan times properly back to normal.

Best regards
Mark

On Tue, 9 Apr 2019 at 16:32, Micah Snyder (micasnyd) 
wrote:

> Mark,
>
>
> Yes, the plan is still to remove the rest of the Phishtank signatures.  We
> wanted to get things back to relative normal and resolve the immediate
> crisis.  We’ll remove the rest of them soon.
>
>
>
> Best,
>
> Micah
>
>
>
> *From: *Mark Allan 
> *Date: *Tuesday, April 9, 2019 at 6:26 AM
> *To: *"Micah Snyder (micasnyd)" 
> *Cc: *ClamAV users ML 
> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>
>
>
> The scan times are definitely better than they were - in fact, they're
> back to how they were before last week's inclusion of the Phishtank
> signatures. They're still almost double what they used to be though, and as
> far as I can see, there are still almost 4000 Phishtank signatures in the
> DB:
>
> $ sigtool --find Phishtank | wc -l
>
> 3968
>
>
>
> Can I request that those ones also be removed please?
>
>
>
> Best regards
>
> Mark
>
>
>
> On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
> wrote:
>
> Tim,
>
>
>
> There are a couple of ways for users to drop specific categories of
> signatures at this time.  Sadly, they wouldn’t have helped this last week.
> These include bytecode signatures, PUA (potentially unwanted applications)
> signatures, Email.Phishing and HTML.Phishing signatures, and the
> Safebrowsing database.
>
>
>
> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or
> Email.Phishing.Phishtank then they could have been disabled with the
> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>
>
>
> Maybe a better option would be for us to create a new optional database
> for phishing signatures. However, the names for the databases are hardcoded
> into freshclam, so it is non-trivial to add a new database and would
> require a few changes to ClamAV’s code. We have talked about making the
> databases easier to add/remove in the future so users can have more
> categories to enable/disable. In this light, it ties in well with existing
> plans.
>
>
>
> Of note the Phishtank sigs from Friday’s daily were removed yesterday and
> scan times should be back to normal.
>
>
>
> Regards,
>
> Micah
>
>
>
> *From: *Tim Hawkins 
> *Date: *Friday, April 5, 2019 at 6:06 PM
> *To: *ClamAV users ML , Mark Allan <
> markjal...@gmail.com>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>
>
>
> Hi Micah
>
>
> Does clamav partition the database so that signatures that are mainly
> associated with email scanning can be dropped out for folks only needing
> filesystems scans,  none of our systems use email, and we dont make use of
> the mailer extension.
>
> Having to load all the email focused signatures could as you have observed
> impact performance.
>
> Sent from Nine 
> --
>
> *From:* "Micah Snyder (micasnyd) via clamav-users" <
> clamav-users@lists.clamav.net>
> *Sent:* Saturday, April 6, 2019 03:18
> *To:* ClamAV users ML; Mark Allan
> *Cc:* Micah Snyder (micasnyd)
> *Subject:* [External] Re: [clamav-users] Scan very slow
>
>
>
> Regarding slow scan times today (and slow scan times in general), it
> appears that the signatures we generate based on PhishTank’s feed for
> phishing URLs are resulting in very slow load and scan times.
>
>
>
> Today’s daily update saw 7448 new Phishtank signatures (much higher than
> usual) coinciding with the immediate performance drop for load time and
> scan time.  One user reported that the load time today on some of his
> slower machines was slow enough to exceed the timeout for service startup (
> https://bugzilla.clamav.net/show_bug.cgi?id=12317).
>
>
>
> In limited testing on my own machine I saw the following change after
> dropping the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:
>
>- Database load time on my laptop went from 75.43203997612 seconds
>down to 14.859203100204468 seconds
>- Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.
>
>
>
> After some discussion between the teams that work on ClamAV and ClamAV
> signature content and deployment, we’ve agreed to drop PhishTank signatures
> from the database until we can determine a way to craft Phishtank
> signatures without incurring such a significant performance hit.
>
>
>
> The daily update tomorrow will have the change.
>
>
>
> -Micah
>
>
>
>
> Micah Snyder
> ClamAV Development
> Talos
> Cisco Systems, Inc.
>
>
>
>
>
>
>
> *From: *clamav-users  on behalf of
> "Micah Snyder (micasnyd) via clamav-users" 
> *Reply-To: *ClamAV users ML 
> *Date: *Friday, April 5, 2019 at 1:08 PM
> *To: *Mark Allan , ClamAV users ML <
> clamav-users@lists.clamav.net>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [clamav-users] Scan 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-14 Thread Paul Kosinski via clamav-users
Regexes can be slow or even extremely slow to apply, depending on the
implementation. Backtracking is the worst, perhaps taking exponential
time, but often is cut off by artificial limits.

Does ClamAV perchance precompute Deterministic Finite Automata for the
regexes? These run fast, but take time exponential in regex length to
set up:

  
https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times



On Thu, 11 Apr 2019 00:56:04 +
"Micah Snyder \(micasnyd\) via clamav-users"
 wrote:

> JME,
> 
> As you've pointed out, it appears that some signatures containing a
> PCRE regex components are responsible for slow scan times on larger
> email files.
> 
> I did a bunch of profiling similar to what Maarten did earlier in
> order to narrow it down.  I found that Email.Phishing.VOF2 signatures
> are performing slower with the eml sample you sent me.
> Email.Phishing.VOF2 signatures contain a PCRE regex component to
> alert on email attachments with specific names.  Now that we've
> determined which signatures are performing slowly in these cases, I
> am hopeful that we will be able to optimize the Email.Phishing.VOF2
> signatures to improve performance.
> 
> I will note that your idea to lower the PCRERecMatchLimit setting to
> 1 will effectively neuter all signatures that rely on regexes and so
> I can't recommend this.
> 
> Regards,
> Micah


___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


Re: [clamav-users] [External] Re: Scan very slow

2019-04-12 Thread Micah Snyder (micasnyd) via clamav-users
We don't use the word engine in quite that way with ClamAV, but I think I 
understand your question. 

With regards to the word "engine":
Clamd builds a scanning engine based on the databases and configuration 
options.  The engine is shared by scanning threads.

With regards to clamd's use of multithreading:
Clamd uses multithreading to handle scan requests.  That is to say that 
each scan target will get its own thread.  However, files contained within the 
scan target will be scanned in the same thread as the scan target.  Scans of 
embedded content are invoked as they are identified by the parsers for each 
given file type.  None of these make use of multithreading at this time.

Regards,
Micah

On 4/11/19, 4:09 PM, "clamav-users on behalf of Paul Kosinski via 
clamav-users"  wrote:

Does clamd use multi-threading for the various "engines" within a
single scan, or only to handle multiple requests from different sources?


On Tue, 9 Apr 2019 21:29:43 +
"Micah Snyder \(micasnyd\) via clamav-users"
 wrote:

> Maarten,
> 
> Your test results are pretty great.  I really like your breakdown of
> the signatures by category.  I will caution that scan times will vary
> quite heavily depending on what you’re scanning, based on Target type
> (https://www.clamav.net/documents/clamav-file-types).
> 
> In addition, it’s important to distinguish between load and scan
> times.  The time reported by clamscan is both load + scan.  If you
> just want scan time, you will want to load the database with clamd
> and then test the scantime with clamdscan.
> 
> Regarding load time vs scantime, all of the signatures must be
> loaded, but depending on the target type of the file being scanned,
> not all of the signatures will be matched against the file.  That is,
> daily_Win.ldb might take the longest to load due to the number of
> signatures or complexity of the signatures but when scanning a PDF,
> they probably won’t impact scan time, as Win signatures are probably
> mostly target type 1 (PE file).
> 
> I’ve bit of time today investigating what I believe is responsible
> for slow load and scan times for the Phishtank sigs.  I had a hunch,
> based on a conversation we saw a while back in the mailing list, that
> the identical beginning for URL-based signatures result in an
> un-balanced and inefficient tree for matching. That is, some 3000
> signatures each began with either:
> 
> 
>   1.  href="http:// (687265663d22687474703a2f2f)
>   2.  HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
>   3.  S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)
> 
> Looking at a few of the Phish.Phishing signatures, these appear to
> have the same issue (href="http:// prefix).  In testing with scan of
> a PDF document, I was able to reduce the scan time from 31.987 sec
> down to 2.632 sec simply by changing the start of the Phishtank
> signatures for the following:
> 
> 
>   1.  href="http://
>  *   from: 687265663d22687474703a2f2f
>  *   to: 687265663d2268747470{3-4}
>   2.  HYPERLINK "http
>  *   from: 48595045524c494e4b2022687474703a2f2f
>  *   to: 48595045524c494e4b202268747470{3-4}
>   3.  S/URI/URI(http://
>  *   from: 532f5552492f55524928687474703a2f2f
>  *   to: 532f5552492f5552492868747470{3-4}
> 
> This should get the same detection with a faster load and scan time,
> and will accommodate for httpS for better coverage.  To turn lemonade
> into really good lemonade, we may be able to take the above
> optimization and apply it to the Phish.Phishing signatures identified
> by Maarten to reduce scan times further to levels below those before
> the addition of the Phishtank signatures.
> 
> As noted by Maarten as well, the Phish.Phishing sigs are Target type
> 0, whereas we’d split the Phishtank.Phishing signatures up by target
> type to reduce scan times of files where the signatures won’t apply.
> It should also speed things up quite a bit for other file types to
> split those up by Target types.
> 
> Further research into scan time optimization is definitely welcome
> and appreciated.
> 
> Regards,
> Micah

___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml



___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq


Re: [clamav-users] [External] Re: Scan very slow

2019-04-10 Thread Micah Snyder (micasnyd) via clamav-users
JME,

As you've pointed out, it appears that some signatures containing a PCRE regex 
components are responsible for slow scan times on larger email files.

I did a bunch of profiling similar to what Maarten did earlier in order to 
narrow it down.  I found that Email.Phishing.VOF2 signatures are performing 
slower with the eml sample you sent me.  Email.Phishing.VOF2 signatures contain 
a PCRE regex component to alert on email attachments with specific names.  Now 
that we've determined which signatures are performing slowly in these cases, I 
am hopeful that we will be able to optimize the Email.Phishing.VOF2 signatures 
to improve performance.

I will note that your idea to lower the PCRERecMatchLimit setting to 1 will 
effectively neuter all signatures that rely on regexes and so I can't recommend 
this.

Regards,
Micah


On 4/10/19, 12:36 PM, "clamav-users on behalf of JME via clamav-users" 
 wrote:

Helo,

I managed to significantly reduce the problems of very long analysis, more 
than 400sec on some emails. Not by disabling PhishingSignatures that did not 
work. But putting: PCRERecMatchLimit to 1.
The PCRE analyzes are thus bypassed, but SafeBrawsing and the other scans 
continue to work. Is it a mistake to precede as well?

Regards,
JME

-Message d'origine-
De : clamav-users  De la part de 
Brent Clark via clamav-users
Envoyé : mercredi 10 avril 2019 12:33
À : ClamAV users ML 
Cc : Brent Clark 
    Objet : Re: [clamav-users] [External] Re: Scan very slow

Thanks for doing this.

What Im getting out of your feedback is that maybe you guys need to look to 
implementing or relooking at your CI process(es).

Before pushing a commit, your CI can run the same test(s) and alert on slow 
or long running scans.

All this can be automated and report on issues.

I highly recommend to doing this, I dont think you guys realise how many 
systems are running and dependent on Clamav. Might be a good time to too remind 
the community and ask to support and donate for the project.

HTH

Regards
Brent

On 2019/04/09 17:58, Maarten Broekman via clamav-users wrote:
> Clearly the latest daily.cvd is performing better, but the remaining 
> "Phishtank" sigs are _not_ a majority of the slowness.
> 
> I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
> -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test 
> scan with each part to see what the load times looked like:
> 
> daily.cdb  Time: 0.007 sec (0 m 0 s)
> daily.cfg  Time: 0.004 sec (0 m 0 s)
> daily.crb  Time: 0.006 sec (0 m 0 s)
> *daily.cvd  Time: 11.384 sec (0 m 11 s)*
> daily.fp  Time: 0.009 sec (0 m 0 s)
> daily.ftm  Time: 0.005 sec (0 m 0 s)
> daily.hdb  Time: 0.303 sec (0 m 0 s)
> daily.hdu  Time: 0.006 sec (0 m 0 s)
> daily.hsb  Time: 1.093 sec (0 m 1 s)
> daily.hsu  Time: 0.005 sec (0 m 0 s)
> daily.idb  Time: 0.006 sec (0 m 0 s)
> *daily.ldb  Time: 5.563 sec (0 m 5 s)
> *
> daily.ldu  Time: 0.005 sec (0 m 0 s)
> daily.mdb  Time: 0.061 sec (0 m 0 s)
> daily.mdu  Time: 0.007 sec (0 m 0 s)
> daily.msb  Time: 0.005 sec (0 m 0 s)
> daily.msu  Time: 0.005 sec (0 m 0 s)
> daily.ndb  Time: 0.017 sec (0 m 0 s)
> daily.ndu  Time: 0.005 sec (0 m 0 s)
> daily.pdb  Time: 0.010 sec (0 m 0 s)
> daily.sfp  Time: 0.006 sec (0 m 0 s)
> daily.wdb  Time: 0.014 sec (0 m 0 s)
> 
> So, half the run time of a clamscan is from the daily.ldb. To break it 
> down farther, I split the daily.ldb into "daily_.ldb" where 
>  is the first part of the dot-separated signature name.
> 
> daily_Andr.ldb  Time: 0.008 sec (0 m 0 s)
> daily_Archive.ldb  Time: 0.009 sec (0 m 0 s)
> daily_Asp.ldb  Time: 0.004 sec (0 m 0 s)
> daily_Doc.ldb  Time: 0.116 sec (0 m 0 s)
> daily_Eicar-Test-Signature.ldb  Time: 0.009 sec (0 m 0 s)
> daily_Email.ldb  Time: 0.014 sec (0 m 0 s)
> daily_Emf.ldb  Time: 0.007 sec (0 m 0 s)
> daily_Heuristics.ldb  Time: 0.006 sec (0 m 0 s)
> daily_Html.ldb  Time: 0.010 sec (0 m 0 s)
> daily_Hwp.ldb  Time: 0.005 sec (0 m 0 s)
> daily_Img.ldb  Time: 0.006 sec (0 m 0 s)
> daily_Ios.ldb  Time: 0.006 sec (0 m 0 s)
> daily_Java.ldb  Time: 0.005 sec (0 m 0 s)
> daily_Js.ldb  Time: 0.007 sec (0 m 0 s)
> daily_Legacy.ldb  Time: 0.006 sec (0 m 0 s)
> daily_Lnk

Re: [clamav-users] [External] Re: Scan very slow

2019-04-10 Thread Brent Clark via clamav-users

Thanks for doing this.

What Im getting out of your feedback is that maybe you guys need to look 
to implementing or relooking at your CI process(es).


Before pushing a commit, your CI can run the same test(s) and alert on 
slow or long running scans.


All this can be automated and report on issues.

I highly recommend to doing this, I dont think you guys realise how many 
systems are running and dependent on Clamav. Might be a good time to too 
remind the community and ask to support and donate for the project.


HTH

Regards
Brent

On 2019/04/09 17:58, Maarten Broekman via clamav-users wrote:
Clearly the latest daily.cvd is performing better, but the remaining 
"Phishtank" sigs are _not_ a majority of the slowness.


I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53 
-0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan 
with each part to see what the load times looked like:


daily.cdb  Time: 0.007 sec (0 m 0 s)
daily.cfg  Time: 0.004 sec (0 m 0 s)
daily.crb  Time: 0.006 sec (0 m 0 s)
*daily.cvd  Time: 11.384 sec (0 m 11 s)*
daily.fp  Time: 0.009 sec (0 m 0 s)
daily.ftm  Time: 0.005 sec (0 m 0 s)
daily.hdb  Time: 0.303 sec (0 m 0 s)
daily.hdu  Time: 0.006 sec (0 m 0 s)
daily.hsb  Time: 1.093 sec (0 m 1 s)
daily.hsu  Time: 0.005 sec (0 m 0 s)
daily.idb  Time: 0.006 sec (0 m 0 s)
*daily.ldb  Time: 5.563 sec (0 m 5 s)
*
daily.ldu  Time: 0.005 sec (0 m 0 s)
daily.mdb  Time: 0.061 sec (0 m 0 s)
daily.mdu  Time: 0.007 sec (0 m 0 s)
daily.msb  Time: 0.005 sec (0 m 0 s)
daily.msu  Time: 0.005 sec (0 m 0 s)
daily.ndb  Time: 0.017 sec (0 m 0 s)
daily.ndu  Time: 0.005 sec (0 m 0 s)
daily.pdb  Time: 0.010 sec (0 m 0 s)
daily.sfp  Time: 0.006 sec (0 m 0 s)
daily.wdb  Time: 0.014 sec (0 m 0 s)

So, half the run time of a clamscan is from the daily.ldb. To break it 
down farther, I split the daily.ldb into "daily_.ldb" where 
 is the first part of the dot-separated signature name.


daily_Andr.ldb  Time: 0.008 sec (0 m 0 s)
daily_Archive.ldb  Time: 0.009 sec (0 m 0 s)
daily_Asp.ldb  Time: 0.004 sec (0 m 0 s)
daily_Doc.ldb  Time: 0.116 sec (0 m 0 s)
daily_Eicar-Test-Signature.ldb  Time: 0.009 sec (0 m 0 s)
daily_Email.ldb  Time: 0.014 sec (0 m 0 s)
daily_Emf.ldb  Time: 0.007 sec (0 m 0 s)
daily_Heuristics.ldb  Time: 0.006 sec (0 m 0 s)
daily_Html.ldb  Time: 0.010 sec (0 m 0 s)
daily_Hwp.ldb  Time: 0.005 sec (0 m 0 s)
daily_Img.ldb  Time: 0.006 sec (0 m 0 s)
daily_Ios.ldb  Time: 0.006 sec (0 m 0 s)
daily_Java.ldb  Time: 0.005 sec (0 m 0 s)
daily_Js.ldb  Time: 0.007 sec (0 m 0 s)
daily_Legacy.ldb  Time: 0.006 sec (0 m 0 s)
daily_Lnk.ldb  Time: 0.005 sec (0 m 0 s)
daily_Mp4.ldb  Time: 0.005 sec (0 m 0 s)
daily_Multios.ldb  Time: 0.005 sec (0 m 0 s)
daily_Osx.ldb  Time: 0.008 sec (0 m 0 s)
daily_Pdf.ldb  Time: 0.007 sec (0 m 0 s)
*daily_Phish.ldb  Time: 1.612 sec (0 m 1 s)*
daily_Phishtank.ldb  Time: 0.146 sec (0 m 0 s)
daily_Php.ldb  Time: 0.006 sec (0 m 0 s)
daily_Ppt.ldb  Time: 0.007 sec (0 m 0 s)
daily_Py.ldb  Time: 0.006 sec (0 m 0 s)
daily_Rtf.ldb  Time: 0.006 sec (0 m 0 s)
daily_Svg.ldb  Time: 0.005 sec (0 m 0 s)
daily_Swf.ldb  Time: 0.007 sec (0 m 0 s)
daily_Ttf.ldb  Time: 0.005 sec (0 m 0 s)
daily_Txt.ldb  Time: 0.009 sec (0 m 0 s)
daily_Unix.ldb  Time: 0.008 sec (0 m 0 s)
daily_Vbs.ldb  Time: 0.009 sec (0 m 0 s)
*daily_Win.ldb  Time: 3.391 sec (0 m 3 s)*
daily_Xls.ldb  Time: 0.009 sec (0 m 0 s)
daily_Xml.ldb  Time: 0.007 sec (0 m 0 s)


"Phish.", not "Phishtank.", and "Win." are the longest run times. 
Looking at the /number/ of signatures in each, the 'Phish.' signatures 
are taking a disproportionate amount of time to load compared to the 
other signatures:


      216 daily_Andr.ldb
        3 daily_Archive.ldb
        1 daily_Asp.ldb
     2096 daily_Doc.ldb
        1 daily_Eicar-Test-Signature.ldb
     1017 daily_Email.ldb
        2 daily_Emf.ldb
        5 daily_Heuristics.ldb
      250 daily_Html.ldb
        1 daily_Hwp.ldb
       15 daily_Img.ldb
        6 daily_Ios.ldb
       16 daily_Java.ldb
       69 daily_Js.ldb
       27 daily_Legacy.ldb
        9 daily_Lnk.ldb
        1 daily_Mp4.ldb
        9 daily_Multios.ldb
      175 daily_Osx.ldb
      132 daily_Pdf.ldb
     2515 daily_Phish.ldb
     3516 daily_Phishtank.ldb
       18 daily_Php.ldb
        5 daily_Ppt.ldb
        3 daily_Py.ldb
       28 daily_Rtf.ldb
        1 daily_Svg.ldb
      103 daily_Swf.ldb
        2 daily_Ttf.ldb
      140 daily_Txt.ldb

Re: [clamav-users] [External] Re: Scan very slow

2019-04-10 Thread Steve Basford

On 2019-04-09 22:29, Micah Snyder (micasnyd) via clamav-users wrote:

Maarten,


Looking at a few of the Phish.Phishing signatures, these appear to
have the same issue (href="http:// prefix).  In testing with scan of a
PDF document, I was able to reduce the scan time from 31.987 sec down
to 2.632 sec simply by changing the start of the Phishtank signatures
for the following:


Hi Micah,

Just in case this helps, a slow loading db issue a while back:

https://bugzilla.clamav.net/show_bug.cgi?id=11017



--
Cheers,

Steve
Twitter: @sanesecurity

___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Maarten Broekman via clamav-users
Oh, absolutely Micah. My scan times were negligible as I was scanning a
single PHP that was 150 bytes or so (opening PHP tag, two lines of
comments, and a call to phpinfo), so those times I gave were entirely load
time.

I'm glad that you found the information helpful.

--Maarten

On Tue, Apr 9, 2019 at 5:29 PM Micah Snyder (micasnyd) 
wrote:

> Maarten,
>
>
>
> Your test results are pretty great.  I really like your breakdown of the
> signatures by category.  I will caution that scan times will vary quite
> heavily depending on what you’re scanning, based on Target type (
> https://www.clamav.net/documents/clamav-file-types).
>
>
>
> In addition, it’s important to distinguish between load and scan times.
> The time reported by clamscan is both load + scan.  If you just want scan
> time, you will want to load the database with clamd and then test the
> scantime with clamdscan.
>
>
>
> Regarding load time vs scantime, all of the signatures must be loaded, but
> depending on the target type of the file being scanned, not all of the
> signatures will be matched against the file.  That is, daily_Win.ldb might
> take the longest to load due to the number of signatures or complexity of
> the signatures but when scanning a PDF, they probably won’t impact scan
> time, as Win signatures are probably mostly target type 1 (PE file).
>
>
>
> I’ve bit of time today investigating what I believe is responsible for
> slow load and scan times for the Phishtank sigs.  I had a hunch, based on a
> conversation we saw a while back in the mailing list, that the identical
> beginning for URL-based signatures result in an un-balanced and inefficient
> tree for matching. That is, some 3000 signatures each began with either:
>
>
>1. href="http:// (687265663d22687474703a2f2f)
>2. HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
>3. S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)
>
>
>
> Looking at a few of the Phish.Phishing signatures, these appear to have
> the same issue (href="http:// prefix).  In testing with scan of a PDF
> document, I was able to reduce the scan time from 31.987 sec down to 2.632
> sec simply by changing the start of the Phishtank signatures for the
> following:
>
>
>1. href="http://
>   1. from: 687265663d22687474703a2f2f
>   2. to: 687265663d2268747470{3-4}
>2. HYPERLINK "http
>   1. from: 48595045524c494e4b2022687474703a2f2f
>   2. to: 48595045524c494e4b202268747470{3-4}
>3. S/URI/URI(http://
>   1. from: 532f5552492f55524928687474703a2f2f
>   2. to: 532f5552492f5552492868747470{3-4}
>
>
>
> This should get the same detection with a faster load and scan time, and
> will accommodate for httpS for better coverage.  To turn lemonade into
> really good lemonade, we may be able to take the above optimization and
> apply it to the Phish.Phishing signatures identified by Maarten to reduce
> scan times further to levels below those before the addition of the
> Phishtank signatures.
>
>
>
> As noted by Maarten as well, the Phish.Phishing sigs are Target type 0,
> whereas we’d split the Phishtank.Phishing signatures up by target type to
> reduce scan times of files where the signatures won’t apply.  It should
> also speed things up quite a bit for other file types to split those up by
> Target types.
>
>
>
> Further research into scan time optimization is definitely welcome and
> appreciated.
>
>
>
> Regards,
>
> Micah
>
>
>
>
>
> *From: *clamav-users  on behalf of
> Maarten Broekman via clamav-users 
> *Reply-To: *ClamAV users ML 
> *Date: *Tuesday, April 9, 2019 at 12:00 PM
> *To: *ClamAV users ML 
> *Cc: *Maarten Broekman 
> *Subject: *Re: [clamav-users] [External] Re: Scan very slow
>
>
>
> Clearly the latest daily.cvd is performing better, but the remaining
> "Phishtank" sigs are *not* a majority of the slowness.
>
>
>
> I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
> -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan
> with each part to see what the load times looked like:
>
> daily.cdb  Time: 0.007 sec (0 m 0 s)
>
> daily.cfg  Time: 0.004 sec (0 m 0 s)
>
> daily.crb  Time: 0.006 sec (0 m 0 s)
>
> *daily.cvd  Time: 11.384 sec (0 m 11 s)*
>
> daily.fp  Time: 0.009 sec (0 m 0 s)
>
> daily.ftm  Time: 0.005 sec (0 m 0 s)
>
> daily.hdb  Time: 0.303 sec (0 m 0 s)
>
> daily.hdu  Time: 0.006 sec (0 m 0 s)
>
> daily.hsb  Time: 1.093 sec (0 m 1 s)
>
> daily.hsu  Time: 0.005 sec (0 m 0 s)
>
> daily.idb  Time: 0.006 sec (0 m 0 s)
>
> *daily.ldb  Ti

Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Micah Snyder (micasnyd) via clamav-users
Maarten,

Your test results are pretty great.  I really like your breakdown of the 
signatures by category.  I will caution that scan times will vary quite heavily 
depending on what you’re scanning, based on Target type 
(https://www.clamav.net/documents/clamav-file-types).

In addition, it’s important to distinguish between load and scan times.  The 
time reported by clamscan is both load + scan.  If you just want scan time, you 
will want to load the database with clamd and then test the scantime with 
clamdscan.

Regarding load time vs scantime, all of the signatures must be loaded, but 
depending on the target type of the file being scanned, not all of the 
signatures will be matched against the file.  That is, daily_Win.ldb might take 
the longest to load due to the number of signatures or complexity of the 
signatures but when scanning a PDF, they probably won’t impact scan time, as 
Win signatures are probably mostly target type 1 (PE file).

I’ve bit of time today investigating what I believe is responsible for slow 
load and scan times for the Phishtank sigs.  I had a hunch, based on a 
conversation we saw a while back in the mailing list, that the identical 
beginning for URL-based signatures result in an un-balanced and inefficient 
tree for matching. That is, some 3000 signatures each began with either:


  1.  href="http:// (687265663d22687474703a2f2f)
  2.  HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
  3.  S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)

Looking at a few of the Phish.Phishing signatures, these appear to have the 
same issue (href="http:// prefix).  In testing with scan of a PDF document, I 
was able to reduce the scan time from 31.987 sec down to 2.632 sec simply by 
changing the start of the Phishtank signatures for the following:


  1.  href="http://
 *   from: 687265663d22687474703a2f2f
 *   to: 687265663d2268747470{3-4}
  2.  HYPERLINK "http
 *   from: 48595045524c494e4b2022687474703a2f2f
 *   to: 48595045524c494e4b202268747470{3-4}
  3.  S/URI/URI(http://
 *   from: 532f5552492f55524928687474703a2f2f
 *   to: 532f5552492f5552492868747470{3-4}

This should get the same detection with a faster load and scan time, and will 
accommodate for httpS for better coverage.  To turn lemonade into really good 
lemonade, we may be able to take the above optimization and apply it to the 
Phish.Phishing signatures identified by Maarten to reduce scan times further to 
levels below those before the addition of the Phishtank signatures.

As noted by Maarten as well, the Phish.Phishing sigs are Target type 0, whereas 
we’d split the Phishtank.Phishing signatures up by target type to reduce scan 
times of files where the signatures won’t apply.  It should also speed things 
up quite a bit for other file types to split those up by Target types.

Further research into scan time optimization is definitely welcome and 
appreciated.

Regards,
Micah


From: clamav-users  on behalf of Maarten 
Broekman via clamav-users 
Reply-To: ClamAV users ML 
Date: Tuesday, April 9, 2019 at 12:00 PM
To: ClamAV users ML 
Cc: Maarten Broekman 
Subject: Re: [clamav-users] [External] Re: Scan very slow

Clearly the latest daily.cvd is performing better, but the remaining 
"Phishtank" sigs are not a majority of the slowness.

I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53 
-0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan with 
each part to see what the load times looked like:
daily.cdb  Time: 0.007 sec (0 m 0 s)
daily.cfg  Time: 0.004 sec (0 m 0 s)
daily.crb  Time: 0.006 sec (0 m 0 s)
daily.cvd  Time: 11.384 sec (0 m 11 s)
daily.fp  Time: 0.009 sec (0 m 0 s)
daily.ftm  Time: 0.005 sec (0 m 0 s)
daily.hdb  Time: 0.303 sec (0 m 0 s)
daily.hdu  Time: 0.006 sec (0 m 0 s)
daily.hsb  Time: 1.093 sec (0 m 1 s)
daily.hsu  Time: 0.005 sec (0 m 0 s)
daily.idb  Time: 0.006 sec (0 m 0 s)
daily.ldb  Time: 5.563 sec (0 m 5 s)
daily.ldu  Time: 0.005 sec (0 m 0 s)
daily.mdb  Time: 0.061 sec (0 m 0 s)
daily.mdu  Time: 0.007 sec (0 m 0 s)
daily.msb  Time: 0.005 sec (0 m 0 s)
daily.msu  Time: 0.005 sec (0 m 0 s)
daily.ndb  Time: 0.017 sec (0 m 0 s)
daily.ndu  Time: 0.005 sec (0 m 0 s)
daily.pdb  Time: 0.010 sec (0 m 0 s)
daily.sfp  Time: 0.006 sec (0 m 0 s)
daily.wdb  Time: 0.014 sec (0 m 0 s)

So, half the run time of a clamscan is from the daily.ldb. To break it down 
farther, I split the daily.ldb into "daily_.ldb" where  is the 
first part of the dot-separated signature name.
daily_Andr.ldb  Time: 0.008 sec (0 m 0 s)
daily_Archive.ldb  Time: 0.009 sec (0 m 0 s)
daily_Asp.ldb  Time: 0.004 sec (0 m 0 s)
daily_Doc.ldb  Time: 0.116 sec (0 m 0 s)
daily_Eicar-Test-Signature.ldb  Time: 0.009 sec (0 m 0 s)
daily_Email.ldb  Time: 0.014 sec (0 m 0 s)
daily_Emf.ldb  Time: 0.007 sec (0 m 0 s)
daily

Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Maarten Broekman via clamav-users
Clearly the latest daily.cvd is performing better, but the remaining
"Phishtank" sigs are *not* a majority of the slowness.

I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
-0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan
with each part to see what the load times looked like:

daily.cdb  Time: 0.007 sec (0 m 0 s)
daily.cfg  Time: 0.004 sec (0 m 0 s)
daily.crb  Time: 0.006 sec (0 m 0 s)
*daily.cvd  Time: 11.384 sec (0 m 11 s)*
daily.fp  Time: 0.009 sec (0 m 0 s)
daily.ftm  Time: 0.005 sec (0 m 0 s)
daily.hdb  Time: 0.303 sec (0 m 0 s)
daily.hdu  Time: 0.006 sec (0 m 0 s)
daily.hsb  Time: 1.093 sec (0 m 1 s)
daily.hsu  Time: 0.005 sec (0 m 0 s)
daily.idb  Time: 0.006 sec (0 m 0 s)

*daily.ldb  Time: 5.563 sec (0 m 5 s)*
daily.ldu  Time: 0.005 sec (0 m 0 s)
daily.mdb  Time: 0.061 sec (0 m 0 s)
daily.mdu  Time: 0.007 sec (0 m 0 s)
daily.msb  Time: 0.005 sec (0 m 0 s)
daily.msu  Time: 0.005 sec (0 m 0 s)
daily.ndb  Time: 0.017 sec (0 m 0 s)
daily.ndu  Time: 0.005 sec (0 m 0 s)
daily.pdb  Time: 0.010 sec (0 m 0 s)
daily.sfp  Time: 0.006 sec (0 m 0 s)
daily.wdb  Time: 0.014 sec (0 m 0 s)

So, half the run time of a clamscan is from the daily.ldb. To break it down
farther, I split the daily.ldb into "daily_.ldb" where  is
the first part of the dot-separated signature name.

daily_Andr.ldb  Time: 0.008 sec (0 m 0 s)
daily_Archive.ldb  Time: 0.009 sec (0 m 0 s)
daily_Asp.ldb  Time: 0.004 sec (0 m 0 s)
daily_Doc.ldb  Time: 0.116 sec (0 m 0 s)
daily_Eicar-Test-Signature.ldb  Time: 0.009 sec (0 m 0 s)
daily_Email.ldb  Time: 0.014 sec (0 m 0 s)
daily_Emf.ldb  Time: 0.007 sec (0 m 0 s)
daily_Heuristics.ldb  Time: 0.006 sec (0 m 0 s)
daily_Html.ldb  Time: 0.010 sec (0 m 0 s)
daily_Hwp.ldb  Time: 0.005 sec (0 m 0 s)
daily_Img.ldb  Time: 0.006 sec (0 m 0 s)
daily_Ios.ldb  Time: 0.006 sec (0 m 0 s)
daily_Java.ldb  Time: 0.005 sec (0 m 0 s)
daily_Js.ldb  Time: 0.007 sec (0 m 0 s)
daily_Legacy.ldb  Time: 0.006 sec (0 m 0 s)
daily_Lnk.ldb  Time: 0.005 sec (0 m 0 s)
daily_Mp4.ldb  Time: 0.005 sec (0 m 0 s)
daily_Multios.ldb  Time: 0.005 sec (0 m 0 s)
daily_Osx.ldb  Time: 0.008 sec (0 m 0 s)
daily_Pdf.ldb  Time: 0.007 sec (0 m 0 s)
*daily_Phish.ldb  Time: 1.612 sec (0 m 1 s)*
daily_Phishtank.ldb  Time: 0.146 sec (0 m 0 s)
daily_Php.ldb  Time: 0.006 sec (0 m 0 s)
daily_Ppt.ldb  Time: 0.007 sec (0 m 0 s)
daily_Py.ldb  Time: 0.006 sec (0 m 0 s)
daily_Rtf.ldb  Time: 0.006 sec (0 m 0 s)
daily_Svg.ldb  Time: 0.005 sec (0 m 0 s)
daily_Swf.ldb  Time: 0.007 sec (0 m 0 s)
daily_Ttf.ldb  Time: 0.005 sec (0 m 0 s)
daily_Txt.ldb  Time: 0.009 sec (0 m 0 s)
daily_Unix.ldb  Time: 0.008 sec (0 m 0 s)
daily_Vbs.ldb  Time: 0.009 sec (0 m 0 s)
*daily_Win.ldb  Time: 3.391 sec (0 m 3 s)*
daily_Xls.ldb  Time: 0.009 sec (0 m 0 s)
daily_Xml.ldb  Time: 0.007 sec (0 m 0 s)


"Phish.", not "Phishtank.", and "Win." are the longest run times. Looking
at the *number* of signatures in each, the 'Phish.' signatures are taking a
disproportionate amount of time to load compared to the other signatures:

 216 daily_Andr.ldb
   3 daily_Archive.ldb
   1 daily_Asp.ldb
2096 daily_Doc.ldb
   1 daily_Eicar-Test-Signature.ldb
1017 daily_Email.ldb
   2 daily_Emf.ldb
   5 daily_Heuristics.ldb
 250 daily_Html.ldb
   1 daily_Hwp.ldb
  15 daily_Img.ldb
   6 daily_Ios.ldb
  16 daily_Java.ldb
  69 daily_Js.ldb
  27 daily_Legacy.ldb
   9 daily_Lnk.ldb
   1 daily_Mp4.ldb
   9 daily_Multios.ldb
 175 daily_Osx.ldb
 132 daily_Pdf.ldb
2515 daily_Phish.ldb
3516 daily_Phishtank.ldb
  18 daily_Php.ldb
   5 daily_Ppt.ldb
   3 daily_Py.ldb
  28 daily_Rtf.ldb
   1 daily_Svg.ldb
 103 daily_Swf.ldb
   2 daily_Ttf.ldb
 140 daily_Txt.ldb
 222 daily_Unix.ldb
  21 daily_Vbs.ldb
   43928 daily_Win.ldb
 165 daily_Xls.ldb
   8 daily_Xml.ldb


>From the look of it, "Phish." has those REPHISH signatures. Those
signatures seem to be looking at any file (Target 0) and have subsignatures
that are combined to match depending on which filetype they are 'looking'
for (so, href for HTML files, %PDF, Subtype, and URI objects for PDFs, etc)
as opposed to the remaining Phishtank sigs which seem to have a separate
signature depending on the target type.

Breaking up daily_Win into it's constituent sub-parts doesn't reveal any
particular culprit from just a simple scan timing though...

daily_Win.Adware.ldb  Time: 0.013 sec (0 m 0 s)
daily_Win.Coinminer.ldb  Time: 0.009 sec (0 m 0 s)
daily_Win.Downloader.ldb  Time: 0.035 sec (0 m 0 s)
daily_Win.Dropper.ldb  Time: 0.240 sec (0 m 0 s)
daily_Win.Exploit.ldb  Time: 0.016 sec (0 m 0 s)
daily_Win.Ircbot.ldb  Time: 0.006 sec (0 m 0 s)

Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Micah Snyder (micasnyd) via clamav-users
Mark,

Yes, the plan is still to remove the rest of the Phishtank signatures.  We 
wanted to get things back to relative normal and resolve the immediate crisis.  
We’ll remove the rest of them soon.

Best,
Micah

From: Mark Allan 
Date: Tuesday, April 9, 2019 at 6:26 AM
To: "Micah Snyder (micasnyd)" 
Cc: ClamAV users ML 
Subject: Re: [External] Re: [clamav-users] Scan very slow

The scan times are definitely better than they were - in fact, they're back to 
how they were before last week's inclusion of the Phishtank signatures. They're 
still almost double what they used to be though, and as far as I can see, there 
are still almost 4000 Phishtank signatures in the DB:
$ sigtool --find Phishtank | wc -l
3968

Can I request that those ones also be removed please?

Best regards
Mark

On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
mailto:micas...@cisco.com>> wrote:
Tim,

There are a couple of ways for users to drop specific categories of signatures 
at this time.  Sadly, they wouldn’t have helped this last week.  These include 
bytecode signatures, PUA (potentially unwanted applications) signatures, 
Email.Phishing and HTML.Phishing signatures, and the Safebrowsing database.

If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
Email.Phishing.Phishtank then they could have been disabled with the clamscan 
option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).

Maybe a better option would be for us to create a new optional database for 
phishing signatures. However, the names for the databases are hardcoded into 
freshclam, so it is non-trivial to add a new database and would require a few 
changes to ClamAV’s code. We have talked about making the databases easier to 
add/remove in the future so users can have more categories to enable/disable. 
In this light, it ties in well with existing plans.

Of note the Phishtank sigs from Friday’s daily were removed yesterday and scan 
times should be back to normal.

Regards,
Micah

From: Tim Hawkins 
mailto:tim.hawk...@redflaggroup.com>>
Date: Friday, April 5, 2019 at 6:06 PM
To: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>, Mark 
Allan mailto:markjal...@gmail.com>>
Cc: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Subject: Re: [External] Re: [clamav-users] Scan very slow

Hi Micah

Does clamav partition the database so that signatures that are mainly 
associated with email scanning can be dropped out for folks only needing 
filesystems scans,  none of our systems use email, and we dont make use of the 
mailer extension.

Having to load all the email focused signatures could as you have observed 
impact performance.
Sent from Nine

From: "Micah Snyder (micasnyd) via clamav-users" 
mailto:clamav-users@lists.clamav.net>>
Sent: Saturday, April 6, 2019 03:18
To: ClamAV users ML; Mark Allan
Cc: Micah Snyder (micasnyd)
Subject: [External] Re: [clamav-users] Scan very slow

Regarding slow scan times today (and slow scan times in general), it appears 
that the signatures we generate based on PhishTank’s feed for phishing URLs are 
resulting in very slow load and scan times.

Today’s daily update saw 7448 new Phishtank signatures (much higher than usual) 
coinciding with the immediate performance drop for load time and scan time.  
One user reported that the load time today on some of his slower machines was 
slow enough to exceed the timeout for service startup 
(https://bugzilla.clamav.net/show_bug.cgi?id=12317).

In limited testing on my own machine I saw the following change after dropping 
the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:

  *   Database load time on my laptop went from 75.43203997612 seconds down to 
14.859203100204468 seconds
  *   Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.

After some discussion between the teams that work on ClamAV and ClamAV 
signature content and deployment, we’ve agreed to drop PhishTank signatures 
from the database until we can determine a way to craft Phishtank signatures 
without incurring such a significant performance hit.

The daily update tomorrow will have the change.

-Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.



From: clamav-users 
mailto:clamav-users-boun...@lists.clamav.net>>
 on behalf of "Micah Snyder (micasnyd) via clamav-users" 
mailto:clamav-users@lists.clamav.net>>
Reply-To: ClamAV users ML 
mailto:clamav-users@lists.clamav.net>>
Date: Friday, April 5, 2019 at 1:08 PM
To: Mark Allan mailto:markjal...@gmail.com>>, ClamAV 
users ML mailto:clamav-users@lists.clamav.net>>
Cc: "Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
Subject: Re: [clamav-users] Scan very slow

Hi Mark,

Sorry about the delay in responding.  I hadn’t looked at my clamav-users filter 
this morning.  Just investigating now.  Will respond when I know more.

-Micah

From: Mark Allan mailto:markjal...@gmail.com>>
Date: Friday, April 5, 2019 at 9:12 AM
To: 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Steve Basford

On 2019-04-09 12:02, Brent Clark via clamav-users wrote:

Cant those be adopted / managed by Sanesecurity?

For all you know, those are already in Sanesecurity.


They are... and have been for quite some time:


"The following databases are distributed by Sanesecurity, but produced 
by Porcupine Signatures"


phishtank.ndb.

Briefly...

Number of sigs in phishtank.ndb: 9,309

eg:

PhishTank.Phishing.6002281, matches:

https://www.phishtank.com/phish_detail.php?phish_id=6002281

So, there is going to be some possible cross over now that 
Phish.Phishing.REPHISH_ID_20190404_67-6931549-0
type signatures names from PhishTank feed are in daily.ldb and 
daily.ndb.


I'll check back on the thread later.

--
Cheers,

Steve
Twitter: @sanesecurity

___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Brent Clark via clamav-users

Cant those be adopted / managed by Sanesecurity?

For all you know, those are already in Sanesecurity.

Regards
Brent Clark

On 2019/04/09 12:25, Mark Allan via clamav-users wrote:
The scan times are definitely better than they were - in fact, they're 
back to how they were before last week's inclusion of the Phishtank 
signatures. They're still almost double what they used to be though, and 
as far as I can see, there are still almost 4000 Phishtank signatures in 
the DB:

$ sigtool --find Phishtank | wc -l
     3968

Can I request that those ones also be removed please?

Best regards
Mark

On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) > wrote:


Tim,

__ __

There are a couple of ways for users to drop specific categories of
signatures at this time.  Sadly, they wouldn’t have helped this last
week.  These include bytecode signatures, PUA (potentially unwanted
applications) signatures, Email.Phishing and HTML.Phishing
signatures, and the Safebrowsing database. 

__ __

If we had named the Phishtank.Phishing sigs to
HTML.Phishing.Phishtank or Email.Phishing.Phishtank then they could
have been disabled with the clamscan option `--phishing-sigs=no`
(clamd.conf: `PhishingSignatures no`).

__ __

Maybe a better option would be for us to create a new optional
database for phishing signatures. However, the names for the
databases are hardcoded into freshclam, so it is non-trivial to add
a new database and would require a few changes to ClamAV’s code. We
have talked about making the databases easier to add/remove in the
future so users can have more categories to enable/disable. In this
light, it ties in well with existing plans.

__ __

Of note the Phishtank sigs from Friday’s daily were removed
yesterday and scan times should be back to normal. 

__ __

Regards,

Micah

__ __

*From: *Tim Hawkins mailto:tim.hawk...@redflaggroup.com>>
*Date: *Friday, April 5, 2019 at 6:06 PM
*To: *ClamAV users ML mailto:clamav-users@lists.clamav.net>>, Mark Allan
mailto:markjal...@gmail.com>>
*Cc: *"Micah Snyder (micasnyd)" mailto:micas...@cisco.com>>
*Subject: *Re: [External] Re: [clamav-users] Scan very slow

__ __

Hi Micah


Does clamav partition the database so that signatures that are
mainly associated with email scanning can be dropped out for folks
only needing filesystems scans,  none of our systems use email, and
we dont make use of the mailer extension.

Having to load all the email focused signatures could as you have
observed impact performance. 

Sent from Nine 



*From:* "Micah Snyder (micasnyd) via clamav-users"
mailto:clamav-users@lists.clamav.net>>
*Sent:* Saturday, April 6, 2019 03:18
*To:* ClamAV users ML; Mark Allan
*Cc:* Micah Snyder (micasnyd)
*Subject:* [External] Re: [clamav-users] Scan very slow

__ __

Regarding slow scan times today (and slow scan times in general), it
appears that the signatures we generate based on PhishTank’s feed
for phishing URLs are resulting in very slow load and scan times.



Today’s daily update saw 7448 new Phishtank signatures (much higher
than usual) coinciding with the immediate performance drop for load
time and scan time.  One user reported that the load time today on
some of his slower machines was slow enough to exceed the timeout
for service startup
(https://bugzilla.clamav.net/show_bug.cgi?id=12317).



In limited testing on my own machine I saw the following change
after dropping the Phishtank.Phishing signatures from daily.cvd’s
daily.ldb file:

  * Database load time on my laptop went from 75.43203997612 seconds
down to 14.859203100204468 seconds 
  * Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644
sec.



After some discussion between the teams that work on ClamAV and
ClamAV signature content and deployment, we’ve agreed to drop
PhishTank signatures from the database until we can determine a way
to craft Phishtank signatures without incurring such a significant
performance hit. 



The daily update tomorrow will have the change.



-Micah




Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.







*From: *clamav-users mailto:clamav-users-boun...@lists.clamav.net>> on behalf of "Micah
Snyder (micasnyd) via clamav-users" mailto:clamav-users@lists.clamav.net>>
*Reply-To: *ClamAV users ML mailto:clamav-users@lists.clamav.net>>
*Date: *Friday, April 5, 2019 at 1:08 PM
*To: *Mark Allan mailto:markjal...@gmail.com>>, ClamAV users 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-09 Thread Mark Allan via clamav-users
The scan times are definitely better than they were - in fact, they're back
to how they were before last week's inclusion of the Phishtank signatures.
They're still almost double what they used to be though, and as far as I
can see, there are still almost 4000 Phishtank signatures in the DB:
$ sigtool --find Phishtank | wc -l
3968

Can I request that those ones also be removed please?

Best regards
Mark

On Sun, 7 Apr 2019 at 14:43, Micah Snyder (micasnyd) 
wrote:

> Tim,
>
>
>
> There are a couple of ways for users to drop specific categories of
> signatures at this time.  Sadly, they wouldn’t have helped this last week.
> These include bytecode signatures, PUA (potentially unwanted applications)
> signatures, Email.Phishing and HTML.Phishing signatures, and the
> Safebrowsing database.
>
>
>
> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or
> Email.Phishing.Phishtank then they could have been disabled with the
> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>
>
>
> Maybe a better option would be for us to create a new optional database
> for phishing signatures. However, the names for the databases are hardcoded
> into freshclam, so it is non-trivial to add a new database and would
> require a few changes to ClamAV’s code. We have talked about making the
> databases easier to add/remove in the future so users can have more
> categories to enable/disable. In this light, it ties in well with existing
> plans.
>
>
>
> Of note the Phishtank sigs from Friday’s daily were removed yesterday and
> scan times should be back to normal.
>
>
>
> Regards,
>
> Micah
>
>
>
> *From: *Tim Hawkins 
> *Date: *Friday, April 5, 2019 at 6:06 PM
> *To: *ClamAV users ML , Mark Allan <
> markjal...@gmail.com>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>
>
>
> Hi Micah
>
>
> Does clamav partition the database so that signatures that are mainly
> associated with email scanning can be dropped out for folks only needing
> filesystems scans,  none of our systems use email, and we dont make use of
> the mailer extension.
>
> Having to load all the email focused signatures could as you have observed
> impact performance.
>
> Sent from Nine 
> --
>
> *From:* "Micah Snyder (micasnyd) via clamav-users" <
> clamav-users@lists.clamav.net>
> *Sent:* Saturday, April 6, 2019 03:18
> *To:* ClamAV users ML; Mark Allan
> *Cc:* Micah Snyder (micasnyd)
> *Subject:* [External] Re: [clamav-users] Scan very slow
>
>
>
> Regarding slow scan times today (and slow scan times in general), it
> appears that the signatures we generate based on PhishTank’s feed for
> phishing URLs are resulting in very slow load and scan times.
>
>
>
> Today’s daily update saw 7448 new Phishtank signatures (much higher than
> usual) coinciding with the immediate performance drop for load time and
> scan time.  One user reported that the load time today on some of his
> slower machines was slow enough to exceed the timeout for service startup (
> https://bugzilla.clamav.net/show_bug.cgi?id=12317).
>
>
>
> In limited testing on my own machine I saw the following change after
> dropping the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:
>
>- Database load time on my laptop went from 75.43203997612 seconds
>down to 14.859203100204468 seconds
>- Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.
>
>
>
> After some discussion between the teams that work on ClamAV and ClamAV
> signature content and deployment, we’ve agreed to drop PhishTank signatures
> from the database until we can determine a way to craft Phishtank
> signatures without incurring such a significant performance hit.
>
>
>
> The daily update tomorrow will have the change.
>
>
>
> -Micah
>
>
>
>
> Micah Snyder
> ClamAV Development
> Talos
> Cisco Systems, Inc.
>
>
>
>
>
>
>
> *From: *clamav-users  on behalf of
> "Micah Snyder (micasnyd) via clamav-users" 
> *Reply-To: *ClamAV users ML 
> *Date: *Friday, April 5, 2019 at 1:08 PM
> *To: *Mark Allan , ClamAV users ML <
> clamav-users@lists.clamav.net>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [clamav-users] Scan very slow
>
>
>
> Hi Mark,
>
>
>
> Sorry about the delay in responding.  I hadn’t looked at my clamav-users
> filter this morning.  Just investigating now.  Will respond when I know
> more.
>
>
>
> -Micah
>
>
>
> *From: *Mark Allan 
> *Date: *Friday, April 5, 2019 at 9:12 AM
> *To: *ClamAV users ML , "Micah Snyder
> (micasnyd)" 
> *Subject: *Re: [clamav-users] Scan very slow
>
>
>
> Also CC'ing Micah directly as the mailing list would appear to be offline
> (at least lists.clamav.net isn't responding to http requests anyway)
>
>
>
> It looks like scan times have gone through the roof. As Oya said, they're
> still considerably higher than they were a couple of months ago, but
> today's scan time is insane.
>
>
>
> Yesterday's scan using
>
> 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-07 Thread Maarten Broekman via clamav-users
Having the Phishtank sigs as an additional optional database would be great
and, from my perspective, well worth the effort since we don't use them.

On Sun, Apr 7, 2019 at 9:44 AM Micah Snyder (micasnyd) via clamav-users <
clamav-users@lists.clamav.net> wrote:

> Tim,
>
>
>
> There are a couple of ways for users to drop specific categories of
> signatures at this time.  Sadly, they wouldn’t have helped this last week.
> These include bytecode signatures, PUA (potentially unwanted applications)
> signatures, Email.Phishing and HTML.Phishing signatures, and the
> Safebrowsing database.
>
>
>
> If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or
> Email.Phishing.Phishtank then they could have been disabled with the
> clamscan option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).
>
>
>
> Maybe a better option would be for us to create a new optional database
> for phishing signatures. However, the names for the databases are hardcoded
> into freshclam, so it is non-trivial to add a new database and would
> require a few changes to ClamAV’s code. We have talked about making the
> databases easier to add/remove in the future so users can have more
> categories to enable/disable. In this light, it ties in well with existing
> plans.
>
>
>
> Of note the Phishtank sigs from Friday’s daily were removed yesterday and
> scan times should be back to normal.
>
>
>
> Regards,
>
> Micah
>
>
>
> *From: *Tim Hawkins 
> *Date: *Friday, April 5, 2019 at 6:06 PM
> *To: *ClamAV users ML , Mark Allan <
> markjal...@gmail.com>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [External] Re: [clamav-users] Scan very slow
>
>
>
> Hi Micah
>
>
> Does clamav partition the database so that signatures that are mainly
> associated with email scanning can be dropped out for folks only needing
> filesystems scans,  none of our systems use email, and we dont make use of
> the mailer extension.
>
> Having to load all the email focused signatures could as you have observed
> impact performance.
>
> Sent from Nine 
> --
>
> *From:* "Micah Snyder (micasnyd) via clamav-users" <
> clamav-users@lists.clamav.net>
> *Sent:* Saturday, April 6, 2019 03:18
> *To:* ClamAV users ML; Mark Allan
> *Cc:* Micah Snyder (micasnyd)
> *Subject:* [External] Re: [clamav-users] Scan very slow
>
>
>
> Regarding slow scan times today (and slow scan times in general), it
> appears that the signatures we generate based on PhishTank’s feed for
> phishing URLs are resulting in very slow load and scan times.
>
>
>
> Today’s daily update saw 7448 new Phishtank signatures (much higher than
> usual) coinciding with the immediate performance drop for load time and
> scan time.  One user reported that the load time today on some of his
> slower machines was slow enough to exceed the timeout for service startup (
> https://bugzilla.clamav.net/show_bug.cgi?id=12317).
>
>
>
> In limited testing on my own machine I saw the following change after
> dropping the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:
>
>- Database load time on my laptop went from 75.43203997612 seconds
>down to 14.859203100204468 seconds
>- Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.
>
>
>
> After some discussion between the teams that work on ClamAV and ClamAV
> signature content and deployment, we’ve agreed to drop PhishTank signatures
> from the database until we can determine a way to craft Phishtank
> signatures without incurring such a significant performance hit.
>
>
>
> The daily update tomorrow will have the change.
>
>
>
> -Micah
>
>
>
>
> Micah Snyder
> ClamAV Development
> Talos
> Cisco Systems, Inc.
>
>
>
>
>
>
>
> *From: *clamav-users  on behalf of
> "Micah Snyder (micasnyd) via clamav-users" 
> *Reply-To: *ClamAV users ML 
> *Date: *Friday, April 5, 2019 at 1:08 PM
> *To: *Mark Allan , ClamAV users ML <
> clamav-users@lists.clamav.net>
> *Cc: *"Micah Snyder (micasnyd)" 
> *Subject: *Re: [clamav-users] Scan very slow
>
>
>
> Hi Mark,
>
>
>
> Sorry about the delay in responding.  I hadn’t looked at my clamav-users
> filter this morning.  Just investigating now.  Will respond when I know
> more.
>
>
>
> -Micah
>
>
>
> *From: *Mark Allan 
> *Date: *Friday, April 5, 2019 at 9:12 AM
> *To: *ClamAV users ML , "Micah Snyder
> (micasnyd)" 
> *Subject: *Re: [clamav-users] Scan very slow
>
>
>
> Also CC'ing Micah directly as the mailing list would appear to be offline
> (at least lists.clamav.net isn't responding to http requests anyway)
>
>
>
> It looks like scan times have gone through the roof. As Oya said, they're
> still considerably higher than they were a couple of months ago, but
> today's scan time is insane.
>
>
>
> Yesterday's scan using
>
> 0.101.2:58:25409:1554370140:1:63:48554:328
>
> took 7m 3s
>
>
>
> On the same hardware, scanning the same read-only disk image, with today's
> scan using
>
> 0.101.2:58:25410:1554452941:1:63:48557:328
>
> the scan 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-07 Thread Micah Snyder (micasnyd) via clamav-users
Tim,

There are a couple of ways for users to drop specific categories of signatures 
at this time.  Sadly, they wouldn’t have helped this last week.  These include 
bytecode signatures, PUA (potentially unwanted applications) signatures, 
Email.Phishing and HTML.Phishing signatures, and the Safebrowsing database.

If we had named the Phishtank.Phishing sigs to HTML.Phishing.Phishtank or 
Email.Phishing.Phishtank then they could have been disabled with the clamscan 
option `--phishing-sigs=no` (clamd.conf: `PhishingSignatures no`).

Maybe a better option would be for us to create a new optional database for 
phishing signatures. However, the names for the databases are hardcoded into 
freshclam, so it is non-trivial to add a new database and would require a few 
changes to ClamAV’s code. We have talked about making the databases easier to 
add/remove in the future so users can have more categories to enable/disable. 
In this light, it ties in well with existing plans.

Of note the Phishtank sigs from Friday’s daily were removed yesterday and scan 
times should be back to normal.

Regards,
Micah

From: Tim Hawkins 
Date: Friday, April 5, 2019 at 6:06 PM
To: ClamAV users ML , Mark Allan 

Cc: "Micah Snyder (micasnyd)" 
Subject: Re: [External] Re: [clamav-users] Scan very slow

Hi Micah

Does clamav partition the database so that signatures that are mainly 
associated with email scanning can be dropped out for folks only needing 
filesystems scans,  none of our systems use email, and we dont make use of the 
mailer extension.

Having to load all the email focused signatures could as you have observed 
impact performance.
Sent from Nine

From: "Micah Snyder (micasnyd) via clamav-users" 
Sent: Saturday, April 6, 2019 03:18
To: ClamAV users ML; Mark Allan
Cc: Micah Snyder (micasnyd)
Subject: [External] Re: [clamav-users] Scan very slow

Regarding slow scan times today (and slow scan times in general), it appears 
that the signatures we generate based on PhishTank’s feed for phishing URLs are 
resulting in very slow load and scan times.

Today’s daily update saw 7448 new Phishtank signatures (much higher than usual) 
coinciding with the immediate performance drop for load time and scan time.  
One user reported that the load time today on some of his slower machines was 
slow enough to exceed the timeout for service startup 
(https://bugzilla.clamav.net/show_bug.cgi?id=12317).

In limited testing on my own machine I saw the following change after dropping 
the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:

  *   Database load time on my laptop went from 75.43203997612 seconds down to 
14.859203100204468 seconds
  *   Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.

After some discussion between the teams that work on ClamAV and ClamAV 
signature content and deployment, we’ve agreed to drop PhishTank signatures 
from the database until we can determine a way to craft Phishtank signatures 
without incurring such a significant performance hit.

The daily update tomorrow will have the change.

-Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.



From: clamav-users  on behalf of "Micah 
Snyder (micasnyd) via clamav-users" 
Reply-To: ClamAV users ML 
Date: Friday, April 5, 2019 at 1:08 PM
To: Mark Allan , ClamAV users ML 

Cc: "Micah Snyder (micasnyd)" 
Subject: Re: [clamav-users] Scan very slow

Hi Mark,

Sorry about the delay in responding.  I hadn’t looked at my clamav-users filter 
this morning.  Just investigating now.  Will respond when I know more.

-Micah

From: Mark Allan 
Date: Friday, April 5, 2019 at 9:12 AM
To: ClamAV users ML , "Micah Snyder (micasnyd)" 

Subject: Re: [clamav-users] Scan very slow

Also CC'ing Micah directly as the mailing list would appear to be offline (at 
least lists.clamav.net isn't responding to http 
requests anyway)

It looks like scan times have gone through the roof. As Oya said, they're still 
considerably higher than they were a couple of months ago, but today's scan 
time is insane.

Yesterday's scan using
0.101.2:58:25409:1554370140:1:63:48554:328
took 7m 3s

On the same hardware, scanning the same read-only disk image, with today's scan 
using
0.101.2:58:25410:1554452941:1:63:48557:328
the scan time has jumped to 26m 15s

This is the longest it has ever taken to scan this volume (cf my previous email 
of 25th March)

Is there anything that can be excluded?

Best regards
Mark

On Mon, 1 Apr 2019 at 17:11, Micah Snyder (micasnyd) via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:
Thanks Oya for the update.  We will continue to investigate the signature 
performance issue.

Regards,
Micah

On 3/28/19, 9:50 AM, "clamav-users on behalf of Tsutomu Oyamada" 
mailto:clamav-users-boun...@lists.clamav.net>
 on behalf of oyam...@promark-inc.com> wrote:

Hi Micah

It seems that the  scanning slow 

Re: [clamav-users] [External] Re: Scan very slow

2019-04-06 Thread Maarten Broekman via clamav-users
Given that the PhishTank signatures, specifically, have been causing the
performance issues, no. It's not unreasonable to want to pull them, and
only them, out. Having them in a separate db file would be highly
beneficial to those of us that don't want or need them at all. Barring
that, having a configuration option to disable them that is separate from
heuristics and safebrowsing would be just as effective.

--Maarten

On Sat, Apr 6, 2019 at 10:43 AM Matus UHLAR - fantomas 
wrote:

> On 05.04.19 22:05, Tim Hawkins wrote:
> >Does clamav partition the database so that signatures that are mainly
> associated with email scanning can be dropped out for folks only needing
> filesystems scans,  none of our systems use email, and we dont make use of
> the mailer extension.
>
> how do you imagine e-mails are scanned, when not as files?
> it's not usually efficient to pass them to clamav through a socket, it's
> better to store them locally and pass a file descriptor...
>
> >Having to load all the email focused signatures could as you have
> observed impact performance.
>
> I doubt so.
>
>
> --
> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> On the other hand, you have different fingers.
>
> ___
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>

___

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


Re: [clamav-users] [External] Re: Scan very slow

2019-04-05 Thread Tim Hawkins
Hi Micah

Does clamav partition the database so that signatures that are mainly 
associated with email scanning can be dropped out for folks only needing 
filesystems scans,  none of our systems use email, and we dont make use of the 
mailer extension.

Having to load all the email focused signatures could as you have observed 
impact performance.

Sent from Nine

From: "Micah Snyder (micasnyd) via clamav-users" 
Sent: Saturday, April 6, 2019 03:18
To: ClamAV users ML; Mark Allan
Cc: Micah Snyder (micasnyd)
Subject: [External] Re: [clamav-users] Scan very slow

Regarding slow scan times today (and slow scan times in general), it appears 
that the signatures we generate based on PhishTank’s feed for phishing URLs are 
resulting in very slow load and scan times.

Today’s daily update saw 7448 new Phishtank signatures (much higher than usual) 
coinciding with the immediate performance drop for load time and scan time.  
One user reported that the load time today on some of his slower machines was 
slow enough to exceed the timeout for service startup 
(https://bugzilla.clamav.net/show_bug.cgi?id=12317).

In limited testing on my own machine I saw the following change after dropping 
the Phishtank.Phishing signatures from daily.cvd’s daily.ldb file:

  *   Database load time on my laptop went from 75.43203997612 seconds down to 
14.859203100204468 seconds
  *   Scan time (for an arbitrary pdf) went from 1.798 sec to 0.644 sec.

After some discussion between the teams that work on ClamAV and ClamAV 
signature content and deployment, we’ve agreed to drop PhishTank signatures 
from the database until we can determine a way to craft Phishtank signatures 
without incurring such a significant performance hit.

The daily update tomorrow will have the change.

-Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.



From: clamav-users  on behalf of "Micah 
Snyder (micasnyd) via clamav-users" 
Reply-To: ClamAV users ML 
Date: Friday, April 5, 2019 at 1:08 PM
To: Mark Allan , ClamAV users ML 

Cc: "Micah Snyder (micasnyd)" 
Subject: Re: [clamav-users] Scan very slow

Hi Mark,

Sorry about the delay in responding.  I hadn’t looked at my clamav-users filter 
this morning.  Just investigating now.  Will respond when I know more.

-Micah

From: Mark Allan 
Date: Friday, April 5, 2019 at 9:12 AM
To: ClamAV users ML , "Micah Snyder (micasnyd)" 

Subject: Re: [clamav-users] Scan very slow

Also CC'ing Micah directly as the mailing list would appear to be offline (at 
least lists.clamav.net isn't responding to http 
requests anyway)

It looks like scan times have gone through the roof. As Oya said, they're still 
considerably higher than they were a couple of months ago, but today's scan 
time is insane.

Yesterday's scan using
0.101.2:58:25409:1554370140:1:63:48554:328
took 7m 3s

On the same hardware, scanning the same read-only disk image, with today's scan 
using
0.101.2:58:25410:1554452941:1:63:48557:328
the scan time has jumped to 26m 15s

This is the longest it has ever taken to scan this volume (cf my previous email 
of 25th March)

Is there anything that can be excluded?

Best regards
Mark

On Mon, 1 Apr 2019 at 17:11, Micah Snyder (micasnyd) via clamav-users 
mailto:clamav-users@lists.clamav.net>> wrote:
Thanks Oya for the update.  We will continue to investigate the signature 
performance issue.

Regards,
Micah

On 3/28/19, 9:50 AM, "clamav-users on behalf of Tsutomu Oyamada" 
mailto:clamav-users-boun...@lists.clamav.net>
 on behalf of oyam...@promark-inc.com> wrote:

Hi Micah

It seems that the  scanning slow down issue of this time has been solved
at some level with CVD Update of the other day.
However, there is still big discrepancy in between the current condition and
the last condition in one month ago.

DateFiles   Scan time
2019/02/15  2550338 08:53:57
2019/03/15  2612792 19:22:54
2019/03/26  2634489 18:13:56
2019/03/27  2637201 18:10:05

We know the improvement of this time is due to the details of CVD, because
we did not make any change on the user's system.
We are going to try some tuning for scanning.

We like to know if you still have some room to make further improvement
for this slow down issue.
Thank you for your help, in advance.

Best regards,
Oya

On Mon, 25 Mar 2019 15:45:02 +
"Micah Snyder \(micasnyd\) via clamav-users" 
mailto:clamav-users@lists.clamav.net>> wrote:

> Hi Mark, all:
>
> I’m disappointed to hear that it is still slow for you.
>
> We found that the target-type of signatures used for PhishTank.Phishing 
signatures were causing a significant slowdown.   We have dropped them as of 
this past Saturday ( https://lists.gt.net/clamav/virusdb/75279 ) and in the 
last two updates have been re-adding them