Your welcome.
As a final note, speaking from experience here, again the rules themself
can make a huge difference.
A real world example:
In my setup I have about 1200 mostly simple rules.
By simple I mean they are direct binary search conditions with no wildcards
(think a memcmp() kind of search).
Example:
rule CRC_8_TABLE
{
meta:
description = "CRC-8 table"
strings:
$c0 = { 00 57 AE F9 0B 5C A5 F2 16 41 B8 EF 1D 4A B3 E4 2C
7B 82 D5 27 70 89 DE 3A 6D 94 C3 31 66 9F C8 58 0F F6 A1 53 04 FD AA 4E 19
E0 B7 45 12 EB BC 74 23 DA 8D 7F 28 D1 86 62 35 CC 9B 69 3E C7 90
B0 E7 1E 49 BB EC 15 42 A6 F1 08 5F AD FA
03 54 9C CB 32 65 97 C0 39 6E 8A DD 24 73 81 D6 2F 78 E8 BF 46 11 E3 B4 4D
1A FE A9 50 07 F5 A2 5B 0C C4 93 6A 3D CF 98 61 36 D2 85 7C 2B D9 8E 77 20
37 60 99 CE 3C 6B 92 C5 21 76 8F D8 2A 7D
84 D3 1B 4C B5 E2 10 47 BE E9 0D 5A A3 F4 06 51 A8 FF 6F 38 C1 96 64 33 CA
9D 79 2E D7 80 72 25 DC 8B 43 14 ED BA 48 1F E6 B1 55 02 FB AC 5E 09 F0 A7
87 D0 29 7E 8C DB 22 75 91 C6 3F 68 9A CD
34 63 AB FC 05 52 A0 F7 0E 59 BD EA 13 44 B6 E1 18 4F DF 88 71 26 D4 83 7A
2D C9 9E 67 30 C2 95 6C 3B F3 A4 5D 0A F8 AF 56 01 E5 B2 4B 1C EE B9 40 17 }
condition:
$c0
}
95% of the rules are like this. Some large like the CRC32 table is 1024
bytes, etc.
The remaining 5% are a bit more complex that require a custom "module" I
added to libyara.
On a very large executable, it takes ~3 seconds to scan through all these
rules.
Now compare this to:
https://github.com/Yara-Rules/rules/blob/master/crypto/crypto_signatures.yar
There are about 125 rules in total.
While some our simple like the example above, the majority are relatively
complex using regex, etc.
Example:
rule Big_Numbers0
{
meta:
author = "_pusher_"
description = "Looks for big numbers 20:sized"
date = "2016-07"
strings:
$c0 = /[0-9a-fA-F]{20}/ fullword ascii
condition:
$c0
}
*In the same setup these 125 rules took 1.45 minutes!*
So even those the more complex set is only ~10% of the count of simple
rules it took 28 times longer to scan.
Also note these signatures had way too many false positives which could be
a major factor (without isolating further) in why it takes so long to scan.
Note: Not knocking YARA in any way.
Is there another public/open source technology like YARA, and anything else
that is the de facto signature system for Malware, etc., other than YARA?
Not that I know of.. It's not perfect, but overall it's great IMHO.
On Monday, March 21, 2022 at 7:03:02 PM UTC-4 Gman wrote:
> Thanks. I believe I have tried all the suggestions already. Magic doesn't
> help, as not only is slow and will run anyways, but it's also not supported
> in Windows. You can achieve a similar effect by simply checking for
> specific bytes using the uint notation, but the problem remains in the fact
> that at the moment you include a module such as "PE" or "DOTNET", these
> modules will parse the entire file anyways. This means that the following
> scenario is never optimized:
>
> import "pe"
> condition:
> filesize <= 0xFF0000 and pe.number_of_sections == 8
>
> You may think that when the filesize (or magic or a single byte check)
> condition is False it will short-circuit and exit. While this is correct,
> it has also wasted all the time so you don't get any performance benefit.
> This is because the PE module parsed the entire file anyways!
>
> I have also tried the custom-scanning front-end approach we discussed
> before, having a subset of pre-compiled rules for PNG, another for PE,
> another for DOCX. To my surprise, this resulted on even slower scan times.
> This might be related to the fact that the Ahocorasick automaton is already
> as good as it gets when all the rules are combined together, which means
> when you split into 3 "scanners/rules sets", you don't get much benefit and
> in the contrary, you may introduce more overhead on the filetype
> checking/conditional scanning logic.
>
> I would love to hear from the original YARA authors, to at least
> understand if we are incorrectly using YARA or this is indeed a (very
> common?) scenario that is not covered/not optimized for.
>
> Thanks
> On Friday, March 18, 2022 at 9:31:07 PM UTC-7 Joe Neighbor wrote:
>
>> Also looking through the manual again, there is a "magic" module.
>> "The Magic module allows you to identify the type of the file based on
>> the output of file, the standard Unix command."
>> Then apparently you can make rule conditions like "magic.mime_type() ==
>> "application/pdf"
>>
>> But still each rule in a set is going to be loaded and compiled
>> regardless like you say.
>> So for max performance potential and control I still think you will want
>> to go with a custom scanning front end.
>>
>> And the sky the limit here, you could have a central controlling app
>> (again maybe made using Python) and then send off scan jobs off to machines
>> on your lan or cloud instances, etc.
>> Sure, a lot more development work for this this kind of scheme though.
>>
>>
>> On Saturday, March 19, 2022 at 12:04:29 AM UTC-4 Joe Neighbor wrote:
>>
>>> Have you considered making a custom version of the console/front end?
>>>
>>> Maybe you can use "Rule tags" (yara manual page 27)?
>>> You could have a "DLL" tag and/or a "EXE" tag then you can seperate them
>>> at least.
>>> Although it says: "Those tags can be used later to filter YARA’s
>>> output". Which might mean they all get scanned anyhow, but then you
>>> filter them in the Yara match callback which is not going to help
>>> performance.
>>>
>>> I know, it really slows down depending on the rules and the size of the
>>> search.
>>>
>>> Here is what I think will be a good solution for you:
>>> 1) Separate your rules into a series of files.
>>> A set of rules for PNG files, DLLs, DOCX, etc.
>>> Compiling is fast but you could sort and batch your inputs for a set of
>>> compiled rule set.
>>>
>>>
>>> You might have to tool up a solution to help manage the rules if you
>>> have a lot of them.
>>> Might have to put them into a DB, etc., so that if one to change a rule
>>> that needs to be duplicated to different sets the changes could be
>>> automatic.
>>> Completely automated, it could help you zero on bad/slow rules too.
>>> If you need a lot of control, you could do something like putting all
>>> your rules in a JSON format then the YARA rule files can be created on the
>>> fly.
>>> At first though I'd just set them all up by manually for development.
>>>
>>> 2) In your controlling app (can still be a console/terminal still) you
>>> first enumerate all the files to scan and put them in buckets.
>>> 3) Then you scan each set in bulk with the per type rule set and only
>>> having to compile once per.
>>> As you find matches you either dump it to the screen or save them in a
>>> JSON for later processing et al.
>>>
>>> There is: https://github.com/VirusTotal/yara-python
>>> Which I think you can get this sort of setup up and running pretty fast
>>> with Python.
>>> And since yara-python is just a binary wrapper around libyara you are
>>> still going to get just sub C/C++ performance.
>>> The Python part is just mostly to do the file wrangling. Unless you need
>>> additional features.
>>>
>>> On Friday, March 18, 2022 at 12:45:13 PM UTC-4 Gman wrote:
>>>
>>>> Thanks for your insights.
>>>>
>>>> I'm not too concerned about drive I/O overhead (this part I consider
>>>> the benchmark pre-stage, so not measuring it). In my tests, I'm simply
>>>> loading all the files first (e.g. file mapping) and hence the only
>>>> performance bottleneck I observe when profiling is the pure CPU-bound YVM
>>>> execution of the instructions.
>>>>
>>>> One of the things I have noticed is that the way in which YARA is
>>>> designed, it will process each sample entirely regardless of having rules
>>>> that apply to them. I have seen many GitHub threads explaining why this is
>>>> a "long term investment" (as you are likely to have 1 rule that will need
>>>> it anyways), but I don't fully agree with the justification. While there
>>>> are scenarios in which you will have a set of files and a set of rules
>>>> that
>>>> will mix well together, in my experience this is more like an exception.
>>>> Let me explain with an example:
>>>>
>>>> I have 5000 PE-EXE files to scan, and 300 PE rules for DLLs only. Now
>>>> imagine how the scan goes...
>>>>
>>>> a) The Ahocorasick tree will fit together 300 rules' atoms. All the PE
>>>> rules start with the "is_pe / is_dll" condition that can quickly exit via
>>>> short-circuit.
>>>> b) YARA will open and fully parse 5k EXE files, regardless of how many
>>>> "potential" rules I have to apply.
>>>> c) Since all my files are EXE and NOT DLL, no PE rule will actually
>>>> match. However, I have wasted a lot of time fully parsing 5k files
>>>> (because
>>>> the PE module will parse the entire file anyways).
>>>>
>>>> You can quickly observe how the performance drops significantly on such
>>>> scenarios. Now consider that you have an unbalanced mix of files such as
>>>> PE, PNG and DOCX. And then, think about an unproportioned number of rules
>>>> (e.g. 1 PE rule and 100 PNG rules but no DOCX rules). It is clear at this
>>>> point that some rules will "heavily penalize" the overall performance of
>>>> YARA simply because they will unnecessarily "overload" the scan by running
>>>> on "will never match" filetypes (in this example, 100 PNG rules that are
>>>> heavy on strings will unnecessarily run on 20k DOCX files). Even if you
>>>> implement something like the "Magic" module to identify it, YARA will go
>>>> over the file anyways trying to find the atoms.
>>>>
>>>> Real world scenarios are in my experience a lot more like the case I'm
>>>> describing: you have an assorted and unpredictable collection of files to
>>>> be scanned, and it's likely that your rules will only ever matter for a
>>>> small portion of your files. If your collection of files is big and
>>>> "misaligned" with the "filetypes" that your rules are designed for, then
>>>> you have a performance nightmare in your hands.
>>>>
>>>> So I guess I'm describing two different issues or improvement
>>>> opportunities (or asking if there is another way to address the described
>>>> scenarios):
>>>>
>>>> 1) It gives me the impression that YARA is missing an essential
>>>> "selective tree" mechanism. For example, imagine if you could have one
>>>> Ahocorasick Tree for PE files, but a different one for PNG files. That
>>>> would drastically reduce the scan time by simply focusing on a "this makes
>>>> sense to be scanned" selector and skip the rest. Traversing a set of 5
>>>> atoms is not the same as traversing a set of 5000 atoms...
>>>> 2) Alternatively or complementarily, it seems YARA would drastically
>>>> benefit from a "pre-flight" check for example for PE files. What if all my
>>>> rules are for DLL files but I'm scanning 100 EXE files? Why should I waste
>>>> precious time fully parsing each entire PE file if I could simply check
>>>> the
>>>> header/first few bytes to determine that the file is a DLL?
>>>>
>>>> Thanks,
>>>> On Friday, March 11, 2022 at 10:11:03 PM UTC-8 Joe Neighbor wrote:
>>>>
>>>>> In my experience doing mostly in memory YARA scanning:
>>>>> I use up to as many threads as the system has physical cores.
>>>>> I doubt logical HT/SMT cores are going to help much (from the
>>>>> following observations).
>>>>>
>>>>> With mostly simple rules that just do hex (byte run) scanning with no
>>>>> wildcards and a single condition extra threads only give about a 10%
>>>>> performance increase.
>>>>> This makes sense because the efficient intensive Aho-Corasick
>>>>> algorithm is memory constrained. The extra threads are just maxing out
>>>>> the
>>>>> memory bandwidth.
>>>>> At this point there is little difference between using one core or 64
>>>>> because memory bandwidth hits it's limit.
>>>>>
>>>>> For more complex rule sets, I see about a 20% to 30% performance
>>>>> increase.
>>>>> This makes sense too as now some of the memory bandwidth gets traded
>>>>> with increased compute (from the extra rules logic).
>>>>> Here this work gets distributed to other cores that can simultaneously
>>>>> compute. And of course memory bandwidth still gets maxed out.
>>>>>
>>>>> See:
>>>>> https://github.com/Neo23x0/YARA-Performance-Guidelines/
>>>>>
>>>>> There is a problem with your logic.
>>>>> Caching doesn't eliminate drive I/0 overhead. You'd have to preload
>>>>> all of your files into memory for that.
>>>>> And are you sure your OS cache is big enough to hold all of your files
>>>>> to begin with?
>>>>> There is still a lot going on in trips from UM to KM and back again.
>>>>> At minimal memcpy() (even if in DMA hardware) from the OS cache into your
>>>>> process et al.
>>>>> A key thing here is you say "cache". This means "memory". You are
>>>>> competing with the OS use of it's file buffer memory (and CPU cache too)
>>>>> vs
>>>>> the intense Aho-Corasick scanning thread(s).
>>>>>
>>>>> Hopefully someday, a new memory technology will come along that will
>>>>> match the CPU in clock speed (in whole values not just bits)..
>>>>>
>>>>> Reads like you are on the right track though.
>>>>> You have to find where the actual bottlenecks actually are (using
>>>>> system and process performance tools) and mitigate them the best you can.
>>>>>
>>>>> On Tuesday, March 8, 2022 at 7:05:09 PM UTC-5 Gman wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to get the maximum possible performance out of YARA, and
>>>>>> for that goal I've been studying the code and algorithms to ensure
>>>>>> everything is contemplated:
>>>>>>
>>>>>> 1) My understanding is that the Aho-Corasick algorithm helps build
>>>>>> the Atoms tree to then efficiently apply just the rules that have Atoms
>>>>>> matching the scanned file. This is a great start because not all the
>>>>>> rules
>>>>>> will be executed for each file.
>>>>>> 2) I also believe there is a short-circuit logic capability so that
>>>>>> once a condition is not satisfied, the subsequent ones will not even try
>>>>>> to
>>>>>> execute.
>>>>>> 3) The -f option (as seen in the command line tool) will also run in
>>>>>> "fast" mode and report the first occurrence, without wasting time on
>>>>>> subsequent checks/rules.
>>>>>> 4) Precompiling rules is a good practice as it saves time, given that
>>>>>> the scanner won't need to compile them before starting a scan.
>>>>>> 5) Writing the rules in smart ways yields better performance,
>>>>>> including: using non-trivial hex sequences, replacing some strings with
>>>>>> hex
>>>>>> representations, re-writing regexs to be more efficient, (sorting the
>>>>>> conditions?), etc.
>>>>>> 6) You can run YARA in multi-thread mode. There is a drastic
>>>>>> difference between running with 1 thread vs running with 16 threads
>>>>>> (most
>>>>>> likely as it also takes advantage of I/O vs CPU-bound operations).
>>>>>>
>>>>>> With these in mind, I tried to measure the performance of YARA for
>>>>>> scanning a given directory (e.g. containing 10k assorted files) using an
>>>>>> artificial set of 5k, 10k, 20k and even 40k rules. To my surprise, YARA
>>>>>> is
>>>>>> quite fast up to 5k rules, and after that performance degrades
>>>>>> drastically
>>>>>> (almost in a linear fashion). Note: I run the benchmark multiple times
>>>>>> to
>>>>>> eliminate the effect of hard disk I/O (hence, having everything in
>>>>>> cache/memory).
>>>>>>
>>>>>> - Am I missing any possible optimization trick or Best-Known-Method?
>>>>>> - Does YARA suffers from some limitation in terms of performance
>>>>>> related to # of rules or # of files?
>>>>>> - Based on my basic understanding of the source code, the modules
>>>>>> such as "pe" and "dotnet" are actually parsing the entire file (within
>>>>>> the
>>>>>> module Load) regardless of the rules actually using these modules. Let's
>>>>>> say a rule just needs to do the check pe.is_pe, do we need to parse the
>>>>>> entire file just for that? Aren't the imported/exported functions or
>>>>>> certificates parsing slowing down the scan unnecessarily? (I'm not even
>>>>>> sure this is the reason for performance degradation, just a thought).
>>>>>>
>>>>>> Any tip or suggestion is much appreciated, and happy to contribute
>>>>>> back if there is an opportunity to do so.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>
--
You received this message because you are subscribed to the Google Groups
"YARA" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/yara-project/b211db06-ee78-4bde-9580-b4c4c8fad778n%40googlegroups.com.