Hey Victor ,

Thank you for your answer. In fact you've solved my problem :)
I wasn't aware that the results from the match function were objects 
containing fields like meta, strings,namespace etc ... . To be honest : 
this is not clearly stated in the documentation (I had to let python show 
me all the fields it contained) , but it allows me to match without using 
the callback and the related delay and still have all the info I need !

Thanks !

Op donderdag 18 mei 2017 15:20:19 UTC+2 schreef Víctor Manuel Álvarez 
García:
>
> Wesley's description of the issue is accurate. If you are interested only 
> in the matching rules you better use the results from "match" function 
> instead of using the callback. Allowing to configure in which cases the 
> callback is called would be a nice addition though. 
>
> On Thu, May 18, 2017 at 10:24 AM, <[email protected] <javascript:>> 
> wrote:
>
>> I did the test and the number of times the callback function gets called 
>> is indeed total number of rules X total number of files (12345 X 12319). 
>> Which gives about a billion function calls + additional data conversions 
>> for only 30 hits. So adding the described functionality would definitely be 
>> great ! I'l make an issue on GitHub about it ...
>> thx
>>
>>
>>
>> Op donderdag 18 mei 2017 03:56:33 UTC+2 schreef Wesley Shields:
>>>
>>> I think this is expected behavior, though if it is optimal behavior or 
>>> not is obviously open for debate. ;) 
>>>
>>> Here's what I think is happening. Every time you scan a file your 
>>> "yara_callback" function is called once for each rule, even if the rule 
>>> didn't match. So if you scan 10 files with 5 rules your callback will be 
>>> called 50 times, even if none of those 5 rules match. You can check this by 
>>> adding the following right after you compile your rules: 
>>>
>>> print(len([rule for rule in self.rules])) 
>>>
>>> That will print you the number of rules that compiled. I'm guessing if 
>>> you take that number and multiply it by the number of files scanned they 
>>> should be equal. 
>>>
>>> Now, you could certainly argue that there should be a flag to the match 
>>> function which indicates if you want your callback called for matches, 
>>> non-matches or both. That would likely eliminate a lot of extra work to 
>>> translate things into python objects. This also explains why the "native" 
>>> yara is much faster. It doesn't have to do any of the C to python 
>>> conversion for the data and it also has a "don't show me non-matches" flag 
>>> on by default. 
>>>
>>> If Victor agrees, you could make this an issue on github so it doesn't 
>>> get lost. I've got some experience in this area and may be able to take it 
>>> on too. 
>>>
>>> -- WXS 
>>>
>>> > On May 17, 2017, at 4:59 PM, [email protected] wrote: 
>>> > 
>>> > The profiling results were not added correctly : 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Total time: 928.533 s 
>>> > 
>>> > 
>>> > File: /app/filters/yaraPOC.py 
>>> > 
>>> > 
>>> > Function: match_rules at line 70 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Line #      Hits         Time  Per Hit   % Time  Line Contents 
>>> > 
>>> > 
>>> > ============================================================== 
>>> > 
>>> > 
>>> >     70                                               @profile 
>>> > 
>>> > 
>>> >     71                                               def 
>>> match_rules(self,file): 
>>> > 
>>> > 
>>> >     72                                                   """ 
>>> > 
>>> > 
>>> >     73                                                   Matches yara 
>>> rules against the file 
>>> > 
>>> > 
>>> >     74                                                   :param file: 
>>> relative path to the files_folder specified for the YaraFilter 
>>> > 
>>> > 
>>> >     75                                                   :return: 
>>> returns dictionary with matching information 
>>> > 
>>> > 
>>> >     76                                                   """ 
>>> > 
>>> > 
>>> >     77     12319        12086      1.0      0.0         
>>>  self.matching_results = [] 
>>> > 
>>> > 
>>> >     78     12319         8847      0.7      0.0          if not 
>>> self.rules: 
>>> > 
>>> > 
>>> >     79                                                       
>>> print("Rules not initialised") 
>>> > 
>>> > 
>>> >     80                                                       return 
>>> > 
>>> > 
>>> >     81     12319         4209      0.3      0.0          try: 
>>> > 
>>> > 
>>> >     82     12319    928508227  75372.0    100.0             
>>>  self.rules.match( str(file),callback=self.yara_callback, fast = True) 
>>> > 
>>> > 
>>> >     83                                           
>>> > 
>>> > 
>>> >     84                                                   except 
>>> Exception as e : 
>>> > 
>>> > 
>>> >     85                                                     
>>> print("Error occured trying to match yara rules on file " + str(file) + ':' 
>>> +  str(e)) 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Total time: 351.386 s 
>>> > 
>>> > 
>>> > File: /app/filters/yaraPOC.py 
>>> > 
>>> > 
>>> > Function: yara_callback at line 87 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Line #      Hits         Time  Per Hit   % Time  Line Contents 
>>> > 
>>> > 
>>> > ============================================================== 
>>> > 
>>> > 
>>> >     87                                               @profile 
>>> > 
>>> > 
>>> >     88                                               def 
>>> yara_callback(self,matching_data): 
>>> > 
>>> > 
>>> >     89                                                   """ 
>>> > 
>>> > 
>>> >     90                                                   Callback 
>>> function that gets called for yara rule that matches 
>>> > 
>>> > 
>>> >     91                                                   :param 
>>> matching_data: 
>>> > 
>>> > 
>>> >     92                                                   :return: 
>>> > 
>>> > 
>>> >     93                                                   """ 
>>> > 
>>> > 
>>> >     94                                                   # Currently 
>>> we do not add the strings from the matching rule 
>>> > 
>>> > 
>>> >     95 151991822     43182861      0.3     12.3          if 
>>> matching_data['matches'] : 
>>> > 
>>> > 
>>> >     96        27         1777     65.8      0.0              print 
>>> ('%s matches %s' %(matching_data['rule'],self.current_file)) 
>>> > 
>>> > 
>>> >     97                                           
>>> > 
>>> > 
>>> >     98 151991822    308201707      2.0     87.7         
>>>  yara.CALLBACK_CONTINUE 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Op woensdag 17 mei 2017 22:56:41 UTC+2 schreef [email protected]: 
>>> > Hey Wesley , 
>>> > thanks for your reply. 
>>> > 
>>> > Here's a trimmed down version of my code but the profiling of this 
>>> function gives me the same results if applied to the same set of files. 
>>> > After the code I've added some profiling results. 
>>> > Most of the rules I'm using come from the public repository : 
>>> https://github.com/Yara-Rules/rules 
>>> > 
>>> > FYI My yara-python is dynamically linked against libyara from my 
>>> 'native' yara install. 
>>> > I did some testing with native yara and there is no comparison in 
>>> speed , it's way faster ... 
>>> > 
>>> > 
>>> > import yara 
>>> > import os 
>>> > import logging 
>>> > class YaraPOC(): 
>>> >     ALLOWED_EXTENSIONS = (r".yar",r".yara") 
>>> > 
>>> >     def __init__(self): 
>>> >         self.current_file = "" 
>>> > 
>>> >     def walk_directory_tree(self,directory, extension_filter=None, 
>>> recursive=True): 
>>> >         file_list_res = [] 
>>> >         if not recursive: 
>>> >             file_list_res = [os.path.join(directory, f) for f in 
>>> os.listdir(directory) if 
>>> >                              os.path.isfile(os.path.join(directory, 
>>> f))] 
>>> >         else: 
>>> >             for path, subdirs, files in os.walk(directory): 
>>> >                 for name in files: 
>>> >                     file_list_res.append(os.path.join(path, name)) 
>>> > 
>>> >         if not extension_filter is None: 
>>> >             file_list_res = [f for f in file_list_res if 
>>> f.endswith(extension_filter)] 
>>> > 
>>> >         return file_list_res 
>>> > 
>>> >     def load_rules(self, rules_folder): 
>>> > 
>>> >         print("Loading yararules from: %s" %rules_folder) 
>>> >         rules_file_list = 
>>> self.walk_directory_tree(rules_folder,YaraPOC.ALLOWED_EXTENSIONS,recursive=True)
>>>  
>>>
>>> >         # For each rule we want the path relative to our main folder 
>>> to use as a namespace in yara 
>>> >         namespaces = [] 
>>> >         remove_index = rules_folder.rfind(os.sep) + 1 
>>> >         # For the namespaces we remove this "prefix" from all our 
>>> paths,and create a seperate list for it 
>>> >         for rule in rules_file_list: 
>>> >             namespaces.append(rule[remove_index::]) 
>>> > 
>>> >         filepaths_dict = {} 
>>> >         for indx, namespace in enumerate(namespaces): 
>>> >             filepaths_dict[namespace] = rules_file_list[indx] 
>>> >         try: 
>>> >             self.rules = yara.compile(filepaths=filepaths_dict) 
>>> >         except Exception as e: 
>>> >             print("Compilation error in Yara rules. Are you missing an 
>>> import ? ") 
>>> >             print(str(e)) 
>>> > 
>>> >         print("Loaded %s Yararules" % str(len(namespaces))) 
>>> > 
>>> > 
>>> >     @profile 
>>> >     def match_rules(self,file): 
>>> >         self.matching_results = [] 
>>> >         if not self.rules: 
>>> >             print("Rules not initialised") 
>>> >             return 
>>> > 
>>> >         self.rules.match( str(file),callback=self.yara_callback, fast 
>>> = True) 
>>> > 
>>> >     @profile 
>>> >     def yara_callback(self,matching_data): 
>>> >         if matching_data['matches'] : 
>>> >             print ('%s matches %s' 
>>> %(matching_data['rule'],self.current_file)) 
>>> > 
>>> >         yara.CALLBACK_CONTINUE 
>>> > 
>>> > # Entrypoint 
>>> > if __name__ == "__main__": 
>>> >     yaraPoc = YaraPOC() 
>>> >     yaraPoc.load_rules("/rules/yara") 
>>> >     for file in os.listdir("/files"): 
>>> >         yaraPoc.current_file = file 
>>> >         yaraPoc.match_rules("/files/" + str(file)) 
>>> > 
>>> > 
>>> > Total time: 928.533 s 
>>> > 
>>> > 
>>> > File<span style="color: #660;" class="sty 
>>> > 
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "YARA" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"YARA" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to