I did the test and the number of times the callback function gets called is 
indeed total number of rules X total number of files (12345 X 12319). Which 
gives about a billion function calls + additional data conversions for only 
30 hits. So adding the described functionality would definitely be great ! 
I'l make an issue on GitHub about it ...
thx



Op donderdag 18 mei 2017 03:56:33 UTC+2 schreef Wesley Shields:
>
> I think this is expected behavior, though if it is optimal behavior or not 
> is obviously open for debate. ;) 
>
> Here's what I think is happening. Every time you scan a file your 
> "yara_callback" function is called once for each rule, even if the rule 
> didn't match. So if you scan 10 files with 5 rules your callback will be 
> called 50 times, even if none of those 5 rules match. You can check this by 
> adding the following right after you compile your rules: 
>
> print(len([rule for rule in self.rules])) 
>
> That will print you the number of rules that compiled. I'm guessing if you 
> take that number and multiply it by the number of files scanned they should 
> be equal. 
>
> Now, you could certainly argue that there should be a flag to the match 
> function which indicates if you want your callback called for matches, 
> non-matches or both. That would likely eliminate a lot of extra work to 
> translate things into python objects. This also explains why the "native" 
> yara is much faster. It doesn't have to do any of the C to python 
> conversion for the data and it also has a "don't show me non-matches" flag 
> on by default. 
>
> If Victor agrees, you could make this an issue on github so it doesn't get 
> lost. I've got some experience in this area and may be able to take it on 
> too. 
>
> -- WXS 
>
> > On May 17, 2017, at 4:59 PM, tofba...@gmail.com <javascript:> wrote: 
> > 
> > The profiling results were not added correctly : 
> > 
> > 
> > 
> > 
> > Total time: 928.533 s 
> > 
> > 
> > File: /app/filters/yaraPOC.py 
> > 
> > 
> > Function: match_rules at line 70 
> > 
> > 
> > 
> > 
> > 
> > Line #      Hits         Time  Per Hit   % Time  Line Contents 
> > 
> > 
> > ============================================================== 
> > 
> > 
> >     70                                               @profile 
> > 
> > 
> >     71                                               def 
> match_rules(self,file): 
> > 
> > 
> >     72                                                   """ 
> > 
> > 
> >     73                                                   Matches yara 
> rules against the file 
> > 
> > 
> >     74                                                   :param file: 
> relative path to the files_folder specified for the YaraFilter 
> > 
> > 
> >     75                                                   :return: 
> returns dictionary with matching information 
> > 
> > 
> >     76                                                   """ 
> > 
> > 
> >     77     12319        12086      1.0      0.0         
>  self.matching_results = [] 
> > 
> > 
> >     78     12319         8847      0.7      0.0          if not 
> self.rules: 
> > 
> > 
> >     79                                                       
> print("Rules not initialised") 
> > 
> > 
> >     80                                                       return 
> > 
> > 
> >     81     12319         4209      0.3      0.0          try: 
> > 
> > 
> >     82     12319    928508227  75372.0    100.0             
>  self.rules.match( str(file),callback=self.yara_callback, fast = True) 
> > 
> > 
> >     83                                           
> > 
> > 
> >     84                                                   except 
> Exception as e : 
> > 
> > 
> >     85                                                     print("Error 
> occured trying to match yara rules on file " + str(file) + ':' +  str(e)) 
> > 
> > 
> > 
> > 
> > 
> > Total time: 351.386 s 
> > 
> > 
> > File: /app/filters/yaraPOC.py 
> > 
> > 
> > Function: yara_callback at line 87 
> > 
> > 
> > 
> > 
> > 
> > Line #      Hits         Time  Per Hit   % Time  Line Contents 
> > 
> > 
> > ============================================================== 
> > 
> > 
> >     87                                               @profile 
> > 
> > 
> >     88                                               def 
> yara_callback(self,matching_data): 
> > 
> > 
> >     89                                                   """ 
> > 
> > 
> >     90                                                   Callback 
> function that gets called for yara rule that matches 
> > 
> > 
> >     91                                                   :param 
> matching_data: 
> > 
> > 
> >     92                                                   :return: 
> > 
> > 
> >     93                                                   """ 
> > 
> > 
> >     94                                                   # Currently we 
> do not add the strings from the matching rule 
> > 
> > 
> >     95 151991822     43182861      0.3     12.3          if 
> matching_data['matches'] : 
> > 
> > 
> >     96        27         1777     65.8      0.0              print ('%s 
> matches %s' %(matching_data['rule'],self.current_file)) 
> > 
> > 
> >     97                                           
> > 
> > 
> >     98 151991822    308201707      2.0     87.7         
>  yara.CALLBACK_CONTINUE 
> > 
> > 
> > 
> > 
> > 
> > Op woensdag 17 mei 2017 22:56:41 UTC+2 schreef tofba...@gmail.com: 
> > Hey Wesley , 
> > thanks for your reply. 
> > 
> > Here's a trimmed down version of my code but the profiling of this 
> function gives me the same results if applied to the same set of files. 
> > After the code I've added some profiling results. 
> > Most of the rules I'm using come from the public repository : 
> https://github.com/Yara-Rules/rules 
> > 
> > FYI My yara-python is dynamically linked against libyara from my 
> 'native' yara install. 
> > I did some testing with native yara and there is no comparison in speed 
> , it's way faster ... 
> > 
> > 
> > import yara 
> > import os 
> > import logging 
> > class YaraPOC(): 
> >     ALLOWED_EXTENSIONS = (r".yar",r".yara") 
> > 
> >     def __init__(self): 
> >         self.current_file = "" 
> > 
> >     def walk_directory_tree(self,directory, extension_filter=None, 
> recursive=True): 
> >         file_list_res = [] 
> >         if not recursive: 
> >             file_list_res = [os.path.join(directory, f) for f in 
> os.listdir(directory) if 
> >                              os.path.isfile(os.path.join(directory, f))] 
> >         else: 
> >             for path, subdirs, files in os.walk(directory): 
> >                 for name in files: 
> >                     file_list_res.append(os.path.join(path, name)) 
> > 
> >         if not extension_filter is None: 
> >             file_list_res = [f for f in file_list_res if 
> f.endswith(extension_filter)] 
> > 
> >         return file_list_res 
> > 
> >     def load_rules(self, rules_folder): 
> > 
> >         print("Loading yararules from: %s" %rules_folder) 
> >         rules_file_list = 
> self.walk_directory_tree(rules_folder,YaraPOC.ALLOWED_EXTENSIONS,recursive=True)
>  
>
> >         # For each rule we want the path relative to our main folder to 
> use as a namespace in yara 
> >         namespaces = [] 
> >         remove_index = rules_folder.rfind(os.sep) + 1 
> >         # For the namespaces we remove this "prefix" from all our 
> paths,and create a seperate list for it 
> >         for rule in rules_file_list: 
> >             namespaces.append(rule[remove_index::]) 
> > 
> >         filepaths_dict = {} 
> >         for indx, namespace in enumerate(namespaces): 
> >             filepaths_dict[namespace] = rules_file_list[indx] 
> >         try: 
> >             self.rules = yara.compile(filepaths=filepaths_dict) 
> >         except Exception as e: 
> >             print("Compilation error in Yara rules. Are you missing an 
> import ? ") 
> >             print(str(e)) 
> > 
> >         print("Loaded %s Yararules" % str(len(namespaces))) 
> > 
> > 
> >     @profile 
> >     def match_rules(self,file): 
> >         self.matching_results = [] 
> >         if not self.rules: 
> >             print("Rules not initialised") 
> >             return 
> > 
> >         self.rules.match( str(file),callback=self.yara_callback, fast = 
> True) 
> > 
> >     @profile 
> >     def yara_callback(self,matching_data): 
> >         if matching_data['matches'] : 
> >             print ('%s matches %s' 
> %(matching_data['rule'],self.current_file)) 
> > 
> >         yara.CALLBACK_CONTINUE 
> > 
> > # Entrypoint 
> > if __name__ == "__main__": 
> >     yaraPoc = YaraPOC() 
> >     yaraPoc.load_rules("/rules/yara") 
> >     for file in os.listdir("/files"): 
> >         yaraPoc.current_file = file 
> >         yaraPoc.match_rules("/files/" + str(file)) 
> > 
> > 
> > Total time: 928.533 s 
> > 
> > 
> > File<span style="color: #660;" class="sty 
> > 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"YARA" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to yara-project+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to