[ http://issues.apache.org/jira/browse/NUTCH-60?page=comments#action_12313323 ]
Jerome Charron commented on NUTCH-60: ------------------------------------- Sami, * for the performance speed, I simply uncomment some lines commented as "used for benchs" in the main method of LanguageIdentifier. Then, I launch the TestIdentifier on a big test of file using the fileset command line argument. * for the performance quality, I just configure the language identifier plugin with the desired size of data to analyze, I comment the line of code uncommented for performance speed, and simply launch the command line with the fileset command line argument on a big set of documents of the same language with grep and wc commands piped in order to get the number of failed identifications: java org.apache.nutch.analysis.lang.LanguageIdentifier -identifyfileset /somewhere/fr/*.txt | grep -v "identified as fr" | wc -l Hope this can help. But you are true, a set of scripts could be a good idea. > Bad language identifier plugin performances > ------------------------------------------- > > Key: NUTCH-60 > URL: http://issues.apache.org/jira/browse/NUTCH-60 > Project: Nutch > Type: Improvement > Components: indexer > Reporter: Jerome Charron > Priority: Minor > Attachments: NUTCH-60-050526.patch, NUTCH-60-050605.patch, > NUTCH-60-050607.patch > > As reported by Stefan Groschupf > (http://www.mail-archive.com/[email protected]/msg04090.html) > the language identifier plugin consumes a lot of processing time. > Some optimizations and/or configuration options are required. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput a projector? How fast can you ride your desk chair down the office luge track? If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
