[ http://issues.apache.org/jira/browse/NUTCH-60?page=all ]
Jerome Charron updated NUTCH-60: -------------------------------- Attachment: NUTCH-60-050605.patch This patch, keeps the improvements of the previous one (configuration), and provides some optimizations that reduce the processing time from 70% to 20%, depending on the configuration (size of data to process), with an average gain of 25%. I will provides more detailled results of my benchs on the Wiki as soon as possible (http://wiki.apache.org/nutch/LanguageIdentifierBenchs). > Bad language identifier plugin performances > ------------------------------------------- > > Key: NUTCH-60 > URL: http://issues.apache.org/jira/browse/NUTCH-60 > Project: Nutch > Type: Improvement > Components: indexer > Reporter: Jerome Charron > Priority: Minor > Attachments: NUTCH-60-050526.patch, NUTCH-60-050605.patch > > As reported by Stefan Groschupf > (http://www.mail-archive.com/nutch-developers@lists.sourceforge.net/msg04090.html) > the language identifier plugin consumes a lot of processing time. > Some optimizations and/or configuration options are required. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput a projector? How fast can you ride your desk chair down the office luge track? If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers