[ 
http://issues.apache.org/jira/browse/NUTCH-60?page=comments#action_12313323 ] 

Jerome Charron commented on NUTCH-60:
-------------------------------------

Sami, 

* for the performance speed, I simply uncomment some lines commented as "used 
for benchs" in the main method of LanguageIdentifier. Then, I launch the 
TestIdentifier on a big test of file using the fileset command line argument.

* for the performance quality, I just configure the language identifier plugin 
with the desired size of data to analyze, I comment the line of code 
uncommented for performance speed, and simply launch the command line with the 
fileset command line argument on a big set of documents of the same language 
with grep and wc commands piped in order to get the number of failed 
identifications:
java org.apache.nutch.analysis.lang.LanguageIdentifier -identifyfileset 
/somewhere/fr/*.txt | grep -v "identified as fr" | wc -l

 Hope this can help. But you are true, a set of scripts could be a good idea.

> Bad language identifier plugin performances
> -------------------------------------------
>
>          Key: NUTCH-60
>          URL: http://issues.apache.org/jira/browse/NUTCH-60
>      Project: Nutch
>         Type: Improvement
>   Components: indexer
>     Reporter: Jerome Charron
>     Priority: Minor
>  Attachments: NUTCH-60-050526.patch, NUTCH-60-050605.patch, 
> NUTCH-60-050607.patch
>
> As reported by Stefan Groschupf 
> (http://www.mail-archive.com/[email protected]/msg04090.html)
>  the language identifier plugin consumes a lot of processing time.
> Some optimizations and/or configuration options are required.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to