Re: [Nutch-dev] Perfomance problems and segmenting

Dennis Kubes Mon, 23 Apr 2007 08:53:40 -0700

Without more information this sounds like your tomcat search 
nutch-site.xml file is setup to use the DFS rather than the local file 
system.  Remember that processing jobs occurs on the DFS but for 
searching, indexes are best moved to the local file system.


Dennis Kubes

JoostRuiter wrote:
> Hi All,
> 
> First off, I'm quite the noob when it comes to Nutch, so don't bash me if
> the following is an enormously stupid question.
> 
> We're using Nutch on a P4 Duo Core system (800mhz fsb) with 4gig RAM and a
> 500gig SATA (3gig/sec) HD. We indexed 350 000 pages into 1 segment of 15gig.
> 
> 
> Performance is really poor, if we do get search results it will take
> multiple minutes. When the query is longer we are getting the following:
> 
> "java.lang.OutOfMemoryError: Java heap memory"
> 
> What we have tried to improve on this:
> - Slice the segments into smaller chuncks (max: 50000 url/per seg)
> - Set io.map.index.skip to 8
> - Set indexer.termIndexInterval to 1024
> - Cluster with Hadoop (4 nodes to search)
> 
> Any ideas? Missing information? Please let me know, this is my graduation
> internship and I would really like to get a good grade ;)

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Perfomance problems and segmenting

Reply via email to