Without more information this sounds like your tomcat search nutch-site.xml file is setup to use the DFS rather than the local file system. Remember that processing jobs occurs on the DFS but for searching, indexes are best moved to the local file system.
Dennis Kubes JoostRuiter wrote: > Hi All, > > First off, I'm quite the noob when it comes to Nutch, so don't bash me if > the following is an enormously stupid question. > > We're using Nutch on a P4 Duo Core system (800mhz fsb) with 4gig RAM and a > 500gig SATA (3gig/sec) HD. We indexed 350 000 pages into 1 segment of 15gig. > > > Performance is really poor, if we do get search results it will take > multiple minutes. When the query is longer we are getting the following: > > "java.lang.OutOfMemoryError: Java heap memory" > > What we have tried to improve on this: > - Slice the segments into smaller chuncks (max: 50000 url/per seg) > - Set io.map.index.skip to 8 > - Set indexer.termIndexInterval to 1024 > - Cluster with Hadoop (4 nodes to search) > > Any ideas? Missing information? Please let me know, this is my graduation > internship and I would really like to get a good grade ;) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers