Re: [Nutch-dev] Perfomance problems and segmenting

Briggs Mon, 23 Apr 2007 14:21:09 -0700

One more thing...

Are you using a distributed index?  If this is so, you do not want to
do this; indexes should be local to the machine that is being
searched.


On 4/23/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> Without more information this sounds like your tomcat search
> nutch-site.xml file is setup to use the DFS rather than the local file
> system.  Remember that processing jobs occurs on the DFS but for
> searching, indexes are best moved to the local file system.
>
> Dennis Kubes
>
> JoostRuiter wrote:
> > Hi All,
> >
> > First off, I'm quite the noob when it comes to Nutch, so don't bash me if
> > the following is an enormously stupid question.
> >
> > We're using Nutch on a P4 Duo Core system (800mhz fsb) with 4gig RAM and a
> > 500gig SATA (3gig/sec) HD. We indexed 350 000 pages into 1 segment of 15gig.
> >
> >
> > Performance is really poor, if we do get search results it will take
> > multiple minutes. When the query is longer we are getting the following:
> >
> > "java.lang.OutOfMemoryError: Java heap memory"
> >
> > What we have tried to improve on this:
> > - Slice the segments into smaller chuncks (max: 50000 url/per seg)
> > - Set io.map.index.skip to 8
> > - Set indexer.termIndexInterval to 1024
> > - Cluster with Hadoop (4 nodes to search)
> >
> > Any ideas? Missing information? Please let me know, this is my graduation
> > internship and I would really like to get a good grade ;)
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Perfomance problems and segmenting

Reply via email to