Hello,

>From what I've learned, once you have a new segment created and indexed,
you can drop it in "as is" without merging -- you just need to
shutdown/restart Tomcat (or try the "touch" trick mentioned a few posts
ago).

Merging segments together will help speed up query response times.
Correct me if I'm wrong, folks, but ideally you have ONE SEGMENT only
per machine for optimal performance.

QUESTION: I've 10 segments, each about 7GB (~700,000 documents).  The
mergesegs tool is taking a LONG time -- 14.5 hours to do the merge and
on its way to over 13 hours to index the new segment.  Any tricks to
speeding this up?!  I read about 100M-document Nutch instances and I'm
only at 7M, but the merging and indexing takes days -- how long do such
maintenance tasks take with these larger installations?

...
051215 110130  Processed 4600000 records (227.03795 rec/s)
051215 110258  Processed 4620000 records (226.12411 rec/s)
051215 110427  Processed 4640000 records (225.17198 rec/s)
051215 110604  Processed 4660000 records (207.01791 rec/s)
051215 110732  Processed 4680000 records (226.23923 rec/s)
051215 110856  Processed 4700000 records (237.00052 rec/s)
051215 111446  Processed 4720000 records (57.266716 rec/s)
051215 111615  Processed 4740000 records (223.50114 rec/s)
051215 111742  Processed 4760000 records (231.44398 rec/s)
051215 111909  Processed 4780000 records (229.86656 rec/s)
051215 112043  Processed 4800000 records (212.17458 rec/s)
...

DaveG


-----Original Message-----
From: Arun Kaundal [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 15, 2005 5:24 AM
To: [email protected]
Subject: Re: After mergesegs

Hi there,
   I also want to merge segment  based on certain criteria, What I need
to
do so that newly created segment can be re-indexed and searchable. Can u
suggest me how can I achieve that. In what order I need to execute
various
Nutch Tools
 Thanx in advance...


On 12/9/05, Andy Liu <[EMAIL PROTECTED]> wrote:
>
> I'm not sure exactly what you're asking, but if you're merging
segments,
> you'll have to reindex that segment before you can make it searchable.
> "updatedb" doesn't affect the segments.  it just updates the webdb
with
> the
> results of the fetch.  so if you already ran updatedb on the
pre-merged
> segments, you don't need to run it again.
>
> On 12/8/05, Goldschmidt, Dave <[EMAIL PROTECTED]> wrote:
> >
> > Hi, just wanted to be sure - after I merge segments via the
"mergesegs"
> > tool, I need to use the "updatedb" tool before dropping the new
indexes
> > in, correct?
> >
> >
> >
> > And, as just posted, I need to shutdown and restart Tomcat, too,
yes?
> >
> >
> >
> > Thanks,
> >
> > DaveG
> >
> >
> >
> >
> >
>
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to