Aha, thanks for the clarification!  :-)   The mergesegs command has a -i
option to index the output segment.  Perhaps the SegmentSlicer command
could be modified to optionally index the output segments, too?

New question: aside from slicing URLs by a Perl5 pattern, is there a way
to slice an index by the MD5 hash of the URL?  I'd like to evenly
distribute my big index into evenly-distributed and determinate smaller
indexes.  And applying a Perl5 pattern sounds expensive.  :-)

Thanks,
DaveG


-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 20, 2005 10:39 AM
To: [email protected]
Subject: Re: [Nutch-dev] distributed search

Goldschmidt, Dave wrote:

>Hi Rafi,
>
>Not sure if anyone answered this, but I think you're just after the
>segslice command:
>
>$ nutch segslice
>
>  
>

If I understand the original request, that's only half of the answer, 
but the right half.. ;-)

segslice doesn't slice the Lucene indexes, only the segment data. So, 
after you slice the segments you need to re-index them. Sorry.

I believe it's possible to do the index slicing, it's just not 
implemented yet...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to