Thank you Dave, very helfull.

-Ledio

-----Original Message-----
From: Goldschmidt, Dave [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 20, 2005 7:24 AM
To: [email protected]
Subject: RE: [Nutch-dev] distributed search


Hi Rafi,

Not sure if anyone answered this, but I think you're just after the
segslice command:

$ nutch segslice

SegmentSlicer (-local | -ndfs <namenode:port>) -o outputDir [-max count]
[-fix] [-nocontent] [-noparsedata] [-noparsetext] [-filterUrlBy
(+|-)perl5pattern] [-logLevel logLevel] (-dir segments | seg1 seg2 ...)
        NOTE: at least one segment dir name is required, or '-dir'
option.
              outputDir is always required.
        -o outputDir    output directory for segments
        -max count      (optional) output multiple segments, each with
maximum 'count' entries
        -fix            (optional) automatically fix corrupted segments
        -nocontent      (optional) ignore content data
        -noparsedata    (optional) ignore parse_data data
        -nocontent      (optional) ignore parse_text data
        -filterUrlBy    (optional)
                        Filter entry by matching its url with a perl5
pattern.
                        Prefix '+' means: default to skip, match to
save.
                        Prefix '-' means: default to save, match to
skip.
                        If no pattern given, no filtering (all are
saved).
        -logLevel       (optional) logging level
        -dir segments   directory containing multiple segments
        seg1 seg2 ...   segment directories


HTH,
DaveG


-----Original Message-----
From: Ledio Ago [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 19, 2005 8:25 PM
To: [email protected]
Subject: RE: [Nutch-dev] distributed search

Rafi,

Based on what you're saying, this tool splits a fetchlist into several
fetchlists
so that we can crawl/fetch the URLs from different fetchers, right??

If so, that's is not what I'm after.  I'm trying to split an existing
index
into smaller partitions, so that I can make those partinions searchable
from
multiple nutch serchers, distributed search.

Thanks,

Ledio

-----Original Message-----
From: Rafi Iz [mailto:[EMAIL PROTECTED]
Sent: Monday, December 19, 2005 4:49 PM
To: [email protected]
Subject: Re: [Nutch-dev] distributed search



check the next command
FetchListTool (-local | -ndfs <namenode:port>) <db>  <segment_dir> 
[-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers
numFetchers] 
[-adddays numDays]

This command call to a function called emitMultipleLists which spit out 
several fetchlists, so that you can fetch across several machines.

e.g.
bin/nutch org.apache.nutch.tools.FetchListTool ......

Rafi


>From: Stefan Groschupf <[EMAIL PROTECTED]>
>Reply-To: [email protected]
>To: [email protected]
>Subject: Re: [Nutch-dev] distributed search
>Date: Tue, 20 Dec 2005 00:38:22 +0100
>
>>By the way, is there an easy way to split the index I have already
have.
>>I would hate to recrawl all of the 1.9MM URLs again and waste
bandwidth.
>
>Well I do not know any tool that comes with nutch or a other tool  that

>does it, may there is one.
>But to write a java class that creates two smaller indexes from one
large 
>is very easy, a hour work maximum.
>Just check any of the existing lucene tutorial, lucene java doc or  the

>lucene book.
>BTW, Erik Hatcher's book "Lucene in action" is a MUST for all nutch
users. 
>:-)
>
>Stefan
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's
FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to