Thank you Dave, very helfull. -Ledio
-----Original Message----- From: Goldschmidt, Dave [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 20, 2005 7:24 AM To: [email protected] Subject: RE: [Nutch-dev] distributed search Hi Rafi, Not sure if anyone answered this, but I think you're just after the segslice command: $ nutch segslice SegmentSlicer (-local | -ndfs <namenode:port>) -o outputDir [-max count] [-fix] [-nocontent] [-noparsedata] [-noparsetext] [-filterUrlBy (+|-)perl5pattern] [-logLevel logLevel] (-dir segments | seg1 seg2 ...) NOTE: at least one segment dir name is required, or '-dir' option. outputDir is always required. -o outputDir output directory for segments -max count (optional) output multiple segments, each with maximum 'count' entries -fix (optional) automatically fix corrupted segments -nocontent (optional) ignore content data -noparsedata (optional) ignore parse_data data -nocontent (optional) ignore parse_text data -filterUrlBy (optional) Filter entry by matching its url with a perl5 pattern. Prefix '+' means: default to skip, match to save. Prefix '-' means: default to save, match to skip. If no pattern given, no filtering (all are saved). -logLevel (optional) logging level -dir segments directory containing multiple segments seg1 seg2 ... segment directories HTH, DaveG -----Original Message----- From: Ledio Ago [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 8:25 PM To: [email protected] Subject: RE: [Nutch-dev] distributed search Rafi, Based on what you're saying, this tool splits a fetchlist into several fetchlists so that we can crawl/fetch the URLs from different fetchers, right?? If so, that's is not what I'm after. I'm trying to split an existing index into smaller partitions, so that I can make those partinions searchable from multiple nutch serchers, distributed search. Thanks, Ledio -----Original Message----- From: Rafi Iz [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 4:49 PM To: [email protected] Subject: Re: [Nutch-dev] distributed search check the next command FetchListTool (-local | -ndfs <namenode:port>) <db> <segment_dir> [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays] This command call to a function called emitMultipleLists which spit out several fetchlists, so that you can fetch across several machines. e.g. bin/nutch org.apache.nutch.tools.FetchListTool ...... Rafi >From: Stefan Groschupf <[EMAIL PROTECTED]> >Reply-To: [email protected] >To: [email protected] >Subject: Re: [Nutch-dev] distributed search >Date: Tue, 20 Dec 2005 00:38:22 +0100 > >>By the way, is there an easy way to split the index I have already have. >>I would hate to recrawl all of the 1.9MM URLs again and waste bandwidth. > >Well I do not know any tool that comes with nutch or a other tool that >does it, may there is one. >But to write a java class that creates two smaller indexes from one large >is very easy, a hour work maximum. >Just check any of the existing lucene tutorial, lucene java doc or the >lucene book. >BTW, Erik Hatcher's book "Lucene in action" is a MUST for all nutch users. >:-) > >Stefan > _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
