You might want to try this but I am not sure if it works :-) Please make backups before!! This is a work around..
I assume that you have two working index i.e "CrawlA" and "CrawlB" (Ready to go and works like a charm via the browser :-). Ok I am taking for granted that all directory like index, indexes, segments etc are in the directory "CrawlA" and "CrawlB" Now make a new directory called "CrawlC" mkdir CrawlC cd CrawlC mkdir crawldb cd crawldb mkdir current cd current Now copy the cp -r CrawlA/crawldb/current/part-00000 to CrawlC/crawldb/current/part-00000 cp -r CrawlB/crawldb/current/part-00000 to CrawlC/crawldb/current/part-00001 NOTE the part-00001 Now make a directory segments under CrawlC cd to CrawlC/segments Now copy the cp-r CrawlA/segments/* to CrawlC/segments/* cp-r CrawlB/segments/* to CrawlC/segments/* etc.. Now you should have under CrawlC two directory crawldb segments Proceed with - bin/nutch invertlinks linkdb segments/* - bin/nutch index indexes crawldb linkdb segments/* - bin/nutch dedup indexes - bin/nutch merge index indexes Change your searcher.dir in nutch-site.xml and give it a go.. Cheers On 4/4/06, Olive g <[EMAIL PROTECTED]> wrote: > We too have deadlines :(. > > I would appreciate it very much if someone can provide more insight. Is it a > bug or > configuration issue? How can we even do incremental crawsl on 0.8 with these > issues? > > Should I send email to the developer mailing list? Would that help? > > Gurus, please help !!!! > > > > >From: "Vertical Search" <[EMAIL PROTECTED]> > >Reply-To: [email protected] > >To: [email protected] > >Subject: Re: Merging indexes -- please help.... > >Date: Tue, 4 Apr 2006 10:11:51 -0500 > > > >Sorry. I too have faced the same problem.. I am in process of releasing for > >a demo (mangement) over this weekend. > >I will try to work on merging stuff after that... IT is a very important > >part and have to get it to work, if I have to succeed in adopting Nutch for > >a vertical domain. > >Further more. I could not get the PruneIndexTool up and running. > >It asks for query. I wonder if some can share the query file or format, the > >tool expects. > > > >But goes without saying.. I am very thankful for folks here extending the > >help. > > > >Thanks > > > > > > > >On 4/4/06, Olive g <[EMAIL PROTECTED]> wrote: > > > > > > Hi, > > > > > > I encountered the same problem on 0.8. See my post > > > > >http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04103.html. > > > Anyone has any idea? Is it a bug or a configuration issue? Please let me > > > know. > > > Thanks. > > > > > > Olive > > > > > > >From: "Dan Morrill" <[EMAIL PROTECTED]> > > > >Reply-To: [email protected] > > > >To: <[email protected]> > > > >Subject: RE: Merging indexes -- please help.... > > > >Date: Mon, 3 Apr 2006 05:18:34 -0700 > > > > > > > >Hi, > > > > > > > >I noticed that when I used the drive designation that it didn't like > >that > > > >(windows cygwin environment) if you did > > > > > > > >./nutch merge -local /STG1/index /STG1/indexes that may work better, > >let > > > me > > > >know. > > > > > > > >Cheers/r/dan > > > >H > > > >-----Original Message----- > > > >From: Vertical Search [mailto:[EMAIL PROTECTED] > > > >Sent: Sunday, April 02, 2006 7:07 PM > > > >To: [email protected] > > > >Subject: Re: Merging indexes -- please help.... > > > > > > > >Okay. > > > >I had 2 sets of crawl > > > >such as E:/STG1 and E/STG2 > > > >I used the dedup command to remove duplicates > > > >Then I the command i used to merge is as follows > > > ><based on what have been available on mail archieves and responses I > >got > > > > > > > >First I can > > > > > > > > bin/nutch merge E:/STG1/index E:/STG1/indexes > > > > bin/nutch merge E:/STG1/index E:/STG2/indexes > > > > > > > >In the nutch-site .xml I have searcher.dir ad E:/STG1 > > > > > > > >I get the absolutely no results...The command console is as follows. > > > >Can some one shed some light on this please ASAP.. > > > > > > > >INFO: creating new bean > > > >Apr 2, 2006 8:58:36 PM org.apache.nutch.searcher.NutchBean init > > > >INFO: opening merged index in E:\Hoodukoo\STG5\index > > > >Apr 2, 2006 8:58:36 PM org.apache.nutch.searcher.NutchBean init > > > >INFO: opening segments in E:\Hoodukoo\STG5\segments > > > >Apr 2, 2006 8:58:36 PM > > > >org.apache.hadoop.conf.ConfigurationgetConfResourceAsRea > > > >der > > > >INFO: found resource common-terms.utf8 at > > > >file:/C:/xampp/tomcat/webapps/hoodukoo > > > >/WEB-INF/classes/common-terms.utf8 > > > >Apr 2, 2006 8:58:36 PM org.apache.nutch.searcher.NutchBean init > > > >INFO: opening linkdb in E:\Hoodukoo\STG5\linkdb > > > >Apr 2, 2006 8:58:36 PM org.apache.jsp.search_jsp _jspService > > > >INFO: query request from 127.0.0.1 > > > >Apr 2, 2006 8:58:36 PM org.apache.jsp.search_jsp _jspService > > > >INFO: query: site > > > >Apr 2, 2006 8:58:36 PM org.apache.nutch.searcher.NutchBean search > > > >INFO: searching for 20 raw hits > > > > > > > > > > _________________________________________________________________ > > > Express yourself instantly with MSN Messenger! Download today - it's > >FREE! > > > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > > > > > > > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today - it's FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
