Actually, I never did that. I guess "nutch updatedb <db> <seg_dir>" should work.

Andrey

-----Original Message-----
From: AJ Chen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 4:44 PM
To: [email protected]
Subject: Re: merge indices from multiple webdb


How do you buid a new webdb from the merged segment/index? Could you provide
detailed steps for the process you described? Thanks.

AJ

On 10/25/05, Andrey Ilinykh <[EMAIL PROTECTED]> wrote:
>
> If you merge two segments page ranks are off. You have to build new webdb,
> calculate page rank and then build one more segment again.
>
> Thank you,
> Andrey
>
> -----Original Message-----
> From: AJ Chen [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 25, 2005 2:02 PM
> To: [email protected]
> Subject: Re: merge indices from multiple webdb
>
>
> Thanks so much, Graham. This should do it.
> A related question: After the merge, is it possible to build the new webdb
> as well? The link data for the merged db can be different from the two
> original db. In order to have accurate page ranking, the link data should
> be
> updated.
>
> AJ
>
> On 10/25/05, Graham Stead <[EMAIL PROTECTED]> wrote:
> >
> > I am by no means a Nutch expert yet, but this is how I merged two
> > separate segments so I could search through them:
> >
> > Step 1:
> > $ bin/nutch mergesegs -local -o testmerge -i
> > ../crawls/foo/segments/20051018224434/
> > ../crawls/bar/segments/20051018225505/
> > < bunch of stuff happens >
> >
> > This creates a segment 20051023112848 in the testmerge folder. The
> > segment contains a combined index as well as copies of all information
> > from the two input segments.
> >
> > Step 2:
> > This wasn't quite enough to search with, however. I copied the index
> > folder and organized the directories into the same structure as used
> > during a crawl, then was able to run the Tomcat searcher on the new
> > segment.
> >
> > After copying/moving/reorganizing I have:
> >
> > $ ls -l testmerge/
> > total 0
> > drwxrwxrwx+ 2 Oct 23 11:42 index
> > drwxrwxrwx+ 3 Oct 23 11:42 segments
> >
> > $ ls -l testmerge/segments/
> > total 0
> > drwxrwxrwx+ 7 Oct 23 11:28 20051023112848
> >
> >
> > Step 3:
> > Then place this in Tomcat's nutch-site.xml file:
> >
> > <nutch-conf>
> > <property>
> > <name>searcher.dir</name>
> > <value>C:\path_to_testmerge\testmerge</value>
> > </property>
> > </nutch-conf>
> >
> > Run Tomcat and search away.
> >
> > Hope this helps,
> > -Graham
> >
> > > -----Original Message-----
> > > From: AJ Chen [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, October 25, 2005 4:03 PM
> > > To: [email protected]
> > > Subject: merge indices from multiple webdb
> > >
> > > Has anyone merged indices from two separate webdb? I have two
> > > separate webdb and need to find a good way to combine them
> > > for unified search.
> > > AJ
> > >
> >
>


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to