Do you have that shell script?

On 3/30/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
> Hi folks,
>
> It worked, it worked great, I made a shell script to do the work for me.
> Thank you, thank you, and again, thank you.
>
> r/d
>
> -----Original Message-----
> From: Dan Morrill [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 30, 2006 5:12 AM
> To: [email protected]
> Subject: RE: Multiple crawls how to get them to work together
>
> Aled,
>
> I'll try that today, excellent, and thanks for the heads up on the db
> directory. I'll let you now how it goes.
>
> r/d
>
>
>
> -----Original Message-----
> From: Aled Jones [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 30, 2006 12:24 AM
> To: [email protected]
> Subject: ATB: Multiple crawls how to get them to work together
>
> Hi Dan
>
> I'll presume you've done the crawls already..
>
> Each resulting crawled folder should have 3 folders, db, index and
> segments.
>
> Create your search.dir folder and create a segments folder in that.
>
> Each segments folder in each crawl folder should contain folders with
> timestamps as the names.  Copy the contents of:
>
> crawlA/segments
> crawlB/segments
> crawlc/segments
>
> (i.e. The folders with timestamps as names)Into:
>
> search.dir/segments
>
> Next, delete the duplicates from the segments by running the command:
>
> bin/nutch dedup -local search.dir/segments
>
> Then you need to merge the segments to create an index folder, so run
> the command:
>
> bin/nutch merge -local search.dir/index search.dir/segments/*
>
> You should now have two folders in your search.dir:
> search.dir/segments
> search.dir/index
>
> That's all you need for serving pages (db folder is only used when
> fetching).
>
> Now just set the searcher.dir property value in nutch-site.xml to be the
> location of search.dir
>
> That's how I've been doing it, although it may not be the "right" way.
> :-) Hope this helps.
>
> Cheers
> Aled
>
>
> > -----Neges Wreiddiol-----/-----Original Message-----
> > Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED]
> > Anfonwyd/Sent: 29 March 2006 18:06
> > At/To: [email protected]
> > Copi/Cc: [EMAIL PROTECTED]
> > Pwnc/Subject: Multiple crawls how to get them to work together
> >
> > Hi folks,
> >
> >
> >
> > I have 3 crawls, crawlA, crawlB, and crawlC. I would like all
> > of them to be available to the search.jsp page.
> >
> >
> >
> > I went through the site saw merge, index, make new db, and
> > followed all the directions that I could find, but still no
> > resolution on this one. So what I need are some idea's on
> > where to proceed from here, I intend on having 2 or
> > 3 boxes make a crawl, then somehow merge the crawls together
> > and form a "master" under search.dir. I would also want to
> > update this one on a regular basis.
> >
> >
> >
> > Unfortunately, the instructions to date have all been tried,
> > and have all lead to the idea not working. There is also no
> > indexmerger or indexsemgents directives in nutch 0.7.1. Any
> > support ideas, direct pointers, or even step-by-step
> > instructions on how to do this (outside of what is in the
> > tutorials because that has been tried already, including
> > support idea's in the user web mail list).
> >
> >
> >
> > Cheers/r/dan
> >
> >
> >
> >
> >
> >
> >
> >
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
> For more information, connect to http://www.f-secure.com/
>
> ************************************************************************
> This e-mail and any attachments are strictly confidential and intended
> solely for the addressee. They may contain information which is covered by
> legal, professional or other privilege. If you are not the intended
> addressee, you must not copy the e-mail or the attachments, or use them for
> any purpose or disclose their contents to any other person. To do so may be
> unlawful. If you have received this transmission in error, please notify us
> as soon as possible and delete the message and attachments from all places
> in your computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for viruses, it is
> your responsibility to ensure that they are actually virus free.
>
>
> =
>
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to