can u share the script with everyone? On 3/31/06, Berlin Brown <[EMAIL PROTECTED]> wrote: > > Do you have that shell script? > > On 3/30/06, Dan Morrill <[EMAIL PROTECTED]> wrote: > > Hi folks, > > > > It worked, it worked great, I made a shell script to do the work for me. > > Thank you, thank you, and again, thank you. > > > > r/d > > > > -----Original Message----- > > From: Dan Morrill [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 30, 2006 5:12 AM > > To: [email protected] > > Subject: RE: Multiple crawls how to get them to work together > > > > Aled, > > > > I'll try that today, excellent, and thanks for the heads up on the db > > directory. I'll let you now how it goes. > > > > r/d > > > > > > > > -----Original Message----- > > From: Aled Jones [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 30, 2006 12:24 AM > > To: [email protected] > > Subject: ATB: Multiple crawls how to get them to work together > > > > Hi Dan > > > > I'll presume you've done the crawls already.. > > > > Each resulting crawled folder should have 3 folders, db, index and > > segments. > > > > Create your search.dir folder and create a segments folder in that. > > > > Each segments folder in each crawl folder should contain folders with > > timestamps as the names. Copy the contents of: > > > > crawlA/segments > > crawlB/segments > > crawlc/segments > > > > (i.e. The folders with timestamps as names)Into: > > > > search.dir/segments > > > > Next, delete the duplicates from the segments by running the command: > > > > bin/nutch dedup -local search.dir/segments > > > > Then you need to merge the segments to create an index folder, so run > > the command: > > > > bin/nutch merge -local search.dir/index search.dir/segments/* > > > > You should now have two folders in your search.dir: > > search.dir/segments > > search.dir/index > > > > That's all you need for serving pages (db folder is only used when > > fetching). > > > > Now just set the searcher.dir property value in nutch-site.xml to be the > > location of search.dir > > > > That's how I've been doing it, although it may not be the "right" way. > > :-) Hope this helps. > > > > Cheers > > Aled > > > > > > > -----Neges Wreiddiol-----/-----Original Message----- > > > Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED] > > > Anfonwyd/Sent: 29 March 2006 18:06 > > > At/To: [email protected] > > > Copi/Cc: [EMAIL PROTECTED] > > > Pwnc/Subject: Multiple crawls how to get them to work together > > > > > > Hi folks, > > > > > > > > > > > > I have 3 crawls, crawlA, crawlB, and crawlC. I would like all > > > of them to be available to the search.jsp page. > > > > > > > > > > > > I went through the site saw merge, index, make new db, and > > > followed all the directions that I could find, but still no > > > resolution on this one. So what I need are some idea's on > > > where to proceed from here, I intend on having 2 or > > > 3 boxes make a crawl, then somehow merge the crawls together > > > and form a "master" under search.dir. I would also want to > > > update this one on a regular basis. > > > > > > > > > > > > Unfortunately, the instructions to date have all been tried, > > > and have all lead to the idea not working. There is also no > > > indexmerger or indexsemgents directives in nutch 0.7.1. Any > > > support ideas, direct pointers, or even step-by-step > > > instructions on how to do this (outside of what is in the > > > tutorials because that has been tried already, including > > > support idea's in the user web mail list). > > > > > > > > > > > > Cheers/r/dan > > > > > > > > > > > > > > > > > > > > > > > > > > ########################################### > > > > This message has been scanned by F-Secure Anti-Virus for Microsoft > Exchange. > > For more information, connect to http://www.f-secure.com/ > > > > ************************************************************************ > > This e-mail and any attachments are strictly confidential and intended > > solely for the addressee. They may contain information which is covered > by > > legal, professional or other privilege. If you are not the intended > > addressee, you must not copy the e-mail or the attachments, or use them > for > > any purpose or disclose their contents to any other person. To do so may > be > > unlawful. If you have received this transmission in error, please notify > us > > as soon as possible and delete the message and attachments from all > places > > in your computer where they are stored. > > > > Although we have scanned this e-mail and any attachments for viruses, it > is > > your responsibility to ensure that they are actually virus free. > > > > > > = > > > > >
-- www.babatu.com
