Hello everyone I am having a problem when we are trying to implement cached text for our nutch search engine. By Cached text I mean the ability to store only text of a website without any embedded images or css files. I am not able to get this done. I am thinking that this is due to the fact that I need to filter these images during the time that indexing happens in nutch.
Any help would be really appreciated. I had to reply to this message since, my new posts are not going successully on the mailing list. Thanks Siddharth > Date: Fri, 7 Mar 2008 18:10:13 +0000 > From: [EMAIL PROTECTED] > To: [email protected] > Subject: Re: merging indexes with nutch > > Thanks Tomislav, It worked beautifully. > > The other solution i also found is that the index was not read by > nutch because as the index.done file was not created (as mentioned in > http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_merge). so it > seems like manually adding an empty index.done to that folder would do > the job as well. > > Cheers > > On Wed, Mar 5, 2008 at 6:11 PM, Tomislav Poljak <[EMAIL PROTECTED]> wrote: > > Hi, > > try this: > > bin/nutch merge crawl/index crawl/indexes crawl/indexes1 > > > > where crawl/index (not indexes) should be created by merge and > > crawl/indexes and crawl/indexes1 are existing indexes for merging. Nutch > > search web application will use merged index form crawl/index and you > > should see this in web application log: > > > > 2007-09-09 20:30:58,949 INFO searcher.NutchBean - creating new bean > > 2007-09-09 20:30:59,128 INFO searcher.NutchBean - opening merged index > > in /home/nutch/test/trunk/crawl/index > > > > Hope this helps, > > > > Tomislav > > > > > > > > > > On Tue, 2008-03-04 at 21:09 +0000, Boris Lau wrote: > > > Hi all, > > > > > > I am having a problem with trying to get my merged index to be > > > searched by nutch. > > > > > > I have used "bin/nutch merge" command to merge 2 indexes into one, but > > > the nutch web-app would not be able to search the merged index (always > > > return 0 items). I have examined the index in Luke and everything > > > seems sane with the index (correct number of merged documents, > > > segments references are correct, etc.). It is just that the webapp > > > would give 0 output. > > > > > > Is there something that I am missing? Any advise on how i would debug > > it? > > > > > > Many thanks > > > boris > > > > > > p.s. would anybody have any recommendation on an alternative way of > > > examining index other than using Luke (e.g. command line interface)? > > > java awt is painfully slow.... > > > > _________________________________________________________________ Post free property ads on Yello Classifieds now! www.yello.in http://ss1.richmedia.in/recurl.asp?pid=219
