Hello everyone

I am having a problem when we are trying to implement cached text for our nutch 
search engine. By Cached text I mean the ability to store only text of a 
website without any embedded images or css files. I am not able to get this 
done. I am thinking that this is due to the fact that I need to filter these 
images during the time that indexing happens in nutch.

Any help would be really appreciated. I had to reply to this message since, my 
new posts are not going successully on the mailing list.

Thanks
Siddharth

> Date: Fri, 7 Mar 2008 18:10:13 +0000
> From: [EMAIL PROTECTED]
> To: [email protected]
> Subject: Re: merging indexes with nutch
> 
> Thanks Tomislav,  It worked beautifully.
> 
> The other solution i also found is that the index was not read by
> nutch because as the index.done file was not created (as mentioned in
> http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_merge).  so it
> seems like manually adding an empty index.done to that folder would do
> the job as well.
> 
> Cheers
> 
> On Wed, Mar 5, 2008 at 6:11 PM, Tomislav Poljak <[EMAIL PROTECTED]> wrote:
> > Hi,
> >  try this:
> >  bin/nutch merge crawl/index crawl/indexes crawl/indexes1
> >
> >  where crawl/index (not indexes) should be created by merge and
> >  crawl/indexes and crawl/indexes1 are existing indexes for merging. Nutch
> >  search web application will use merged index form crawl/index and you
> >  should see this in web application log:
> >
> >  2007-09-09 20:30:58,949 INFO  searcher.NutchBean - creating new bean
> >  2007-09-09 20:30:59,128 INFO  searcher.NutchBean - opening merged index
> >  in /home/nutch/test/trunk/crawl/index
> >
> >  Hope this helps,
> >
> >  Tomislav
> >
> >
> >
> >
> >  On Tue, 2008-03-04 at 21:09 +0000, Boris Lau wrote:
> >  > Hi all,
> >  >
> >  > I am having a problem with trying to get my merged index to be
> >  > searched by nutch.
> >  >
> >  > I have used "bin/nutch merge" command to merge 2 indexes into one, but
> >  > the nutch web-app would not be able to search the merged index (always
> >  > return 0 items).  I have examined the index in Luke and everything
> >  > seems sane with the index (correct number of merged documents,
> >  > segments references are correct, etc.).  It is just that the webapp
> >  > would give 0 output.
> >  >
> >  > Is there something that I am missing?  Any advise on how i would debug 
> > it?
> >  >
> >  > Many thanks
> >  > boris
> >  >
> >  > p.s. would anybody have any recommendation on an alternative way of
> >  > examining index other than using Luke (e.g. command line interface)?
> >  > java awt is painfully slow....
> >
> >

_________________________________________________________________
Post free property ads on Yello Classifieds now! www.yello.in
http://ss1.richmedia.in/recurl.asp?pid=219

Reply via email to