Re: Documents in Nutch

dprantzalos Sat, 24 Sep 2005 10:13:06 -0700

My initial thought was to use Nutch to crawl a site and aggregate all the 
content into a single string (or file), save it to my db, and later use lucene 
to index it as just another field. I think that would work, but I didn't know 
if there was a better way.


-jim

-------------- Original message -------------- 

> I'm sorry I partially missed the point. I don't know how the internals 
> of the indexing work, maybe someone else here can give an explanation? 
> 
> [EMAIL PROTECTED] wrote: 
> 
> >But if I do that, then the other fields wouldn't get indexed, would they? 
> >What 
> if I wanted to search for the keywords "sapphire" (which might only appear in 
> the general description for the merchant), and "beenie baby" (which might 
> appear 
> in the content on one of thier pages). Wouldn't both need to be indexed? 
> > 
> >What you're suggesting would only allow me to have a search engine for my 
> pages, correct? and I think what I'm asking is can I use Nutch as a search 
> engine for my pages in addition to some additional meta data about the 
> content 
> -- and if so, how? Would I have to tag each of the pages with the meta data? 
> because that seems like a lot of redundancy... i.e. if I index 50 pages for 
> merchant www.xyzcollectibles.com they're all going to have the same name, 
> general description, etc... in thier metadata. 
> > 
> > 
> > 
> >>Add an unique identifier to the document and use a separate external 
> >>database. 
> >> 
> >> 
>

Re: Documents in Nutch

Reply via email to