Thanks Chris for replying my question.  So I'm thinking about using a CMS and 
when somebody publishes a page in CMS, I would generated this well structure 
XML file and feed that xml to Solr to generate the index on those data. Then, I 
can simply do faceted search using the correct Lucene query format, rite?  Do 
you have any other ideas or comment on my CMS approach?
  Cheers,
  Niraj

Chris Hostetter <[EMAIL PROTECTED]> wrote:
  
: search for documents. I'm planning to use Nutch to crawl that website
: and use Solr to cluster my search results. I tried integrating Nutch
: with Solr following FooFactory.com's blog ......but I could not follow
: few of the steps as I'm very new to both of them. If anyone of you have
: implemented, can you please give me suggestion or code snippets so that
: I can implemented them to achieve the "faceted search". Any help would
: be appericated.

I'm not very familiar with the Nutch/Solr hybrid stuff some people have
done, but faceting requires that you have well structured fields
containing discreet peices of information ... ie: if you want to facet
cameras on manufacturer, megapixels, weight, and battery life, you need
sepertate fields for manufacturer, megapixels, weiht, and mattery life ...
i'm not sure that nutch is going to be able to do that for you.

extracting structured data out of webpages like that without writing
customer parser code for each website layout is a pretty weight data
harvesting problem.

-Hoss



       
---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.

Reply via email to