: search for documents. I'm planning to use Nutch to crawl that website : and use Solr to cluster my search results. I tried integrating Nutch : with Solr following FooFactory.com's blog ......but I could not follow : few of the steps as I'm very new to both of them. If anyone of you have : implemented, can you please give me suggestion or code snippets so that : I can implemented them to achieve the "faceted search". Any help would : be appericated.
I'm not very familiar with the Nutch/Solr hybrid stuff some people have done, but faceting requires that you have well structured fields containing discreet peices of information ... ie: if you want to facet cameras on manufacturer, megapixels, weight, and battery life, you need sepertate fields for manufacturer, megapixels, weiht, and mattery life ... i'm not sure that nutch is going to be able to do that for you. extracting structured data out of webpages like that without writing customer parser code for each website layout is a pretty weight data harvesting problem. -Hoss