Ok nice. So its possible. Do you think this is a better method than scraping
using an alternate? It seems to me it is in that it will work better with my
end state, being Solr faceted search and I can remove layers of complexity.
On Sep 12, 2011 8:03 PM, "Markus Jelsma-2 [via Lucene]" <
[email protected]> wrote:
>
>
> Yes you can. As Ken replied in your Solr thread you must create custom
parse
> and indexing filters. The parse filter is needed to extract the
information
> and store it in the document and the index filter is used to pass that new

> information to the Solr index.
>
>
> On Monday 12 September 2011 12:55:49 dpt9876 wrote:
>> Hi, the friendly guys at the Solr user group pointed me here.
>>
>> I am wondering if Nutch/Solr will do the following for a project I am
>> working on.
>> I want to create a search engine with facets for potentially hundreds of
>> websites.
>> Similar to say crawling amazon + buy.com + ebay and someone can search
>> these 3 sites from my 1 website.
>> (I realise there are better ways of doing the above example, its for
>> illustrative purposes).
>> Eventually I would build that search crawl to index say 200 or 1000
>> merchants.
>> Someone would come to my site and search for "digital camera".
>>
>> They would get results from all 3 indexes and hopefully dynamic facets eg
>> Price $100-200
>> Price 200-300
>> Resolution 1mp-2mp
>>
>> etc etc
>>
>> Can this be done on the fly?
>>
>> I ask this because I am currently developing webscrapers to crawl these
>> websites, dump that data into a db, then was thinking of tacking on a
solr
>> server to crawl my db.
>>
>> Problem with that approach is that crawling the worlds ecommerce sites
will
>> take forever, when it seems solr might do that for me? (I have read about
>> multiple indexes etc).
>>
>> Many thanks
>>
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-ak
>> a-a-mini-google-with-faceted-search-tp3329346p3329346.html Sent from the
>> Nutch - User mailing list archive at Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3329346p3329431.html
>
> To unsubscribe from Will Solr/Nutch crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3329346&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI5MzQ2fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3329346p3329454.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to