Re: Crawling relation database

Markus Jelsma Tue, 05 Jul 2011 15:55:41 -0700

Hi Lewis,

It sounds to me you'd be better of using Solr's very advanced 
DataImportHandler [1]. It can (delta) import data from your RDBMS' and offers 
much flexibility on how to transform entities.


Besides crawling you also mentions you'd like to push results (of what) to 
another structured data store. But why would you want that? Handling, 
processing and serving search results is done by Solr (and ES in the future) 
and since our entities are flat (just a document) it makes more sense to me to 
save documents in a document (or column) oriented DB.

[1] :http://wiki.apache.org/solr/DataImportHandler

Cheers,

> Hi,
> 
> I'm curious to hear if anyone has information for configuring Nutch to
> crawl a RDB such as MySQL. In my hypothetical example there are N number
> of databases residing in various distributed geographical locations, to
> make a worst case scenario, say that they are NOT all the same type, and I
> wish to use Nutch trunk 2.0 to push the results to some other structured
> data store which I can then connect to to serve search results.
> 
> Does anyone have any information such as an overview of database crawling
> and serving using Nutch? I have been unsuccesful obtaining info on the Web
> as query results are ambiguous and usually refer to crawldb or linkdb.
> 
> If I can get this it would be a real nice entry for inclusion in our wiki.
> 
> Thanks for any suggestions or info.

Re: Crawling relation database

Reply via email to