Re: Crawling relation database

Markus Jelsma Tue, 05 Jul 2011 16:37:08 -0700

H,

About geographical search: Solr will do this for you. Built-in for 3.x+ and 
using third-party plugins for 1.4.x. Both provide different features. In Solr 
it's you'd not base similarity on geographical data but use spatial data to 
boost textual similar documents instead, or filter.


This keeps text similarity intact and offers spatial features on top.

You'll get more feedback on the Solr list indeed :)

Cheers

> Thanks for this Markus, it had occured to me that DIH was a very plausable
> solution to progress with. I think you have just confirmed due to the
> flexibility it offers amongst other attributes.
> 
> I'm looking at creating a context aware web application which would use
> geographical search to obtain results based on location. This is required
> as the data will contain (amongst others) fields with integer values which
> vary dependent upon a building location cost index. Similarity is directly
> linked through geographical location factor. I wanted to have the data
> stored within the n number of distributed RDB's available in a cloud
> environment which could be searched as oppose to the non-trivial task of
> searching across a fragmented distrubuted number of DB's.
> 
> As you mention, it does make more sense to save documents in a doc (or
> column) oriented DB.
> 
> Essentially, using the DIH tool would remove requirement for Nutch?
> 
> I think to progress with this, I'm best moving the thread to Solr-user@ if
> I have further questions.
> 
> Thank you
> 
> On Tue, Jul 5, 2011 at 3:53 PM, Markus Jelsma 
<[email protected]>wrote:
> > Hi Lewis,
> > 
> > It sounds to me you'd be better of using Solr's very advanced
> > DataImportHandler [1]. It can (delta) import data from your RDBMS' and
> > offers
> > much flexibility on how to transform entities.
> > 
> > Besides crawling you also mentions you'd like to push results (of what)
> > to another structured data store. But why would you want that? Handling,
> > processing and serving search results is done by Solr (and ES in the
> > future)
> > and since our entities are flat (just a document) it makes more sense to
> > me to
> > save documents in a document (or column) oriented DB.
> > 
> > [1] :http://wiki.apache.org/solr/DataImportHandler
> > 
> > Cheers,
> > 
> > > Hi,
> > > 
> > > I'm curious to hear if anyone has information for configuring Nutch to
> > > crawl a RDB such as MySQL. In my hypothetical example there are N
> > > number of databases residing in various distributed geographical
> > > locations, to make a worst case scenario, say that they are NOT all
> > > the same type, and
> > 
> > I
> > 
> > > wish to use Nutch trunk 2.0 to push the results to some other
> > > structured data store which I can then connect to to serve search
> > > results.
> > > 
> > > Does anyone have any information such as an overview of database
> > > crawling and serving using Nutch? I have been unsuccesful obtaining
> > > info on the
> > 
> > Web
> > 
> > > as query results are ambiguous and usually refer to crawldb or linkdb.
> > > 
> > > If I can get this it would be a real nice entry for inclusion in our
> > 
> > wiki.
> > 
> > > Thanks for any suggestions or info.

Re: Crawling relation database

Reply via email to