Re: Crawling relation database

lewis john mcgibbney Tue, 05 Jul 2011 16:40:14 -0700

thanks to you both

On Tue, Jul 5, 2011 at 4:35 PM, Markus Jelsma <[email protected]>wrote:


> H,
>
> About geographical search: Solr will do this for you. Built-in for 3.x+ and
> using third-party plugins for 1.4.x. Both provide different features. In
> Solr
> it's you'd not base similarity on geographical data but use spatial data to
> boost textual similar documents instead, or filter.
>
> This keeps text similarity intact and offers spatial features on top.
>
> You'll get more feedback on the Solr list indeed :)
>
> Cheers
>
> > Thanks for this Markus, it had occured to me that DIH was a very
> plausable
> > solution to progress with. I think you have just confirmed due to the
> > flexibility it offers amongst other attributes.
> >
> > I'm looking at creating a context aware web application which would use
> > geographical search to obtain results based on location. This is required
> > as the data will contain (amongst others) fields with integer values
> which
> > vary dependent upon a building location cost index. Similarity is
> directly
> > linked through geographical location factor. I wanted to have the data
> > stored within the n number of distributed RDB's available in a cloud
> > environment which could be searched as oppose to the non-trivial task of
> > searching across a fragmented distrubuted number of DB's.
> >
> > As you mention, it does make more sense to save documents in a doc (or
> > column) oriented DB.
> >
> > Essentially, using the DIH tool would remove requirement for Nutch?
> >
> > I think to progress with this, I'm best moving the thread to Solr-user@if
> > I have further questions.
> >
> > Thank you
> >
> > On Tue, Jul 5, 2011 at 3:53 PM, Markus Jelsma
> <[email protected]>wrote:
> > > Hi Lewis,
> > >
> > > It sounds to me you'd be better of using Solr's very advanced
> > > DataImportHandler [1]. It can (delta) import data from your RDBMS' and
> > > offers
> > > much flexibility on how to transform entities.
> > >
> > > Besides crawling you also mentions you'd like to push results (of what)
> > > to another structured data store. But why would you want that?
> Handling,
> > > processing and serving search results is done by Solr (and ES in the
> > > future)
> > > and since our entities are flat (just a document) it makes more sense
> to
> > > me to
> > > save documents in a document (or column) oriented DB.
> > >
> > > [1] :http://wiki.apache.org/solr/DataImportHandler
> > >
> > > Cheers,
> > >
> > > > Hi,
> > > >
> > > > I'm curious to hear if anyone has information for configuring Nutch
> to
> > > > crawl a RDB such as MySQL. In my hypothetical example there are N
> > > > number of databases residing in various distributed geographical
> > > > locations, to make a worst case scenario, say that they are NOT all
> > > > the same type, and
> > >
> > > I
> > >
> > > > wish to use Nutch trunk 2.0 to push the results to some other
> > > > structured data store which I can then connect to to serve search
> > > > results.
> > > >
> > > > Does anyone have any information such as an overview of database
> > > > crawling and serving using Nutch? I have been unsuccesful obtaining
> > > > info on the
> > >
> > > Web
> > >
> > > > as query results are ambiguous and usually refer to crawldb or
> linkdb.
> > > >
> > > > If I can get this it would be a real nice entry for inclusion in our
> > >
> > > wiki.
> > >
> > > > Thanks for any suggestions or info.
>



-- 
*Lewis*

Re: Crawling relation database

Reply via email to