I'd think that this would be more a case for the universal exporter (a.k.a
multiple indexing backends) that we mentioned several times. The REST API
is more a way of piloting a crawl remotely. It could certainly be twisted
into doing all sorts of  things but I am not sure it would be very
practical when dealing with very large data. Instead having a pluggable
exporter would allow you to define what backend you want to send the data
to and what transformations to do on the way (e.g. convert to JSON).
Alternatively a good old custom map reduce job based is the way to go.

HTH

Jul

On 10 July 2012 22:42, Lewis John Mcgibbney <[email protected]>wrote:

> Hi,
>
> I am looking to create a dataset for use in an example scenario where
> I want to create all the products you would typically find in the
> online Amazon store e.g. loads of products with different categories,
> different prices, titles, availability, condition etc etc etc. One way
> I was thinking of doing this was using the above API written into
> Nutch 2.X to get the results as JSON these could then hopefully be
> loaded into my product table in my datastore and we could begin to
> build up the database of products.
>
> Having never used the REST API directly I wonder if anyone has any
> information on this and whether I can obtain some direction relating
> to producing my crawl results as JSON. I'm also going to look into
> Andrzej's patch in NUTCH-932 also so I'll try to update this thread
> once I make some progress with it.
>
> Thanks in advance for any sharing of experiences with this one.
>
> Best
> Lewis
>
> --
> Lewis
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to