I'd think that this would be more a case for the universal exporter (a.k.a multiple indexing backends) that we mentioned several times. The REST API is more a way of piloting a crawl remotely. It could certainly be twisted into doing all sorts of things but I am not sure it would be very practical when dealing with very large data. Instead having a pluggable exporter would allow you to define what backend you want to send the data to and what transformations to do on the way (e.g. convert to JSON). Alternatively a good old custom map reduce job based is the way to go.
HTH Jul On 10 July 2012 22:42, Lewis John Mcgibbney <[email protected]>wrote: > Hi, > > I am looking to create a dataset for use in an example scenario where > I want to create all the products you would typically find in the > online Amazon store e.g. loads of products with different categories, > different prices, titles, availability, condition etc etc etc. One way > I was thinking of doing this was using the above API written into > Nutch 2.X to get the results as JSON these could then hopefully be > loaded into my product table in my datastore and we could begin to > build up the database of products. > > Having never used the REST API directly I wonder if anyone has any > information on this and whether I can obtain some direction relating > to producing my crawl results as JSON. I'm also going to look into > Andrzej's patch in NUTCH-932 also so I'll try to update this thread > once I make some progress with it. > > Thanks in advance for any sharing of experiences with this one. > > Best > Lewis > > -- > Lewis > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

