Thanks Julien Looking at Andrzej's comments on the issue I saw that as you mention he's run a full crawl using POST over REST and retrieving results as JSON and this sounded appealing to get moving with.
>From my own pov it appears that Nutch 2.X is 'closer' to the model required for a multiple backends implementation although there is still quite a bit of work to do here. What I am slightly confused about, which hasn't been mentioned on this particular issue is whether individual Gora modules would make up part of the stack or whether the abstraction would somehow be written @Nutch side... of course this then gets a bit more tricky when we begin thinking about current 1.X and how to progress with a suitable long term vision. This is of course all speculation from my side so any vision you have to share would be ideal. Thanks Lewis On Wed, Jul 11, 2012 at 8:54 AM, Julien Nioche <[email protected]> wrote: > I'd think that this would be more a case for the universal exporter (a.k.a > multiple indexing backends) that we mentioned several times. The REST API > is more a way of piloting a crawl remotely. It could certainly be twisted > into doing all sorts of things but I am not sure it would be very > practical when dealing with very large data. Instead having a pluggable > exporter would allow you to define what backend you want to send the data > to and what transformations to do on the way (e.g. convert to JSON). > Alternatively a good old custom map reduce job based is the way to go. > > HTH > > Jul > > On 10 July 2012 22:42, Lewis John Mcgibbney <[email protected]>wrote: > >> Hi, >> >> I am looking to create a dataset for use in an example scenario where >> I want to create all the products you would typically find in the >> online Amazon store e.g. loads of products with different categories, >> different prices, titles, availability, condition etc etc etc. One way >> I was thinking of doing this was using the above API written into >> Nutch 2.X to get the results as JSON these could then hopefully be >> loaded into my product table in my datastore and we could begin to >> build up the database of products. >> >> Having never used the REST API directly I wonder if anyone has any >> information on this and whether I can obtain some direction relating >> to producing my crawl results as JSON. I'm also going to look into >> Andrzej's patch in NUTCH-932 also so I'll try to update this thread >> once I make some progress with it. >> >> Thanks in advance for any sharing of experiences with this one. >> >> Best >> Lewis >> >> -- >> Lewis >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble -- Lewis

