Thanks Julien

Looking at Andrzej's comments on the issue I saw that as you mention
he's run a full crawl using POST over REST and retrieving results as
JSON and this sounded appealing to get moving with.

>From my own pov it appears that Nutch 2.X is 'closer' to the model
required for a multiple backends implementation although there is
still quite a bit of work to do here. What I am slightly confused
about, which hasn't been mentioned on this particular issue is whether
individual Gora modules would make up part of the stack or whether the
abstraction would somehow be written @Nutch side... of course this
then gets a bit more tricky when we begin thinking about current 1.X
and how to progress with a suitable long term vision.

This is of course all speculation from my side so any vision you have
to share would be ideal.

Thanks

Lewis


On Wed, Jul 11, 2012 at 8:54 AM, Julien Nioche
<[email protected]> wrote:
> I'd think that this would be more a case for the universal exporter (a.k.a
> multiple indexing backends) that we mentioned several times. The REST API
> is more a way of piloting a crawl remotely. It could certainly be twisted
> into doing all sorts of  things but I am not sure it would be very
> practical when dealing with very large data. Instead having a pluggable
> exporter would allow you to define what backend you want to send the data
> to and what transformations to do on the way (e.g. convert to JSON).
> Alternatively a good old custom map reduce job based is the way to go.
>
> HTH
>
> Jul
>
> On 10 July 2012 22:42, Lewis John Mcgibbney <[email protected]>wrote:
>
>> Hi,
>>
>> I am looking to create a dataset for use in an example scenario where
>> I want to create all the products you would typically find in the
>> online Amazon store e.g. loads of products with different categories,
>> different prices, titles, availability, condition etc etc etc. One way
>> I was thinking of doing this was using the above API written into
>> Nutch 2.X to get the results as JSON these could then hopefully be
>> loaded into my product table in my datastore and we could begin to
>> build up the database of products.
>>
>> Having never used the REST API directly I wonder if anyone has any
>> information on this and whether I can obtain some direction relating
>> to producing my crawl results as JSON. I'm also going to look into
>> Andrzej's patch in NUTCH-932 also so I'll try to update this thread
>> once I make some progress with it.
>>
>> Thanks in advance for any sharing of experiences with this one.
>>
>> Best
>> Lewis
>>
>> --
>> Lewis
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble



-- 
Lewis

Reply via email to