Thanks Jorge for your answer. Do you think an injector that accepts local/hdfs 
paths and in addition API endpoints could be a good improvement for Nutch.

Regards, Roannel

----- Original Message -----
> From: "Jorge Betancourt" <betancourt.jo...@gmail.com>
> To: "user" <user@nutch.apache.org>
> Sent: Lunes, 16 de Septiembre 2019 13:14:36
> Subject: [MASSMAIL]Re: Injection from webservice

> Hi Roannel,
> 
> The current implementation of the injector only accepts a path (actually an
> org.apache.hadoop.fs.Path) this means that there is no way to feed an URL
> directly unless you download the content first.
> 
> If you use the REST API you can send the seed file using the API endpoint.
> Otherwise, you could write your own injector with the proper logic to deal
> with a list of URLs coming from an URL.
> 
> The REST API implementation just writes the content in the expected format (
> https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113
> )
> 
> Best Regards,
> Jorge
> 
> On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez <roan...@uci.cu>
> wrote:
> 
>> Hi folks,
>>
>> Is there any way in Nutch 1.15 to inject a remote seed file (accessible
>> via http or https)?
>>
>> I mean this, for instance:
>>
>> bin/nutch inject crawl http://example.org/seed
>>
>> Regards
>> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
>> Por La Habana, lo más grande. #Habana500 #UCIxHabana500
>>
1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
Por La Habana, lo más grande. #Habana500 #UCIxHabana500

Reply via email to