Hi Roannel,

The current implementation of the injector only accepts a path (actually an
org.apache.hadoop.fs.Path) this means that there is no way to feed an URL
directly unless you download the content first.

If you use the REST API you can send the seed file using the API endpoint.
Otherwise, you could write your own injector with the proper logic to deal
with a list of URLs coming from an URL.

The REST API implementation just writes the content in the expected format (
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113
)

Best Regards,
Jorge

On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez <roan...@uci.cu>
wrote:

> Hi folks,
>
> Is there any way in Nutch 1.15 to inject a remote seed file (accessible
> via http or https)?
>
> I mean this, for instance:
>
> bin/nutch inject crawl http://example.org/seed
>
> Regards
> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
> Por La Habana, lo más grande. #Habana500 #UCIxHabana500
>
>

Reply via email to