Or use a scheduled wget job to pull them from the remote server and store
them on a path that Nutch can access locally.

Regards,

Dave Beckstrom
Technical Delivery Manager / Senior Developer
em: dbeckst...@collectivefls.com <aha...@collectivefls.com>
ph: 763.323.3499


On Mon, Sep 16, 2019 at 12:14 PM Jorge Betancourt <
betancourt.jo...@gmail.com> wrote:

> Hi Roannel,
>
> The current implementation of the injector only accepts a path (actually an
> org.apache.hadoop.fs.Path) this means that there is no way to feed an URL
> directly unless you download the content first.
>
> If you use the REST API you can send the seed file using the API endpoint.
> Otherwise, you could write your own injector with the proper logic to deal
> with a list of URLs coming from an URL.
>
> The REST API implementation just writes the content in the expected format
> (
>
> https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113
> )
>
> Best Regards,
> Jorge
>
> On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez <
> roan...@uci.cu>
> wrote:
>
> > Hi folks,
> >
> > Is there any way in Nutch 1.15 to inject a remote seed file (accessible
> > via http or https)?
> >
> > I mean this, for instance:
> >
> > bin/nutch inject crawl http://example.org/seed
> >
> > Regards
> > 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
> > Por La Habana, lo más grande. #Habana500 #UCIxHabana500
> >
> >
>

-- 
*Fig Leaf Software is now Collective FLS, Inc.*
*
*
*Collective FLS, Inc.* 

https://www.collectivefls.com/ <https://www.collectivefls.com/

Reply via email to