Or use a scheduled wget job to pull them from the remote server and store them on a path that Nutch can access locally.
Regards, Dave Beckstrom Technical Delivery Manager / Senior Developer em: [email protected] <[email protected]> ph: 763.323.3499 On Mon, Sep 16, 2019 at 12:14 PM Jorge Betancourt < [email protected]> wrote: > Hi Roannel, > > The current implementation of the injector only accepts a path (actually an > org.apache.hadoop.fs.Path) this means that there is no way to feed an URL > directly unless you download the content first. > > If you use the REST API you can send the seed file using the API endpoint. > Otherwise, you could write your own injector with the proper logic to deal > with a list of URLs coming from an URL. > > The REST API implementation just writes the content in the expected format > ( > > https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113 > ) > > Best Regards, > Jorge > > On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez < > [email protected]> > wrote: > > > Hi folks, > > > > Is there any way in Nutch 1.15 to inject a remote seed file (accessible > > via http or https)? > > > > I mean this, for instance: > > > > bin/nutch inject crawl http://example.org/seed > > > > Regards > > 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana > > Por La Habana, lo más grande. #Habana500 #UCIxHabana500 > > > > > -- *Fig Leaf Software is now Collective FLS, Inc.* * * *Collective FLS, Inc.* https://www.collectivefls.com/ <https://www.collectivefls.com/>

