Thanks Jorge for your answer. Do you think an injector that accepts local/hdfs paths and in addition API endpoints could be a good improvement for Nutch.
Regards, Roannel ----- Original Message ----- > From: "Jorge Betancourt" <[email protected]> > To: "user" <[email protected]> > Sent: Lunes, 16 de Septiembre 2019 13:14:36 > Subject: [MASSMAIL]Re: Injection from webservice > Hi Roannel, > > The current implementation of the injector only accepts a path (actually an > org.apache.hadoop.fs.Path) this means that there is no way to feed an URL > directly unless you download the content first. > > If you use the REST API you can send the seed file using the API endpoint. > Otherwise, you could write your own injector with the proper logic to deal > with a list of URLs coming from an URL. > > The REST API implementation just writes the content in the expected format ( > https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113 > ) > > Best Regards, > Jorge > > On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez <[email protected]> > wrote: > >> Hi folks, >> >> Is there any way in Nutch 1.15 to inject a remote seed file (accessible >> via http or https)? >> >> I mean this, for instance: >> >> bin/nutch inject crawl http://example.org/seed >> >> Regards >> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana >> Por La Habana, lo más grande. #Habana500 #UCIxHabana500 >> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana Por La Habana, lo más grande. #Habana500 #UCIxHabana500

