TBH I'm not entirely sure. Downloading the file can be scripted around without a lot of troubles. My feeling is that the Injector class has a good enough scope already. There are valid reasons for having a custom injector (reading the seed URLs from a DB comes to my mind). When I needed a custom injector it was for very requirements, and it made more sense to have a custom injector instead of generating a seed file (this was before having a REST API, which right now provides a nice API around the injector).
It is a valid point that we don't have an extension point for the Injector logic which could allow for having different seed URL providers without developers needing to worry about the specific injection logic. My main concern is if we want to put this additional complexity in Nutch. It is really valuable to all of our users to have HTTP/DB/custom injectors available out of the box in a pluggable way? I would love to hear what other people have to say. Best Regards, Jorge On Mon, Sep 16, 2019 at 8:53 PM Roannel Fernandez Hernandez <roan...@uci.cu> wrote: > Thanks Jorge for your answer. Do you think an injector that accepts > local/hdfs paths and in addition API endpoints could be a good improvement > for Nutch. > > Regards, Roannel > > ----- Original Message ----- > > From: "Jorge Betancourt" <betancourt.jo...@gmail.com> > > To: "user" <email@example.com> > > Sent: Lunes, 16 de Septiembre 2019 13:14:36 > > Subject: [MASSMAIL]Re: Injection from webservice > > > Hi Roannel, > > > > The current implementation of the injector only accepts a path (actually > an > > org.apache.hadoop.fs.Path) this means that there is no way to feed an URL > > directly unless you download the content first. > > > > If you use the REST API you can send the seed file using the API > endpoint. > > Otherwise, you could write your own injector with the proper logic to > deal > > with a list of URLs coming from an URL. > > > > The REST API implementation just writes the content in the expected > format ( > > > https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/service/resources/SeedResource.java#L92-L113 > > ) > > > > Best Regards, > > Jorge > > > > On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez < > roan...@uci.cu> > > wrote: > > > >> Hi folks, > >> > >> Is there any way in Nutch 1.15 to inject a remote seed file (accessible > >> via http or https)? > >> > >> I mean this, for instance: > >> > >> bin/nutch inject crawl http://example.org/seed > >> > >> Regards > >> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana > >> Por La Habana, lo más grande. #Habana500 #UCIxHabana500 > >> > 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana > Por La Habana, lo más grande. #Habana500 #UCIxHabana500 > >