RE: nutch 1.12 How can I force a URL to get re-indexed

2016-10-07 Thread Markus Jelsma
Hello - this sounds fine indeed. But i don't know what happens with the calculation of the next fetch time when adddays is used, i've never tried. You may want to confirm fetch time is not affected by this. Markus -Original message- > From:Sujan Suppala > Sent:

RE: nutch 1.12 How can I force a URL to get re-indexed

2016-10-07 Thread Sujan Suppala
Thanks Markus. I can not use freegen as this tool is not available via REST api. With the combination of -adddays and -expr options of generator I achieved my requirement. Here is what I did: 1. inject the urls with some metadata say pageId= Seed file contains the below entry:

Re: Nutch as a service

2016-10-07 Thread Sachin Shaju
Hi Furkan, I've checked giving null for args. It didn't work either. After investigating source code of *Fetcher.java* I've figured out it is looking for segment in local path if a segment option is not added. If segment option is added as a valid segment in hdfs it will work. I've

Unknown issue in Nutch indexer with REST api

2016-10-07 Thread Sachin Shaju
Hi, I was trying to expose nutch using REST endpoints and ran into an issue in indexer phase. I'm using elasticsearch index writer to index docs to ES. I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While indexing an unknown exception is thrown. Error:

Re: Issue Crawling Alternate URLs

2016-10-07 Thread Sebastian Nagel
Hi Matthew, afaics, the content delivered to Nutch under the URL http://rssfeeds.azcentral.com/phoenix/asu does not contain the link http://rssfeeds.azcentral.com/phoenix/asu=1 That's the simple answer. What you see in a browser is often not that what is delivered from the server to a