Hi Joe, On Fri, Oct 21, 2016 at 7:34 AM, <[email protected]> wrote:
> From: Joe Adams <[email protected]> > To: [email protected] > Cc: > Date: Fri, 21 Oct 2016 10:34:15 -0400 > Subject: Nutch 2.3.1 elasticsearch tstamp > I'm working on setting up nutch with elasticsearch and hbase to crawl a > site and provide a dashboard in kibana for reporting. I have the > interactions working between the components. I can crawl the site, hbase > shows all the data, and I can index into elasticsearch. The problem is that > the tstamp field in elasticsearch shows 1970-01-01T00:00:00.000Z and not > data related the fetched time of the page. I also tried adding the > index-more plugin and that seems to add a 'date' field but this also shows > up as epoch. > > I can't find much searching around the internet. The only thing I can find > closely related is https://issues.apache.org/jira/browse/NUTCH-2045, but > that was fixed in 2.3.1 which is the version I'm running. > My suggestion would be, that if you are running Nutch2, then use the current development branch which is available at https://github.com/apache/nutch/tree/2.x. I say this as we are always fixing bugs and it will enable other using this branch a better chance of reproducing your issue. Additionally, this will enable you to upgrade to ES 2.X as per the indexer-elastic2 plugin https://github.com/apache/nutch/tree/2.x/src/plugin/indexer-elastic2 > > Does anyone have any idea why my dates aren't being set properly in my > elasticsearch index? Not yet but I will scope it out. > The data looks good if I run readdb -url $url. Thanks for this info. > Can > anyone provide some good advice to troubleshoot this further? > Not right now, but can you please log an issue over at Jira and also link it to NUTCH-2045? This would help us to track it and fix it with a test if there is definitely a bug. > > Any help would be appreciated. > > > Versions: > Nutch 2.3.1 > Elasticsearch 1.7.5 > Gora: 0.6.1 > Hbase: 1.2.3 > Please note that the supported version of HBase in Nutch2.3.1 is 0.98.8-hadoop2. I can most certainly say that HBase support will not be compatible with HBase 1.2.3. > > <property> > <name>fetcher.server.delay</name> > <value>.1</value> > <description>Delay between page fetches.</description> > </property> > > <property> > <name>fetcher.server.min.delay</name> > <value>.1</value> > </property> > You may find that you experienced access denied e.g. your IP is being blocked from accessing servers at such small delay amounts. This is just a friendly warning! Please log the issue in Jira and I will try to reproduce. Thanks Lewis

