Thanks for the feeback. I updated to the 2.x branch (2.4-SNAPSHOT) and that seems to have fixed the timestamp issues. I saw in the change log that there was a fix for this exact bug. I didn't realize that the indexer-elastic plugin was deprecated and you must upgrade to elasticsearch2 for compatibility. After sorting all that out, things seem good.
Thanks again! On Fri, Oct 21, 2016 at 3:04 PM, lewis john mcgibbney <[email protected]> wrote: > Hi Joe, > > On Fri, Oct 21, 2016 at 7:34 AM, <[email protected]> > wrote: > > > From: Joe Adams <[email protected]> > > To: [email protected] > > Cc: > > Date: Fri, 21 Oct 2016 10:34:15 -0400 > > Subject: Nutch 2.3.1 elasticsearch tstamp > > I'm working on setting up nutch with elasticsearch and hbase to crawl a > > site and provide a dashboard in kibana for reporting. I have the > > interactions working between the components. I can crawl the site, hbase > > shows all the data, and I can index into elasticsearch. The problem is > that > > the tstamp field in elasticsearch shows 1970-01-01T00:00:00.000Z and not > > data related the fetched time of the page. I also tried adding the > > index-more plugin and that seems to add a 'date' field but this also > shows > > up as epoch. > > > > I can't find much searching around the internet. The only thing I can > find > > closely related is https://issues.apache.org/jira/browse/NUTCH-2045, but > > that was fixed in 2.3.1 which is the version I'm running. > > > > My suggestion would be, that if you are running Nutch2, then use the > current development branch which is available at > https://github.com/apache/nutch/tree/2.x. I say this as we are always > fixing bugs and it will enable other using this branch a better chance of > reproducing your issue. Additionally, this will enable you to upgrade to ES > 2.X as per the indexer-elastic2 plugin > https://github.com/apache/nutch/tree/2.x/src/plugin/indexer-elastic2 > > > > > > Does anyone have any idea why my dates aren't being set properly in my > > elasticsearch index? > > > Not yet but I will scope it out. > > > > The data looks good if I run readdb -url $url. > > > Thanks for this info. > > > > Can > > anyone provide some good advice to troubleshoot this further? > > > > Not right now, but can you please log an issue over at Jira and also link > it to NUTCH-2045? This would help us to track it and fix it with a test if > there is definitely a bug. > > > > > > Any help would be appreciated. > > > > > > Versions: > > Nutch 2.3.1 > > Elasticsearch 1.7.5 > > Gora: 0.6.1 > > Hbase: 1.2.3 > > > > Please note that the supported version of HBase in Nutch2.3.1 is > 0.98.8-hadoop2. I can most certainly say that HBase support will not be > compatible with HBase 1.2.3. > > > > > > <property> > > <name>fetcher.server.delay</name> > > <value>.1</value> > > <description>Delay between page fetches.</description> > > </property> > > > > <property> > > <name>fetcher.server.min.delay</name> > > <value>.1</value> > > </property> > > > > You may find that you experienced access denied e.g. your IP is being > blocked from accessing servers at such small delay amounts. This is just a > friendly warning! > > Please log the issue in Jira and I will try to reproduce. > Thanks > Lewis >

