Thanks for the feeback. I updated to the 2.x branch (2.4-SNAPSHOT) and that
seems to have fixed the timestamp issues. I saw in the change log that
there was a fix for this exact bug. I didn't realize that the
indexer-elastic plugin was deprecated and you must upgrade to
elasticsearch2 for compatibility. After sorting all that out, things seem
good.

Thanks again!

On Fri, Oct 21, 2016 at 3:04 PM, lewis john mcgibbney <[email protected]>
wrote:

> Hi Joe,
>
> On Fri, Oct 21, 2016 at 7:34 AM, <[email protected]>
> wrote:
>
> > From: Joe Adams <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Fri, 21 Oct 2016 10:34:15 -0400
> > Subject: Nutch 2.3.1 elasticsearch tstamp
> > I'm working on setting up nutch with elasticsearch and hbase to crawl a
> > site and provide a dashboard in kibana for reporting. I have the
> > interactions working between the components. I can crawl the site, hbase
> > shows all the data, and I can index into elasticsearch. The problem is
> that
> > the tstamp field in elasticsearch shows 1970-01-01T00:00:00.000Z and not
> > data related the fetched time of the page. I also tried adding the
> > index-more plugin and that seems to add a 'date' field but this also
> shows
> > up as epoch.
> >
> > I can't find much searching around the internet. The only thing I can
> find
> > closely related is https://issues.apache.org/jira/browse/NUTCH-2045, but
> > that was fixed in 2.3.1 which is the version I'm running.
> >
>
> My suggestion would be, that if you are running Nutch2, then use the
> current development branch which is available at
> https://github.com/apache/nutch/tree/2.x. I say this as we are always
> fixing bugs and it will enable other using this branch a better chance of
> reproducing your issue. Additionally, this will enable you to upgrade to ES
> 2.X as per the indexer-elastic2 plugin
> https://github.com/apache/nutch/tree/2.x/src/plugin/indexer-elastic2
>
>
> >
> > Does anyone have any idea why my dates aren't being set properly in my
> > elasticsearch index?
>
>
> Not yet but I will scope it out.
>
>
> > The data looks good if I run readdb -url $url.
>
>
> Thanks for this info.
>
>
> > Can
> > anyone provide some good advice to troubleshoot this further?
> >
>
> Not right now, but can you please log an issue over at Jira and also link
> it to NUTCH-2045? This would help us to track it and fix it with a test if
> there is definitely a bug.
>
>
> >
> > Any help would be appreciated.
> >
> >
> > Versions:
> > Nutch 2.3.1
> > Elasticsearch 1.7.5
> > Gora: 0.6.1
> > Hbase: 1.2.3
> >
>
> Please note that the supported version of HBase in Nutch2.3.1 is
> 0.98.8-hadoop2. I can most certainly say that HBase support will not be
> compatible with HBase 1.2.3.
>
>
> >
> > <property>
> >     <name>fetcher.server.delay</name>
> >     <value>.1</value>
> >     <description>Delay between page fetches.</description>
> > </property>
> >
> > <property>
> >     <name>fetcher.server.min.delay</name>
> >     <value>.1</value>
> > </property>
> >
>
> You may find that you experienced access denied e.g. your IP is being
> blocked from accessing servers at such small delay amounts. This is just a
> friendly warning!
>
> Please log the issue in Jira and I will try to reproduce.
> Thanks
> Lewis
>

Reply via email to