Hi Joe,

On Fri, Oct 21, 2016 at 7:34 AM, <[email protected]> wrote:

> From: Joe Adams <[email protected]>
> To: [email protected]
> Cc:
> Date: Fri, 21 Oct 2016 10:34:15 -0400
> Subject: Nutch 2.3.1 elasticsearch tstamp
> I'm working on setting up nutch with elasticsearch and hbase to crawl a
> site and provide a dashboard in kibana for reporting. I have the
> interactions working between the components. I can crawl the site, hbase
> shows all the data, and I can index into elasticsearch. The problem is that
> the tstamp field in elasticsearch shows 1970-01-01T00:00:00.000Z and not
> data related the fetched time of the page. I also tried adding the
> index-more plugin and that seems to add a 'date' field but this also shows
> up as epoch.
>
> I can't find much searching around the internet. The only thing I can find
> closely related is https://issues.apache.org/jira/browse/NUTCH-2045, but
> that was fixed in 2.3.1 which is the version I'm running.
>

My suggestion would be, that if you are running Nutch2, then use the
current development branch which is available at
https://github.com/apache/nutch/tree/2.x. I say this as we are always
fixing bugs and it will enable other using this branch a better chance of
reproducing your issue. Additionally, this will enable you to upgrade to ES
2.X as per the indexer-elastic2 plugin
https://github.com/apache/nutch/tree/2.x/src/plugin/indexer-elastic2


>
> Does anyone have any idea why my dates aren't being set properly in my
> elasticsearch index?


Not yet but I will scope it out.


> The data looks good if I run readdb -url $url.


Thanks for this info.


> Can
> anyone provide some good advice to troubleshoot this further?
>

Not right now, but can you please log an issue over at Jira and also link
it to NUTCH-2045? This would help us to track it and fix it with a test if
there is definitely a bug.


>
> Any help would be appreciated.
>
>
> Versions:
> Nutch 2.3.1
> Elasticsearch 1.7.5
> Gora: 0.6.1
> Hbase: 1.2.3
>

Please note that the supported version of HBase in Nutch2.3.1 is
0.98.8-hadoop2. I can most certainly say that HBase support will not be
compatible with HBase 1.2.3.


>
> <property>
>     <name>fetcher.server.delay</name>
>     <value>.1</value>
>     <description>Delay between page fetches.</description>
> </property>
>
> <property>
>     <name>fetcher.server.min.delay</name>
>     <value>.1</value>
> </property>
>

You may find that you experienced access denied e.g. your IP is being
blocked from accessing servers at such small delay amounts. This is just a
friendly warning!

Please log the issue in Jira and I will try to reproduce.
Thanks
Lewis

Reply via email to