Hey,
I finally solved it! It was to do with my Cassandra cluster. My hadoop and
cassandra clusters were in two different datacenters. This caused cassandra
requests to timeout. And that meant the generate phase didn’t have any input!
Works like a charm now :)
Regards
--
Manikandan Saravanan
Hey list,
I'm sure this issue was asked several times, but a quick look in the
nutch user archive did not help, so:
Has anyone documentation or tried to use a browser (like chromium) or
phantomjs etc. for fetching web pages?
Due to a heavily loaded javascript site, nutch needs to see the
I'm currently looking at those separately but an integrated option would be
more efficient.
Looking forward for any experience sharing
On Sat, Jun 7, 2014 at 6:25 PM, Patrick Kirsch pkir...@zscho.de wrote:
Hey list,
I'm sure this issue was asked several times, but a quick look in the
nutch
Hi Ali,
OK, I will share using my current script.
I sometimes use -adddays parameter on nutch generate steps to force
recrawling.
Thanks.
On Fri, Jun 6, 2014 at 11:02 PM, Ali Nazemian alinazem...@gmail.com wrote:
Dear Bayu,
Would you please also provide me what procedure you are going to
So you mean the only difference(beside some parameter that should be set in
site-nutch.xml is using nutch generate -adddays instead of nutch generate?
what about other parts?) Could you please provide step by step guide?
Regards.
On Sat, Jun 7, 2014 at 4:20 PM, Bayu Widyasanyata
5 matches
Mail list logo