for timeout?
Good to know! I was definitely exceeding that, so I've changed my properties.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Thursday, October 20, 2011 10:00 AM
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Good workaround
The actual parse which is producing time outs happens early in the process.
There are, to my knowledge, no Nutch settings to make this faster or change
its behaviour, it's all about the parser implementation.
Try increasing your parser.timeout setting.
On Wednesday 26 October 2011 16:45:33
parsing of large XML files (Was RE: Good workaround
for timeout?)
The actual parse which is producing time outs happens early in the process.
There are, to my knowledge, no Nutch settings to make this faster or change its
behaviour, it's all about the parser implementation.
Try increasing your
, 2011 4:57 PM
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Good workaround for timeout?
I'm using protocol-http, but I removed protocol-httpclient after you
pointed out in another thread that it's broken. Unfortunately I'm not
sure which properties are used by what, and I'm not sure
Integer.MAX_VALUE. Don't know
how hadoop will handle for sure.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 19, 2011 4:57 PM
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Good workaround for timeout?
I'm using protocol
Good to know! I was definitely exceeding that, so I've changed my properties.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Thursday, October 20, 2011 10:00 AM
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Good workaround for timeout
What is timing out, the fetch or the parse?
I'm getting a fairly persistent timeout on a particular page. Other,
smaller pages in this folder do fine, but this one times out most of the
time. When it fails, my ParserChecker results look like:
# bin/nutch
: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 19, 2011 11:08 AM
To: user@nutch.apache.org
Subject: Re: Good workaround for timeout?
What is timing out, the fetch or the parse?
I'm getting a fairly persistent timeout on a particular page. Other,
smaller pages
Subject: Re: Good workaround for timeout?
What is timing out, the fetch or the parse?
I'm getting a fairly persistent timeout on a particular page. Other,
smaller pages in this folder do fine, but this one times out most of
the time. When it fails, my ParserChecker results look like
because of a very long or corrupted document.
/description
/property
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 19, 2011 11:28 AM
To: user@nutch.apache.org
Subject: Re: Good workaround for timeout?
It is indeed. Tricky.
Are you going
10 matches
Mail list logo