://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992586.html
Sent from the Nutch - User mailing list archive at Nabble.com.
in 1/5 the
time.
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992586.html
Sent from the Nutch - User mailing list archive at Nabble.com.
/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992601.html
Sent from the Nutch - User mailing list archive at Nabble.com.
...@gmail.com
Sent: Mon 02-Jul-2012 22:44
To: user@nutch.apache.org
Subject: RE: ParseSegment taking a long time to finish
I'll run more experiments on that segment. My regex-urlfilter.txt removes
urls longer than 350 chars.
-^.{350,}$
Any recommendations for max URL char length? or any other
units.
mapred.child.java.opts -Xmx4096m
mapred.tasktracker.map.tasks.maximum 6
mapred.tasktracker.reduce.tasks.maximum 2
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992605.html
Sent from the Nutch - User mailing
then
allocate more task slots and have higher throughput.
-Original message-
From:sidbatra siddharthaba...@gmail.com
Sent: Mon 02-Jul-2012 23:02
To: user@nutch.apache.org
Subject: RE: ParseSegment taking a long time to finish
You already have that rule configured?
Yes, its-^.{350
Thanks a lot Markus. I'll make these changes, re-run and share the result.
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992610.html
Sent from the Nutch - User mailing list archive at Nabble.com.
running it
local although it shouldn't.
-Original message-
From:sidbatra siddharthaba...@gmail.com
Sent: Mon 02-Jul-2012 23:14
To: user@nutch.apache.org
Subject: RE: ParseSegment taking a long time to finish
Thanks a lot Markus. I'll make these changes, re-run and share the result
That's quite odd. Does the parser spawn multiple threads to optimize parsing
and perhaps one of the threads hangs?
But this is very odd that it's just waiting for the next record.
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish
@nutch.apache.org
Subject: RE: ParseSegment taking a long time to finish
That's quite odd. Does the parser spawn multiple threads to optimize parsing
and perhaps one of the threads hangs?
But this is very odd that it's just waiting for the next record.
--
View this message in context:
http
Hi guys. Did you find a solution for this issue?
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992370.html
Sent from the Nutch - User mailing list archive at Nabble.com.
-Bafflingly-Slow-in-Reduce-Step-with-example-td3988820.html
thanks,
Sid
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3989072.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hi Magnus,
I'm facing the exactly the same issue with Nutch 1.4
Did you manage to find a solution?
thanks,
Sid
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3987122.html
Sent from the Nutch - User mailing list archive
.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3987122.html
Sent from the Nutch - User mailing list archive at Nabble.com.
--
Lewis
, 2012 at 9:23 PM, sidbatra siddharthaba...@gmail.com wrote:
Hi Magnus,
I'm facing the exactly the same issue with Nutch 1.4
Did you manage to find a solution?
thanks,
Sid
--
View this message in context:
http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish
Hi,
I tried the parsechecker tool and as it turns out it hangs after printing out:
Content Metadata: Vary=Accept-Encoding Date=Thu, 23 Feb 2012 15:27:43
GMT Content-Length=3992 Expires=Thu, 19 Nov 1981 08:52:00 GMT
Content-Encoding=gzip
Set-Cookie=Shoper4Shop=a3ojqpk5ep6opahejfpiv98hf6; path=/
Hi,
According to my logs a really long time +2 hours elapses between
parsing the last page in a segment until the ParseSegment finishes as
can be seen here:
2012-02-19 00:51:43,471 INFO parse.ParseSegment - Parsing: http://
2012-02-19 03:15:18,604 INFO parse.ParseSegment - ParseSegment:
Hi,
Could you also try the parsechecker tool on that last url? It's
possible.that the file has a.problem or simply a bug.
Remi
On Sunday, February 19, 2012, Magnús Skúlason magg...@gmail.com wrote:
Hi,
According to my logs a really long time +2 hours elapses between
parsing the last page in
18 matches
Mail list logo