Re: ParseSegment taking a long time to finish

2012-07-02 Thread sidbatra
://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992586.html Sent from the Nutch - User mailing list archive at Nabble.com.

RE: ParseSegment taking a long time to finish

2012-07-02 Thread Markus Jelsma
in 1/5 the time. -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992586.html Sent from the Nutch - User mailing list archive at Nabble.com.

RE: ParseSegment taking a long time to finish

2012-07-02 Thread sidbatra
/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992601.html Sent from the Nutch - User mailing list archive at Nabble.com.

RE: ParseSegment taking a long time to finish

2012-07-02 Thread Markus Jelsma
...@gmail.com Sent: Mon 02-Jul-2012 22:44 To: user@nutch.apache.org Subject: RE: ParseSegment taking a long time to finish I'll run more experiments on that segment. My regex-urlfilter.txt removes urls longer than 350 chars. -^.{350,}$ Any recommendations for max URL char length? or any other

RE: ParseSegment taking a long time to finish

2012-07-02 Thread sidbatra
units. mapred.child.java.opts -Xmx4096m mapred.tasktracker.map.tasks.maximum 6 mapred.tasktracker.reduce.tasks.maximum 2 -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992605.html Sent from the Nutch - User mailing

RE: ParseSegment taking a long time to finish

2012-07-02 Thread Markus Jelsma
then allocate more task slots and have higher throughput. -Original message- From:sidbatra siddharthaba...@gmail.com Sent: Mon 02-Jul-2012 23:02 To: user@nutch.apache.org Subject: RE: ParseSegment taking a long time to finish You already have that rule configured? Yes, its-^.{350

RE: ParseSegment taking a long time to finish

2012-07-02 Thread sidbatra
Thanks a lot Markus. I'll make these changes, re-run and share the result. -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992610.html Sent from the Nutch - User mailing list archive at Nabble.com.

RE: ParseSegment taking a long time to finish

2012-07-02 Thread Markus Jelsma
running it local although it shouldn't. -Original message- From:sidbatra siddharthaba...@gmail.com Sent: Mon 02-Jul-2012 23:14 To: user@nutch.apache.org Subject: RE: ParseSegment taking a long time to finish Thanks a lot Markus. I'll make these changes, re-run and share the result

RE: ParseSegment taking a long time to finish

2012-07-02 Thread sidbatra
That's quite odd. Does the parser spawn multiple threads to optimize parsing and perhaps one of the threads hangs? But this is very odd that it's just waiting for the next record. -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish

RE: ParseSegment taking a long time to finish

2012-07-02 Thread Markus Jelsma
@nutch.apache.org Subject: RE: ParseSegment taking a long time to finish That's quite odd. Does the parser spawn multiple threads to optimize parsing and perhaps one of the threads hangs? But this is very odd that it's just waiting for the next record. -- View this message in context: http

Re: ParseSegment taking a long time to finish

2012-07-01 Thread mstekel
Hi guys. Did you find a solution for this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992370.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: ParseSegment taking a long time to finish

2012-06-11 Thread sidbatra
-Bafflingly-Slow-in-Reduce-Step-with-example-td3988820.html thanks, Sid -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3989072.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: ParseSegment taking a long time to finish

2012-05-31 Thread sidbatra
Hi Magnus, I'm facing the exactly the same issue with Nutch 1.4 Did you manage to find a solution? thanks, Sid -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3987122.html Sent from the Nutch - User mailing list archive

Re: ParseSegment taking a long time to finish

2012-05-31 Thread Lewis John Mcgibbney
.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3987122.html Sent from the Nutch - User mailing list archive at Nabble.com. -- Lewis

Re: ParseSegment taking a long time to finish

2012-05-31 Thread Magnús Skúlason
, 2012 at 9:23 PM, sidbatra siddharthaba...@gmail.com wrote: Hi Magnus, I'm facing the exactly the same issue with Nutch 1.4 Did you manage to find a solution? thanks, Sid -- View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish

Re: ParseSegment taking a long time to finish

2012-02-23 Thread Magnús Skúlason
Hi, I tried the parsechecker tool and as it turns out it hangs after printing out: Content Metadata: Vary=Accept-Encoding Date=Thu, 23 Feb 2012 15:27:43 GMT Content-Length=3992 Expires=Thu, 19 Nov 1981 08:52:00 GMT Content-Encoding=gzip Set-Cookie=Shoper4Shop=a3ojqpk5ep6opahejfpiv98hf6; path=/

ParseSegment taking a long time to finish

2012-02-19 Thread Magnús Skúlason
Hi, According to my logs a really long time +2 hours elapses between parsing the last page in a segment until the ParseSegment finishes as can be seen here: 2012-02-19 00:51:43,471 INFO parse.ParseSegment - Parsing: http:// 2012-02-19 03:15:18,604 INFO parse.ParseSegment - ParseSegment:

Re: ParseSegment taking a long time to finish

2012-02-19 Thread remi tassing
Hi, Could you also try the parsechecker tool on that last url? It's possible.that the file has a.problem or simply a bug. Remi On Sunday, February 19, 2012, Magnús Skúlason magg...@gmail.com wrote: Hi, According to my logs a really long time +2 hours elapses between parsing the last page in