Re: mapred crawling exception - Job failed!

2006-01-06 Thread Lukas Vlcek
Huh... anybody interested in this? Normally I would be so pushy but to me it seems that Nutch dies if it meets word document which can't be parsed. This seems like a serious issue to me. Or did I overlooked something important/fundamental? Lukas On 1/6/06, Lukas Vlcek [EMAIL PROTECTED] wrote:

Re: mapred crawling exception - Job failed!

2006-01-05 Thread Andrzej Bialecki
Lukas Vlcek wrote: How can I learn that? What I do is running regular one-step command [/bin/nutch crawl] In that case your nutch-default.xml / nutch-site.xml decides, there is a boolean option there. If you didn't change this, then it defaults to true (i.e. your fetcher is parsing the

Re: mapred crawling exception - Job failed!

2006-01-05 Thread Lukas Vlcek
Hi, I found the reason of that exception! If you look into my crawl.log carefully then you notice these lines: 060104 213608 Parsing [http://220.000.000.001/otd_04_Detailed_Design_Document.doc] with [EMAIL PROTECTED] 060104 213609 Unable to successfully parse content

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Gal Nitzan
Yes it was fixed. just update your code from trunk. On Wed, 2006-01-04 at 08:51 +0100, Andrzej Bialecki wrote: Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Hmmm... If I am looking correctly into my local SVN copy then I see I last updated yesterday - thus I have revision 365850 (Update of HTTPClient to v3.0). So this should be already fixed... :-( Andrzej, since you did probably the fix, is there anything special I should check to be sure I have

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Byron Miller
Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Thanks guys! I really didn't have the latest copy... L. On 1/4/06, Byron Miller [EMAIL PROTECTED] wrote: Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi,

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: ... 060104 213644 map 100% 060104 213645 Optimizing index. java.lang.NullPointerException: value cannot be null at

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Andrzej Bialecki
Lukas Vlcek wrote: I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: Do you use the Fetcher in parsing or non-parsing mode, i.e. do you run a ParseSegment as a separate step? --

mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some timeout. I am not sure whether this is a real nutch issue or just mine

Re: mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
Note: I mistakenly used nutch-user email for reply-to value. Feel free to reply to either nutch-dev or nutch-user as I monitor both of them :-) Anyway can anybody tell me how I can easily change reply-to value in gmail? I am struggling with this all the time especially when replying to multiple

Re: mapred crawling exception - Job failed!

2006-01-03 Thread Andrzej Bialecki
Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some timeout. This was fixed (or should be fixed