Huh...
anybody interested in this?
Normally I would be so pushy but to me it seems that Nutch dies if it
meets word document which can't be parsed. This seems like a serious
issue to me.
Or did I overlooked something important/fundamental?
Lukas
On 1/6/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
How can I learn that?
What I do is running regular one-step command [/bin/nutch crawl]
In that case your nutch-default.xml / nutch-site.xml decides, there is a
boolean option there. If you didn't change this, then it defaults to
true (i.e. your fetcher is parsing the
Hi,
I found the reason of that exception!
If you look into my crawl.log carefully then you notice these lines:
060104 213608 Parsing
[http://220.000.000.001/otd_04_Detailed_Design_Document.doc] with
[EMAIL PROTECTED]
060104 213609 Unable to successfully parse content
Yes it was fixed. just update your code from trunk.
On Wed, 2006-01-04 at 08:51 +0100, Andrzej Bialecki wrote:
Lukas Vlcek wrote:
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done
Hmmm...
If I am looking correctly into my local SVN copy then I see I last
updated yesterday - thus I have revision 365850 (Update of HTTPClient
to v3.0). So this should be already fixed... :-(
Andrzej, since you did probably the fix, is there anything special I
should check to be sure I have
Fixed in the copy i run as i've been able to get my
100k pages indexed without getting that error.
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Hi,
I am trying to use the latest nutch-trunk version
but I am facing
unexpected Job failed! exception. It seems
Thanks guys!
I really didn't have the latest copy...
L.
On 1/4/06, Byron Miller [EMAIL PROTECTED] wrote:
Fixed in the copy i run as i've been able to get my
100k pages indexed without getting that error.
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Hi,
I gave it a next try this night and I still have troubles.
This is the very end of my log (full version is attached) and you can
see another nasty exception:
...
060104 213644 map 100%
060104 213645 Optimizing index.
java.lang.NullPointerException: value cannot be null
at
Lukas Vlcek wrote:
I gave it a next try this night and I still have troubles.
This is the very end of my log (full version is attached) and you can
see another nasty exception:
Do you use the Fetcher in parsing or non-parsing mode, i.e. do you run a
ParseSegment as a separate step?
--
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done but some threads are hunged which results into
exception after some timeout.
I am not sure whether this is a real nutch issue or just mine
Note: I mistakenly used nutch-user email for reply-to value. Feel free
to reply to either nutch-dev or nutch-user as I monitor both of them
:-)
Anyway can anybody tell me how I can easily change reply-to value in
gmail? I am struggling with this all the time especially when replying
to multiple
Lukas Vlcek wrote:
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done but some threads are hunged which results into
exception after some timeout.
This was fixed (or should be fixed
12 matches
Mail list logo