Hi,

I've spent some time working on this as well. I've just put together a
blog entry addressing the issues I ran into. See
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

In a nutchsell, I changed three pieces in Gora and Nutch code:
- flush the datastore regularly in the Hadoop RecordWriter (in GoraOutputFormat)
- wait for Hadoop job completion in the Fetcher job
- ensure that the content length limit is not being exceeded in
protocol-http plugin (only for MySQL datastore)

>> So what am I missing?
>
> I don't know, we need more information. BTW, dev@ list may be more
> appropriate for this discussion.
>


I agree this should not be in nutch-user list. Post a comment on my
blog entry or reply to my thread
(http://www.mail-archive.com/[email protected]/msg01385.html) in
the dev-list!

Alexis.

Reply via email to