Hi, I've spent some time working on this as well. I've just put together a blog entry addressing the issues I ran into. See http://techvineyard.blogspot.com/2010/12/build-nutch-20.html
In a nutchsell, I changed three pieces in Gora and Nutch code: - flush the datastore regularly in the Hadoop RecordWriter (in GoraOutputFormat) - wait for Hadoop job completion in the Fetcher job - ensure that the content length limit is not being exceeded in protocol-http plugin (only for MySQL datastore) >> So what am I missing? > > I don't know, we need more information. BTW, dev@ list may be more > appropriate for this discussion. > I agree this should not be in nutch-user list. Post a comment on my blog entry or reply to my thread (http://www.mail-archive.com/[email protected]/msg01385.html) in the dev-list! Alexis.

