It looks like we might want to at least give it a try then, with the worst
possible case of Nutch users having to keep speculative execution disabled if
it causes grief again. If other problems arise, then we can just revert back to
0.11.2 which seems to be stable in terms of all the Nutch operations.
----- Original Message ----
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: nutch-dev@lucene.apache.org
Sent: Sunday, March 11, 2007 4:34:38 PM
Subject: Hadoop 0.11.2 vs. 0.12.1
Hi all,
After our discussion about which Hadoop release to use for the upcoming
Nutch release, I decided to ask around on the Hadoop mailing list. The
message was clear that we should go with 0.12.1 - see below:
Owen O'Malley wrote:
>
> On Mar 10, 2007, at 12:32 AM, Andrzej Bialecki wrote:
>
>>> I think the experience on big clusters at Yahoo! is that 0.12.1
>>> should be more stable than 0.11.2, but others can confirm that.
>>
>> Hm.. That's not the impression I have from JIRA and the mailing list.
>> My impression is that even though 0.12.1 is more robust in some
>> situations, the significant changes (checksum filesystem, speculative
>> execution, in memory sorting, improved map output handling, etc, etc)
>> made between these releases introduced many subtle bugs which only
>> now start coming into light.
>
> We never upgraded our main clusters to 11.2 because it never
> stabilized to our satisfaction, which is why I was proposing an 11.3.
> However, 12.1 is looking pretty good with the exception of a couple
> of bugs and we decided to hold out for 12.1. At this point, if I was
> going to 11, I'd want a lot of the fixes that have been done in between.
0.12.x release has speculative execution turned on by default, but I
remember that there were places in Nutch that would break when using
PhasedFileSystem (which is what Hadoop uses when run in that mode). I'm
afraid there might be other issues here as well - noone tested Nutch
with 0.12 to be sure that it works ok.
On the other hand, I only tested 0.11.2 in a limited production env., so
there may be other bugs lurking there that Owen referred to, which show
up when you run larger jobs (or different jobs).
What do you think?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers