Hello everyone, I am pretty new to whirr and I would like to share with you my doubts. I am using on my laptop latest version of cdh available from repositories and I'd expect it to behave the same way on the cloud since I used the "cloudera recipe" to launch a cdh cluster in EC2, m1.large instance type. Here's the whirr config file http://pastebin.com/JXHYvMNb First of all seems like whirr installs a different version of cdh in the cloud according to what hadoop version says: I don't think this is a big deal but still... Secondly some options have absolutely no effect:
- hadoop-mapreduce.mapred.child.java.opts=-Xmx1600m is basically ignored, even though it is present in mapred-site (on my laptop it works). According to ps aux the Xmx is 200 as default. I've also tried to manually edit mapred-site and add mapred.map.child.java.opts=-Xmx1000m and I got error saying there was no enough space (4GB were free). - hadoop-hdfs.dfs.replication=1 is ignored as well since I see replication factor 3 when I move my data from s3 to hdfs. - hadoop-hdfs.dfs.block.size=134217728 I actually don't know how to see if this has affected hdfs config. I hope someone can shed some light and maybe some configuration tweaks for m1.large he/she is using. Thank you Marco Didonna
