The files are also created on the local machine in ~/.whirr/cluster-name/ so it shouldn't be that hard. The only tricky part is to match the Hadoop version from my point of view.
On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <[email protected]> wrote: > This whole scenario does bring up the question about how people handle this > kind of scenario. To me the beauty of whirr is that it means I can spin up > and down hadoop clusters on the fly when my workflow demands it. If a task > gets q'd up that needs mapreduce, I spin up a cluster, solve my problem, > gather my data, kill the cluster, workflow goes on. > > But if my workflow requires the contents of three little files located on a > different machine, in a different cluster, and possible a different cloud > vendor, that really puts a damper on the whimsical on-the-flyness of > creating hadoop resources only when needed. I'm curious how other people > are handling this scenario. > > > On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <[email protected]>wrote: > >> Awesome! I'm glad we figured this out, I was getting worried that we have >> a critical bug. >> >> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <[email protected]> wrote: >> >>> Ok...I think I figured it out. This email thread made me take a look at >>> how I'm kicking off my hadoop job. My hadoop driver, the class that links a >>> bunch of jobs together in a workflow, is on a different machine than the >>> cluster that hadoop is running on. This means when I create a new >>> Configuration() object it, it tries to load the default hadoop values from >>> the class path, but since the driver isnt running on the hadoop cluster and >>> doesnt have access to the hadoop cluster's configuration files, it just uses >>> the default vales...config for suck. >>> >>> So I copied the *-site.xml files from my namenode over to the machine my >>> hadoop job driver was running from and put it in the class path, and >>> shazam...it picked up the hadoop config that whirr created for me. yay! >>> >>> >>> >>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <[email protected]>wrote: >>> >>>> >>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <[email protected]> wrote: >>>> >>>>> It looks like hadoop is reading default configuration values from >>>>> somewhere and using them, and not reading from >>>>> the /usr/lib/hadoop/conf/*-site.xml files. >>>>> >>>> >>>> If you are running CDH the config files are in: >>>> >>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20} >>>> >>>> >>>> >>>> >>>> >>>> >>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> See >>>> https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> >>> Thanks, >>> John C >>> >>> >> > > > -- > > Thanks, > John C > >
