Any type should work. We can change it later. On Thu, Oct 6, 2011 at 12:07 AM, John Conwell <[email protected]> wrote:
> Do you guys want it logged as a bug, feature, improvement? Does it matter? > > > On Wed, Oct 5, 2011 at 1:32 PM, Andrei Savu <[email protected]> wrote: > >> I understand. From my point of view this is a bug we should fix. Can you >> open an issue? >> >> On Wed, Oct 5, 2011 at 11:25 PM, John Conwell <[email protected]> wrote: >> >>> I thought about that, but the hadoop-site.xml created by whirr has some >>> of the info needed, but its not the full set of xml elements that get >>> written to the *-site.xml files on the hadoop cluster. For example whirr >>> sets *mapred.reduce.tasks* based on the number task trackers, which is >>> vital for the job configuration to have. But the hadoop-size.xml doesnt >>> have this value. It only has the core properties needed to allow you to use >>> the ssh proxy to interact with the name node and job tracker >>> >>> >>> >>> On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <[email protected]>wrote: >>> >>>> The files are also created on the local machine in >>>> ~/.whirr/cluster-name/ so it shouldn't be that hard. The only tricky part >>>> is >>>> to match the Hadoop version from my point of view. >>>> >>>> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <[email protected]> wrote: >>>> >>>>> This whole scenario does bring up the question about how people handle >>>>> this kind of scenario. To me the beauty of whirr is that it means I can >>>>> spin up and down hadoop clusters on the fly when my workflow demands it. >>>>> If >>>>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my >>>>> problem, gather my data, kill the cluster, workflow goes on. >>>>> >>>>> But if my workflow requires the contents of three little files located >>>>> on a different machine, in a different cluster, and possible a different >>>>> cloud vendor, that really puts a damper on the whimsical on-the-flyness of >>>>> creating hadoop resources only when needed. I'm curious how other people >>>>> are handling this scenario. >>>>> >>>>> >>>>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <[email protected]>wrote: >>>>> >>>>>> Awesome! I'm glad we figured this out, I was getting worried that we >>>>>> have a critical bug. >>>>>> >>>>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <[email protected]>wrote: >>>>>> >>>>>>> Ok...I think I figured it out. This email thread made me take a look >>>>>>> at how I'm kicking off my hadoop job. My hadoop driver, the class that >>>>>>> links a bunch of jobs together in a workflow, is on a different machine >>>>>>> than >>>>>>> the cluster that hadoop is running on. This means when I create a new >>>>>>> Configuration() object it, it tries to load the default hadoop values >>>>>>> from >>>>>>> the class path, but since the driver isnt running on the hadoop cluster >>>>>>> and >>>>>>> doesnt have access to the hadoop cluster's configuration files, it just >>>>>>> uses >>>>>>> the default vales...config for suck. >>>>>>> >>>>>>> So I copied the *-site.xml files from my namenode over to the machine >>>>>>> my hadoop job driver was running from and put it in the class path, and >>>>>>> shazam...it picked up the hadoop config that whirr created for me. yay! >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <[email protected]>wrote: >>>>>>>> >>>>>>>>> It looks like hadoop is reading default configuration values from >>>>>>>>> somewhere and using them, and not reading from >>>>>>>>> the /usr/lib/hadoop/conf/*-site.xml files. >>>>>>>>> >>>>>>>> >>>>>>>> If you are running CDH the config files are in: >>>>>>>> >>>>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> See >>>>>>>> https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thanks, >>>>>>> John C >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Thanks, >>>>> John C >>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> Thanks, >>> John C >>> >>> >> > > > -- > > Thanks, > John C > >
