Re: Starting up hadoop cluster programatically

2011-01-19 Thread Tom White
The Jsch source doesn't have any references to proxyHost, so I'm
guessing these properties don't have any effect. However, the Jsch
home page does mention connection through HTTP proxy. Perhaps you
could ask on the Jsch list how to achieve this?

Cheers,
Tom

On Wed, Jan 19, 2011 at 6:40 AM, Andrei Savu savu.and...@gmail.com wrote:
 I don't think it's possible to open SSH connections over a HTTP/HTTPS company 
 proxy. I believe you need a SOCKS proxy but unfortunately, as far as I know, 
 jsch does not support proxy connections.

 I believe that you need to find a way to have direct SSH access to the Amazon 
 Cloud.

 Maybe there is a simpler way. You should wait for more replies.

 -original message-
 Subject: Starting up hadoop cluster programatically
 From: praveen.pe...@nokia.com
 Date: 19/01/2011 16:33

 Hi all,
 I am trying to create a cluster dynamically using a Java Program. I wrote a 
 program similar to HadoopServiceController in trunk code. I also added proxy 
 information as system property (since I am inside my company network). I saw 
 the cluster got created but the configuration fails. Here is the output. I am 
 sure I am missing the proxy details somewhere else.

 I set proxy details as follows:
 System.setProperty(http.proxyHost, xxx.xxx.com);
        System.setProperty(http.proxyPort, );
        System.setProperty(https.proxyHost, xxx.xxx.com);
        System.setProperty(https.proxyPort, );

 Here is the output:

 INFO: Starting up cluster...
 Jan 18, 2011 3:29:02 PM 
 org.apache.whirr.cluster.actions.BootstrapClusterAction doAction
 INFO: Bootstrapping cluster
 Jan 18, 2011 3:29:03 PM 
 org.apache.whirr.cluster.actions.BootstrapClusterAction buildTemplate
 INFO: Configuring template
 Jan 18, 2011 3:29:08 PM 
 org.apache.whirr.cluster.actions.BootstrapClusterAction$1 call
 INFO: Starting 1 node(s) with roles [tt, dn]
 Jan 18, 2011 3:29:08 PM 
 org.apache.whirr.cluster.actions.BootstrapClusterAction buildTemplate
 INFO: Configuring template
 Jan 18, 2011 3:29:11 PM 
 org.apache.whirr.cluster.actions.BootstrapClusterAction$1 call
 INFO: Starting 1 node(s) with roles [jt, nn]
  problem applying options to node(556284):
 org.jclouds.ssh.SshException: root@184.106.155.148:22: Error connecting to 
 session.
        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:250)
        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
        at 
 org.jclouds.compute.util.ComputeUtils.runCallablesOnNode(ComputeUtils.java:202)
        at 
 org.jclouds.compute.util.ComputeUtils.runOptionsOnNode(ComputeUtils.java:151)
        at org.jclouds.compute.util.ComputeUtils$1.call(ComputeUtils.java:116)
        at org.jclouds.compute.util.ComputeUtils$1.call(ComputeUtils.java:112)
        at 
 org.jclouds.compute.strategy.impl.EncodeTagIntoNameRunNodesAndAddToSetStrategy$1.call(EncodeTagIntoNameRunNodesAndAddToSetStrategy.java:93)
        at 
 org.jclouds.compute.strategy.impl.EncodeTagIntoNameRunNodesAndAddToSetStrategy$1.call(EncodeTagIntoNameRunNodesAndAddToSetStrategy.java:86)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: com.jcraft.jsch.JSchException: java.net.ConnectException: 
 Connection timed out
        at com.jcraft.jsch.Util.createSocket(Util.java:386)
        at com.jcraft.jsch.Session.connect(Session.java:182)
        at com.jcraft.jsch.Session.connect(Session.java:150)
        at 
 org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:245)
        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:184)
        ... 11 more





Re: Running Mapred jobs after launching cluster

2011-01-27 Thread Tom White
You don't need to add anything to the classpath, but you need to use
the configuration in the org.apache.whirr.service.Cluster object to
populate your Hadoop Configuration object so that your code knows
which cluster to connect to. See the getConfiguration() method in
HadoopServiceController for how to do this.

Cheers,
Tom

On Thu, Jan 27, 2011 at 12:21 PM,  praveen.pe...@nokia.com wrote:
 Hello all,
 I wrote a java class HadoopLanucher that is very similar to
 HadoopServiceController. I was succesfully able to launch a cluster
 programtically from my application using Whirr. Now I want to copy files to
 hdfs and also run a job progrmatically.

 When I copy a file to hdfs its copying to local file system, not hdfs. Here
 is the code I used:

 Configuration conf = new Configuration();
 FileSystem hdfs = FileSystem.get(conf);
 hdfs.copyFromLocalFile(false, true, new Path(localFilePath), new
 Path(hdfsFileDirectory));

 Do I need to add anything else to the classpath so Hadoop libraries know
 that it needs to talk to the dynamically lanuched cluster? When running
 Whirr from command line I know it uses HADOOP_CONF_DIR to find the hadoop
 config files but when doing the same from Java I am wondering how to solve
 this issue.

 Praveen




Re: Error while running cassandra using whirr

2011-01-28 Thread Tom White
On Fri, Jan 28, 2011 at 12:53 AM, Ashish paliwalash...@gmail.com wrote:
 Folks,

 I followed the instructions from
 http://www.philwhln.com/quickly-launch-a-cassandra-cluster-on-amazon-ec2

 Using whirr-0.2.0-incubating stable release

I suggest trying with 0.3.0 (out soon, or available from svn now, as
the blog outlines) since the Cassandra code has changed quite a bit
since 0.2.0.


 Instances are launched, but at the end it it displays an error while
 connecting it.

 Following exception is printed -

 Authorizing firewall
 Running configuration script
 Exception in thread main java.io.IOException:
 org.jclouds.compute.RunScriptOnNodesException: error runScript on
 filter
 ed nodes options(RunScriptOptions [overridingCredentials=true, 
 runAsRoot=true])
 Execution failures:

 0 error[s]
 Node failures:

 1) SshException on node us-east-1/i-17f3497b:
 org.jclouds.ssh.SshException: ec2-user@50.16.165.161:22: Error
 connecting to session.
        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:250)
        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:204)
        at 
 org.jclouds.compute.internal.BaseComputeService$4.call(BaseComputeService.java:375)
        at 
 org.jclouds.compute.internal.BaseComputeService$4.call(BaseComputeService.java:364)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
 Caused by: com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:452)
        at com.jcraft.jsch.Session.connect(Session.java:150)
        at 
 org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:245)
        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:184)
        ... 7 more


 Hadoop cluster runs fine with the whirr release.

 My configuration file is plan simple, picked from the blog and the
 real value replaced.

 Am I missing anything out here?

 thanks
 ashish



Re: Running Mapred jobs after launching cluster

2011-01-28 Thread Tom White
On Fri, Jan 28, 2011 at 12:06 PM,  praveen.pe...@nokia.com wrote:
 Thanks Tom. I think I got it working with my own driver so I will go with it 
 for now (unless that proves to be a bad option).

 BTW, could you tell me how to stick with one hadoop version while launching 
 cluster. I have hadoop-0.20.2 in my classpath but it lookws like Whirr gets 
 the latest hadoop from the repository. Since the latest version may be 
 different depending on the time, I would like to stick to one version so that 
 hadoop version mismatch won't happen.

You do need to make sure that the versions are the same. See the
Hadoop integration tests, which specify the version of Hadoop to use
in their POM.


 Also what jar files are necessary for launching cluster using Java. Currently 
 I have cli version of jar file but that's way too large since it has 
 ervrything in it.

You need Whirr's core and Hadoop jars, as well as their dependencies.
If you look at the POMs in the source code they will tell you the
dependencies.

Cheers
Tom


 Thanks
 Praveen

 -Original Message-
 From: ext Tom White [mailto:tom.e.wh...@gmail.com]
 Sent: Friday, January 28, 2011 2:12 PM
 To: whirr-user@incubator.apache.org
 Subject: Re: Running Mapred jobs after launching cluster

 On Fri, Jan 28, 2011 at 6:28 AM,  praveen.pe...@nokia.com wrote:
 Thanks Tom. Could you eloborate little more on the second option.

 What is the HADOOP_CONF_DIR here, after launching the cluster?

 ~/.whirr/cluster-name

 When you said run in new process, did you mean using command line Whirr tool?

 I meant that you could launch Whirr using the CLI, or Java. Then run the job 
 in another process, with HADOOP_CONF_DIR set.

 The MR jobs you are running I assume can be run against an arbitrary cluster, 
 so you should be able to point them at a cluster started by Whirr.

 Tom


 I may finally end up writing my own driver for running external mapred jobs 
 so I can have more control but I was just curious to know if option #2 is 
 better than writing my own driver.

 Praveen

 -Original Message-
 From: ext Tom White [mailto:t...@cloudera.com]
 Sent: Thursday, January 27, 2011 4:01 PM
 To: whirr-user@incubator.apache.org
 Subject: Re: Running Mapred jobs after launching cluster

 If they implement the Tool interface then you can set configuration on them. 
 Failing that you could set HADOOP_CONF_DIR and run them in a new process.

 Cheers,
 Tom

 On Thu, Jan 27, 2011 at 12:52 PM,  praveen.pe...@nokia.com wrote:
 Hmm...
 I am running some of the map reduce jobs written by me but some of them are 
 in external libraries (eg. Mahout) which I don't have control over. Since I 
 can't modify the code in external libraries, is there any other way to make 
 this work?

 Praveen

 -Original Message-
 From: ext Tom White [mailto:tom.e.wh...@gmail.com]
 Sent: Thursday, January 27, 2011 3:42 PM
 To: whirr-user@incubator.apache.org
 Subject: Re: Running Mapred jobs after launching cluster

 You don't need to add anything to the classpath, but you need to use the 
 configuration in the org.apache.whirr.service.Cluster object to populate 
 your Hadoop Configuration object so that your code knows which cluster to 
 connect to. See the getConfiguration() method in HadoopServiceController 
 for how to do this.

 Cheers,
 Tom

 On Thu, Jan 27, 2011 at 12:21 PM,  praveen.pe...@nokia.com wrote:
 Hello all,
 I wrote a java class HadoopLanucher that is very similar to
 HadoopServiceController. I was succesfully able to launch a cluster
 programtically from my application using Whirr. Now I want to copy
 files to hdfs and also run a job progrmatically.

 When I copy a file to hdfs its copying to local file system, not hdfs.
 Here is the code I used:

 Configuration conf = new Configuration(); FileSystem hdfs =
 FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new
 Path(localFilePath), new Path(hdfsFileDirectory));

 Do I need to add anything else to the classpath so Hadoop libraries
 know that it needs to talk to the dynamically lanuched cluster? When
 running Whirr from command line I know it uses HADOOP_CONF_DIR to
 find the hadoop config files but when doing the same from Java I am
 wondering how to solve this issue.

 Praveen







[ANNOUNCE] Apache Whirr 0.3.0-incubating released

2011-01-30 Thread Tom White
The Apache Whirr team is pleased to announce the release of Whirr
0.3.0-incubating from the Apache Incubator.

Apache Whirr is a set of libraries for running cloud services such as
Apache Hadoop, HBase, ZooKeeper, and Cassandra.

The release is available here:
http://www.apache.org/dyn/closer.cgi/incubator/whirr/

The full change log is available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=1230version=12315487

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://incubator.apache.org/whirr/

Thanks to everyone who contributed to this release.

The Apache Whirr Team


Re: Whirr and HBase

2011-02-01 Thread Tom White
Try removing the CDH lines. I don't think that this combination works yet.

Tom
On Feb 1, 2011 7:53 AM, Paolo Castagna castagna.li...@googlemail.com
wrote:
 Andrei Savu wrote:
 Could you share the recipe? I want to try to replicate the issue on my
computer.

 
 whirr.cluster-name=myhbase
 whirr.instance-templates=1 zk+nn+jt+hbase-master,3
dn+tt+hbase-regionserver
 whirr.hadoop-install-runurl=cloudera/cdh/install
 whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
 whirr.provider=ec2

 whirr.identity=
 whirr.credential=

 # See also: http://aws.amazon.com/ec2/instance-types/
 # t1.micro, m1.small, m1.large, m1.xlarge, m2.xlarge, m2.2xlarge,
m2.4xlarge,
 c1.medium, c1.xlarge, cc1.4xlarge
 # whirr.hardware-id=m1.large
 # Ubuntu 10.04 LTS Lucid. See also: http://alestic.com/
 # whirr.image-id=eu-west-1/ami-0d9ca979
 # If you choose a different location, make sure whirr.image-id is updated
too
 # whirr.location-id=eu-west-1

 #whirr.hardware-id=m1.large
 #whirr.location-id=us-east-1
 #whirr.image-id=us-east-1/ami-f8f40591

 whirr.hardware-id=m1.xlarge
 whirr.image-id=us-east-1/ami-da0cf8b3
 whirr.location-id=us-east-1

 whirr.private-key-file=${sys:user.home}/.ssh/whirr
 whirr.public-key-file=${sys:user.home}/.ssh/whirr.pub
 

 I did different attempts (you see them commented).
 Last one, I was using m1.xlarge with us-east-1/ami-da0cf8b3.

 Paolo


 On Tue, Feb 1, 2011 at 5:32 PM, Paolo Castagna
 castagna.li...@googlemail.com wrote:
 Hi,
 I am trying to run an HBase small cluster using Whirr 0.3.0-incubating
 and (since it does not start HBase master or it does not install Hadoop
 correctly) Whirr from trunk.

 When I run it from trunk with a recipe very similar to the one provided
 in the recipes folder, I see these errors in the whirr.log:


 2011-02-01 15:11:58,484 DEBUG [jclouds.compute] (user thread 9) 
stderr
 from runscript as ubuntu@50.16.158.231
 + [[ hbase != \h\b\a\s\e ]]
 + HBASE_HOME=/usr/local/hbase-0.89.20100924
 + HBASE_CONF_DIR=/usr/local/hbase-0.89.20100924/conf
 + update_repo
 + which dpkg
 + sudo apt-get update
 + install_hbase
 + id hadoop
 + useradd hadoop
 useradd: group hadoop exists - if you want to add this user to that
group,
 use -g

 [...]

 2011-02-01 15:12:26,370 DEBUG [jclouds.compute] (user thread 2) 
stderr
 from computeserv as ubuntu@50.16.158.231
 + HBASE_VERSION=hbase-0.89.20100924
 + [[ hbase != \h\b\a\s\e ]]
 + HBASE_HOME=/usr/local/hbase-0.89.20100924
 + HBASE_CONF_DIR=/usr/local/hbase-0.89.20100924/conf
 + configure_hbase
 + case $CLOUD_PROVIDER in
 + MOUNT=/mnt
 + mkdir -p /mnt/hbase
 + chown hadoop:hadoop /mnt/hbase
 chown: invalid user: `hadoop:hadoop'

 Is this a known problem?

 Paolo








Re: Whirr supports redhat?

2011-02-18 Thread Tom White
I used the API guide and curl to get the information for server and
flavour IDs. There is probably a better way that I don't know about
though.

Cheers,
Tom

On Fri, Feb 18, 2011 at 4:15 PM,  praveen.pe...@nokia.com wrote:
 Thanks Tom. I will let the list know how it goes. Where do I get the 
 information about what the whirr.image-id property should be?

 Praveen

 On Feb 18, 2011, at 7:12 PM, ext Tom White tom.e.wh...@gmail.com wrote:

 Hi Praveen,

 I haven't tried Hadoop on Cloudservers with Redhat, but the scripts do
 support RPM-based systems (like Amazon's Linux AMI). Please let the
 list know if you get it working with this combination and consider
 contributing a configuration recipe.

 Cheers,
 Tom

 On Fri, Feb 18, 2011 at 2:46 PM,  praveen.pe...@nokia.com wrote:
 Does Whirr support spawning hadoop cluster on redhat on rackspace. I could
 not find any documentation related to redhat. Currently I am using Ubuntu
 and it works great but we need to use redhet ultimately.

 Thanks
 Praveen




Re: image-id to specify a m1.large hardware in EC2

2011-04-13 Thread Tom White
Hi Patricio,

In the past I've used

whirr.hardware-id=m1.large
whirr.image-id=us-east-1/ami-da0cf8b3
whirr.location-id=us-east-1

Hope that helps.

Tom

2011/4/13 Patricio Echagüe patric...@gmail.com:
 Hi all, I need to create m1.large EC2 nodes in EC2 and was wondering if
 someone knows what image-id to specify in the properties file for whirr in
 order to do that.

 whirr.image-id=ami-2a1fec43  (apparently the default doesn't work for
 m1.large hardware)


 with the previously default ami, if I set whirr.hardware-id = m1.large it
 throws an exception saying that it is an incompatible type for the AMI.


 Currently the default is m1.small and when I run the benchmark TestDFSIO,
 all my datanodes are timing out.


 So my thought here is that perhaps a hardware type m1.large or bigger can
 help.


 Any help will be much appreciated


 Thanks


Re: Custom install and config functions when calling Whirr from code

2011-05-26 Thread Tom White
Hi John,

The functions directory itself needs to be on the classpath. You can
achieve this by including it in your application JAR (like the Whirr
service JARs do), or by adding it to the application classpath (like
the bin/whirr script does).

Hope that helps.

Cheers,
Tom

On Thu, May 26, 2011 at 3:29 PM, John Conwell j...@iamjohn.me wrote:
 I have customized the install and config functions for cassandra, and put
 these two files in the functions folder, and it works great from the cmd
 line whirr utility.
 But when launching a cluster via the Serice.launchCluster() method how do
 you specify a custom function file for either install or configuration?
 Thanks,
 John C



Re: How to use OtherAction?

2011-06-02 Thread Tom White
On Thu, Jun 2, 2011 at 3:06 PM, Andrei Savu savu.and...@gmail.com wrote:
 I understand. Tom should be able to tell us more about the intended
 usage scenario for OtherAction.

The other action call was just to cover the case if new events were
added and not explicitly exposed in ClusterActionHandlerSupport. It's
not currently used.


 I just want to add that in 0.5.0 we've added support for remote script
 execution to the CLI:
 https://issues.apache.org/jira/browse/WHIRR-173

 ... and I believe what you need is a similar mechanism available in
 the core API.

By adding a new EXECUTE_ACTION, and extending
ClusterActionHandlerSupport to expose before/afterExecute methods?

Tom


 -- Andrei Savu

 On Fri, Jun 3, 2011 at 12:59 AM, John Conwell j...@iamjohn.me wrote:
 Well, the reason I want to know is two fold.  First, I'm using the whirr
 core API to spin up and provision multiple clusters.  But on one of my
 clusters, I'd need to push out a shell script to execute after the configure
 action happens.  Basically, my code needs to generate a bunch of stuff,
 which it then uses to create the shell script dynamically, and I want to use
 the OtherAction to push that shell script out to the instances in the
 cluster to execute.
 Second, I just like to know how generic extension points work, because I
 like to come up with interesting ways to integrate APIs into what I'm
 working on

 Thanks,
 John
 On Thu, Jun 2, 2011 at 2:48 PM, Andrei Savu savu.and...@gmail.com wrote:

 What's you use case? Maybe there is another way.

 As far as I know no service is using OtherAction event.

 -- Andrei Savu / andreisavu.ro

 On Fri, Jun 3, 2011 at 12:46 AM, John Conwell j...@iamjohn.me wrote:
  In looking at the ClusterActionHandlerSupport, I notice the before/after
  OtherAction event.  Is this functional?  How can I trigger this event?
 
  --
 
  Thanks,
  John C
 



 --

 Thanks,
 John C




[ANNOUNCE] Apache Whirr 0.5.0-incubating released

2011-06-05 Thread Tom White
The Apache Whirr team is pleased to announce the release of Whirr
0.5.0-incubating from the Apache Incubator.

This is the fifth incubating release of Apache Whirr, a set of libraries for
running cloud services such as Apache Hadoop, HBase, ZooKeeper, and
Cassandra.

The release is available here:
http://www.apache.org/dyn/closer.cgi/incubator/whirr/

The full change log is available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12316248styleName=HtmlprojectId=1230

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://incubator.apache.org/whirr/

The Apache Whirr Team


Re: set up two cassandra clusters?

2011-06-07 Thread Tom White
Do the clusters have different names? Can you supply the stacktrace
you're getting from whirr.log.

Cheers,
Tom

On Tue, Jun 7, 2011 at 4:45 PM, Khanh Nguyen nguyen.h.kh...@gmail.com wrote:
 Hi,

 I want to launch another cassandra clusters on EC2 but I keep getting
 an exception like this

 Exception in thread main java.lang.IllegalStateException: The
 permission '209.6.54.22/32-1-9160-9160' has already been authorized on
 the specified group

 Essentially, I am trying to launch two clusters. One uses
 RandomPartitioner and another use ByteOrderedPartitioner. Thanks.

 Cheers,

 -k



Re: hadoop security and ssh proxy

2011-06-14 Thread Tom White
The proxy is not used for security (which would be better provided by
a firewall), but to make the datanode addresses resolve correctly for
the client. Without the proxy the datanodes return their internal
addresses which are not routable by the client (which runs in an
external network typically).

I agree that it would be better if we could replace the proxy with
something better, such as
https://issues.apache.org/jira/browse/WHIRR-81.

On Tue, Jun 14, 2011 at 9:26 AM, John Conwell j...@iamjohn.me wrote:
 I get the whole security is a good thing thing, but could someone give me
 a description as to why when whirr configures hadoop it sets up the ssh
 proxy to disallow all coms to the data / task nodes except via the name node
 over the proxy?  If I'm running on EC2, wont correctly setting up security
 groups give me enough security?
 The reason I ask is that I'm using Whirr through its API to
 automate...well...all the cool things whirr does.  But they key point is
 automation.  After a hadoop cluster is up and running I'd like the program
 to kick off a hadoop job, monitor jobs and tasks.  But that means my program
 has to launch hadoop-proxy.sh somehow, capture the PID of the process, kick
 off my hadoop job, then when done, kill the process via the PID.  The whole
 calling a shell script, capturing the PID, persisting it, and killing it all
 through my java automation just seems a bit duct-tape and bailing-wire'ish.

You can run the proxy from Java via HadoopProxy, which handles all
these details for you.


 So I'm trying to figure out why we have the whole hadoop-proxy.sh thing in
 the first place (specifically within the context of EC2)

 --

 Thanks,
 John C


Cheers,
Tom


Re: Is Service.launchCluster thread safe?

2011-06-14 Thread Tom White
On Tue, Jun 14, 2011 at 3:46 PM, John Conwell j...@iamjohn.me wrote:
 So as an FYI, I just tested using the whirr API to start multiple clusters
 at the same time using Futures, and it (seems to) works great.  really cuts
 down on the time to ramp up a set of clusters (like 4 or more).  Yay

Great. This would be a nice thing to put on the wiki or in the docs as
a usage example. Would you like to do this?

Thanks,
Tom


 On Fri, Jun 10, 2011 at 11:38 AM, Andrei Savu savu.and...@gmail.com wrote:

 John,

 I don't think we've checked this. Could you open an issue? We should at
 least update the docs.

 To be safe you should create multiple instances, one for each thread. I
 don't think you need to worry about the amount of memory used.

 In 0.5.0 we've done some performance improvements that are relevant to
 you.

 Sent from my phone.

 Cheers,

 -- Andrei

 On Jun 10, 2011 8:38 PM, John Conwell j...@iamjohn.me wrote:
  In 0.4.0, is Service.launchCluster(ClusterSpec) thread safe? Through the
  API I'm spinning up 3 clusters, one after another, and I'd like to
  change
  the code to launch them in parallel, but wanted to check if this is
  threadsafe.
 
  --
 
  Thanks,
  John C



 --

 Thanks,
 John C



Re: hadoop security and ssh proxy

2011-06-15 Thread Tom White
On Wed, Jun 15, 2011 at 10:18 AM, John Conwell j...@iamjohn.me wrote:
 Ok, that makes sense.  Thanks for the clarification.  It
 is definitely unwieldy when trying to integrate whirr's API into another API
 to wrap spinning up hadoop clusters, and getting it to work without any
 manual steps.

Agreed, but it is possible - see the Hadoop integration tests which
are an example of spinning up a Hadoop cluster from Java in a
completely automated fashion.

Tom



 On Tue, Jun 14, 2011 at 5:13 PM, Tom White tom.e.wh...@gmail.com wrote:

 The proxy is not used for security (which would be better provided by
 a firewall), but to make the datanode addresses resolve correctly for
 the client. Without the proxy the datanodes return their internal
 addresses which are not routable by the client (which runs in an
 external network typically).

 I agree that it would be better if we could replace the proxy with
 something better, such as
 https://issues.apache.org/jira/browse/WHIRR-81.

 On Tue, Jun 14, 2011 at 9:26 AM, John Conwell j...@iamjohn.me wrote:
  I get the whole security is a good thing thing, but could someone give
  me
  a description as to why when whirr configures hadoop it sets up the ssh
  proxy to disallow all coms to the data / task nodes except via the name
  node
  over the proxy?  If I'm running on EC2, wont correctly setting up
  security
  groups give me enough security?
  The reason I ask is that I'm using Whirr through its API to
  automate...well...all the cool things whirr does.  But they key point is
  automation.  After a hadoop cluster is up and running I'd like the
  program
  to kick off a hadoop job, monitor jobs and tasks.  But that means my
  program
  has to launch hadoop-proxy.sh somehow, capture the PID of the process,
  kick
  off my hadoop job, then when done, kill the process via the PID.  The
  whole
  calling a shell script, capturing the PID, persisting it, and killing it
  all
  through my java automation just seems a bit duct-tape and
  bailing-wire'ish.

 You can run the proxy from Java via HadoopProxy, which handles all
 these details for you.

 
  So I'm trying to figure out why we have the whole hadoop-proxy.sh thing
  in
  the first place (specifically within the context of EC2)
 
  --
 
  Thanks,
  John C
 

 Cheers,
 Tom



 --

 Thanks,
 John C



Re: Is Service.launchCluster thread safe?

2011-06-15 Thread Tom White
There's a page at src/site/xdoc/api-guide.xml, which might be a good
place for this.

Thanks!
Tom

On Wed, Jun 15, 2011 at 10:34 AM, John Conwell j...@iamjohn.me wrote:
 Yea, I can do that.  What section would you want it in?  Maybe a new section
 on using whirr via its API?

 On Tue, Jun 14, 2011 at 5:20 PM, Tom White tom.e.wh...@gmail.com wrote:

 On Tue, Jun 14, 2011 at 3:46 PM, John Conwell j...@iamjohn.me wrote:
  So as an FYI, I just tested using the whirr API to start multiple
  clusters
  at the same time using Futures, and it (seems to) works great.  really
  cuts
  down on the time to ramp up a set of clusters (like 4 or more).  Yay

 Great. This would be a nice thing to put on the wiki or in the docs as
 a usage example. Would you like to do this?

 Thanks,
 Tom

 
  On Fri, Jun 10, 2011 at 11:38 AM, Andrei Savu savu.and...@gmail.com
  wrote:
 
  John,
 
  I don't think we've checked this. Could you open an issue? We should at
  least update the docs.
 
  To be safe you should create multiple instances, one for each thread. I
  don't think you need to worry about the amount of memory used.
 
  In 0.5.0 we've done some performance improvements that are relevant to
  you.
 
  Sent from my phone.
 
  Cheers,
 
  -- Andrei
 
  On Jun 10, 2011 8:38 PM, John Conwell j...@iamjohn.me wrote:
   In 0.4.0, is Service.launchCluster(ClusterSpec) thread safe? Through
   the
   API I'm spinning up 3 clusters, one after another, and I'd like to
   change
   the code to launch them in parallel, but wanted to check if this is
   threadsafe.
  
   --
  
   Thanks,
   John C
 
 
 
  --
 
  Thanks,
  John C
 



 --

 Thanks,
 John C



Re: Execute a cmd via jclouds SshClient that requires sudo privs on VM started by Whirr

2011-06-16 Thread Tom White
You could write your own predicate that does a cast. See
ClusterController.runningInGroup() for something similar.

Cheers,
Tom

On Thu, Jun 16, 2011 at 9:47 AM, John Conwell j...@iamjohn.me wrote:
 Pulled the code from RunScriptCommand as an example, and I think I'm good in
 that respect.
 I'm having issues with runScriptOnNodesMatching() when I want to only run
 the script on 1 node in the group.  I have the node Id of the target node,
 but not sure how to create a predicate that just targets one node based on
 the Id.  NodePredicates.withIds() returns a PredicateComputeMetadata,
 but runScriptOnNodesMatching takes a PredicateNodeMetadata.
 The jclouds wiki states Individual commands are executed against a specific
 node's id, but that doesnt really explain how to do this.
 On Wed, Jun 15, 2011 at 1:21 PM, Andrei Savu savu.and...@gmail.com wrote:

 Take a look at the RunScriptCommand class.

 You need to call the function somehow like this:

 StatementBuilder builder = new StatementBuilder();
 builder.addStatements(
  Statements.appendFile(/tmp/my.cfg, lines),
  exec(getFileContent(scriptPath))
 );
 controller.runScriptOnNodesMatching(
  spec, condition, builder)

 -- Andrei

 On Wed, Jun 15, 2011 at 8:38 PM, John Conwell j...@iamjohn.me wrote:
  I looked at computeService.runScriptOnNodesMatching a while ago and
  couldnt
  make much sense on how to use the API correctly (for uploading a script
  and
  running it).  Is there a good unit tests that shows how to do this?
 
  On Tue, Jun 14, 2011 at 2:03 AM, Andrei Savu savu.and...@gmail.com
  wrote:
 
  You could also try to use computeService.runScriptOnNodesMatching and
  upload the file using an AppendFile jclouds statement together with
  the credentials from the cluster spec file.
 
  This approach is similar to what RunScriptCommand is doing.
 
  -- Andrei Savu / andreisavu.ro
 
  On Mon, Jun 13, 2011 at 11:16 PM, John Conwell j...@iamjohn.me wrote:
   the AMI is us-east-1/ami-da0cf8b3
   and OS is the default that whirr installs, Ubuntu 10.4 my thinks.
  
   On Mon, Jun 13, 2011 at 1:13 PM, Andrei Savu savu.and...@gmail.com
   wrote:
  
   That should work. What ami, OS are you using?
  
   On Jun 13, 2011 10:22 PM, John Conwell j...@iamjohn.me wrote:
which user. Whirr creates a user called ubuntu, and it also
creates
a
user that is specified via whirr.cluster-user. I tried the user
that
is
specified via whirr.cluster-user and that didnt work
   
On Mon, Jun 13, 2011 at 12:16 PM, Andrei Savu
savu.and...@gmail.com
wrote:
   
The user created by Whirr should be able to do sudo without
requesting
a
password.
On Jun 13, 2011 8:55 PM, John Conwell j...@iamjohn.me wrote:
 I've got a cluster that gets started by Whirr. After its
 running,
 I
 need
to
 create a config file and copy it up to a specific folder on the
 VM.
 I'm
 using the class.method SshClient.put() to copy the file up to
 the
 /tmp
 directory on my VM with no security issues. But then I need to
 copy
 the
 file to a different folder, via SshClient.exec method. But the
 cp
 command
 requires sudo because the user whirr created for me doesn't
 have
 privs to
 copy to the required folder. Also, I cant specify a password
 with
 the
sudo
 command because the connection was made using x509
 certificates.

 So how can I execute remote ssh commands that require sudo
 privs?

 --

 Thanks,
 John C
   
   
   
   
--
   
Thanks,
John C
  
  
  
   --
  
   Thanks,
   John C
  
 
 
 
  --
 
  Thanks,
  John C
 



 --

 Thanks,
 John C



Re: Run commands via whirr

2011-08-24 Thread Tom White
Looks great - I've often wanted something like this. I think adding
whirr run-cmd would be the way to add this since then it's
integrated with the whirr command.

Thanks,
Tom

On Wed, Aug 24, 2011 at 2:06 AM, Karel Vervaeke ka...@outerthought.org wrote:
 I'd be happy to do the jira dance.

 On Tue, Aug 23, 2011 at 6:35 PM, Andrei Savu savu.and...@gmail.com wrote:
 Looks good! How about adding this as a script in bin/ e.g. bin/whirr-cmd?

 -- Andrei

 On Tue, Aug 23, 2011 at 8:50 AM, Karel Vervaeke ka...@outerthought.org 
 wrote:
 I got bored with writing little scripts when using whirr run-script so
 hacked this up. Maybe it's useful to someone.

 Usage examples:
 whirrcmd --cluster=recipes/mycluster.properties sudo /usr/bin/jps status
 whirrcmd --cluster=recipes/mycluster.properties --roles=hbase-master
 sudo /usr/bin/jps status

 Perhaps it be better to add it to whirr (e.g. whirr run-script
 --command= or whirr run-cmd ...),
 but I don't know if it's useful enough.

 Here's the ugly bit:

 whirrcmd() {
  local whirr_args
  tmpfile=$(mktemp --suffix .sh)

  whirr_args=(${@:1:$#-1})
  cmd_arg=${@:$#}

  cat  $tmpfile EOF
 #!/bin/bash

 $cmd_arg
 EOF

  whirr run-script ${whirr_args[@]} --script=$tmpfile
  rm $tmpfile
 }



 --
 Karel Vervaeke
 http://outerthought.org/
 Open Source Content Applications
 Makers of Kauri, Daisy CMS and Lily





 --
 Karel Vervaeke
 http://outerthought.org/
 Open Source Content Applications
 Makers of Kauri, Daisy CMS and Lily