Hi guys, I'm fairly new to cloud computing and about 2 days in to using
cloudman for galaxy...

I have setup a m2.4xlarge master node and flexible load for up to 4 workers
of the m2.xlarge type, minimum one.
I was able to upload 6 samples of paired end RNAseq->12 files, gz file
sizes around 5-8 gb.
Grooming files took about a day, but my previous experience was on an
in-house Galaxy install which was pretty small so I didn't think of
anything at the time.
I started 3 Tophat jobs and I noticed the UI being a bit sluggish to
respond, added a 4th one hoping it might push it over the edge for the
worker nodes.

Unfortunately, 12 hours later, the Tophats are still running, the master
node is way over 100%, and the worker is reported idle.
While the cluster log has a few errors in it, it ends saying that the
instance for the worker node is ready, despite it encountering errors
adding it to the SGE host, code1.

Has anyone run into this before or has any insight to fixing this? I'm
pasting the lower portion of the status log below.

Thanks very much for any help! (Also, several other emails about cloud
installs were directed to /dev. If that's not the right place for this
question, I apologize and can change to the /user list.)
-- 
Brian Lin
cont...@brian-lin.com
brian....@tufts.edu





13:15:38 - Instance 'i-e2bb0491' reported alive
13:15:38 - Sent master public key to worker instance 'i-e2bb0491'.
13:15:54 - Adding instance i-e2bb0491 as SGE administrative host.
13:16:06 - Adding instance 'i-e2bb0491' to SGE execution host list.
13:16:11 - Process encountered problems adding instance 'i-e2bb0491' as an
SGE execution host. Process returned code 1
13:16:26 - Waiting on worker instance 'i-e2bb0491' to configure itself...
13:17:05 - Instance 'i-e2bb0491' (IP: 23.23.24.81) ready
13:42:23 - Rebooting instance i-e2bb0491 (reboot #3).
13:44:27 - Instance 'i-e2bb0491' reported alive
13:44:27 - Sent master public key to worker instance 'i-e2bb0491'.
13:44:56 - Adding instance i-e2bb0491 as SGE administrative host.
13:44:56 - Adding instance 'i-e2bb0491' to SGE execution host list.
13:44:56 - Process encountered problems adding instance 'i-e2bb0491' as an
SGE execution host. Process returned code 1
13:44:56 - Waiting on worker instance 'i-e2bb0491' to configure itself...
13:46:09 - Instance 'i-e2bb0491' (IP: 23.23.24.81) ready
15:19:37 - Rebooting instance i-e2bb0491 (reboot #4).
15:21:16 - Instance 'i-e2bb0491' reported alive
15:21:16 - Sent master public key to worker instance 'i-e2bb0491'.
15:21:38 - Adding instance i-e2bb0491 as SGE administrative host.
15:21:44 - Adding instance 'i-e2bb0491' to SGE execution host list.
15:21:48 - Process encountered problems adding instance 'i-e2bb0491' as an
SGE execution host. Process returned code 1
15:21:51 - Waiting on worker instance 'i-e2bb0491' to configure itself...
15:22:13 - Instance 'i-e2bb0491' (IP: 23.23.24.81) ready
15:54:32 - Instance i-e2bb0491 not responding after 4 reboots. Terminating
instance.
15:54:32 - Terminating instance i-e2bb0491
15:54:35 - Instance 'i-e2bb0491' removed from the internal instance list.
15:56:10 - Adding 1 on-demand instance(s)
15:56:14 - Cannot get cloud instance object without an instance ID?
15:58:26 - Instance 'i-98932deb' reported alive
15:58:26 - Sent master public key to worker instance 'i-98932deb'.
15:59:02 - Adding instance i-98932deb as SGE administrative host.
15:59:09 - Adding instance 'i-98932deb' to SGE execution host list.
15:59:17 - Successfully added instance 'i-98932deb' to SGE
15:59:17 - Waiting on worker instance 'i-98932deb' to configure itself...
15:59:46 - Instance 'i-98932deb' (IP: 54.234.100.22) ready
16:12:47 - Rebooting instance i-98932deb (reboot #1).
16:12:47 - Rebooting instance i-98932deb (reboot #1).
16:12:47 - Rebooting instance i-98932deb (reboot #1).
16:14:06 - Instance 'i-98932deb' reported alive
16:14:06 - Sent master public key to worker instance 'i-98932deb'.
16:14:29 - Adding instance i-98932deb as SGE administrative host.
16:14:29 - Adding instance 'i-98932deb' to SGE execution host list.
16:14:29 - Process encountered problems adding instance 'i-98932deb' as an
SGE execution host. Process returned code 1
16:14:29 - Waiting on worker instance 'i-98932deb' to configure itself...
16:14:39 - Instance 'i-98932deb' (IP: 54.234.100.22) ready
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to