Re: [galaxy-dev] Galaxy Cloudman - How to analyse 1TB data ?

2012-02-21 Thread Wetzels, Yves [JRDBE Extern]
Hi Enis

 

1.   I first created the LVM on 2 new volumes.

2.   Mounted the LVM

3.   I stopped all services (SGE, PostgreSQL, Galaxy).

4.   Copied all data on filesystem /mnt/galaxyData to the LVM

5.   Umounted /mnt/galaxyData

6.   Mounted LVM to /mnt/galaxyData

7.   Started all services

 

As I mentioned in my previous posts all seems to be ok but I received a 

 

WARNING:galaxy.datatypes.registry:Overriding conflicting datatype with
extension 'coverage', using datatype from /mnt/galaxyData/tmp/tmpGx9fsi.

 

while running the Groom tool.

I didn`t know what to do at that time and started messing around
removing tmp files, restarting SGE etc.

I later received the same error on a newly created Galaxy Cloudman
instance with normal (1TB size) galaxyData filesystem.

Greg Von Kuster replied to me I had to remove a duplicate value in the
datatypes.conf.xml file.

 

Hello Yves,

 

You have one or more entries in your datatypes_conf.xml file for a
datatype named coverage  These should be eliminated from your
datatypes.conf.xml file because they are not valid datatypes (unless you
have added proprietary datatypes to your Galaxy instance with this
extension).  They were originally in the datatypes.conf.xml.sample file
for datatype indexers\, but datatype indexers have been eliminated from
the Galaxy framework because datatype converters do the same thing.

 

Greg Von Kuster

 

Currently I am running multiple Galaxy Cloudman instances to circumvent
the 1 TB limit.

If I find some time I will redo the exercise with the LVM.

 

Kind Regards
Yves

 

From: Enis Afgan [mailto:eaf...@emory.edu] 
Sent: Sunday, 19 February 2012 23:50
To: Wetzels, Yves [JRDBE Extern]
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Galaxy Cloudman - How to analyse  1TB data ?

 

Hi Yves, 

When you create the LVM file system - are you composing it from the
volume that already contains the data (i.e., directory structure++
created by CloudMan) and then adding another volume into the LVM or
starting with 2 new, clean volumes? 

Maybe trying again and not messing with SGE at all would at least
resolve the SGE issue. Namely, SGE is on the root file system so it
should be fine as is. I'd suggest stopping Galaxy and PostgreSQL
services (from the CloudMan Admin), from the CLI, unmount galaxyData
file system and proceed to create the LVM. Mount the file system and
ensure the directories and the data that were there are still there. The
start back PostgreSQL and Galaxy services. See if it all comes up fine
and try adding a worker node if it does. 

 

Currently, CloudMan does not support composition of a file system from
multiple volumes but I would think that as long as you did not restart
the cluster and created the file system manually, things would work
fine. I've been thinking about why you're seeing the described behavior
and am not really sure so please let me know how the above process works
out.

 

 

 

On Thu, Feb 16, 2012 at 7:37 PM, Wetzels, Yves [JRDBE Extern]
ywet...@its.jnj.com wrote:

Hi Brad

I did not restart the master CloudMan node.
I only restarted the services (Galaxy, PostgreSQL and SGE).
I do not have these problems without  creating the logical volume.

Kind Regards
Yves



Yves;
I'm hoping Enis can jump in here since he is more familiar with the
internals of CloudMan and may be able to offer better advice. I can tell
you what I see from your error messages.

 I used LVM2 to create the logical volume.

Does this involve stopping and restarting the master CloudMan node? The
error messages you are seeing look like SGE is missing or not properly
configured on the master node:

 02/15/2012 11:22:08|  main|domU-12-31-39-0A-62-12|E|error opening file
 /opt/sge/default/common/./sched_configuration for reading: No such
 file or directory
[...]
 DeniedByDrmException: code 17: error: no suitable queues

which is causing the job submission to fail since it can't find the SGE
cluster environment to submit to. The strange thing is that SGE is
present in /opt on the main EBS store, so I wouldn't expect your
modified /mnt/galaxyData volume to influence this.

Since starting worker nodes appears to be fine, I'd focus on the main
instance manipulations you are doing. Perhaps some of the setup causes
the problem without creating the logical volume? This could help narrow
down the issue and hopefully get you running again.

Hope this helps,
Brad


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

 

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Cloudman - How to analyse 1TB data ?

2012-02-19 Thread Enis Afgan
Hi Yves,
When you create the LVM file system - are you composing it from the volume
that already contains the data (i.e., directory structure++ created by
CloudMan) and then adding another volume into the LVM or starting with 2
new, clean volumes?
Maybe trying again and not messing with SGE at all would at least resolve
the SGE issue. Namely, SGE is on the root file system so it should be fine
as is. I'd suggest stopping Galaxy and PostgreSQL services (from the
CloudMan Admin), from the CLI, unmount galaxyData file system and proceed
to create the LVM. Mount the file system and ensure the directories and the
data that were there are still there. The start back PostgreSQL and Galaxy
services. See if it all comes up fine and try adding a worker node if it
does.

Currently, CloudMan does not support composition of a file system from
multiple volumes but I would think that as long as you did not restart the
cluster and created the file system manually, things would work fine. I've
been thinking about why you're seeing the described behavior and am not
really sure so please let me know how the above process works out.



On Thu, Feb 16, 2012 at 7:37 PM, Wetzels, Yves [JRDBE Extern] 
ywet...@its.jnj.com wrote:

 Hi Brad

 I did not restart the master CloudMan node.
 I only restarted the services (Galaxy, PostgreSQL and SGE).
 I do not have these problems without  creating the logical volume.

 Kind Regards
 Yves


 Yves;
 I'm hoping Enis can jump in here since he is more familiar with the
 internals of CloudMan and may be able to offer better advice. I can tell
 you what I see from your error messages.

  I used LVM2 to create the logical volume.

 Does this involve stopping and restarting the master CloudMan node? The
 error messages you are seeing look like SGE is missing or not properly
 configured on the master node:

  02/15/2012 11:22:08|  main|domU-12-31-39-0A-62-12|E|error opening file
  /opt/sge/default/common/./sched_configuration for reading: No such
  file or directory
 [...]
  DeniedByDrmException: code 17: error: no suitable queues

 which is causing the job submission to fail since it can't find the SGE
 cluster environment to submit to. The strange thing is that SGE is
 present in /opt on the main EBS store, so I wouldn't expect your
 modified /mnt/galaxyData volume to influence this.

 Since starting worker nodes appears to be fine, I'd focus on the main
 instance manipulations you are doing. Perhaps some of the setup causes
 the problem without creating the logical volume? This could help narrow
 down the issue and hopefully get you running again.

 Hope this helps,
 Brad


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Cloudman - How to analyse 1TB data ?

2012-02-14 Thread Brad Chapman

Yves;

 I am currently investigating if Galaxy Cloudman can help us in analyzing
 large NGS datasets.
 
 I was first impressed by the simple setup, the autoscaling and
 useability of Galaxy Cloudman but soon ran into the EBS limit of 1 TB L
 
 I thought to be clever and umounted the /mnt/galaxyData EBS volume,
 created a logical volume of 2 TB and remounted this volume to
 /mnt/galaxyData.

How did you create this volume? I know there are some tricks to get
around the 1Tb limit:

http://alestic.com/2009/06/ec2-ebs-raid

In the screenshot you sent it looks like Cloudman is a bit confused
about the disk size. The Disk Status lists 1.2Tb out of 668Gb, which
might be the source of your problems.

 All is green as you can see from the picture below but running a tool is
 not possible since Galaxy is not configured to work with logical volume
 I assume.

Can you describe what errors you are seeing?

 It is truly a waste having this fine setup (autoscaling) but this is not
 useable if there is not enough storage ?
 
 Does anybody has experience with this ? Tips, tricks ...

The more general answer is that folks do not normally use EBS this way
since having large permanent EBS filesystems is expensive. S3 stores larger
data, up to 50Tb, at a more reasonable price. S3 files are then copied to a
transient EBS store, processed, and uploaded back to S3. This isn't as
automated since it will be highly dependent on your workflow and what
files you want to save, but might be worth exploring in general when
using EC2.

Hope this helps,
Brad
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/