Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-27 Thread Greg Edwards
Enis,
Thanks, your instructions below worked ok and I have reduced my 700GB to 1
GB.
No doubt a pile of genome tools don't work now but I don't need them.
I see this is described in the Wiki as well, but it was more concise below,
thanks.
cp -r was fine for copying the remnants of /mnt/galaxyIndices.
Cheers,
Greg E



 2. The vast majority of the storage costs are fro the Gemome databases in
 the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
 the bare essentials ?


 You can do this manually:
 1. Start a new Galaxy cluster (ie, one you can easily delete later)
 2. ssh into the master instance and delete whatever genomes you don't
 need/want (these are all located under /mnt/galaxyIndices)
 3. Create a new EBS volume of size that'll fit whatever's left on the
 original volume, attach it and mount it
 4. Copy over the data from the original volume to the new one while
 keeping the directory structure the same (rsync is probably the best tool
 for this)
 5. Unmount  detach the new volume; create a snapshot from it
 6. For the cluster you want to keep around (while it is terminated), edit
 persistent_data.yaml in it's bucket on S3 and replace the existing snap ID
 for the galaxyIndices with the snapshot ID you got in the previous step
 7. Start that cluster and you should have a file system from the new
 snapshot mounted.
 8. Terminate  delete the cluster you created in step 1

 If you don't want to have to do this the first time around on your custom
 cluster, you can first try it with another temporary cluster and make sure
 it all works as expected and then move on to the real cluster.

 Best,
 Enis




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-27 Thread Enis Afgan
Awesome! Also glad to hear your rate of success can be quantified as a 700%
improvement. Amazing! :)

Best,
Enis

On Wed, Mar 28, 2012 at 2:57 PM, Greg Edwards gedwar...@gmail.com wrote:

 Enis,
 Thanks, your instructions below worked ok and I have reduced my 700GB to 1
 GB.
 No doubt a pile of genome tools don't work now but I don't need them.
 I see this is described in the Wiki as well, but it was more concise
 below, thanks.
 cp -r was fine for copying the remnants of /mnt/galaxyIndices.
 Cheers,
 Greg E



 2. The vast majority of the storage costs are fro the Gemome databases in
 the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
 the bare essentials ?


 You can do this manually:
 1. Start a new Galaxy cluster (ie, one you can easily delete later)
 2. ssh into the master instance and delete whatever genomes you don't
 need/want (these are all located under /mnt/galaxyIndices)
 3. Create a new EBS volume of size that'll fit whatever's left on the
 original volume, attach it and mount it
 4. Copy over the data from the original volume to the new one while
 keeping the directory structure the same (rsync is probably the best tool
 for this)
 5. Unmount  detach the new volume; create a snapshot from it
 6. For the cluster you want to keep around (while it is terminated), edit
 persistent_data.yaml in it's bucket on S3 and replace the existing snap ID
 for the galaxyIndices with the snapshot ID you got in the previous step
 7. Start that cluster and you should have a file system from the new
 snapshot mounted.
 8. Terminate  delete the cluster you created in step 1

 If you don't want to have to do this the first time around on your custom
 cluster, you can first try it with another temporary cluster and make sure
 it all works as expected and then move on to the real cluster.

 Best,
 Enis






___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-19 Thread Enis Afgan
Greg,
Regarding the performance of different types of instances, I came across
this and thought you might potentially find it useful:
http://cloudharmony.com/benchmarks

Enis

On Mon, Mar 19, 2012 at 7:49 PM, Greg Edwards gedwar...@gmail.com wrote:

 Enis,

 Thanks. Will try that re the storage.

 Greg E


 On Mon, Mar 19, 2012 at 4:49 PM, Enis Afgan eaf...@emory.edu wrote:

 Hi Greg,

 On Mon, Mar 19, 2012 at 11:01 AM, Greg Edwards gedwar...@gmail.comwrote:

 Hi,

 I've got an implementation of some proteomics tools going well in Galaxy
 on AWS EC2 under Cloudman. Thanks for the help along the way.

 I need to drive the costs down a bit. I'm using an m1.large AMI and it's
 costing about $180 - $200 / month. This is about 55% storage and 45%
 instance costs. That's peanuts in some senses but for now we need to get it
 down so that it comes out of petty cash for the department, while the case
 is proven for it's use.

 I have a few questions and would appreciate ny insights ..


 1. AWS has just released an m1.medium and m1.small instance type, which
 are 1/2 and 1/4 the cost of m1.large.

 http://aws.amazon.com/ec2/instance-types/
 http://aws.amazon.com/ec2/pricing/

 I tried the m1.small and m1.medium with the latest Cloudman AMI *  
 *galaxy-cloudman-2011-03-22
 (ami-da58aab3)
 All seemed to install ok, but the Tools took up tp 30 minutes to start
 execution on m1.medium, and never started on m1.small.

 m1.medium only added about 15% to run times compared with m1.large,
 can't say for m1.small. t1.micro does run (and for free in my Free Tier
 first year) but blows execution times out by a factor of about 3 which is
 too much.

 Has anyone tried these new Instance Types ? (m1.small/medium)

 I have no real experience with these instance types yet either so maybe
 someone else can chime in on this?



 2. The vast majority of the storage costs are fro the Gemome databases
 in the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
 the bare essentials ?


 You can do this manually:
 1. Start a new Galaxy cluster (ie, one you can easily delete later)
 2. ssh into the master instance and delete whatever genomes you don't
 need/want (these are all located under /mnt/galaxyIndices)
 3. Create a new EBS volume of size that'll fit whatever's left on the
 original volume, attach it and mount it
 4. Copy over the data from the original volume to the new one while
 keeping the directory structure the same (rsync is probably the best tool
 for this)
 5. Unmount  detach the new volume; create a snapshot from it
 6. For the cluster you want to keep around (while it is terminated), edit
 persistent_data.yaml in it's bucket on S3 and replace the existing snap ID
 for the galaxyIndices with the snapshot ID you got in the previous step
 7. Start that cluster and you should have a file system from the new
 snapshot mounted.
 8. Terminate  delete the cluster you created in step 1

 If you don't want to have to do this the first time around on your custom
 cluster, you can first try it with another temporary cluster and make sure
 it all works as expected and then move on to the real cluster.

 Best,
 Enis


 Using m1.small/medium and getting rid of the 700GB would being my costs
 down to say $50 / month which is ok.


 Thanks !
 Greg E


 --
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/





 --
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-19 Thread Dave Clements
Hi Enis, Greg,

I've taken stuff from my this email, and previous conversations with Enis
and put it in the wiki:

  http://wiki.g2.bx.psu.edu/Admin/Cloud/CapacityPlanning

Please feel free to update/correct/enhance.

Dave C.

On Mon, Mar 19, 2012 at 2:58 PM, Enis Afgan eaf...@emory.edu wrote:

 Greg,
 Regarding the performance of different types of instances, I came across
 this and thought you might potentially find it useful:
 http://cloudharmony.com/benchmarks

 Enis

 On Mon, Mar 19, 2012 at 7:49 PM, Greg Edwards gedwar...@gmail.com wrote:

 Enis,

 Thanks. Will try that re the storage.

 Greg E


 On Mon, Mar 19, 2012 at 4:49 PM, Enis Afgan eaf...@emory.edu wrote:

 Hi Greg,

 On Mon, Mar 19, 2012 at 11:01 AM, Greg Edwards gedwar...@gmail.comwrote:

 Hi,

 I've got an implementation of some proteomics tools going well in
 Galaxy on AWS EC2 under Cloudman. Thanks for the help along the way.

 I need to drive the costs down a bit. I'm using an m1.large AMI and
 it's costing about $180 - $200 / month. This is about 55% storage and 45%
 instance costs. That's peanuts in some senses but for now we need to get it
 down so that it comes out of petty cash for the department, while the case
 is proven for it's use.

 I have a few questions and would appreciate ny insights ..


 1. AWS has just released an m1.medium and m1.small instance type, which
 are 1/2 and 1/4 the cost of m1.large.

 http://aws.amazon.com/ec2/instance-types/
 http://aws.amazon.com/ec2/pricing/

 I tried the m1.small and m1.medium with the latest Cloudman AMI *  
 *galaxy-cloudman-2011-03-22
 (ami-da58aab3)
 All seemed to install ok, but the Tools took up tp 30 minutes to start
 execution on m1.medium, and never started on m1.small.

 m1.medium only added about 15% to run times compared with m1.large,
 can't say for m1.small. t1.micro does run (and for free in my Free Tier
 first year) but blows execution times out by a factor of about 3 which is
 too much.

 Has anyone tried these new Instance Types ? (m1.small/medium)

 I have no real experience with these instance types yet either so maybe
 someone else can chime in on this?



 2. The vast majority of the storage costs are fro the Gemome databases
 in the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
 the bare essentials ?


 You can do this manually:
 1. Start a new Galaxy cluster (ie, one you can easily delete later)
 2. ssh into the master instance and delete whatever genomes you don't
 need/want (these are all located under /mnt/galaxyIndices)
 3. Create a new EBS volume of size that'll fit whatever's left on the
 original volume, attach it and mount it
 4. Copy over the data from the original volume to the new one while
 keeping the directory structure the same (rsync is probably the best tool
 for this)
 5. Unmount  detach the new volume; create a snapshot from it
 6. For the cluster you want to keep around (while it is terminated),
 edit persistent_data.yaml in it's bucket on S3 and replace the existing
 snap ID for the galaxyIndices with the snapshot ID you got in the previous
 step
 7. Start that cluster and you should have a file system from the new
 snapshot mounted.
 8. Terminate  delete the cluster you created in step 1

 If you don't want to have to do this the first time around on your
 custom cluster, you can first try it with another temporary cluster and
 make sure it all works as expected and then move on to the real cluster.

 Best,
 Enis


 Using m1.small/medium and getting rid of the 700GB would being my costs
 down to say $50 / month which is ok.


 Thanks !
 Greg E


 --
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/





 --
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/




-- 
http://galaxyproject.org/GCC2012 http://galaxyproject.org/wiki/GCC2012
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-19 Thread Dannon Baker
Just one extra thought on this-- If you leave your instance up all the time it 
may be worth looking into having a reserved micro instance up as the front end 
(cheap, or free, with your intro tier) with SGE submission disabled.  Then, 
enable autoscaling(max 1) of m1.large/xlarge instances.

-Dannon


On Mar 19, 2012, at 7:20 PM, Dave Clements wrote:

 Hi Enis, Greg,
 
 I've taken stuff from my this email, and previous conversations with Enis and 
 put it in the wiki:
 
   http://wiki.g2.bx.psu.edu/Admin/Cloud/CapacityPlanning
 
 Please feel free to update/correct/enhance.
 
 Dave C.
 
 On Mon, Mar 19, 2012 at 2:58 PM, Enis Afgan eaf...@emory.edu wrote:
 Greg,
 Regarding the performance of different types of instances, I came across this 
 and thought you might potentially find it useful: 
 http://cloudharmony.com/benchmarks
 
 Enis
 
 On Mon, Mar 19, 2012 at 7:49 PM, Greg Edwards gedwar...@gmail.com wrote:
 Enis,
 
 Thanks. Will try that re the storage.
 
 Greg E
 
 
 On Mon, Mar 19, 2012 at 4:49 PM, Enis Afgan eaf...@emory.edu wrote:
 Hi Greg,
 
 On Mon, Mar 19, 2012 at 11:01 AM, Greg Edwards gedwar...@gmail.com wrote:
 Hi,
 
 I've got an implementation of some proteomics tools going well in Galaxy on 
 AWS EC2 under Cloudman. Thanks for the help along the way.
 
 I need to drive the costs down a bit. I'm using an m1.large AMI and it's 
 costing about $180 - $200 / month. This is about 55% storage and 45% instance 
 costs. That's peanuts in some senses but for now we need to get it down so 
 that it comes out of petty cash for the department, while the case is proven 
 for it's use.
 
 I have a few questions and would appreciate ny insights ..
 
 
 1. AWS has just released an m1.medium and m1.small instance type, which are 
 1/2 and 1/4 the cost of m1.large.   
 
 http://aws.amazon.com/ec2/instance-types/ 
 http://aws.amazon.com/ec2/pricing/
 
 I tried the m1.small and m1.medium with the latest Cloudman AMI   
 galaxy-cloudman-2011-03-22 (ami-da58aab3)
 All seemed to install ok, but the Tools took up tp 30 minutes to start 
 execution on m1.medium, and never started on m1.small.
 
 m1.medium only added about 15% to run times compared with m1.large, can't say 
 for m1.small. t1.micro does run (and for free in my Free Tier first year) but 
 blows execution times out by a factor of about 3 which is too much.
 
 Has anyone tried these new Instance Types ? (m1.small/medium)
 I have no real experience with these instance types yet either so maybe 
 someone else can chime in on this?  
 
 
 2. The vast majority of the storage costs are fro the Gemome databases in the 
 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to the bare 
 essentials ?
 
 You can do this manually: 
 1. Start a new Galaxy cluster (ie, one you can easily delete later)
 2. ssh into the master instance and delete whatever genomes you don't 
 need/want (these are all located under /mnt/galaxyIndices)
 3. Create a new EBS volume of size that'll fit whatever's left on the 
 original volume, attach it and mount it
 4. Copy over the data from the original volume to the new one while keeping 
 the directory structure the same (rsync is probably the best tool for this)
 5. Unmount  detach the new volume; create a snapshot from it
 6. For the cluster you want to keep around (while it is terminated), edit 
 persistent_data.yaml in it's bucket on S3 and replace the existing snap ID 
 for the galaxyIndices with the snapshot ID you got in the previous step
 7. Start that cluster and you should have a file system from the new snapshot 
 mounted.
 8. Terminate  delete the cluster you created in step 1
 
 If you don't want to have to do this the first time around on your custom 
 cluster, you can first try it with another temporary cluster and make sure it 
 all works as expected and then move on to the real cluster.
 
 Best,
 Enis
 
 Using m1.small/medium and getting rid of the 700GB would being my costs down 
 to say $50 / month which is ok.
 
 
 Thanks !
 Greg E
 
 
 -- 
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 
 
 
 
 -- 
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 
 
 
 -- 
 http://galaxyproject.org/GCC2012
 http://galaxyproject.org/
 http://getgalaxy.org/
 http://usegalaxy.org/
 http://galaxyproject.org/wiki/
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your 

[galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-18 Thread Greg Edwards
Hi,

I've got an implementation of some proteomics tools going well in Galaxy on
AWS EC2 under Cloudman. Thanks for the help along the way.

I need to drive the costs down a bit. I'm using an m1.large AMI and it's
costing about $180 - $200 / month. This is about 55% storage and 45%
instance costs. That's peanuts in some senses but for now we need to get it
down so that it comes out of petty cash for the department, while the case
is proven for it's use.

I have a few questions and would appreciate ny insights ..


1. AWS has just released an m1.medium and m1.small instance type, which are
1/2 and 1/4 the cost of m1.large.

http://aws.amazon.com/ec2/instance-types/
http://aws.amazon.com/ec2/pricing/

I tried the m1.small and m1.medium with the latest Cloudman AMI *
*galaxy-cloudman-2011-03-22
(ami-da58aab3)
All seemed to install ok, but the Tools took up tp 30 minutes to start
execution on m1.medium, and never started on m1.small.

m1.medium only added about 15% to run times compared with m1.large, can't
say for m1.small. t1.micro does run (and for free in my Free Tier first
year) but blows execution times out by a factor of about 3 which is too
much.

Has anyone tried these new Instance Types ? (m1.small/medium)


2. The vast majority of the storage costs are fro the Gemome databases in
the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
the bare essentials ?

Using m1.small/medium and getting rid of the 700GB would being my costs
down to say $50 / month which is ok.


Thanks !
Greg E


-- 
Greg Edwards,
Port Jackson Bioinformatics
gedwar...@gmail.com
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reducing costs in Cloud Galaxy

2012-03-18 Thread Enis Afgan
Hi Greg,

On Mon, Mar 19, 2012 at 11:01 AM, Greg Edwards gedwar...@gmail.com wrote:

 Hi,

 I've got an implementation of some proteomics tools going well in Galaxy
 on AWS EC2 under Cloudman. Thanks for the help along the way.

 I need to drive the costs down a bit. I'm using an m1.large AMI and it's
 costing about $180 - $200 / month. This is about 55% storage and 45%
 instance costs. That's peanuts in some senses but for now we need to get it
 down so that it comes out of petty cash for the department, while the case
 is proven for it's use.

 I have a few questions and would appreciate ny insights ..


 1. AWS has just released an m1.medium and m1.small instance type, which
 are 1/2 and 1/4 the cost of m1.large.

 http://aws.amazon.com/ec2/instance-types/
 http://aws.amazon.com/ec2/pricing/

 I tried the m1.small and m1.medium with the latest Cloudman AMI *  
 *galaxy-cloudman-2011-03-22
 (ami-da58aab3)
 All seemed to install ok, but the Tools took up tp 30 minutes to start
 execution on m1.medium, and never started on m1.small.

 m1.medium only added about 15% to run times compared with m1.large, can't
 say for m1.small. t1.micro does run (and for free in my Free Tier first
 year) but blows execution times out by a factor of about 3 which is too
 much.

 Has anyone tried these new Instance Types ? (m1.small/medium)

I have no real experience with these instance types yet either so maybe
someone else can chime in on this?



 2. The vast majority of the storage costs are fro the Gemome databases in
 the 700GB /mnt/galaxyIndices, which I don't need. Can this be reduced to
 the bare essentials ?


You can do this manually:
1. Start a new Galaxy cluster (ie, one you can easily delete later)
2. ssh into the master instance and delete whatever genomes you don't
need/want (these are all located under /mnt/galaxyIndices)
3. Create a new EBS volume of size that'll fit whatever's left on the
original volume, attach it and mount it
4. Copy over the data from the original volume to the new one while keeping
the directory structure the same (rsync is probably the best tool for this)
5. Unmount  detach the new volume; create a snapshot from it
6. For the cluster you want to keep around (while it is terminated), edit
persistent_data.yaml in it's bucket on S3 and replace the existing snap ID
for the galaxyIndices with the snapshot ID you got in the previous step
7. Start that cluster and you should have a file system from the new
snapshot mounted.
8. Terminate  delete the cluster you created in step 1

If you don't want to have to do this the first time around on your custom
cluster, you can first try it with another temporary cluster and make sure
it all works as expected and then move on to the real cluster.

Best,
Enis


 Using m1.small/medium and getting rid of the 700GB would being my costs
 down to say $50 / month which is ok.


 Thanks !
 Greg E


 --
 Greg Edwards,
 Port Jackson Bioinformatics
 gedwar...@gmail.com


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/