Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Björn Grüning
Hi Jen,

fantastic news!

Thanks a lot!
Bjoern

> Thanks,
> 
> There have been no public data updates since the migration started
> (late last spring we froze the data). But there are some known issues
> and data that is ready to be released, in the process of becoming
> ready, etc. We expect to be able to start working on this again in the
> very near term.
> 
> To start this off, the mm9 bowtie2 indexes were restored this morning:
> https://trello.com/c/SbizUDQt
> 
> And these other finds are great, thanks Bjoern. I will add them to the
> public card. A few errors that were corrected later last spring popped
> out again, but will also be fixed. Small, but will be addressed ASAP.
> 
> Adding in any missing .2bit files in general are on our internal to-do
> list. Older genomes have other inconsistencies that will be addressed.
> The goal is to have the data filled in with complete indexes around
> the end of Nov, then filled in with newly released genomes/more
> variants for important model organisms by the end of the year. All of
> this depends on various factors, but this is where we are shooting
> for.
> 
> Thanks and if anything else off is noted, please feel free to send to
> the list or add to the card. All input is welcome - we want this to be
> a great resource for everyone - time to get back to making that happen
> now that the migration is wrapping up!
> 
> Jen
> Galaxy team
> 
> On 11/8/13 2:00 AM, Bjoern Gruening wrote:
> 
> > Hi,
> > 
> > to chime into this discussion.
> > 
> > I found some inconsistency during my rsync endeavor and I'm curious
> > if there is any way to contribute to that service.
> > 
> > --
> > xenTro3 xenTro3 Frog (Xenopus tropicalis):
> > xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa
> > but only
> > /xenTro3.fa.gz exists.
> > ---
> > ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
> > ---
> > ce6 has no .fa file under seq/ but in allfasta.loc there is a
> > reference to it ce6 Caenorhabditis elegans:
> > ce6 /galaxy/data/ce6/seq/ce6.fa
> > ---
> > TAIR9 and TAIR10 is not available via rync  
> > ---
> > Bowtie2 indices are missing for ce6, xentTro3
> > 
> > 
> > Thanks,
> > Bjoern
> > 
> > > Hi Jennifer,
> > > 
> > > Today I was trying to pull some bowtie2 indices from Galaxy rsync server 
> > > for PhiX to run some tests and just got the ones for bowtie1… I'm 
> > > wondering what's the state in regards to this past thread and what we can 
> > > do to help in here.
> > > 
> > > Cheers!
> > > Roman
> > > 
> > > 7 mar 2013 kl. 20:01 skrev Jennifer Jackson :
> > > 
> > > > Hi Brad (and Roman),
> > > > 
> > > > The team has talked about this in detail. There are a few wrinkles with 
> > > > just pulling in indexes - Dan is doing some work that could change this 
> > > > later on, but for now, the rsync will continue to point to the same 
> > > > location as Main's genome data source. This means that there are some 
> > > > limits on what we can do immediately. Setting up a submission pipe is 
> > > > one of them - there just isn't resource to do this right now or a 
> > > > common place distinct from Main to house the data. A few other ideas 
> > > > came up - we can chat later, each had side issues.
> > > > 
> > > > But I saw your tweet and think that it is great that you are pulling 
> > > > CloudBioLinux data from the rsync now, so let's get as much data in 
> > > > common as possible, so you have data to work with near term.
> > > > 
> > > > I am in the process of adding bt2 indexes - some are published to 
> > > > Main/rsync server already and some are not, but more will show up over 
> > > > the next week or so (along with more genomes and other indexes). I'll 
> > > > take a look at what you have and pull/match what I can. Genome sort 
> > > > order and variants are my concerns, both require special handling in 
> > > > processing and .locs. If it takes longer to check, I am just going to 
> > > > create here if I haven't already. The GATK-sort hg19 canonical is 
> > > > already on my list - it needed all indexes, not just bw2. When the next 
> > > > distribution goes out, I'll list what is new on the rsync in the News 
> > > > Brief.
> > > > 
> > > > For the Novoalign indexes, I'm not quite sure what to do about those 
> > > > yet. Or for any indexes associated with tools or genomes not hosted on 
> > > > Main. Do you want to open a card for those and any other cases that are 
> > > > similar? We can discuss a strategy from there, maybe at IUC, if 
> > > > Greg/Dan thinks it is appropriate. Please add me so I can follow.
> > > > 
> > > > I'll be in touch as I go through the data. Thanks for your patience on 
> > > > this!
> > > > 
> > > > Jen
> > > > Galaxy team
> > > > 
> > > > On 2/21/13 12:43 PM, Brad Chapman wrote:
> > > >> Hi all;
> > > >> Is there a way for community members to contribute indexes to the rsync
> > > >> server? This resource is awesome and I'm working on migrating the
> > > >> CloudBioLinux retrieval scripts to use this instead of 

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Jennifer Jackson

Thanks,

There have been no public data updates since the migration started (late 
last spring we froze the data). But there are some known issues and data 
that is ready to be released, in the process of becoming ready, etc. We 
expect to be able to start working on this again in the very near term.


To start this off, the mm9 bowtie2 indexes were restored this morning:
https://trello.com/c/SbizUDQt

And these other finds are great, thanks Bjoern. I will add them to the 
public card. A few errors that were corrected later last spring popped 
out again, but will also be fixed. Small, but will be addressed ASAP.


Adding in any missing .2bit files in general are on our internal to-do 
list. Older genomes have other inconsistencies that will be addressed. 
The goal is to have the data filled in with complete indexes around the 
end of Nov, then filled in with newly released genomes/more variants for 
important model organisms by the end of the year. All of this depends on 
various factors, but this is where we are shooting for.


Thanks and if anything else off is noted, please feel free to send to 
the list or add to the card. All input is welcome - we want this to be a 
great resource for everyone - time to get back to making that happen now 
that the migration is wrapping up!


Jen
Galaxy team

On 11/8/13 2:00 AM, Bjoern Gruening wrote:

Hi,

to chime into this discussion.

I found some inconsistency during my rsync endeavor and I'm curious if 
there is any way to contribute to that service.


--
xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 
/galaxy/data/xenTro3/seq/xenTro3.fa

but only
/xenTro3.fa.gz exists.
---
ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
---
ce6 has no .fa file under seq/ but in allfasta.loc there is a 
reference to it ce6 Caenorhabditis elegans: ce6 
/galaxy/data/ce6/seq/ce6.fa

---
TAIR9 and TAIR10 is not available via rync
---
Bowtie2 indices are missing for ce6, xentTro3


Thanks,
Bjoern


Hi Jennifer,

Today I was trying to pull some bowtie2 indices from Galaxy rsync server for 
PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's 
the state in regards to this past thread and what we can do to help in here.

Cheers!
Roman

7 mar 2013 kl. 20:01 skrev Jennifer Jackson mailto:j...@bx.psu.edu>>:

> Hi Brad (and Roman),
>
> The team has talked about this in detail. There are a few wrinkles with just 
pulling in indexes - Dan is doing some work that could change this later on, but 
for now, the rsync will continue to point to the same location as Main's genome 
data source. This means that there are some limits on what we can do immediately. 
Setting up a submission pipe is one of them - there just isn't resource to do this 
right now or a common place distinct from Main to house the data. A few other 
ideas came up - we can chat later, each had side issues.
>
> But I saw your tweet and think that it is great that you are pulling 
CloudBioLinux data from the rsync now, so let's get as much data in common as 
possible, so you have data to work with near term.
>
> I am in the process of adding bt2 indexes - some are published to Main/rsync 
server already and some are not, but more will show up over the next week or so 
(along with more genomes and other indexes). I'll take a look at what you have and 
pull/match what I can. Genome sort order and variants are my concerns, both 
require special handling in processing and .locs. If it takes longer to check, I 
am just going to create here if I haven't already. The GATK-sort hg19 canonical is 
already on my list - it needed all indexes, not just bw2. When the next 
distribution goes out, I'll list what is new on the rsync in the News Brief.
>
> For the Novoalign indexes, I'm not quite sure what to do about those yet. Or 
for any indexes associated with tools or genomes not hosted on Main. Do you want 
to open a card for those and any other cases that are similar? We can discuss a 
strategy from there, maybe at IUC, if Greg/Dan thinks it is appropriate. Please 
add me so I can follow.
>
> I'll be in touch as I go through the data. Thanks for your patience on this!
>
> Jen
> Galaxy team
>
> On 2/21/13 12:43 PM, Brad Chapman wrote:
>> Hi all;
>> Is there a way for community members to contribute indexes to the rsync
>> server? This resource is awesome and I'm working on migrating the
>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>> buckets we'd set up previously:
>>
>>https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
>>
>> It's great to have this as a public shared resource and I'd like to be
>> able to contribute back. From an initial pass, here are the things I'd
>> like to do:
>>
>> - Include bowtie2 indexes for more genomes.
>>
>> - Include novoalign indexes for a number of commonly used genomes.
>>
>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>   Broad has a nice version prepped so GATK will be happy

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Bjoern Gruening
Hi,

to chime into this discussion.

I found some inconsistency during my rsync endeavor and I'm curious if
there is any way to contribute to that service.

--
xenTro3 xenTro3 Frog (Xenopus tropicalis):
xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa
but only
/xenTro3.fa.gz exists.
---
ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
---
ce6 has no .fa file under seq/ but in allfasta.loc there is a reference
to it ce6   Caenorhabditis elegans: ce6 /galaxy/data/ce6/seq/ce6.fa
---
TAIR9 and TAIR10 is not available via rync  
---
Bowtie2 indices are missing for ce6, xentTro3


Thanks,
Bjoern


> Hi Jennifer,
> 
> Today I was trying to pull some bowtie2 indices from Galaxy rsync server for 
> PhiX to run some tests and just got the ones for bowtie1… I'm wondering 
> what's the state in regards to this past thread and what we can do to help in 
> here.
> 
> Cheers!
> Roman
> 
> 7 mar 2013 kl. 20:01 skrev Jennifer Jackson :
> 
> > Hi Brad (and Roman),
> > 
> > The team has talked about this in detail. There are a few wrinkles with 
> > just pulling in indexes - Dan is doing some work that could change this 
> > later on, but for now, the rsync will continue to point to the same 
> > location as Main's genome data source. This means that there are some 
> > limits on what we can do immediately. Setting up a submission pipe is one 
> > of them - there just isn't resource to do this right now or a common place 
> > distinct from Main to house the data. A few other ideas came up - we can 
> > chat later, each had side issues.
> > 
> > But I saw your tweet and think that it is great that you are pulling 
> > CloudBioLinux data from the rsync now, so let's get as much data in common 
> > as possible, so you have data to work with near term.
> > 
> > I am in the process of adding bt2 indexes - some are published to 
> > Main/rsync server already and some are not, but more will show up over the 
> > next week or so (along with more genomes and other indexes). I'll take a 
> > look at what you have and pull/match what I can. Genome sort order and 
> > variants are my concerns, both require special handling in processing and 
> > .locs. If it takes longer to check, I am just going to create here if I 
> > haven't already. The GATK-sort hg19 canonical is already on my list - it 
> > needed all indexes, not just bw2. When the next distribution goes out, I'll 
> > list what is new on the rsync in the News Brief.
> > 
> > For the Novoalign indexes, I'm not quite sure what to do about those yet. 
> > Or for any indexes associated with tools or genomes not hosted on Main. Do 
> > you want to open a card for those and any other cases that are similar? We 
> > can discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is 
> > appropriate. Please add me so I can follow.
> > 
> > I'll be in touch as I go through the data. Thanks for your patience on this!
> > 
> > Jen
> > Galaxy team
> > 
> > On 2/21/13 12:43 PM, Brad Chapman wrote:
> >> Hi all;
> >> Is there a way for community members to contribute indexes to the rsync
> >> server? This resource is awesome and I'm working on migrating the
> >> CloudBioLinux retrieval scripts to use this instead of the custom S3
> >> buckets we'd set up previously:
> >> 
> >> https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
> >> 
> >> It's great to have this as a public shared resource and I'd like to be
> >> able to contribute back. From an initial pass, here are the things I'd
> >> like to do:
> >> 
> >> - Include bowtie2 indexes for more genomes.
> >> 
> >> - Include novoalign indexes for a number of commonly used genomes.
> >> 
> >> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
> >>   Broad has a nice version prepped so GATK will be happy with it, and
> >>   you need to stick with this ordering if you're ever going to use a
> >>   GATK tool on it. Right now there is a partial hg19canon (without the
> >>   random/haplotype chromosomes) and the structure is a bit complex.
> >> 
> >> What's the best way to contribute these? Right now I have a lot of the
> >> indexes on S3. For instance, the hg19 indexes are here:
> >> 
> >> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
> >> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
> >> https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
> >> https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
> >> https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
> >> https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
> >> 
> >> I'm happy to format these differently or upload somewhere that would
> >> make it easy to include. Thanks again for setting this up, I'm looking
> >> forward to working off a shared repository of data,
> >> Brad
> >> ___
> >> Please keep all replies on the list by using "reply all"
> >> in your mail client.  To manage your subscriptions to this

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-10-21 Thread Roman Valls Guimera
Hi Jennifer,

Today I was trying to pull some bowtie2 indices from Galaxy rsync server for 
PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's 
the state in regards to this past thread and what we can do to help in here.

Cheers!
Roman

7 mar 2013 kl. 20:01 skrev Jennifer Jackson :

> Hi Brad (and Roman),
> 
> The team has talked about this in detail. There are a few wrinkles with just 
> pulling in indexes - Dan is doing some work that could change this later on, 
> but for now, the rsync will continue to point to the same location as Main's 
> genome data source. This means that there are some limits on what we can do 
> immediately. Setting up a submission pipe is one of them - there just isn't 
> resource to do this right now or a common place distinct from Main to house 
> the data. A few other ideas came up - we can chat later, each had side issues.
> 
> But I saw your tweet and think that it is great that you are pulling 
> CloudBioLinux data from the rsync now, so let's get as much data in common as 
> possible, so you have data to work with near term.
> 
> I am in the process of adding bt2 indexes - some are published to Main/rsync 
> server already and some are not, but more will show up over the next week or 
> so (along with more genomes and other indexes). I'll take a look at what you 
> have and pull/match what I can. Genome sort order and variants are my 
> concerns, both require special handling in processing and .locs. If it takes 
> longer to check, I am just going to create here if I haven't already. The 
> GATK-sort hg19 canonical is already on my list - it needed all indexes, not 
> just bw2. When the next distribution goes out, I'll list what is new on the 
> rsync in the News Brief.
> 
> For the Novoalign indexes, I'm not quite sure what to do about those yet. Or 
> for any indexes associated with tools or genomes not hosted on Main. Do you 
> want to open a card for those and any other cases that are similar? We can 
> discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is 
> appropriate. Please add me so I can follow.
> 
> I'll be in touch as I go through the data. Thanks for your patience on this!
> 
> Jen
> Galaxy team
> 
> On 2/21/13 12:43 PM, Brad Chapman wrote:
>> Hi all;
>> Is there a way for community members to contribute indexes to the rsync
>> server? This resource is awesome and I'm working on migrating the
>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>> buckets we'd set up previously:
>> 
>> https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
>> 
>> It's great to have this as a public shared resource and I'd like to be
>> able to contribute back. From an initial pass, here are the things I'd
>> like to do:
>> 
>> - Include bowtie2 indexes for more genomes.
>> 
>> - Include novoalign indexes for a number of commonly used genomes.
>> 
>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>   Broad has a nice version prepped so GATK will be happy with it, and
>>   you need to stick with this ordering if you're ever going to use a
>>   GATK tool on it. Right now there is a partial hg19canon (without the
>>   random/haplotype chromosomes) and the structure is a bit complex.
>> 
>> What's the best way to contribute these? Right now I have a lot of the
>> indexes on S3. For instance, the hg19 indexes are here:
>> 
>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
>> 
>> I'm happy to format these differently or upload somewhere that would
>> make it easy to include. Thanks again for setting this up, I'm looking
>> forward to working off a shared repository of data,
>> Brad
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>> 
>>   http://lists.bx.psu.edu/
> 
> -- 
> Jennifer Hillman-Jackson
> Galaxy Support and Training
> http://galaxyproject.org
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
> http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/sear

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-03-07 Thread Jennifer Jackson

Hi Brad (and Roman),

The team has talked about this in detail. There are a few wrinkles with 
just pulling in indexes - Dan is doing some work that could change this 
later on, but for now, the rsync will continue to point to the same 
location as Main's genome data source. This means that there are some 
limits on what we can do immediately. Setting up a submission pipe is 
one of them - there just isn't resource to do this right now or a common 
place distinct from Main to house the data. A few other ideas came up - 
we can chat later, each had side issues.


But I saw your tweet and think that it is great that you are pulling 
CloudBioLinux data from the rsync now, so let's get as much data in 
common as possible, so you have data to work with near term.


I am in the process of adding bt2 indexes - some are published to 
Main/rsync server already and some are not, but more will show up over 
the next week or so (along with more genomes and other indexes). I'll 
take a look at what you have and pull/match what I can. Genome sort 
order and variants are my concerns, both require special handling in 
processing and .locs. If it takes longer to check, I am just going to 
create here if I haven't already. The GATK-sort hg19 canonical is 
already on my list - it needed all indexes, not just bw2. When the next 
distribution goes out, I'll list what is new on the rsync in the News Brief.


For the Novoalign indexes, I'm not quite sure what to do about those 
yet. Or for any indexes associated with tools or genomes not hosted on 
Main. Do you want to open a card for those and any other cases that are 
similar? We can discuss a strategy from there, maybe at IUC, if Greg/Dan 
thinks it is appropriate. Please add me so I can follow.


I'll be in touch as I go through the data. Thanks for your patience on this!

Jen
Galaxy team

On 2/21/13 12:43 PM, Brad Chapman wrote:

Hi all;
Is there a way for community members to contribute indexes to the rsync
server? This resource is awesome and I'm working on migrating the
CloudBioLinux retrieval scripts to use this instead of the custom S3
buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

It's great to have this as a public shared resource and I'd like to be
able to contribute back. From an initial pass, here are the things I'd
like to do:

- Include bowtie2 indexes for more genomes.

- Include novoalign indexes for a number of commonly used genomes.

- Clean up hg19 to include a full canonically sorted hg19, with indexes.
   Broad has a nice version prepped so GATK will be happy with it, and
   you need to stick with this ordering if you're ever going to use a
   GATK tool on it. Right now there is a partial hg19canon (without the
   random/haplotype chromosomes) and the structure is a bit complex.

What's the best way to contribute these? Right now I have a lot of the
indexes on S3. For instance, the hg19 indexes are here:

https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz

I'm happy to format these differently or upload somewhere that would
make it easy to include. Thanks again for setting this up, I'm looking
forward to working off a shared repository of data,
Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-03-05 Thread Roman Valls
Hello Brad, Jennifer,

I'm also interested in this initiative since I'm using Brad's code as
a part of a testsuite for a bloom filter:

https://github.com/SciLifeLab/facs/blob/master/facs/utils/galaxy.py

As both of you pointed out, there's a need for some cleanup, there are
some stray files here and there:

-rw-r--r-- 1 vagrant vagrant  34M May 28  2009 .2bit < ???
-rw-r--r-- 1 vagrant vagrant 779M Aug 13  2010 hg19.2bit

In general I think that some good naming conventions in such a good
community resource would be *very* useful.

I will be glad to help! :)

@romanvg on Trello.

Cheers!
Roman

On Sat, Feb 23, 2013 at 8:39 PM, Brad Chapman  wrote:
>
> Jen;
> That sounds great, thanks for your enthusiasm and help organizing this.
> I'm @bradchapman on Trello so feel free to add me to the ticket and let
> me know how I can help. I'm happy to set this up however you feel best:
> looking forward to having a shared repository for all this formatted
> genome data. Thanks again,
> Brad
>
>
>> Hi Brad,
>>
>> I really like this idea. I'm not going to open a ticket yet but talk
>> with Dan/team about some options. We have an alternate directory
>> structure modeled from last fall I'd like to get in place before we
>> start something like this ( is not yet implemented, but it or something
>> similar would be required to properly add in the hg19 GTAK-sort. will
>> involve a bit of other data shuffling w/ .loc changes to keep external
>> links functional ). There are also some other repo ideas in play.
>>
>> Let me get some internal feedback next week, then I'll start a Trello
>> ticket with some basics from our side that we can use to vet a plan to
>> go forward, assuming the rest of the team likes idea. I think test
>> cases/index validation would probably be part of this somehow. And
>> certainly some simplification we be welcomed in the more cluttered dirs,
>> if that can be managed while keeping enough around for reproducibility
>> needs.
>>
>> Very topical, thanks for bring up and offering to help out! These can be
>> great deal of work to create and makes total sense to share. I'll send
>> and update later next week, hopefully with a Trello link so we can get
>> started.
>>
>> Jen
>> Galaxy
>>
>> On 2/21/13 12:43 PM, Brad Chapman wrote:
>>>
>>> Hi all;
>>> Is there a way for community members to contribute indexes to the rsync
>>> server? This resource is awesome and I'm working on migrating the
>>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>>> buckets we'd set up previously:
>>>
>>> https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
>>>
>>> It's great to have this as a public shared resource and I'd like to be
>>> able to contribute back. From an initial pass, here are the things I'd
>>> like to do:
>>>
>>> - Include bowtie2 indexes for more genomes.
>>>
>>> - Include novoalign indexes for a number of commonly used genomes.
>>>
>>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>>Broad has a nice version prepped so GATK will be happy with it, and
>>>you need to stick with this ordering if you're ever going to use a
>>>GATK tool on it. Right now there is a partial hg19canon (without the
>>>random/haplotype chromosomes) and the structure is a bit complex.
>>>
>>> What's the best way to contribute these? Right now I have a lot of the
>>> indexes on S3. For instance, the hg19 indexes are here:
>>>
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
>>>
>>> I'm happy to format these differently or upload somewhere that would
>>> make it easy to include. Thanks again for setting this up, I'm looking
>>> forward to working off a shared repository of data,
>>> Brad
>>> ___
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>
>>>http://lists.bx.psu.edu/
>>>
>>
>> --
>> Jennifer Hillman-Jackson
>> Galaxy Support and Training
>> http://galaxyproject.org
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-02-23 Thread Brad Chapman

Jen;
That sounds great, thanks for your enthusiasm and help organizing this.
I'm @bradchapman on Trello so feel free to add me to the ticket and let
me know how I can help. I'm happy to set this up however you feel best:
looking forward to having a shared repository for all this formatted
genome data. Thanks again,
Brad


> Hi Brad,
>
> I really like this idea. I'm not going to open a ticket yet but talk 
> with Dan/team about some options. We have an alternate directory 
> structure modeled from last fall I'd like to get in place before we 
> start something like this ( is not yet implemented, but it or something 
> similar would be required to properly add in the hg19 GTAK-sort. will 
> involve a bit of other data shuffling w/ .loc changes to keep external 
> links functional ). There are also some other repo ideas in play.
>
> Let me get some internal feedback next week, then I'll start a Trello 
> ticket with some basics from our side that we can use to vet a plan to 
> go forward, assuming the rest of the team likes idea. I think test 
> cases/index validation would probably be part of this somehow. And 
> certainly some simplification we be welcomed in the more cluttered dirs, 
> if that can be managed while keeping enough around for reproducibility 
> needs.
>
> Very topical, thanks for bring up and offering to help out! These can be 
> great deal of work to create and makes total sense to share. I'll send 
> and update later next week, hopefully with a Trello link so we can get 
> started.
>
> Jen
> Galaxy
>
> On 2/21/13 12:43 PM, Brad Chapman wrote:
>>
>> Hi all;
>> Is there a way for community members to contribute indexes to the rsync
>> server? This resource is awesome and I'm working on migrating the
>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>> buckets we'd set up previously:
>>
>> https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
>>
>> It's great to have this as a public shared resource and I'd like to be
>> able to contribute back. From an initial pass, here are the things I'd
>> like to do:
>>
>> - Include bowtie2 indexes for more genomes.
>>
>> - Include novoalign indexes for a number of commonly used genomes.
>>
>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>Broad has a nice version prepped so GATK will be happy with it, and
>>you need to stick with this ordering if you're ever going to use a
>>GATK tool on it. Right now there is a partial hg19canon (without the
>>random/haplotype chromosomes) and the structure is a bit complex.
>>
>> What's the best way to contribute these? Right now I have a lot of the
>> indexes on S3. For instance, the hg19 indexes are here:
>>
>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
>> https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
>>
>> I'm happy to format these differently or upload somewhere that would
>> make it easy to include. Thanks again for setting this up, I'm looking
>> forward to working off a shared repository of data,
>> Brad
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>http://lists.bx.psu.edu/
>>
>
> -- 
> Jennifer Hillman-Jackson
> Galaxy Support and Training
> http://galaxyproject.org
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-02-22 Thread Jennifer Jackson

Hi Brad,

I really like this idea. I'm not going to open a ticket yet but talk 
with Dan/team about some options. We have an alternate directory 
structure modeled from last fall I'd like to get in place before we 
start something like this ( is not yet implemented, but it or something 
similar would be required to properly add in the hg19 GTAK-sort. will 
involve a bit of other data shuffling w/ .loc changes to keep external 
links functional ). There are also some other repo ideas in play.


Let me get some internal feedback next week, then I'll start a Trello 
ticket with some basics from our side that we can use to vet a plan to 
go forward, assuming the rest of the team likes idea. I think test 
cases/index validation would probably be part of this somehow. And 
certainly some simplification we be welcomed in the more cluttered dirs, 
if that can be managed while keeping enough around for reproducibility 
needs.


Very topical, thanks for bring up and offering to help out! These can be 
great deal of work to create and makes total sense to share. I'll send 
and update later next week, hopefully with a Trello link so we can get 
started.


Jen
Galaxy

On 2/21/13 12:43 PM, Brad Chapman wrote:


Hi all;
Is there a way for community members to contribute indexes to the rsync
server? This resource is awesome and I'm working on migrating the
CloudBioLinux retrieval scripts to use this instead of the custom S3
buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

It's great to have this as a public shared resource and I'd like to be
able to contribute back. From an initial pass, here are the things I'd
like to do:

- Include bowtie2 indexes for more genomes.

- Include novoalign indexes for a number of commonly used genomes.

- Clean up hg19 to include a full canonically sorted hg19, with indexes.
   Broad has a nice version prepped so GATK will be happy with it, and
   you need to stick with this ordering if you're ever going to use a
   GATK tool on it. Right now there is a partial hg19canon (without the
   random/haplotype chromosomes) and the structure is a bit complex.

What's the best way to contribute these? Right now I have a lot of the
indexes on S3. For instance, the hg19 indexes are here:

https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz

I'm happy to format these differently or upload somewhere that would
make it easy to include. Thanks again for setting this up, I'm looking
forward to working off a shared repository of data,
Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Contributing to genome indexes on rsync server

2013-02-21 Thread Brad Chapman

Hi all;
Is there a way for community members to contribute indexes to the rsync
server? This resource is awesome and I'm working on migrating the
CloudBioLinux retrieval scripts to use this instead of the custom S3
buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

It's great to have this as a public shared resource and I'd like to be
able to contribute back. From an initial pass, here are the things I'd
like to do:

- Include bowtie2 indexes for more genomes.

- Include novoalign indexes for a number of commonly used genomes.

- Clean up hg19 to include a full canonically sorted hg19, with indexes.
  Broad has a nice version prepped so GATK will be happy with it, and
  you need to stick with this ordering if you're ever going to use a
  GATK tool on it. Right now there is a partial hg19canon (without the
  random/haplotype chromosomes) and the structure is a bit complex.

What's the best way to contribute these? Right now I have a lot of the
indexes on S3. For instance, the hg19 indexes are here:

https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz

I'm happy to format these differently or upload somewhere that would
make it easy to include. Thanks again for setting this up, I'm looking
forward to working off a shared repository of data,
Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/