On Tue, Sep 6, 2011 at 5:12 PM, Nate Coraor wrote:
> Peter Cock wrote:
>> On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor wrote:
>> > Peter Cock wrote:
>> >> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor wrote:
>> >> > Ideally, there'd just be a column on the dataset table indicating
>> >> > whether t
The use of (unaligned) BAM for readgroups seems like a good idea. At the very
least it prevents inconsistently hacking this information into the FASTQ
descriptor (a common problem with any simple format).
chris
On Sep 8, 2011, at 1:35 PM, Edward Kirton wrote:
> copied from another thread:
>
copied from another thread:
On Thu, Sep 8, 2011 at 7:30 AM, Anton Nekrutenko wrote:
> What we are thinking of lately is switching to unaligned BAM for
> everyting. One of the benefits here is the ability to add readgroups from
> day 1 simplifying multisample analyses down the road.
>
this seems
Peter Cock wrote:
> On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor wrote:
> > Peter Cock wrote:
> >> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor wrote:
> >> > Ideally, there'd just be a column on the dataset table indicating
> >> > whether the dataset is compressed or not, and then tools get a new
>
On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor wrote:
> Peter Cock wrote:
>> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor wrote:
>> > Ideally, there'd just be a column on the dataset table indicating
>> > whether the dataset is compressed or not, and then tools get a new
>> > way to indicate whether
Peter Cock wrote:
> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor wrote:
> > Edward Kirton wrote:
> >> Peter wrote:
> >> > I wonder if Galaxy would benefit from a new fastqsanger-gzip (etc)
> >> > datatype?
> >> > However this seems generally useful (not just for FASTQ) so perhaps a
> >> > more
>
On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor wrote:
> Edward Kirton wrote:
>> Peter wrote:
>> > I wonder if Galaxy would benefit from a new fastqsanger-gzip (etc)
>> > datatype?
>> > However this seems generally useful (not just for FASTQ) so perhaps a more
>> > general mechanism would be better w
Edward Kirton wrote:
> > In your position I agree that is a pragmatic choice.
>
> Thanks for helping me muddle through my options.
>
> > You might be able to
> > modify the file upload code to gzip any FASTQ files... that would prevent
> > uncompressed FASTQ getting into new histories.
>
> Right
Probably not, as it is somewhat a competitor of SAM/BAM (a bit broader in
scope, beyond just alignments). As Peter indicated, I know the BioHDF folks
(they are here in town); however, my actual question was whether anyone is
actually using HDF5 or SRA in production? I haven't seen adoption
On Sep 2, 2011, at 8:02 PM, Peter Cock wrote:
> On Fri, Sep 2, 2011 at 9:27 PM, Fields, Christopher J
> wrote:
>> On Sep 2, 2011, at 3:02 PM, Edward Kirton wrote:
>>
What, like a BAM file of unaligned reads? Uses gzip compression, and
tracks the pairing information explicitly :) Some
> In your position I agree that is a pragmatic choice.
Thanks for helping me muddle through my options.
> You might be able to
> modify the file upload code to gzip any FASTQ files... that would prevent
> uncompressed FASTQ getting into new histories.
Right!
> I wonder if Galaxy would benefit f
On Sep 2, 2011, at 8:02 PM, Peter Cock wrote:
> On Fri, Sep 2, 2011 at 9:27 PM, Fields, Christopher J
> wrote:
>> On Sep 2, 2011, at 3:02 PM, Edward Kirton wrote:
>>
What, like a BAM file of unaligned reads? Uses gzip compression, and
tracks the pairing information explicitly :) Some t
On Saturday, September 3, 2011, Edward Kirton wrote:
> of course there is a computational cost to compressing/uncompressing
> files but that's probably better than storing unnecessarily huge
> files. it's a trade-off.
It may still be faster due to less IO, probably depends on your hardware.
> s
>>> i actually think illumina's pipeline produces files in this format
>>>(unaligned-bam) now.
> Oh do they? - that's interesting. Do you have a reference/link?
i caught wind of this at the recent illumina user's conference but i
asked someone in our sequencing team to confirm and he hadn't hear
On Fri, Sep 2, 2011 at 9:27 PM, Fields, Christopher J
wrote:
> On Sep 2, 2011, at 3:02 PM, Edward Kirton wrote:
>
>>> What, like a BAM file of unaligned reads? Uses gzip compression, and
>>> tracks the pairing information explicitly :) Some tools will already take
>>> this as an input format, but
On Sep 2, 2011, at 3:02 PM, Edward Kirton wrote:
>> What, like a BAM file of unaligned reads? Uses gzip compression, and
>> tracks the pairing information explicitly :) Some tools will already take
>> this as an input format, but not all.
>
> ah, yes, precisely. i actually think illumina's pipel
> What, like a BAM file of unaligned reads? Uses gzip compression, and
> tracks the pairing information explicitly :) Some tools will already take
> this as an input format, but not all.
ah, yes, precisely. i actually think illumina's pipeline produces
files in this format now.
wrappers which cre
On Thu, Sep 1, 2011 at 11:02 PM, Edward Kirton wrote:
> Read QC intermediate files account for most of the storage used on our
> galaxy site. And it's a real problem that I must solve soon.
> My first attempt at taming the beast was to try to create a single read QC
> tool that did such things as
Read QC intermediate files account for most of the storage used on our
galaxy site. And it's a real problem that I must solve soon.
My first attempt at taming the beast was to try to create a single read QC
tool that did such things as convert qual encoding, qual-end trimming, etc.
(very basic fun
Hi Patrick,
the issue you are having is partly related to the idea of Galaxy to
ensure reproducible science and saving each intermediate step and
output files. For example in your current workflow in Galaxy you can
easily do something else with each intermediate file - feed it to a
different tool
I'm not a bioinformaticist or programmer so apologies if this is a silly
question. I've been occasionally running galaxy on my laptop and on the public
server and I love it. The issue that I have is that my workflow requires many
steps (what I do is probably very unusual). Each step creates a ne
21 matches
Mail list logo