Re: [galaxy-user] Problem with bam and/or bai files

2011-10-27 Thread Peter Cock
Sending to galaxy-dev ...

On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
jrobi...@broadinstitute.org wrote:

 Hi Mike,

 Someone from the Galaxy team can perhaps give some insight on
 what went wrong,  I can comment on the error message from IGV.
 That error is thrown from Picard, in every case I've investigated so
 far it was traced to a problem with the index.

Useful background re: Error reading bam file. This usually indicates
a problem with the index (bai) file. ArrayIndexOutofBoundsException:
4682 (4682).

 The most common causes are (1) a problem with the sequence
 dictionary in the BAM header itself, specifically incorrect sequence
 lengths,

Any idea what tools produce that kind of thing?

 and (2) indexing an un-sorted BAM.  Apparently samtools will
 make invalid indexes from such files without any complaints in
 both cases.  You can even use samtools tview on such files,
 it happily will show you some random region when you query.

That is news to me - I recall samtools index being recommended
as a way to determine if a BAM files was sorted or not (error on
unsorted, you get an index if it was sorted) and again from
memory this is what Galaxy uses internally as part of preparing
BAM files on upload.

Might this be tied to a specific version of samtools? e.g. a
possible regression?

 I don't see a Sort step in your workflow, maybe that's the problem?

 Please CC me on any reply,  I might miss it in the list.

 Jim

Thanks,

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Problem with bam and/or bai files

2011-10-27 Thread Jim Robinson
 Its possible the sorting problem was a specific version and now gives 
an error.  The incorrect index caused by bad sequence lengths is a 
recurrent problem, but I do not know what tool produces such headers.  
Perhaps someone who has experienced this can chime in.


I'm not a samtools expert just sharing my experience on what has caused 
this error int the past.   It does seem that, as a general rule,  that 
these index problems result in errors from Picard (which the GATK uses), 
while samtools can fail silently and sometimes and give you an unrelated 
query region.


Jim


Sending to galaxy-dev ...

On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
jrobi...@broadinstitute.org  wrote:

Hi Mike,

Someone from the Galaxy team can perhaps give some insight on
what went wrong,  I can comment on the error message from IGV.
That error is thrown from Picard, in every case I've investigated so
far it was traced to a problem with the index.

Useful background re: Error reading bam file. This usually indicates
a problem with the index (bai) file. ArrayIndexOutofBoundsException:
4682 (4682).


The most common causes are (1) a problem with the sequence
dictionary in the BAM header itself, specifically incorrect sequence
lengths,

Any idea what tools produce that kind of thing?


and (2) indexing an un-sorted BAM.  Apparently samtools will
make invalid indexes from such files without any complaints in
both cases.  You can even use samtools tview on such files,
it happily will show you some random region when you query.

That is news to me - I recall samtools index being recommended
as a way to determine if a BAM files was sorted or not (error on
unsorted, you get an index if it was sorted) and again from
memory this is what Galaxy uses internally as part of preparing
BAM files on upload.

Might this be tied to a specific version of samtools? e.g. a
possible regression?




I don't see a Sort step in your workflow, maybe that's the problem?

Please CC me on any reply,  I might miss it in the list.

Jim

Thanks,

Peter


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Problem with bam and/or bai files

2011-10-27 Thread Mike Dufault
Hi all,
 
I appreciate all of the discussion related to this issue. I still don't 
understand why I should only see this issue when I choose the hg_g1k_v37 format 
but not when I choose the Hg_19 format? I realize that I would need to ensure 
that the Bam files are sorted correctly before I enter the GATK pipline, but 
all of this is before that process.
 
When my read files are processed through to .bam files using the hg_19 format, 
I can view them in IGV without a problem. It is only when I use the hg_g1k_v37 
format that I receive an error from IGV. It seems to me that the process that I 
am using in Galaxy should be identical except for the reference genome format 
(i.e. hg_19 or hg_g1k_v37).
 
I am at a loss of how to proceed. Does anyone have ideas?
 
Thanks,
Mike



--- On Thu, 10/27/11, Jim Robinson jrobi...@broadinstitute.org wrote:


From: Jim Robinson jrobi...@broadinstitute.org
Subject: Re: [galaxy-user] Problem with bam and/or bai files
To: Peter Cock p.j.a.c...@googlemail.com
Cc: Galaxy Dev galaxy-...@bx.psu.edu, Mike Dufault dufau...@yahoo.com, 
galaxy-user galaxy-user@lists.bx.psu.edu
Date: Thursday, October 27, 2011, 9:58 AM


  Its possible the sorting problem was a specific version and now gives 
an error.  The incorrect index caused by bad sequence lengths is a 
recurrent problem, but I do not know what tool produces such headers.  
Perhaps someone who has experienced this can chime in.

I'm not a samtools expert just sharing my experience on what has caused 
this error int the past.   It does seem that, as a general rule,  that 
these index problems result in errors from Picard (which the GATK uses), 
while samtools can fail silently and sometimes and give you an unrelated 
query region.

Jim

 Sending to galaxy-dev ...

 On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
 jrobi...@broadinstitute.org  wrote:
 Hi Mike,

 Someone from the Galaxy team can perhaps give some insight on
 what went wrong,  I can comment on the error message from IGV.
 That error is thrown from Picard, in every case I've investigated so
 far it was traced to a problem with the index.
 Useful background re: Error reading bam file. This usually indicates
 a problem with the index (bai) file. ArrayIndexOutofBoundsException:
 4682 (4682).

 The most common causes are (1) a problem with the sequence
 dictionary in the BAM header itself, specifically incorrect sequence
 lengths,
 Any idea what tools produce that kind of thing?

 and (2) indexing an un-sorted BAM.  Apparently samtools will
 make invalid indexes from such files without any complaints in
 both cases.  You can even use samtools tview on such files,
 it happily will show you some random region when you query.
 That is news to me - I recall samtools index being recommended
 as a way to determine if a BAM files was sorted or not (error on
 unsorted, you get an index if it was sorted) and again from
 memory this is what Galaxy uses internally as part of preparing
 BAM files on upload.

 Might this be tied to a specific version of samtools? e.g. a
 possible regression?


 I don't see a Sort step in your workflow, maybe that's the problem?

 Please CC me on any reply,  I might miss it in the list.

 Jim
 Thanks,

 Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Problem with bam and/or bai files

2011-10-27 Thread Jennifer Jackson

Hi Mike,

Are you using the Galaxy Main instance at http://usegalaxy.org? If not, 
can you duplicate when using Main?


If on Main, and you want to share a history link with me, I can take a 
look. Use Options - Share or Publish, generate link (or add me as a 
share user), and email that back to me directly.


Hopefully we can help,

Jen
Galaxy team

On 10/26/11 9:34 PM, Mike Dufault wrote:

Hello Galaxy Team,
I have been using Galaxy for SNP detection for with great success.
Basically, I followed the screen-cast from Anton without any problems.
The only change was to use the BWA instead of Bowtie. Until now, I have
always assigned my raw read files to the hg19 format. Now I want to try
the GATK pipeline to analyze my samples but I am running into a problem
with the bam/bai files.
Here is what I did. I imported my Illumina paired end reads into Galaxy
and assigned them to the hg_g1k_v37 format instead of the Hg19 format.
 From there, I again followed the exact same process: FastQ Groomer,
Summary Statistics, Boxplots, Align with BWA, filter on SAM, SAM-to-Bam,
generate bai file. I made sure that hg_g1k_37 was chosen for the format
for all of these steps that required that information.
Everything seemed to run successfully as all of the boxed turned green.
When I tried to view the bam file in IGV (as a QC step before the GATK
pipeline), I received the following error: Error reading bam file. This
usually indicates a problem with the index (bai) file.
ArrayIndexOutofBoundsException: 4682 (4682).
I did the exact same analysis using the Hg19 format and my bam/bai files
worked perfectly fine in the IGV viewer. Can anyone tell me what the
problem is and how to fix it?
Thanks,
Mike Dufault



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Problem with bam and/or bai files

2011-10-26 Thread Jim Robinson

 Hi Mike,

Someone from the Galaxy team can perhaps give some insight on what went 
wrong,  I can comment on the error message from IGV.   That error is 
thrown from Picard, in every case I've investigated so far it was traced 
to a problem with the index.  The most common causes are (1) a problem 
with the sequence dictionary in the BAM header itself, specifically 
incorrect sequence lengths, and (2) indexing an un-sorted BAM.  
Apparently samtools will make invalid indexes from such files without 
any complaints in both cases.  You can even use samtools tview on such 
files,  it happily will show you some random region when you query.


I don't see a Sort step in your workflow, maybe that's the problem?

Please CC me on any reply,  I might miss it in the list.

Jim




Hello Galaxy Team,
I have been using Galaxy for SNP detection for with great success. 
Basically, I followed the screen-cast from Anton without any problems. 
The only change was to use the BWA instead of Bowtie. Until now, I 
have always assigned my raw read files to the hg19 format. Now I want 
to try the GATK pipeline to analyze my samples but I am running into a 
problem with the bam/bai files.
Here is what I did. I imported my Illumina paired end reads into 
Galaxy and assigned them to the hg_g1k_v37 format instead of the Hg19 
format. From there, I again followed the exact same process: FastQ 
Groomer, Summary Statistics, Boxplots, Align with BWA, filter on SAM, 
SAM-to-Bam, generate bai file. I made sure that hg_g1k_37 was chosen 
for the format for all of these steps that required that information.
Everything seemed to run successfully as all of the boxed turned 
green. When I tried to view the bam file in IGV (as a QC step before 
the GATK pipeline), I received the following error: Error reading bam 
file. This usually indicates a problem with the index (bai) file. 
ArrayIndexOutofBoundsException: 4682 (4682).
I did the exact same analysis using the Hg19 format and my bam/bai 
files worked perfectly fine in the IGV viewer. Can anyone tell me what 
the problem is and how to fix it?

Thanks,
Mike Dufault


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/