Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2012-04-12 Thread Louise-Amélie Schmitt

Hi,

I'm having the same issue, has it been fixed since then?

Thanks,
L-A


Le 07/11/2011 21:43, Nate Coraor a écrit :

On Nov 4, 2011, at 1:11 PM, Carlos Borroto wrote:


Hi,

Reading a little more about this problem, I see Galaxy uses python
tempfile library (http://docs.python.org/library/tempfile.html),
specifically at line 70 in tools/samtools/sam_to_bam.py:
tmp_dir = tempfile.mkdtemp()

mkdtemp should honor TMPDIR, TEMP or TMP environment variables, I
setup all tree of them in "~/.bashrc" with no results. I'm using:
"default_cluster_job_runner = drmaa://-q all.q -V/"

With "-V" I was hoping to be able to export all my environment
variables, which seems to work for everything else, but not for the
TMP.

I ended hardcoding the "dir" argument, which is not good workaround,
as I'm guessing this is not the only tool that will run into this
problem:
tmp_dir = tempfile.mkdtemp( dir='/home/cborroto/galaxy_dist/database/tmp')

Any advice? In a more SGE related question, is there a way for me to
debug what environment I'm getting when running Galaxy jobs?

Hi Carlos,

Try submitting an SGE job from the command line and having a look at the 
environment variables set on the execution host.  Most likely, SGE is setting 
its own TMPDIR variable which would be overriding the value set with -V.

--nate


Thanks,
Carlos

On Fri, Nov 4, 2011 at 11:22 AM, Carlos Borroto
  wrote:

Hi Jen,

Thanks for the quick response. The workaround you describe could work,
but I might run into trouble later on.

My interest is to develop a workflow for GATK, which have very strict
requirements on the input BAM file. One of which is that the sorting
have to be exactly the same as the reference. My reference is not
sorted lexicographically "chr1, chr10, chr11, ", but instead is
sorted karyotypically "chr1, chr2, ...". I don't think I'll be able to
do this with "Filter and Sort ->  Sort". Also GATK needs the header for
the @RG tags, which I could resolve by just reintroducing the header
later on, but still it will be cumbersome.

I'll work on my galaxy/cluster configuration and see if I can find why
the SAM-to-BAM tool is failing.

Thanks again,
Carlos

On Thu, Nov 3, 2011 at 6:35 PM, Jennifer Jackson  wrote:

Hello Carlos,

If what you want is a sorted SAM file, then the tool "Filter and Sort ->
Sort" may be a better choice. A SAM file is a tabular file.

If there is header data at the beginning of the SAM file, it can be removed
before running Sort with the tool "Filter and Sort ->  Select" (with a "not"
matching regex). Although, you can choose to not include header output as a
BWA option.

Perhaps this will solve the immediate problem?

Best,

Jen
Galaxy team

On 11/3/11 12:43 PM, Carlos Borroto wrote:

Hi,

I'm running into this error:
"Error sorting alignments from
(/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),"

When using SAM-to-BAM tool on a locally install Galaxy using a SGE
cluster. I'm using the last version of galaxy-dist. I'm guessing I
have a problem with the configuration for the tmp folder. I have this
on "universe_wsgi.ini":
# Temporary files are stored in this directory.
new_file_path = /home/cborroto/galaxy_dist/database/tmp

But I don't see this directory being used and from the error looks
like /tmp in the node is used. I wonder if this is the problem, as I
don't know if there is enough space in the local /tmp directory at the
nodes? I ran the same tool in a subset of the same SAM file and it ran
fine.

Also, I see this in the description of the tool:
"This tool uses the SAMTools toolkit to produce an indexed BAM file
based on a sorted input SAM file."

But what I actually need is to sort a SAM file output from bwa, I
haven't found any other way than to converting it to BAM. Looking at
"sam_to_bam.py" I see the BAM file will also be sorted. Would it be
wrong to feed an unsorted SAM file into this tool?

Finally, just to be sure there is nothing wrong with the initial SAM
file, I ran "samtools view ..." and "samtools sort ..." on this file
manually outside of Galaxy and it ran fine.

Thanks in advance,
Carlos
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


___

Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-07 Thread Nate Coraor
On Nov 4, 2011, at 1:11 PM, Carlos Borroto wrote:

> Hi,
> 
> Reading a little more about this problem, I see Galaxy uses python
> tempfile library (http://docs.python.org/library/tempfile.html),
> specifically at line 70 in tools/samtools/sam_to_bam.py:
> tmp_dir = tempfile.mkdtemp()
> 
> mkdtemp should honor TMPDIR, TEMP or TMP environment variables, I
> setup all tree of them in "~/.bashrc" with no results. I'm using:
> "default_cluster_job_runner = drmaa://-q all.q -V/"
> 
> With "-V" I was hoping to be able to export all my environment
> variables, which seems to work for everything else, but not for the
> TMP.
> 
> I ended hardcoding the "dir" argument, which is not good workaround,
> as I'm guessing this is not the only tool that will run into this
> problem:
> tmp_dir = tempfile.mkdtemp( dir='/home/cborroto/galaxy_dist/database/tmp')
> 
> Any advice? In a more SGE related question, is there a way for me to
> debug what environment I'm getting when running Galaxy jobs?

Hi Carlos,

Try submitting an SGE job from the command line and having a look at the 
environment variables set on the execution host.  Most likely, SGE is setting 
its own TMPDIR variable which would be overriding the value set with -V.

--nate

> 
> Thanks,
> Carlos
> 
> On Fri, Nov 4, 2011 at 11:22 AM, Carlos Borroto
>  wrote:
>> Hi Jen,
>> 
>> Thanks for the quick response. The workaround you describe could work,
>> but I might run into trouble later on.
>> 
>> My interest is to develop a workflow for GATK, which have very strict
>> requirements on the input BAM file. One of which is that the sorting
>> have to be exactly the same as the reference. My reference is not
>> sorted lexicographically "chr1, chr10, chr11, ", but instead is
>> sorted karyotypically "chr1, chr2, ...". I don't think I'll be able to
>> do this with "Filter and Sort -> Sort". Also GATK needs the header for
>> the @RG tags, which I could resolve by just reintroducing the header
>> later on, but still it will be cumbersome.
>> 
>> I'll work on my galaxy/cluster configuration and see if I can find why
>> the SAM-to-BAM tool is failing.
>> 
>> Thanks again,
>> Carlos
>> 
>> On Thu, Nov 3, 2011 at 6:35 PM, Jennifer Jackson  wrote:
>>> 
>>> Hello Carlos,
>>> 
>>> If what you want is a sorted SAM file, then the tool "Filter and Sort ->
>>> Sort" may be a better choice. A SAM file is a tabular file.
>>> 
>>> If there is header data at the beginning of the SAM file, it can be removed
>>> before running Sort with the tool "Filter and Sort -> Select" (with a "not"
>>> matching regex). Although, you can choose to not include header output as a
>>> BWA option.
>>> 
>>> Perhaps this will solve the immediate problem?
>>> 
>>> Best,
>>> 
>>> Jen
>>> Galaxy team
>>> 
>>> On 11/3/11 12:43 PM, Carlos Borroto wrote:
 
 Hi,
 
 I'm running into this error:
 "Error sorting alignments from
 (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),"
 
 When using SAM-to-BAM tool on a locally install Galaxy using a SGE
 cluster. I'm using the last version of galaxy-dist. I'm guessing I
 have a problem with the configuration for the tmp folder. I have this
 on "universe_wsgi.ini":
 # Temporary files are stored in this directory.
 new_file_path = /home/cborroto/galaxy_dist/database/tmp
 
 But I don't see this directory being used and from the error looks
 like /tmp in the node is used. I wonder if this is the problem, as I
 don't know if there is enough space in the local /tmp directory at the
 nodes? I ran the same tool in a subset of the same SAM file and it ran
 fine.
 
 Also, I see this in the description of the tool:
 "This tool uses the SAMTools toolkit to produce an indexed BAM file
 based on a sorted input SAM file."
 
 But what I actually need is to sort a SAM file output from bwa, I
 haven't found any other way than to converting it to BAM. Looking at
 "sam_to_bam.py" I see the BAM file will also be sorted. Would it be
 wrong to feed an unsorted SAM file into this tool?
 
 Finally, just to be sure there is nothing wrong with the initial SAM
 file, I ran "samtools view ..." and "samtools sort ..." on this file
 manually outside of Galaxy and it ran fine.
 
 Thanks in advance,
 Carlos
 ___
 Please keep all replies on the list by using "reply all"
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/
>>> 
>>> --
>>> Jennifer Jackson
>>> http://usegalaxy.org
>>> http://galaxyproject.org/wiki/Support
>>> 
>> 
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/
> 


_

Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-04 Thread Carlos Borroto
Hi,

Reading a little more about this problem, I see Galaxy uses python
tempfile library (http://docs.python.org/library/tempfile.html),
specifically at line 70 in tools/samtools/sam_to_bam.py:
tmp_dir = tempfile.mkdtemp()

mkdtemp should honor TMPDIR, TEMP or TMP environment variables, I
setup all tree of them in "~/.bashrc" with no results. I'm using:
"default_cluster_job_runner = drmaa://-q all.q -V/"

With "-V" I was hoping to be able to export all my environment
variables, which seems to work for everything else, but not for the
TMP.

I ended hardcoding the "dir" argument, which is not good workaround,
as I'm guessing this is not the only tool that will run into this
problem:
tmp_dir = tempfile.mkdtemp( dir='/home/cborroto/galaxy_dist/database/tmp')

Any advice? In a more SGE related question, is there a way for me to
debug what environment I'm getting when running Galaxy jobs?

Thanks,
Carlos

On Fri, Nov 4, 2011 at 11:22 AM, Carlos Borroto
 wrote:
> Hi Jen,
>
> Thanks for the quick response. The workaround you describe could work,
> but I might run into trouble later on.
>
> My interest is to develop a workflow for GATK, which have very strict
> requirements on the input BAM file. One of which is that the sorting
> have to be exactly the same as the reference. My reference is not
> sorted lexicographically "chr1, chr10, chr11, ", but instead is
> sorted karyotypically "chr1, chr2, ...". I don't think I'll be able to
> do this with "Filter and Sort -> Sort". Also GATK needs the header for
> the @RG tags, which I could resolve by just reintroducing the header
> later on, but still it will be cumbersome.
>
> I'll work on my galaxy/cluster configuration and see if I can find why
> the SAM-to-BAM tool is failing.
>
> Thanks again,
> Carlos
>
> On Thu, Nov 3, 2011 at 6:35 PM, Jennifer Jackson  wrote:
>>
>> Hello Carlos,
>>
>> If what you want is a sorted SAM file, then the tool "Filter and Sort ->
>> Sort" may be a better choice. A SAM file is a tabular file.
>>
>> If there is header data at the beginning of the SAM file, it can be removed
>> before running Sort with the tool "Filter and Sort -> Select" (with a "not"
>> matching regex). Although, you can choose to not include header output as a
>> BWA option.
>>
>> Perhaps this will solve the immediate problem?
>>
>> Best,
>>
>> Jen
>> Galaxy team
>>
>> On 11/3/11 12:43 PM, Carlos Borroto wrote:
>>>
>>> Hi,
>>>
>>> I'm running into this error:
>>> "Error sorting alignments from
>>> (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),"
>>>
>>> When using SAM-to-BAM tool on a locally install Galaxy using a SGE
>>> cluster. I'm using the last version of galaxy-dist. I'm guessing I
>>> have a problem with the configuration for the tmp folder. I have this
>>> on "universe_wsgi.ini":
>>> # Temporary files are stored in this directory.
>>> new_file_path = /home/cborroto/galaxy_dist/database/tmp
>>>
>>> But I don't see this directory being used and from the error looks
>>> like /tmp in the node is used. I wonder if this is the problem, as I
>>> don't know if there is enough space in the local /tmp directory at the
>>> nodes? I ran the same tool in a subset of the same SAM file and it ran
>>> fine.
>>>
>>> Also, I see this in the description of the tool:
>>> "This tool uses the SAMTools toolkit to produce an indexed BAM file
>>> based on a sorted input SAM file."
>>>
>>> But what I actually need is to sort a SAM file output from bwa, I
>>> haven't found any other way than to converting it to BAM. Looking at
>>> "sam_to_bam.py" I see the BAM file will also be sorted. Would it be
>>> wrong to feed an unsorted SAM file into this tool?
>>>
>>> Finally, just to be sure there is nothing wrong with the initial SAM
>>> file, I ran "samtools view ..." and "samtools sort ..." on this file
>>> manually outside of Galaxy and it ran fine.
>>>
>>> Thanks in advance,
>>> Carlos
>>> ___
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>
>>>   http://lists.bx.psu.edu/
>>
>> --
>> Jennifer Jackson
>> http://usegalaxy.org
>> http://galaxyproject.org/wiki/Support
>>
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-04 Thread Carlos Borroto
Hi Jen,

Thanks for the quick response. The workaround you describe could work,
but I might run into trouble later on.

My interest is to develop a workflow for GATK, which have very strict
requirements on the input BAM file. One of which is that the sorting
have to be exactly the same as the reference. My reference is not
sorted lexicographically "chr1, chr10, chr11, ", but instead is
sorted karyotypically "chr1, chr2, ...". I don't think I'll be able to
do this with "Filter and Sort -> Sort". Also GATK needs the header for
the @RG tags, which I could resolve by just reintroducing the header
later on, but still it will be cumbersome.

I'll work on my galaxy/cluster configuration and see if I can find why
the SAM-to-BAM tool is failing.

Thanks again,
Carlos

On Thu, Nov 3, 2011 at 6:35 PM, Jennifer Jackson  wrote:
>
> Hello Carlos,
>
> If what you want is a sorted SAM file, then the tool "Filter and Sort ->
> Sort" may be a better choice. A SAM file is a tabular file.
>
> If there is header data at the beginning of the SAM file, it can be removed
> before running Sort with the tool "Filter and Sort -> Select" (with a "not"
> matching regex). Although, you can choose to not include header output as a
> BWA option.
>
> Perhaps this will solve the immediate problem?
>
> Best,
>
> Jen
> Galaxy team
>
> On 11/3/11 12:43 PM, Carlos Borroto wrote:
>>
>> Hi,
>>
>> I'm running into this error:
>> "Error sorting alignments from
>> (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),"
>>
>> When using SAM-to-BAM tool on a locally install Galaxy using a SGE
>> cluster. I'm using the last version of galaxy-dist. I'm guessing I
>> have a problem with the configuration for the tmp folder. I have this
>> on "universe_wsgi.ini":
>> # Temporary files are stored in this directory.
>> new_file_path = /home/cborroto/galaxy_dist/database/tmp
>>
>> But I don't see this directory being used and from the error looks
>> like /tmp in the node is used. I wonder if this is the problem, as I
>> don't know if there is enough space in the local /tmp directory at the
>> nodes? I ran the same tool in a subset of the same SAM file and it ran
>> fine.
>>
>> Also, I see this in the description of the tool:
>> "This tool uses the SAMTools toolkit to produce an indexed BAM file
>> based on a sorted input SAM file."
>>
>> But what I actually need is to sort a SAM file output from bwa, I
>> haven't found any other way than to converting it to BAM. Looking at
>> "sam_to_bam.py" I see the BAM file will also be sorted. Would it be
>> wrong to feed an unsorted SAM file into this tool?
>>
>> Finally, just to be sure there is nothing wrong with the initial SAM
>> file, I ran "samtools view ..." and "samtools sort ..." on this file
>> manually outside of Galaxy and it ran fine.
>>
>> Thanks in advance,
>> Carlos
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>   http://lists.bx.psu.edu/
>
> --
> Jennifer Jackson
> http://usegalaxy.org
> http://galaxyproject.org/wiki/Support
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-03 Thread Jennifer Jackson

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>http://lists.bx.psu.edu/

Hello Carlos,

If what you want is a sorted SAM file, then the tool "Filter and Sort -> 
Sort" may be a better choice. A SAM file is a tabular file.


If there is header data at the beginning of the SAM file, it can be 
removed before running Sort with the tool "Filter and Sort -> Select" 
(with a "not" matching regex). Although, you can choose to not include 
header output as a BWA option.


Perhaps this will solve the immediate problem?

Best,

Jen
Galaxy team

On 11/3/11 12:43 PM, Carlos Borroto wrote:

Hi,

I'm running into this error:
"Error sorting alignments from (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),"

When using SAM-to-BAM tool on a locally install Galaxy using a SGE
cluster. I'm using the last version of galaxy-dist. I'm guessing I
have a problem with the configuration for the tmp folder. I have this
on "universe_wsgi.ini":
# Temporary files are stored in this directory.
new_file_path = /home/cborroto/galaxy_dist/database/tmp

But I don't see this directory being used and from the error looks
like /tmp in the node is used. I wonder if this is the problem, as I
don't know if there is enough space in the local /tmp directory at the
nodes? I ran the same tool in a subset of the same SAM file and it ran
fine.

Also, I see this in the description of the tool:
"This tool uses the SAMTools toolkit to produce an indexed BAM file
based on a sorted input SAM file."

But what I actually need is to sort a SAM file output from bwa, I
haven't found any other way than to converting it to BAM. Looking at
"sam_to_bam.py" I see the BAM file will also be sorted. Would it be
wrong to feed an unsorted SAM file into this tool?

Finally, just to be sure there is nothing wrong with the initial SAM
file, I ran "samtools view ..." and "samtools sort ..." on this file
manually outside of Galaxy and it ran fine.

Thanks in advance,
Carlos
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-03 Thread Carlos Borroto
Hi,

I'm running into this error:
"Error sorting alignments from (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_), "

When using SAM-to-BAM tool on a locally install Galaxy using a SGE
cluster. I'm using the last version of galaxy-dist. I'm guessing I
have a problem with the configuration for the tmp folder. I have this
on "universe_wsgi.ini":
# Temporary files are stored in this directory.
new_file_path = /home/cborroto/galaxy_dist/database/tmp

But I don't see this directory being used and from the error looks
like /tmp in the node is used. I wonder if this is the problem, as I
don't know if there is enough space in the local /tmp directory at the
nodes? I ran the same tool in a subset of the same SAM file and it ran
fine.

Also, I see this in the description of the tool:
"This tool uses the SAMTools toolkit to produce an indexed BAM file
based on a sorted input SAM file."

But what I actually need is to sort a SAM file output from bwa, I
haven't found any other way than to converting it to BAM. Looking at
"sam_to_bam.py" I see the BAM file will also be sorted. Would it be
wrong to feed an unsorted SAM file into this tool?

Finally, just to be sure there is nothing wrong with the initial SAM
file, I ran "samtools view ..." and "samtools sort ..." on this file
manually outside of Galaxy and it ran fine.

Thanks in advance,
Carlos
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/