Re: [galaxy-dev] Automatically removing items from history

2011-09-08 Thread Dannon Baker
I haven't had a chance to do anything on this yet, but I'll see if I can work 
something out in the near future.

-Dannon

On Sep 7, 2011, at 9:34 PM, Glen Beane wrote:

 
 On Sep 7, 2011, at 8:10 PM, Edward Kirton wrote:
 
 i'm resurrecting this thread to see if there's any more support for the idea 
 of deleting intermediate files in a workflow.  i think this is an important 
 feature to have.  oftentimes a workflow creates many intermediate files no 
 one will ever look at.  and leaving it up to the user to cleanup their data 
 files is asking too much.  there's another ticket regarding allow users to 
 still be able to preview the metadata of deleted workflow history items and 
 together these would go together nicely.
 
 
 I am _very_ interested in this feature
 
 --
 Glen L. Beane
 Senior Software Engineer
 The Jackson Laboratory
 (207) 288-6153
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] trannsfer files from remote system to galaxy

2011-09-08 Thread shashi shekhar
Hi,

 In my local instance of Galaxy ,I  want to  add one option in which i
can  get files  from remote system to galaxy  in data library .
except url is there any option to get remote files from galaxy ?

 Regards
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tophat non Sanger input

2011-09-08 Thread Anton Nekrutenko
Dear Stephen (and others):

The sole reason for requiring fastq-sanger input to all of our wrappers was to 
force the users to run their data through the groomer. It is slow, but it 
checks data consistency in a way that is more robust than just checking 'four 
lines per fastq block' and prevents a lot of problems downstream. Here on 
Galaxy @ Penn State we see a lot of fastq files edited in MS Word and other 
similar horrors, which are being caught by groomer and prevent users from 
running into problems later on (and so cutting down on the support overhead - 
investigating why groomer has failed is a lot easier than researching why a 
particular set of polymorphisms derived from a Word-edited fastq file clusters 
Ukrainians with parasitic worms). In addition, even though Illumina did switch 
to Sanger encoding, there is still a lot of old data out there. However, we are 
open to suggestions ... What we are thinking of lately is switching to 
unaligned BAM for everyting. One of the benefits here is the ability to add 
readgroups from day 1 simplifying multisample analyses down the road.

a.


Anton Nekrutenko
http://galaxyproject.org




On Sep 8, 2011, at 10:14 AM, Stephen Taylor wrote:

 On 08/09/2011 14:17, Hans-Rudolf Hotz wrote:
 
 
 On 09/08/2011 09:47 AM, Stephen Taylor wrote:
 On 07/09/2011 20:22, Edward Kirton wrote:
 seems unnecessary since illumina switched over to fastqsanger now.
 
 http://www.illumina.com/truseq/quality_101/quality_scores.ilmn
 
 Eventually...unfortunately we still get a lot of fastqillumina :-(
 
 
 I might miss your point.but why can't you use the fastq groomer tool?
 
 
 - Duplication of data (disk space usage)
 - Groomer is slow and puts more demands on CPU usage where it can be done 
 easily on the fly by tophat
 - Consistency (bowtie does it)
 
 From the responses (or lack of :-)) we've been spurred on to change the 
 wrapper. If there is interest we will commit it to the code base when done.
 
 Cheers,
 
 Steve
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Server down?

2011-09-08 Thread Crystal Goh

Hi, I am user of Galaxy Test.
 
I wonder is the Galaxy Test server down these 2 days? As the Tophat and 
Cufflinks job running take me a lot of time to run compared to previous?

Thanks.
 
Best regards,
Crystal   ___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] cuffcompare wrapper

2011-09-08 Thread Chorny, Ilya
It's not an issue in the tmp dir but in the job_working_directory. I run most 
of those other tools with no problems. I don't think we should make it a 
requirement across the board and I think we can come up with alternative ways 
to clean up the job_working_directory. I am hoping that you could add the 
symlink to the cuffcompare wrapper as it is the only one where the symlink 
causes me a problem as far as I have tested. We don't want to have our code 
base differ to much from galaxy-central.

Thanks,

Ilya


From: Jeremy Goecks [mailto:jeremy.goe...@emory.edu]
Sent: Wednesday, September 07, 2011 6:26 PM
To: Chorny, Ilya
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] cuffcompare wrapper

Ilya,

A search of the Galaxy codebase indicates that thirteen tools use symlinks 
(e.g. GATK, Sicer, Picard, Cuff*, Bowtie), so the changes required to support 
this new code are significant. (Changes would also likely be needed for tools 
in the tool shed.) Also, asking tool wrappers to delete symlinks would be an 
idiosyncratic requirement as tools assume they have a temporary working 
directory at their disposal.

For these reasons, it seems best to have the tool framework clean up symlinks 
as necessary to support the new code.

Best,
J.

On Sep 7, 2011, at 2:28 PM, Chorny, Ilya wrote:


Ok, I figured out why you need the symlink.

Can you add an unlink after the process completes?
i.e

for i, arg in enumerate( args ):
input_file_name = ./input%i % ( i+1 )
os.unlink(input_file_name)

From: 
galaxy-dev-boun...@lists.bx.psu.edumailto:galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu]mailto:[mailto:galaxy-dev-boun...@lists.bx.psu.edu]
 On Behalf Of Chorny, Ilya
Sent: Wednesday, September 07, 2011 9:18 AM
To: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] cuffcompare wrapper

Hi Jeremy,

The symlink in the cuffcompare wrapper was causing galaxy to crash because I 
run as the actual user and have to chmod the job_working directory at the end 
so Galaxy can clean it up. Turns out is seems like the symlink is not needed. 
Am I missing something. See below.

Your code:
for i, arg in enumerate( args ):
input_file_name = ./input%i % ( i+1 )
os.symlink( arg, input_file_name )
cmd += %s  % input_file_name

My code:
for i, arg in enumerate( args ):
cmd += arg



Ilya Chorny Ph.D.
Bioinformatics Scientist I
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Work: 858.202.4582
Email: icho...@illumina.commailto:icho...@illumina.com
Website: www.illumina.comhttp://www.illumina.com/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] macs in galaxy

2011-09-08 Thread Chorny, Ilya
Is there a python script associated with the macs.xml file?

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Kanwei Li
Sent: Wednesday, August 24, 2011 5:41 PM
To: KOH Jia Yu Jayce
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] macs in galaxy

You can use the MACS wrappers here (for 1.4): 
https://bitbucket.org/cistrome/cistrome-harvard/src/779d208c2cbd/tools/peakcalling/

Until we officially add it to our distribution.

Thanks,

K
On Wed, Aug 24, 2011 at 8:32 PM, KOH Jia Yu Jayce 
ko...@gis.a-star.edu.sgmailto:ko...@gis.a-star.edu.sg wrote:
Yes. Thank you for your help ☺

-Original Message-
From: Kanwei Li [mailto:kan...@gmail.commailto:kan...@gmail.com]
Sent: Wednesday, August 24, 2011 11:28 PM
To: KOH Jia Yu Jayce
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] macs in galaxy

Hi Jayce,

Are you running this on your local instance? It seems you are running MACS 1.4, 
which our wrapper does not support yet, but we are planning to add a wrapper 
for 1.4 soon.

Thanks,

K
On Wed, Aug 24, 2011 at 4:20 AM, KOH Jia Yu Jayce 
ko...@gis.a-star.edu.sgmailto:ko...@gis.a-star.edu.sg wrote:
In running macs in galaxy, the following error was found

ERROR:root:mfold format error! Your input is '32'. It should be like '10,30'

A format for mfold like 10,30 is expected… but the default value configured in 
xml remains as 32. will there be an updated version of this xml in future?

Also after altering the default display mfold value to 10,30, type integer in 
the param tag for mfold become erroneous. May I ask what is the correct type 
for input format 10,30?

Thanks alot



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] disk space and file formats

2011-09-08 Thread Fields, Christopher J
The use of (unaligned) BAM for readgroups seems like a good idea.  At the very 
least it prevents inconsistently hacking this information into the FASTQ 
descriptor (a common problem with any simple format).

chris

On Sep 8, 2011, at 1:35 PM, Edward Kirton wrote:

 copied from another thread:
 
 On Thu, Sep 8, 2011 at 7:30 AM, Anton Nekrutenko an...@bx.psu.edu wrote:
 What we are thinking of lately is switching to unaligned BAM for everyting. 
 One of the benefits here is the ability to add readgroups from day 1 
 simplifying multisample analyses down the road.
 
 this seems to be the simplest solution; i like it a lot.  really, only the 
 reads need to be compressed, most other outfiles are tiny by comparison, so a 
 more general solution may be overkill.  and if compression of everything is 
 desired, zfs works well -- another of our sites (LANL) uses this and 
 recommended it to me too.  i just haven't been able to convince my own IT 
 people to go this route for technical reason beyond my attention span.
 
 On Tue, Sep 6, 2011 at 9:05 AM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor n...@bx.psu.edu wrote:
  Peter Cock wrote:
  On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor n...@bx.psu.edu wrote:
   Ideally, there'd just be a column on the dataset table indicating
   whether the dataset is compressed or not, and then tools get a new
   way to indicate whether they can directly read compressed inputs, or
   whether the input needs to be decompressed first.
  
   --nate
 
  Yes, that's what I was envisioning Nate.
 
  Are there any schemes other than gzip which would make sense?
  Perhaps rather than a boolean column (compressed or not), it
  should specify the kind of compression if any (e.g. gzip).
 
  Makes sense.
 
  We need something which balances compression efficiency (size)
  with decompression speed, while also being widely supported in
  libraries for maximum tool uptake.
 
  Yes, and there's a side effect of allowing this: you may decrease
  efficiency if the tools used downstream all require decompression,
  and you waste a bunch of time decompressing the dataset multiple
  times.
 
 While decompression wastes CPU time and makes things slower,
 there is less data IO from disk (which may be network mounted)
 which makes things faster. So overall, depending on the setup
 and the task at hand, it could be faster.
 
 Is it time to file an issue on bitbucket to track this potential
 enhancement?
 
 Peter
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] galaxy-dev Digest, Vol 63, Issue 8

2011-09-08 Thread Andrew Warren
I would also like to voice my support for this feature. I wrote a wrapper
for bowtie that converts the SAM output to BAM after bowtie is finished just
to avoid the hassle of letting galaxy know that the SAM file existed
(didn't want to run Tophat).
After thinking about how I would go about deleting an existing output it
occurred to me that a deleting tool  would require some extra logic since
you would probably want to prevent the output port on a workflow node/tool
from being connected to the input of another node if the output is going to
be deleted.
I was wondering if it might make sense to modify the flagged output
feature (the asterisk) of the galaxy tools nodes to delete the non-flagged
outputs instead of just hiding them? Or perhaps just mark them as deleted so
they will be taken care of by the cleanup scripts?
 In this same line of thinking, it might make sense to have a flag for the
input ports that specify that the input will be consumed/deleted after the
tool has successfully run. This would address the case where you wanted to
use the output of a tool before it is removed.

Cheers,
Andrew


I haven't had a chance to do anything on this yet, but I'll see if I can
work something out in the near future.

-Dannon

On Sep 7, 2011, at 9:34 PM, Glen Beane wrote:


 On Sep 7, 2011, at 8:10 PM, Edward Kirton wrote:

 i'm resurrecting this thread to see if there's any more support for the
idea of deleting intermediate files in a workflow.  i think this is an
important feature to have.  oftentimes a workflow creates many intermediate
files no one will ever look at.  and leaving it up to the user to cleanup
their data files is asking too much.  there's another ticket regarding allow
users to still be able to preview the metadata of deleted workflow history
items and together these would go together nicely.


 I am _very_ interested in this feature

 --
 Glen L. Beane
 Senior Software Engineer
 The Jackson Laboratory
 (207) 288-6153


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] cuffcompare wrapper

2011-09-08 Thread Jeremy Goecks
I'm confused. Why would the symlink cause problems for Cuffcompare but not for 
other tools that use symlinks (including Cufflinks and Cuffdiff)?

J.

On Sep 8, 2011, at 1:43 PM, Chorny, Ilya wrote:

 It’s not an issue in the tmp dir but in the job_working_directory. I run most 
 of those other tools with no problems. I don’t think we should make it a 
 requirement across the board and I think we can come up with alternative ways 
 to clean up the job_working_directory. I am hoping that you could add the 
 symlink to the cuffcompare wrapper as it is the only one where the symlink 
 causes me a problem as far as I have tested. We don’t want to have our code 
 base differ to much from galaxy-central.
  
 Thanks,
  
 Ilya
  
  
 From: Jeremy Goecks [mailto:jeremy.goe...@emory.edu] 
 Sent: Wednesday, September 07, 2011 6:26 PM
 To: Chorny, Ilya
 Cc: galaxy-dev@lists.bx.psu.edu
 Subject: Re: [galaxy-dev] cuffcompare wrapper
  
 Ilya,
  
 A search of the Galaxy codebase indicates that thirteen tools use symlinks 
 (e.g. GATK, Sicer, Picard, Cuff*, Bowtie), so the changes required to support 
 this new code are significant. (Changes would also likely be needed for tools 
 in the tool shed.) Also, asking tool wrappers to delete symlinks would be an 
 idiosyncratic requirement as tools assume they have a temporary working 
 directory at their disposal.
  
 For these reasons, it seems best to have the tool framework clean up symlinks 
 as necessary to support the new code.
  
 Best,
 J.
  
 On Sep 7, 2011, at 2:28 PM, Chorny, Ilya wrote:
 
 
 Ok, I figured out why you need the symlink.
  
 Can you add an unlink after the process completes?
 i.e
  
 for i, arg in enumerate( args ):
 input_file_name = ./input%i % ( i+1 )
 os.unlink(input_file_name)
  
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Chorny, Ilya
 Sent: Wednesday, September 07, 2011 9:18 AM
 To: galaxy-dev@lists.bx.psu.edu
 Subject: [galaxy-dev] cuffcompare wrapper
  
 Hi Jeremy,
  
 The symlink in the cuffcompare wrapper was causing galaxy to crash because I 
 run as the actual user and have to chmod the job_working directory at the end 
 so Galaxy can clean it up. Turns out is seems like the symlink is not needed. 
 Am I missing something. See below.
  
 Your code:
 for i, arg in enumerate( args ):
 input_file_name = ./input%i % ( i+1 )
 os.symlink( arg, input_file_name )
 cmd += %s  % input_file_name
  
 My code:
 for i, arg in enumerate( args ):
 cmd += arg
  

  
 Ilya Chorny Ph.D.
 Bioinformatics Scientist I
 Illumina, Inc.
 9885 Towne Centre Drive
 San Diego, CA 92121
 Work: 858.202.4582
 Email: icho...@illumina.com
 Website: www.illumina.com
  
  
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
  

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/