[galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails

2011-07-20 Thread Robert Curtis Hendrickson
Folks,

I have a column c1 that has entries like GXP_297346(PVALB/human).
I'm trying to use Text Manipulation  Compute to strip off the (...) portion, 
leaving only the accession (which can vary in length).

I have tried a variety of things that work in my python command line, but fail 
here, for example:
c1[1:c1.find(()]
or
c1.split('(')[0]

This gets mangled:
An error occurred running this job: Expression c1__ob__1:c1.find(()__cb__ 
likely invalid.
Or
An error occurred running this job: Expression c1.split(()__ob__0__cb__ 
likely invalid.

Please help. This is driving me crazy.
Searching the list, I find only
http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911 
Inputs sanitization which seems to indicate this is a global mapper that can 
only be disabled with dire security consequences.
And
http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-columns-tt3026255.html#a3048100
 substring sequence on coordinate in columns which doesn't ever answer the 
question about how to get compute to work.

Thanks,
Curtis

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] How to re-use a parameter in a workflow?

2011-07-05 Thread Robert Curtis Hendrickson
Galaxy Users,

I have a workflow where I'd like the user to input a value once, say a number 
of nucleotides. That value would then be used as an input parameter to several 
different tasks, for example, to two instances of Operate on Genomic Intervals 
 Get flanks , where it would be used both for the offset and length of 
flanking regions(s) in one instance, and it's value and it's *negative* would 
be used for the second instance.

Thus, the user inputs 20, and Get_flanks(20,20) and Get_flanks(-20,20) get 
run.

For this workflow, it's important that those parameters all be of the same 
magnitude, or things will get messy later, so I don't want the user having to 
input them separately, or to have to remember which one gets negated...

All suggestions welcome,
Curtis

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Error running cufflinks on Galaxy

2011-06-22 Thread Robert Curtis Hendrickson
Jen, 

We are running into the same problem on our local install of galaxy. 
We're running Cufflinks v.1.0.1, on a BAM file (accepted_reads) from TopHat run 
on mm9 based RNAseq data (paired-end 25mer), and pulled down the changes made 
to galaxy last month to support the 1.0.1 version of Cufflinks. 

We (think) we have mm9 indexes locally installed. We can successfully run 
get_genomic_sequence on mm9 .BED's

Turning off bias correction made no difference.

We also tried rolling back to Cufflinks v0.9.1 (including the Galaxy patch), 
and got the same error

An error occurred running this job: cufflinks v0.9.1
cufflinks -q --no-update-check -I 50 -F 0.000100 -j 0.000100 -p 4 -N
Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf'

An error occurred running this job: cufflinks v1.0.1
cufflinks -q --no-update-check -I 50 -F 0.000100 -j 0.000100 -p 4 -N
Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf'

I can provide a link to the history on our server that you should 
(theoretically) be able to access. 

Regards, 
Curtis



 -Original Message-
 From: galaxy-user-boun...@lists.bx.psu.edu [mailto:galaxy-user-
 boun...@lists.bx.psu.edu] On Behalf Of Jennifer Jackson
 Sent: Friday, June 10, 2011 3:58 PM
 To: David Robinson
 Cc: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] Error running cufflinks on Galaxy
 
 Hello David,
 
 Cufflinks requires locally cached data to perform the Bias Correction
 function.
 
 Without seeing any sample data, a quick guess is that changing the
 option Tool: Cufflinks - Perform Bias Correction: from yes to no in
 that workflow step will probably correct the problem.
 
 Another option is to set the dbkey value in the initial input FASTQ file
 to be a native database (if possible).
 
 Hopefully this helps, but if does not correct the problem, please share
 a history link with data that demonstrates the problem and I can take
 closer look (emailing link to me directly, to maintain data privacy,
 would be fine).
 
 Jen
 Galaxy team
 
 On 6/8/11 12:07 PM, David Robinson wrote:
  Hello,
 
  When I attempt to run cufflinks based on .sam output from bowtie I get
  an error:
 
  An error occurred running this job: /cufflinks v1.0.1
  cufflinks -q --no-update-check -I 30 -F 0.05 -j 0.05 -p 8 -b
  /galaxy/data/hg19/sam_index/hg19.fa
  Error running cufflinks. [Errno 2] No such file or directory:
  'transcripts.gtf'
 
  /What can I do to get around this problem and run cufflinks?
 
  My workflow is on http://main.g2.bx.psu.edu and can be found here (I ran
  it using a .fastq file):
 
  http://main.g2.bx.psu.edu/u/dgrtwo/w/cufflinks-workflow-imported-from-
 uploaded-file
 
  Thanks in advance for your help!
 
  -David
 
 
  
 
  David Robinson
  Graduate Student
  Lewis-Sigler Institute for Integrative Genomics
  Carl Icahn Laboratory
  Princeton University
  646-620-6630
 
  
 
 
 
  ___
  The Galaxy User list should be used for the discussion of
  Galaxy analysis and other features on the public server
  at usegalaxy.org.  Please keep all replies on the list by
  using reply all in your mail client.  For discussion of
  local Galaxy instances and the Galaxy source code, please
  use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
  To manage your subscriptions to this and other Galaxy lists,
  please use the interface at:
 
 http://lists.bx.psu.edu/
 
 --
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] FTP and command line access in Galaxy

2011-06-21 Thread Robert Curtis Hendrickson
Nate,



The Galaxy's ability to pull files with user/password from FTP sites as a 
client is great.



However, I need to pull data from an HTTP site at a sequencing center with 
user/password (already tried to get them to set up an FTP server, no luck). Any 
way to do this?



If not, would it be easy to add?



Regards,

Curtis





 Hi Nate,



 We'd like to set up our local Galaxy to be able to import data (fastq

 sequences) from an ftp site (with a username/password)  into a user's

 account. Does Galaxy support FTP or would we have to write a wrapper

 script to do it via HTTP and use the standard data connection methods as

 described in http://bitbucket.org/galaxy/galaxy-central/wiki/DataSources?



 Hi Steve,



 You can actually put FTP URLs directly in the URL/paste box on the

 upload form.  With a username and password, the format would be:



 ftp://user:p...@example.org/path/to/file.ext



 I haven't tested that the user/pass bit works, but it should.



Yep. It works. Thanks for the tip!

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] can I merge histories?

2011-06-14 Thread Robert Curtis Hendrickson
Folks,

Is there some way I can merge histories?

I ran a workflow on 3 different samples in one history, each time putting them 
in a different history with the same name. However, Galaxy created 3 new 
histories, each with the same name! But I need the data in the same history to 
compare and contrast it.

Thanks,
Curtis

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?

2011-06-07 Thread Robert Curtis Hendrickson
Jen, 

Thanks for all your help. 
Here's the final Galaxy workflow for doing FUZZNUC on a BED file from UCSC 
Table Browser, then producing BED file that you can view in UCSC. 

http://main.g2.bx.psu.edu/u/curtish-uab/w/fuzznucucscbed

I do not include the Get Flank operation in this base workflow, but include a 
note in the description. 
I have not (yet) had time to make the score in the final BED dependent on the 
quality of the match, when mis-matches are allowed, but I hope to come back and 
add that later. 

How does one handle versioning of published workflows? Do updated the existing 
one, or create another with a .v2 name? 

Also, I used several Text Manipulation Compute steps - is there any way to 
compute more than 1 new column at a time? 


Regards, 
Curtis



 -Original Message-
 From: Jennifer Jackson [mailto:j...@bx.psu.edu]
 Sent: Wednesday, May 18, 2011 11:45 AM
 To: Robert Curtis Hendrickson
 Cc: galaxy-user
 Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?
 
 Hello Curtis,
 
 The BED extraction data can be resolved in Galaxy. Pull out the whole
 gene and then modify the coordinates in Galaxy to be 10k upstream.
 
 To be clear - this coordinate data is going to be used to transform the
 coordinates in your current fuzznuc output that is transcript-based to
 be genome-based. The coordinates are not input for fuzznuc - the are
 used after fuzznuc is run on the fasta file, in order to covert the
 result coordinates only.
 
 This page in the UCSC wiki has a good description of how the UCSC
 coordinates are organized.
 http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
 
 The output format for fuzznuc is documented in the tool's help - the
 last line on the tool form has a link.
 
 Hopefully this helps to clear up the suggested processing,
 
 Thanks,
 
 Jen
 Galaxy team
 
 
 
 On 5/17/11 2:08 PM, Robert Curtis Hendrickson wrote:
  Jennifer,
 
  I tried getting data from UCSC as .BED - two issues:
 
  1. Unlike get sequence, I can no longer specify how far upstream I
  want - it's EITHER whole gene (what's the definition of that!!!) OR
  #bp_upstream OR exons OR introns -- with get seq those are not mutually
  exclusive - I happen to want the genomic region (5'UTR, exons, introns
  3'UTR + 10kbp upstream of 5'UTR)
 
  2. fuzznuc does not recognize BED as a valid input format. So, I can't
  run fuzznuc because my BED file doesn't' show up in the pulldown.
 
  Indeed, BED files are just annotation, they don't carry any sequence.
 
  Have I mis-understood your directions?
 
  Regards,
 
  Curtis
 
  -Original Message-
  From: Jennifer Jackson [mailto:j...@bx.psu.edu]
  Sent: Tuesday, May 17, 2011 11:23 AM
  To: Robert Curtis Hendrickson
  Cc: 'galaxy-user@lists.bx.psu.edu'
  Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?
 
  Hello Curtis,
 
  No need to use the fasta headers from your original fasta file.
 
  To obtain the coordinates in BED format: using Get Data - UCSC main
 
  again to link to the UCSC Table browser, set the same selection criteria
 
  as for the original fasta sequence, only change the output type to be
 
  BED (instead of sequence). Once in your Galaxy history, this format
 
  will be easier to work with.
 
  Best,
 
  Jen
 
  Galaxy team
 
  On 5/16/11 9:04 PM, Robert Curtis Hendrickson wrote:
 
Jennifer,
 
   
 
Thanks for the outline. I'll try that approach.
 
   
 
However, it seems rather painful to have to join the fuzznuc output
  back to the original fasta to get at the header information that really
  should have been passed along. It would see that there must be a way to
  get the data out of UCSC without that space in the fasta header, so that
  the chromosome genomic location get correctly preserved in the fuzznuc
  output. Failing that, is there an easy text manipulation that would
  convert that fasta header space to a |?
 
   
 
Regards,
 
Curtis
 
   
 
   
 
-Original Message-
 
From: Jennifer Jackson [mailto:j...@bx.psu.edu]
 
Sent: Monday, May 16, 2011 6:50 PM
 
To: Robert Curtis Hendrickson
 
Cc: 'galaxy-user@lists.bx.psu.edu'
 
Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?
 
   
 
Hello Curtis,
 
   
 
The coordinates of your match are with respect to the fasta sequence,
 
not with respect to the reference genome. Only data mapped to the
 
reference genome can be viewed in the UCSC Browser
 
   
 
You will need to calculate from the position of the match in the fasta
 
sequence back through to the reference genome.
 
   
 
One suggested way to do this:
 
   
 
a) Merge together the original genomic coordinates of the 2kb regions
 
with each line of output from fuzznuc. Use the original source fasta
 
sequence name as the common key for the merge. If both data are in BED
 
format, that would be ideal and make the following steps possible. You
 
may need to split the file based on whether

[galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?

2011-05-13 Thread Robert Curtis Hendrickson
Folks, 

I wanted to scan the 2kb upstream of a list of human gene isoforms for TFBS 
using fuzznuc. I was able to 
Get Data UCSC Main  As sequence and get my sequences
EMBOSS  fuzznuc ran fine, and output the hits

HOWEVER, fuzznuc lost the genomic position information that UCSC has put after 
a space in the sequence headers of the FASTA file. It only provided offsets 
within the fasta. 

http://main.g2.bx.psu.edu/u/curtish-uab/h/ucsc-fuzznuc-ucsc-broken

Thus, when I converted the fuzznuc output back to a BED file and tried to 
visualize the hits in UCSC browser, it failed with invalid BED File. 
I tried fuzznuc with output: seqtable, feattable and gff3, but in all cases the 
genomic position was missing, and being a bit of Galaxy novice, I couldn't 
figure out how to get the output back to UCSC to visualize the hits. 

Can anyone tell me how to link up these tools correctly, or share a history 
with some other tool set that accomplishes this goal? 

Regards, 
Curtis

Research Associate
Center for Clinical and Translational Science
University of Alabama at Birmingham

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/