[galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails
Folks, I have a column c1 that has entries like GXP_297346(PVALB/human). I'm trying to use Text Manipulation Compute to strip off the (...) portion, leaving only the accession (which can vary in length). I have tried a variety of things that work in my python command line, but fail here, for example: c1[1:c1.find(()] or c1.split('(')[0] This gets mangled: An error occurred running this job: Expression c1__ob__1:c1.find(()__cb__ likely invalid. Or An error occurred running this job: Expression c1.split(()__ob__0__cb__ likely invalid. Please help. This is driving me crazy. Searching the list, I find only http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911 Inputs sanitization which seems to indicate this is a global mapper that can only be disabled with dire security consequences. And http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-columns-tt3026255.html#a3048100 substring sequence on coordinate in columns which doesn't ever answer the question about how to get compute to work. Thanks, Curtis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] How to re-use a parameter in a workflow?
Galaxy Users, I have a workflow where I'd like the user to input a value once, say a number of nucleotides. That value would then be used as an input parameter to several different tasks, for example, to two instances of Operate on Genomic Intervals Get flanks , where it would be used both for the offset and length of flanking regions(s) in one instance, and it's value and it's *negative* would be used for the second instance. Thus, the user inputs 20, and Get_flanks(20,20) and Get_flanks(-20,20) get run. For this workflow, it's important that those parameters all be of the same magnitude, or things will get messy later, so I don't want the user having to input them separately, or to have to remember which one gets negated... All suggestions welcome, Curtis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Error running cufflinks on Galaxy
Jen, We are running into the same problem on our local install of galaxy. We're running Cufflinks v.1.0.1, on a BAM file (accepted_reads) from TopHat run on mm9 based RNAseq data (paired-end 25mer), and pulled down the changes made to galaxy last month to support the 1.0.1 version of Cufflinks. We (think) we have mm9 indexes locally installed. We can successfully run get_genomic_sequence on mm9 .BED's Turning off bias correction made no difference. We also tried rolling back to Cufflinks v0.9.1 (including the Galaxy patch), and got the same error An error occurred running this job: cufflinks v0.9.1 cufflinks -q --no-update-check -I 50 -F 0.000100 -j 0.000100 -p 4 -N Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf' An error occurred running this job: cufflinks v1.0.1 cufflinks -q --no-update-check -I 50 -F 0.000100 -j 0.000100 -p 4 -N Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf' I can provide a link to the history on our server that you should (theoretically) be able to access. Regards, Curtis -Original Message- From: galaxy-user-boun...@lists.bx.psu.edu [mailto:galaxy-user- boun...@lists.bx.psu.edu] On Behalf Of Jennifer Jackson Sent: Friday, June 10, 2011 3:58 PM To: David Robinson Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Error running cufflinks on Galaxy Hello David, Cufflinks requires locally cached data to perform the Bias Correction function. Without seeing any sample data, a quick guess is that changing the option Tool: Cufflinks - Perform Bias Correction: from yes to no in that workflow step will probably correct the problem. Another option is to set the dbkey value in the initial input FASTQ file to be a native database (if possible). Hopefully this helps, but if does not correct the problem, please share a history link with data that demonstrates the problem and I can take closer look (emailing link to me directly, to maintain data privacy, would be fine). Jen Galaxy team On 6/8/11 12:07 PM, David Robinson wrote: Hello, When I attempt to run cufflinks based on .sam output from bowtie I get an error: An error occurred running this job: /cufflinks v1.0.1 cufflinks -q --no-update-check -I 30 -F 0.05 -j 0.05 -p 8 -b /galaxy/data/hg19/sam_index/hg19.fa Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf' /What can I do to get around this problem and run cufflinks? My workflow is on http://main.g2.bx.psu.edu and can be found here (I ran it using a .fastq file): http://main.g2.bx.psu.edu/u/dgrtwo/w/cufflinks-workflow-imported-from- uploaded-file Thanks in advance for your help! -David David Robinson Graduate Student Lewis-Sigler Institute for Integrative Genomics Carl Icahn Laboratory Princeton University 646-620-6630 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] FTP and command line access in Galaxy
Nate, The Galaxy's ability to pull files with user/password from FTP sites as a client is great. However, I need to pull data from an HTTP site at a sequencing center with user/password (already tried to get them to set up an FTP server, no luck). Any way to do this? If not, would it be easy to add? Regards, Curtis Hi Nate, We'd like to set up our local Galaxy to be able to import data (fastq sequences) from an ftp site (with a username/password) into a user's account. Does Galaxy support FTP or would we have to write a wrapper script to do it via HTTP and use the standard data connection methods as described in http://bitbucket.org/galaxy/galaxy-central/wiki/DataSources? Hi Steve, You can actually put FTP URLs directly in the URL/paste box on the upload form. With a username and password, the format would be: ftp://user:p...@example.org/path/to/file.ext I haven't tested that the user/pass bit works, but it should. Yep. It works. Thanks for the tip! ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] can I merge histories?
Folks, Is there some way I can merge histories? I ran a workflow on 3 different samples in one history, each time putting them in a different history with the same name. However, Galaxy created 3 new histories, each with the same name! But I need the data in the same history to compare and contrast it. Thanks, Curtis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?
Jen, Thanks for all your help. Here's the final Galaxy workflow for doing FUZZNUC on a BED file from UCSC Table Browser, then producing BED file that you can view in UCSC. http://main.g2.bx.psu.edu/u/curtish-uab/w/fuzznucucscbed I do not include the Get Flank operation in this base workflow, but include a note in the description. I have not (yet) had time to make the score in the final BED dependent on the quality of the match, when mis-matches are allowed, but I hope to come back and add that later. How does one handle versioning of published workflows? Do updated the existing one, or create another with a .v2 name? Also, I used several Text Manipulation Compute steps - is there any way to compute more than 1 new column at a time? Regards, Curtis -Original Message- From: Jennifer Jackson [mailto:j...@bx.psu.edu] Sent: Wednesday, May 18, 2011 11:45 AM To: Robert Curtis Hendrickson Cc: galaxy-user Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow? Hello Curtis, The BED extraction data can be resolved in Galaxy. Pull out the whole gene and then modify the coordinates in Galaxy to be 10k upstream. To be clear - this coordinate data is going to be used to transform the coordinates in your current fuzznuc output that is transcript-based to be genome-based. The coordinates are not input for fuzznuc - the are used after fuzznuc is run on the fasta file, in order to covert the result coordinates only. This page in the UCSC wiki has a good description of how the UCSC coordinates are organized. http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms The output format for fuzznuc is documented in the tool's help - the last line on the tool form has a link. Hopefully this helps to clear up the suggested processing, Thanks, Jen Galaxy team On 5/17/11 2:08 PM, Robert Curtis Hendrickson wrote: Jennifer, I tried getting data from UCSC as .BED - two issues: 1. Unlike get sequence, I can no longer specify how far upstream I want - it's EITHER whole gene (what's the definition of that!!!) OR #bp_upstream OR exons OR introns -- with get seq those are not mutually exclusive - I happen to want the genomic region (5'UTR, exons, introns 3'UTR + 10kbp upstream of 5'UTR) 2. fuzznuc does not recognize BED as a valid input format. So, I can't run fuzznuc because my BED file doesn't' show up in the pulldown. Indeed, BED files are just annotation, they don't carry any sequence. Have I mis-understood your directions? Regards, Curtis -Original Message- From: Jennifer Jackson [mailto:j...@bx.psu.edu] Sent: Tuesday, May 17, 2011 11:23 AM To: Robert Curtis Hendrickson Cc: 'galaxy-user@lists.bx.psu.edu' Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow? Hello Curtis, No need to use the fasta headers from your original fasta file. To obtain the coordinates in BED format: using Get Data - UCSC main again to link to the UCSC Table browser, set the same selection criteria as for the original fasta sequence, only change the output type to be BED (instead of sequence). Once in your Galaxy history, this format will be easier to work with. Best, Jen Galaxy team On 5/16/11 9:04 PM, Robert Curtis Hendrickson wrote: Jennifer, Thanks for the outline. I'll try that approach. However, it seems rather painful to have to join the fuzznuc output back to the original fasta to get at the header information that really should have been passed along. It would see that there must be a way to get the data out of UCSC without that space in the fasta header, so that the chromosome genomic location get correctly preserved in the fuzznuc output. Failing that, is there an easy text manipulation that would convert that fasta header space to a |? Regards, Curtis -Original Message- From: Jennifer Jackson [mailto:j...@bx.psu.edu] Sent: Monday, May 16, 2011 6:50 PM To: Robert Curtis Hendrickson Cc: 'galaxy-user@lists.bx.psu.edu' Subject: Re: [galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow? Hello Curtis, The coordinates of your match are with respect to the fasta sequence, not with respect to the reference genome. Only data mapped to the reference genome can be viewed in the UCSC Browser You will need to calculate from the position of the match in the fasta sequence back through to the reference genome. One suggested way to do this: a) Merge together the original genomic coordinates of the 2kb regions with each line of output from fuzznuc. Use the original source fasta sequence name as the common key for the merge. If both data are in BED format, that would be ideal and make the following steps possible. You may need to split the file based on whether
[galaxy-user] UCSC-EMBOSS/fuzznuc-UCSC workflow?
Folks, I wanted to scan the 2kb upstream of a list of human gene isoforms for TFBS using fuzznuc. I was able to Get Data UCSC Main As sequence and get my sequences EMBOSS fuzznuc ran fine, and output the hits HOWEVER, fuzznuc lost the genomic position information that UCSC has put after a space in the sequence headers of the FASTA file. It only provided offsets within the fasta. http://main.g2.bx.psu.edu/u/curtish-uab/h/ucsc-fuzznuc-ucsc-broken Thus, when I converted the fuzznuc output back to a BED file and tried to visualize the hits in UCSC browser, it failed with invalid BED File. I tried fuzznuc with output: seqtable, feattable and gff3, but in all cases the genomic position was missing, and being a bit of Galaxy novice, I couldn't figure out how to get the output back to UCSC to visualize the hits. Can anyone tell me how to link up these tools correctly, or share a history with some other tool set that accomplishes this goal? Regards, Curtis Research Associate Center for Clinical and Translational Science University of Alabama at Birmingham ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/