Re: [galaxy-user] Text Manipulation
Hi Lee, There are a few options .. my guess is you are working with files like snpXXXCodingDbSnp? 1 - Using an expression such as this one will extract individual characters from the column, including the commas (treating the column's data like a string) c5.pop(1) = extracts the second character from the column 2 - Or, if you want to work with the entire codons, you could use the tool Convert delimiters to TAB to fully expand the data. You might want to use other Text Manipulation tools such as Cut with this option to get access to specific columns. Or Condense consecutive characters if the extra trailing commas in some of these datasets create empty columns (after converting to tabs). Hopefully this helps! If there is more feedback from our developers, we'll post more. Others on the list are also welcome to add in comments. Best, Jen Galaxy team On 8/27/12 9:44 AM, Jennifer Jackson wrote: Post to mailing list On 8/26/12 11:38 AM, leemsil...@genepeeks.com wrote: Hi Jen, The ability to use Python commands with the Compute tool seems to be a very well-hidden gem in Galaxy. The problem with this nearly unmentioned gem is the sheer frustration felt when something that works on the Python command line fails on the Galaxy Compute tool. What I would like to do is manipulate a column from a UCSC download that lists two or three codons separated by a comma, e.g.AAC,GCG, or GGT, CAC, TAT, . The string-based split command c5.split(,).pop(1) fails on this data because Galaxy assigned it a list type automatically. All of my attempts to change the data type to string have failed. Any suggestions? Lee leemsil...@genepeeks.com -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails
Hi Curtis, To accomplish things like your split command I have had success with Compute using c1.split('(').pop(0) HTH - Russell Message: 1 Date: Mon, 22 Aug 2011 09:24:36 -0700 From: Jennifer Jackson j...@bx.psu.edu To: Robert Curtis Hendrickson curt...@uab.edu Cc: galaxy-user@lists.bx.psu.edu galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails Message-ID: 4e5282c4.3090...@bx.psu.edu Content-Type: text/plain; charset=windows-1252; format=flowed Hello Curtis, There is some more feedback from our developers. In your own instance, this is the recommended change: Edit - tools/stats/column_maker.py To add - __ob__ and __cb__ to the mapped_str dict. Perhaps this will help if you still needed work-around (or others reading this thread). Best! Jen Galaxy team On 7/20/11 1:13 PM, Robert Curtis Hendrickson wrote: Folks, I have a column c1 that has entries like ?GXP_297346(PVALB/human)?. I?m trying to use Text Manipulation Compute to strip off the ?(?)? portion, leaving only the accession (which can vary in length). I have tried a variety of things that work in my python command line, but fail here, for example: c1[1:c1.find(()] or c1.split('(')[0] This gets mangled: An error occurred running this job: Expression c1__ob__1:c1.find(()__cb__ likely invalid. Or An error occurred running this job: Expression c1.split(()__ob__0__cb__ likely invalid. Please help. This is driving me crazy. Searching the list, I find only http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a266 4911 ?Inputs sanitization? which seems to indicate this is a global mapper that can only be disabled with dire security consequences. And http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-col umns-tt3026255.html#a3048100 ?substring sequence on coordinate in columns? which doesn?t ever answer the question about how to get compute to work. Thanks, Curtis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails
Hello Robert, This tool does sanitize many of the characters required to build this type of regular expression. Some changes to a few of the Text Manipulation tools have been discussed, but nothing is planned for the near term. For now, the most expedient solution is for you to download the file, edit with a text editor (line command or desktop), then reload. An updated wrapper that successfully adds parenthesis to the to list of chars changed to tabs with Convert might be the simplest change. If you make one, please consider adding it to the Tool Shed and send us an email to galaxy-...@bx.psu.edu to let the development community know about it. Best wishes, Jen Galaxy team On 7/20/11 1:13 PM, Robert Curtis Hendrickson wrote: Folks, I have a column c1 that has entries like “GXP_297346(PVALB/human)”. I’m trying to use Text Manipulation Compute to strip off the “(…)” portion, leaving only the accession (which can vary in length). I have tried a variety of things that work in my python command line, but fail here, for example: c1[1:c1.find(()] or c1.split('(')[0] This gets mangled: An error occurred running this job: Expression c1__ob__1:c1.find(()__cb__ likely invalid. Or An error occurred running this job: Expression c1.split(()__ob__0__cb__ likely invalid. Please help. This is driving me crazy. Searching the list, I find only http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911 “Inputs sanitization” which seems to indicate this is a global mapper that can only be disabled with dire security consequences. And http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-columns-tt3026255.html#a3048100 “substring sequence on coordinate in columns” which doesn’t ever answer the question about how to get compute to work. Thanks, Curtis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Text Manipulation: Filter out duplicates (uniq) from an plain text file ?
On Fri, May 6, 2011 at 3:16 PM, Roman Valls brainst...@nopcode.org wrote: Well, having similarly basic tools (in Galaxy) that can be performed on the commandline, such as sort or cut I just wondered how come a uniq is not there on the tool panel in some form/name. Thanks for the feedback Rory ! That's a timely question - I was also looking for something within Galaxy to take a text file and remove duplicate lines. Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Text Manipulation: Filter out duplicates (uniq) from an plain text file ?
Hi Peter and Roman, The Count tool under Statistics section provides uniq-like functionality. If you run this tool by selecting all columns under Count occurrences of values in column(s) field, your output will contain one line per record, with the 1st column containing the number of occurrences of each record. Hope this answers your question. Thanks for using Galaxy, Guru. On Fri, May 6, 2011 at 10:22 AM, Peter Cock p.j.a.c...@googlemail.comwrote: On Fri, May 6, 2011 at 3:16 PM, Roman Valls brainst...@nopcode.org wrote: Well, having similarly basic tools (in Galaxy) that can be performed on the commandline, such as sort or cut I just wondered how come a uniq is not there on the tool panel in some form/name. Thanks for the feedback Rory ! That's a timely question - I was also looking for something within Galaxy to take a text file and remove duplicate lines. Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team Penn State University 505 Wartik lab University Park PA 16802 g...@psu.edu ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Text Manipulation
Hi Peter, sounds nice, would be a great feature. For everyone else how is not using a custom server: if you are lucky you can use the trim tool on tabs to solve your problem. If you want to add text to the beginning or end of a column: - Use add column and add the text as a new column - Then use cut to get everything in the right order - finally join the new column with the column you want to add the text to I hope you get what I mean and is helpful to someone Using these tricks I have created a work flow that serves as my custom sam to gff converter. Of course its terribly inefficient, but it gets the job done. thx, Felix On Tue, Feb 22, 2011 at 2:28 AM, Jennifer Jackson j...@bx.psu.edu wrote: Hi Felix, Text Manipulation - Convert delimiters to TAB could split one field into more than one, but the delimiter has to be in the list (@ is not). Text Manipulation - Cut columns from a table is similar, but it will not split on a @ either. Text Manipulation - Trim leading or trailing characters could be use for this specific case, since you can trim off the end of a column based on a position (but again, not a specified delimiter). To prep for an entire genome, you would need to break up the starting query so that the chromosome name lengths in any derivative queries are of a consistent length, then merge back together. Perhaps the @ was just an example and one of these tools will work for you. If you are customizing, additions to the Tool Shed that expand the native tools are always welcome! http://community.g2.bx.psu.edu I've been planning to write a Galaxy tool to split a column on a given delimiter (e.g. @ for this example, or | for NCBI style identifiers), which would solve this use case nicely. I haven't done it yet though - so if anyone else wants to write such a tool first, please go ahead. Specifically I would be aiming to expose the Python split and rsplit string method functionality, so the user would have to specify the number of splits (or perhaps more intuitively the number of columns to make) and if it should start on the left (default) or on the right. Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/