Re: [galaxy-user] Text Manipulation

2012-08-27 Thread Jennifer Jackson

Hi Lee,

There are a few options .. my guess is you are working with files like 
snpXXXCodingDbSnp?


1 - Using an expression such as this one will extract individual 
characters from the column, including the commas (treating the column's 
data like a string)


  c5.pop(1)

  = extracts the second character from the column

2 - Or, if you want to work with the entire codons, you could use the 
tool Convert delimiters to TAB to fully expand the data. You might 
want to use other Text Manipulation tools such as Cut with this option 
to get access to specific columns. Or Condense consecutive characters 
if the extra trailing commas in some of these datasets create empty 
columns (after converting to tabs).


Hopefully this helps! If there is more feedback from our developers, 
we'll post more. Others on the list are also welcome to add in comments.


Best,

Jen
Galaxy team

On 8/27/12 9:44 AM, Jennifer Jackson wrote:

Post to mailing list

On 8/26/12 11:38 AM, leemsil...@genepeeks.com wrote:

Hi Jen,

The ability to use Python commands with the Compute tool seems to be a
very well-hidden gem in Galaxy.  The problem with this nearly
unmentioned gem is the sheer frustration felt when something that
works on the Python command line fails on the Galaxy Compute tool.

What I would like to do is manipulate a column from a UCSC download
that lists two or three codons separated by a comma, e.g.AAC,GCG,
or  GGT, CAC, TAT,  .   The string-based split command  
c5.split(,).pop(1)  fails on this data because Galaxy assigned it a
list type automatically. All of my attempts to change the data type
to string have failed.

Any suggestions?

Lee
leemsil...@genepeeks.com


--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails

2011-08-23 Thread Russell Bell
Hi Curtis,


To accomplish things like your split command I have had success with
Compute using

c1.split('(').pop(0)

HTH

-
Russell

Message: 1
Date: Mon, 22 Aug 2011 09:24:36 -0700
From: Jennifer Jackson j...@bx.psu.edu
To: Robert Curtis Hendrickson curt...@uab.edu
Cc: galaxy-user@lists.bx.psu.edu galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Text Manipulation  Compute 
c1[1:c1.find(()] fails
Message-ID: 4e5282c4.3090...@bx.psu.edu
Content-Type: text/plain; charset=windows-1252; format=flowed

Hello Curtis,

There is some more feedback from our developers. In your own instance,
this is the recommended change:

Edit -

 tools/stats/column_maker.py

To add -

  __ob__ and __cb__ to the mapped_str dict.


Perhaps this will help if you still needed work-around (or others
reading this thread).

Best!

Jen
Galaxy team

 On 7/20/11 1:13 PM, Robert Curtis Hendrickson wrote:
 Folks,

 I have a column c1 that has entries like ?GXP_297346(PVALB/human)?.

 I?m trying to use Text Manipulation  Compute to strip off the ?(?)?
 portion, leaving only the accession (which can vary in length).

 I have tried a variety of things that work in my python command line,
 but fail here, for example:

 c1[1:c1.find(()]

 or

 c1.split('(')[0]

 This gets mangled:

 An error occurred running this job: Expression
 c1__ob__1:c1.find(()__cb__ likely invalid.

 Or

 An error occurred running this job: Expression
 c1.split(()__ob__0__cb__ likely invalid.

 Please help. This is driving me crazy.

 Searching the list, I find only

 
http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a266
4911

 ?Inputs sanitization? which seems to indicate this is a global mapper
 that can only be disabled with dire security consequences.

 And

 
http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-col
umns-tt3026255.html#a3048100

 ?substring sequence on coordinate in columns? which doesn?t ever answer
 the question about how to get compute to work.

 Thanks,

 Curtis



 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org. Please keep all replies on the list by
 using reply all in your mail client. For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

 http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/Support


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Text Manipulation Compute c1[1:c1.find(()] fails

2011-07-22 Thread Jennifer Jackson

Hello Robert,

This tool does sanitize many of the characters required to build this 
type of regular expression. Some changes to a few of the Text 
Manipulation tools have been discussed, but nothing is planned for the 
near term.


For now, the most expedient solution is for you to download the file, 
edit with a text editor (line command or desktop), then reload.


An updated wrapper that successfully adds parenthesis to the to list of 
chars changed to tabs with Convert might be the simplest change. If 
you make one, please consider adding it to the Tool Shed and send us an 
email to galaxy-...@bx.psu.edu to let the development community know 
about it.


Best wishes,

Jen
Galaxy team


On 7/20/11 1:13 PM, Robert Curtis Hendrickson wrote:

Folks,

I have a column c1 that has entries like “GXP_297346(PVALB/human)”.

I’m trying to use Text Manipulation  Compute to strip off the “(…)”
portion, leaving only the accession (which can vary in length).

I have tried a variety of things that work in my python command line,
but fail here, for example:

c1[1:c1.find(()]

or

c1.split('(')[0]

This gets mangled:

An error occurred running this job: Expression
c1__ob__1:c1.find(()__cb__ likely invalid.

Or

An error occurred running this job: Expression
c1.split(()__ob__0__cb__ likely invalid.

Please help. This is driving me crazy.

Searching the list, I find only

http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911
“Inputs sanitization” which seems to indicate this is a global mapper
that can only be disabled with dire security consequences.

And

http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-columns-tt3026255.html#a3048100
“substring sequence on coordinate in columns” which doesn’t ever answer
the question about how to get compute to work.

Thanks,

Curtis



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org/
http://galaxyproject.org/
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Text Manipulation: Filter out duplicates (uniq) from an plain text file ?

2011-05-06 Thread Peter Cock
On Fri, May 6, 2011 at 3:16 PM, Roman Valls brainst...@nopcode.org wrote:
 Well, having similarly basic tools (in Galaxy) that can be performed on
 the commandline, such as sort or cut I just wondered how come a
 uniq is not there on the tool panel in some form/name.

 Thanks for the feedback Rory !

That's a timely question - I was also looking for something within Galaxy
to take a text file and remove duplicate lines.

Peter
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Text Manipulation: Filter out duplicates (uniq) from an plain text file ?

2011-05-06 Thread Guru Ananda
Hi Peter and Roman,

The Count tool under Statistics section provides uniq-like
functionality. If you run this tool by selecting all columns under Count
occurrences of values in column(s) field, your output will contain one line
per record, with the 1st column containing the number of occurrences of each
record.

Hope this answers your question.
Thanks for using Galaxy,
Guru.


On Fri, May 6, 2011 at 10:22 AM, Peter Cock p.j.a.c...@googlemail.comwrote:

 On Fri, May 6, 2011 at 3:16 PM, Roman Valls brainst...@nopcode.org
 wrote:
  Well, having similarly basic tools (in Galaxy) that can be performed on
  the commandline, such as sort or cut I just wondered how come a
  uniq is not there on the tool panel in some form/name.
 
  Thanks for the feedback Rory !

 That's a timely question - I was also looking for something within Galaxy
 to take a text file and remove duplicate lines.

 Peter
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/




-- 
Graduate student, Bioinformatics and Genomics
Makova lab/Galaxy team
Penn State University
505 Wartik lab
University Park PA 16802
g...@psu.edu
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Text Manipulation

2011-02-22 Thread Felix Hammer
Hi Peter,

sounds nice, would be a great feature. For everyone else how is not using
a custom server: if you are lucky you can use the trim tool on tabs to
solve your problem.

If you want to add text to the beginning or end of a column:
- Use add column and add the text as a new column
- Then use cut to get everything in the right order
- finally join the new column with the column you want to add the text to
I hope you get what I mean and is helpful to someone

Using these tricks I have created a work flow that serves as my custom sam
to gff converter. Of course its terribly inefficient, but it gets the job
done.

thx,
Felix


 On Tue, Feb 22, 2011 at 2:28 AM, Jennifer Jackson j...@bx.psu.edu wrote:
 Hi Felix,

 Text Manipulation - Convert delimiters to TAB could split one field
 into
 more than one, but the delimiter has to be in the list (@ is not).

 Text Manipulation - Cut columns from a table is similar, but it will
 not
 split on a @ either.

 Text Manipulation - Trim leading or trailing characters could be use
 for
 this specific case, since you can trim off the end of a column based on
 a
 position (but again, not a specified delimiter). To prep for an entire
 genome, you would need to break up the starting query so that the
 chromosome
 name lengths in any derivative queries are of a consistent length, then
 merge back together.

 Perhaps the @ was just an example and one of these tools will work for
 you. If you are customizing, additions to the Tool Shed that expand the
 native tools are always welcome! http://community.g2.bx.psu.edu


 I've been planning to write a Galaxy tool to split a column on a given
 delimiter (e.g. @ for this example, or | for NCBI style identifiers),
 which
 would solve this use case nicely. I haven't done it yet though - so if
 anyone else wants to write such a tool first, please go ahead.

 Specifically I would be aiming to expose the Python split and rsplit
 string
 method functionality, so the user would have to specify the number of
 splits (or perhaps more intuitively the number of columns to make) and
 if it should start on the left (default) or on the right.

 Peter



___
The Galaxy User list should be used for the discussion
of Galaxy analysis and other features on the public
server at usegalaxy.org. For discussion of local Galaxy
instances and the Galaxy source code, please use the
Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other
Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/