Re: [galaxy-dev] Select first/last N rows from grouped tabular files (e.g. top BLAST hits)
On Thu, May 19, 2011 at 7:33 PM, madduri wrote: > I wonder if somebody can give me more context around this issue.. > On 3rd May I emailed IBX about their Galaxy install and one of the (in house) tools mentioned on the workflow image here: https://ibi.uchicago.edu/resources/galaxy/index.html I recognised the NCBI BLAST+ tools but the "Filter Top Blast Results" tool was new to me, and asked what it did and if it or any the other IBX tools would be available at the Galaxy Tool Shed: http://community.g2.bx.psu.edu/ I had a reply from Alex Rodriguez (iBi/CI University of Chicago) that they haven't put any of the wrappers on the Galaxy tool shed yet as they are still being worked on. The IBI system assigned the number [Galaxy #13918]. This thread "Select first/last N rows from grouped tabular files (e.g. top BLAST hits)" could have similarities to the IBI "Filter Top Blast Results" tool, so I forwarded the email to the IBI galaxy email address to encourage you (e.g. Alex) to comment on the thread. The IBI system assigned the number [Galaxy #14246]. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Select first/last N rows from grouped tabular files (e.g. top BLAST hits)
On Tue, May 17, 2011 at 5:30 PM, Peter Cock wrote: > Hi all, > > I'm wondering if the following task can be done in Galaxy with the > standard tools. The specific example is selecting the top (e.g. 3) > match sequences for each blast query, but I see this problem as much > more general than a "Select top BLAST hits" tool. > > ... > > Does this make sense? Does it seem like a useful tool to write if > there isn't anything like this already present? Or might it be simpler > to just write a "Select top BLAST hits" tool? While I still think the above task could be useful in general, I am now considering a general "BLAST filter" tool to offer this and some other commonly used filters like a minimum coverage threshold (which is possible with a filter on the extended tabular output, but not trivial). Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Select first/last N rows from grouped tabular files (e.g. top BLAST hits)
Hi all, I'm wondering if the following task can be done in Galaxy with the standard tools. The specific example is selecting the top (e.g. 3) match sequences for each blast query, but I see this problem as much more general than a "Select top BLAST hits" tool. I want to select the first few (e.g. 1) rows of each group in a tabular file, where the group criteria is having certain columns equal (e.g. the first 2). e.g. Tabular BLAST output has columns of query ID, match ID, etc. queryA match1 ... queryA match2 ... queryA match2 ... queryA match3 ... queryA match4 ... queryA match4 ... queryA match4 ... queryB match5 ... queryB match5 ... queryC match6 ... queryC match7 ... In this example, some of my queries have more than one HSP per match (more than one line with the same first two columns). If I group on the first two columns, the groups are: queryA match1 ... queryA match2 ... queryA match2 ... queryA match3 ... queryA match4 ... queryA match4 ... queryA match4 ... queryB match5 ... queryB match5 ... queryC match6 ... queryC match7 ... If I then take the first row in each group, that gives me just the first HSP for each query+match combination. queryA match1 ... queryA match2 ... queryA match3 ... queryA match4 ... queryB match5 ... queryC match6 ... queryC match7 ... If for example I wanted only the top 3 matches for each query, I could repeat the proposed tool one more time but with different settings - this time grouping on the first column only: queryA match1 ... queryA match2 ... queryA match3 ... queryB match5 ... queryC match6 ... queryC match7 ... I hope I've conveyed the idea here. The existing tools "Select first lines from a dataset" and "Select last lines from a dataset" are related, but do this at the file level. Does this make sense? Does it seem like a useful tool to write if there isn't anything like this already present? Or might it be simpler to just write a "Select top BLAST hits" tool? Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/