Re: [galaxy-user] Join on genomic interval

2011-08-08 Thread Jennifer Jackson

Hello Seth,

It sounds like there may be also reads joining with more than one exon - 
so there is a many-to-many relationship in the output. This would not be 
uncommon (especially if there are multiple reads per gene cluster) and 
would result in an input read being reported >1 time in the output. 
Depending on the data, separating the join into two by strand and/or 
increasing the overlap may be appropriate.


FAQ/Screencast:
http://wiki.g2.bx.psu.edu/Learn/Interval%20Operations

Hopefully this helps,

Jen
Galaxy team

On 8/6/11 6:21 AM, Seth Kasowitz wrote:

Hello,
I am hoping for some clarification on how Join on Genomic Intervals
functions. I have two lists of intervals: mapped reads and a list of
exons. If I join the two (INNER JOIN), I expect multiple reads to join
with the same exon, and see this in the output. What is confusing me is
that some output has more joined intervals returned than were present in
the input reads.
For example: I join 17,000,000 mapped reads with a list of 300,000 exons
and retrieve 21,000,000 joined intervals

I must be misunderstanding what the function does, and am hoping someone
can explain how the output can have more lines than the reads submitted.

Thank you,
Seth

--
Seth Kasowitz
University of Connecticut
Department of Molecular and Cellular Biology
seth.kasow...@uconn.edu 
Beach Hall Room 335 (6-3580)


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Join on genomic interval

2011-08-06 Thread Seth Kasowitz
Hello,
I am hoping for some clarification on how Join on Genomic Intervals
functions. I have two lists of intervals: mapped reads and a list of exons.
If I join the two (INNER JOIN), I expect multiple reads to join with the
same exon, and see this in the output. What is confusing me is that some
output has more joined intervals returned than were present in the input
reads.
For example: I join 17,000,000 mapped reads with a list of 300,000 exons and
retrieve 21,000,000 joined intervals

I must be misunderstanding what the function does, and am hoping someone can
explain how the output can have more lines than the reads submitted.

Thank you,
Seth

-- 
Seth Kasowitz
University of Connecticut
Department of Molecular and Cellular Biology
seth.kasow...@uconn.edu
Beach Hall Room 335 (6-3580)
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/