Hi,
No ordering was defined until now for GAlignments. Starting with
GAlignments 1.11.7, the elements of a GAlignments object 'x' are
ordered based on the order of the elements in 'granges(x)', that
is, by chromosome, then by strand, then by start, then by end.
Note that I chose a simple ordering definition for GAlignments objects
but it has the following important caveat: == does NOT look at the
cigar. That means that 2 elements in a GAlignments object are considered
equal even if their cigars are different:
> gal <- GAlignments(Rle(factor("chr1"), 2), c(5L, 5L), c("10M",
"5M2I5M"), Rle(strand("+"), 2))
> gal
GAlignments object with 2 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 + 10M 10 5 14 10
[2] chr1 + 5M2I5M 12 5 14 10
njunc
<integer>
[1] 0
[2] 0
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> gal[1] == gal
[1] TRUE TRUE
Another ordering definition would be an ordering that looks at
chromosome/strand/start/end/cigar instead of just
chromosome/strand/start/end (i.e. the cigar is used to break ties when
looking at
chromosome/strand/start/end only leads to a tie). That would address
the above caveat but it feels weird to use lexicographic ordering of
the cigars in order to break ties. So I'm kind of hesitant to do this.
Anyway now <=, <, ==, !=, pcompare(), order(), sort(), rank(), etc...
work.
Still no match(), selfmatch(), duplicated(), or unique() for GAlignments
objects. Note that these things should adhere to the same notion of
equality as == so they would also ignore the cigar right now if we
wanted to have them. Maybe that's another argument in favor of an
ordering based on chromosome/strand/start/end/cigar.
H.
On 01/08/2017 05:00 PM, Dario Strbenac wrote:
Good day,
When sort is used on a GAlignments object, a stack error is shown, no matter
how small the object is.
testAlignments
GAlignments object with 3 alignments and 0 metadata columns:
seqnames strand cigar qwidth
start end width njunc
<Rle> <Rle> <character> <integer> <integer>
<integer> <integer> <integer>
700666F:126:C8768ANXX:3:2204:3175:99484 chr14 + 71S27M 98
18386040 18386066 27 0
700666F:126:C8768ANXX:1:1107:8115:31928 chr14 + 40S60M 100
18915005 18915064 60 0
700666F:126:C8768ANXX:1:2206:7564:34686 chr14 + 40S50M 90
18915005 18915054 50 0
-------
seqinfo: 23 sequences from an unspecified genome
sort(testAlignments)
Error: C stack usage 7970544 is too close to the limit
I use up-to-date packages.
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8
[6] LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] GenomicAlignments_1.10.0 Rsamtools_1.26.1 Biostrings_2.42.1
XVector_0.14.0
[5] SummarizedExperiment_1.4.0 Biobase_2.34.0 GenomicRanges_1.26.2
GenomeInfoDb_1.10.1
[9] IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] lattice_0.20-34 bitops_1.0-6 grid_3.3.2 zlibbioc_1.20.0
Matrix_1.2-7.1 BiocParallel_1.8.1
[7] tools_3.3.2
--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel