Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?
Hi Charles, Vince, Herve, A further option would be a DataFrame containing two GRanges columns. This is used for example by the plyranges join_* and pair_* functions. regards, Paul Harrison [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?
Hi Charles, Vince, Yes, a PairwiseAlignments object will contain the sequences of the 2 genomes being aligned so will be big. Could be mitigated by using one object per chromosome instead of trying to represent the full genome alignment in a single object, but then you loose the ability to represent regions that align across chromosomes. Other downsides of using PairwiseAlignments are: - You loose the nice/simple block-to-block mapping that GRangePairs gives you, together with the easy/straightforward way to annotate the links between blocks (via the metadata columns of the GRangePairs). - A PairwiseAlignments object can only represent replacements and indels while the block-to-block mapping in a GRangePairs object can support rearrangements (in addition to indels and replacements). - The GRangesPairs approach even allows you to represent a many-to-many relationship between the blocks/regions of the 2 genomes, something that a PairwiseAlignments-based approach cannot do. So the GRangePairs approach seems more flexible. Maybe a better way to support an arbitrary relationship between the blocks/regions of the 2 genomes would be to use a 3-slot data structure: 2 slots for 2 GRanges objects defining regions on the 2 genomes + 1 slot for representing the links between the regions defined on each genome (these links could be stored in a Hits object). Note that this is a classic bipartite graph. Would particularly make sense if the mapping between the regions is expected to be many-to-many. This kind of container would be able to represent a side-by-side comparison of 2 arbitrary genomes, in its more general form, not just a pairwise genome alignment, which is more restrictive. Cheers, H. On 9/18/20 02:41, Vincent Carey wrote: > Starting from > > PairwiseAlignments-class package:Biostrings R Documentation > > PairwiseAlignments, PairwiseAlignmentsSingleSubject, and > PairwiseAlignmentsSingleSubjectSummary objects > > Description: > > The ‘PairwiseAlignments’ class is a container for storing a set of > pairwise alignments. > > The ‘PairwiseAlignmentsSingleSubject’ class is a container for > storing a set of pairwise alignments with a single subject. > > The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container > for storing the summary of a set of pairwise alignments. > > Usage: > > ## Constructors: > ## When subject is missing, pattern must be of length 2 > ## S4 method for signature 'XString,XString' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1) > ## S4 method for signature 'XStringSet,missing' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1) > ## S4 method for signature 'character,character' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1, > baseClass = "BString") > > ... > > my question would be whether this is a relevant starting place? Clearly > the focus is not on coordinates, but perhaps a structure that maintains > genomic content and coordinates together would be of use? > > > On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy > wrote: > >> Dear Bioc developers, >> >> I am currently analysing pairwise genome alignments with Bioconductor, >> and I represent them with a GRanges object of the first genome, >> containing one element by alignment block, and storing the coordinates >> in the other genome in a metadata column containing another GRanges object. >> >> Something like this. >> >> GRanges object with 36582 ranges and 2 metadata columns: >> seqnames ranges strand | scorequery >> | >> [1] S1 162-550 + | 861XSR:909374-909853 >> [2] S1833-3738 + | 7238XSR:910181-913291 >> [3] S1 3769-4212 + | 1165XSR:913510-913953 >> [4] S1 4246-4381 + | 359XSR:914134-914275 >> [5] S1 4532-5990 + | 2977 chr2:6694031-6695569 >> ... ... ...... . ... ... >> [36578] S99 17228-17759 - | 793 chr1:2375870-2376379 >> [36579] S99 16417-16935 - | 632 chr1:2376612-2377077 >> [36580] S99 12370-12759 - | 773 chr1:2379949-2380343 >> [36581] S99 5270-5384 - | 295 chr1:843397-843511 >> [36582] S99 1949-3053 - | 2105 chr1:845358-846326 >> --- >> >> Using "Pairwise genome alignment" as a keyword in a search engine, I >> found that the packages CNEr is doing something similar, although it >> uses a dedicated "GRangePairs" object for the purpose. >> >> Before I start to invest
Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?
Starting from PairwiseAlignments-class package:Biostrings R Documentation PairwiseAlignments, PairwiseAlignmentsSingleSubject, and PairwiseAlignmentsSingleSubjectSummary objects Description: The ‘PairwiseAlignments’ class is a container for storing a set of pairwise alignments. The ‘PairwiseAlignmentsSingleSubject’ class is a container for storing a set of pairwise alignments with a single subject. The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container for storing the summary of a set of pairwise alignments. Usage: ## Constructors: ## When subject is missing, pattern must be of length 2 ## S4 method for signature 'XString,XString' PairwiseAlignments(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) ## S4 method for signature 'XStringSet,missing' PairwiseAlignments(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) ## S4 method for signature 'character,character' PairwiseAlignments(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") ... my question would be whether this is a relevant starting place? Clearly the focus is not on coordinates, but perhaps a structure that maintains genomic content and coordinates together would be of use? On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy wrote: > Dear Bioc developers, > > I am currently analysing pairwise genome alignments with Bioconductor, > and I represent them with a GRanges object of the first genome, > containing one element by alignment block, and storing the coordinates > in the other genome in a metadata column containing another GRanges object. > > Something like this. > > GRanges object with 36582 ranges and 2 metadata columns: >seqnames ranges strand | scorequery > | >[1] S1 162-550 + | 861XSR:909374-909853 >[2] S1833-3738 + | 7238XSR:910181-913291 >[3] S1 3769-4212 + | 1165XSR:913510-913953 >[4] S1 4246-4381 + | 359XSR:914134-914275 >[5] S1 4532-5990 + | 2977 chr2:6694031-6695569 >... ... ...... . ... ... >[36578] S99 17228-17759 - | 793 chr1:2375870-2376379 >[36579] S99 16417-16935 - | 632 chr1:2376612-2377077 >[36580] S99 12370-12759 - | 773 chr1:2379949-2380343 >[36581] S99 5270-5384 - | 295 chr1:843397-843511 >[36582] S99 1949-3053 - | 2105 chr1:845358-846326 >--- > > Using "Pairwise genome alignment" as a keyword in a search engine, I > found that the packages CNEr is doing something similar, although it > uses a dedicated "GRangePairs" object for the purpose. > > Before I start to invest time in either direction, I wanted to check on > that mailing list if there were other solutions already existing, in > particularly closer to the core packages ? > > Have a nice day, > > Charles > > -- > Charles Plessy - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - charles.ple...@oist.jp > Okinawa Institute of Science and Technology Graduate University > Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu > Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- The information in this e-mail is intended only for the ...{{dropped:18}} ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Best object structure for representing a pairwise genome alignment ?
Dear Bioc developers, I am currently analysing pairwise genome alignments with Bioconductor, and I represent them with a GRanges object of the first genome, containing one element by alignment block, and storing the coordinates in the other genome in a metadata column containing another GRanges object. Something like this. GRanges object with 36582 ranges and 2 metadata columns: seqnames ranges strand | scorequery | [1] S1 162-550 + | 861XSR:909374-909853 [2] S1833-3738 + | 7238XSR:910181-913291 [3] S1 3769-4212 + | 1165XSR:913510-913953 [4] S1 4246-4381 + | 359XSR:914134-914275 [5] S1 4532-5990 + | 2977 chr2:6694031-6695569 ... ... ...... . ... ... [36578] S99 17228-17759 - | 793 chr1:2375870-2376379 [36579] S99 16417-16935 - | 632 chr1:2376612-2377077 [36580] S99 12370-12759 - | 773 chr1:2379949-2380343 [36581] S99 5270-5384 - | 295 chr1:843397-843511 [36582] S99 1949-3053 - | 2105 chr1:845358-846326 --- Using "Pairwise genome alignment" as a keyword in a search engine, I found that the packages CNEr is doing something similar, although it uses a dedicated "GRangePairs" object for the purpose. Before I start to invest time in either direction, I wanted to check on that mailing list if there were other solutions already existing, in particularly closer to the core packages ? Have a nice day, Charles -- Charles Plessy - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - charles.ple...@oist.jp Okinawa Institute of Science and Technology Graduate University Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel