Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?

2020-09-27 Thread Paul Harrison via Bioc-devel
Hi Charles, Vince, Herve,

A further option would be a DataFrame containing two GRanges columns. This
is used for example by the plyranges join_* and pair_* functions.

regards,
Paul Harrison

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?

2020-09-21 Thread Pages, Herve
Hi Charles, Vince,

Yes, a PairwiseAlignments object will contain the sequences of the 2 
genomes being aligned so will be big. Could be mitigated by using one 
object per chromosome instead of trying to represent the full genome 
alignment in a single object, but then you loose the ability to 
represent regions that align across chromosomes.

Other downsides of using PairwiseAlignments are:
- You loose the nice/simple block-to-block mapping that GRangePairs 
gives you, together with the easy/straightforward way to annotate the 
links between blocks (via the metadata columns of the GRangePairs).
- A PairwiseAlignments object can only represent replacements and indels 
while the block-to-block mapping in a GRangePairs object can support 
rearrangements (in addition to indels and replacements).
- The GRangesPairs approach even allows you to represent a many-to-many 
relationship between the blocks/regions of the 2 genomes, something that 
a PairwiseAlignments-based approach cannot do.

So the GRangePairs approach seems more flexible.

Maybe a better way to support an arbitrary relationship between the 
blocks/regions of the 2 genomes would be to use a 3-slot data structure: 
2 slots for 2 GRanges objects defining regions on the 2 genomes + 1 slot 
for representing the links between the regions defined on each genome 
(these links could be stored in a Hits object). Note that this is a 
classic bipartite graph. Would particularly make sense if the mapping 
between the regions is expected to be many-to-many. This kind of 
container would be able to represent a side-by-side comparison of 2 
arbitrary genomes, in its more general form, not just a pairwise genome 
alignment, which is more restrictive.

Cheers,
H.

On 9/18/20 02:41, Vincent Carey wrote:
> Starting from
> 
> PairwiseAlignments-class  package:Biostrings   R Documentation
> 
> PairwiseAlignments, PairwiseAlignmentsSingleSubject, and
> PairwiseAlignmentsSingleSubjectSummary objects
> 
> Description:
> 
>   The ‘PairwiseAlignments’ class is a container for storing a set of
>   pairwise alignments.
> 
>   The ‘PairwiseAlignmentsSingleSubject’ class is a container for
>   storing a set of pairwise alignments with a single subject.
> 
>   The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container
>   for storing the summary of a set of pairwise alignments.
> 
> Usage:
> 
>   ## Constructors:
>   ## When subject is missing, pattern must be of length 2
>   ## S4 method for signature 'XString,XString'
>   PairwiseAlignments(pattern, subject,
> type = "global", substitutionMatrix = NULL, gapOpening = 0,
> gapExtension = 1)
>   ## S4 method for signature 'XStringSet,missing'
>   PairwiseAlignments(pattern, subject,
> type = "global", substitutionMatrix = NULL, gapOpening = 0,
> gapExtension = 1)
>   ## S4 method for signature 'character,character'
>   PairwiseAlignments(pattern, subject,
> type = "global", substitutionMatrix = NULL, gapOpening = 0,
> gapExtension = 1,
> baseClass = "BString")
> 
> ...
> 
> my question would be whether this is a relevant starting place?  Clearly
> the focus is not on coordinates, but perhaps a structure that maintains
> genomic content and coordinates together would be of use?
> 
> 
> On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy 
> wrote:
> 
>> Dear Bioc developers,
>>
>> I am currently analysing pairwise genome alignments with Bioconductor,
>> and I represent them with a GRanges object of the first genome,
>> containing one element by alignment block, and storing the coordinates
>> in the other genome in a metadata column containing another GRanges object.
>>
>> Something like this.
>>
>> GRanges object with 36582 ranges and 2 metadata columns:
>> seqnames  ranges strand | scorequery
>>  | 
>> [1]   S1 162-550  + |   861XSR:909374-909853
>> [2]   S1833-3738  + |  7238XSR:910181-913291
>> [3]   S1   3769-4212  + |  1165XSR:913510-913953
>> [4]   S1   4246-4381  + |   359XSR:914134-914275
>> [5]   S1   4532-5990  + |  2977 chr2:6694031-6695569
>> ...  ... ...... .   ...  ...
>> [36578]  S99 17228-17759  - |   793 chr1:2375870-2376379
>> [36579]  S99 16417-16935  - |   632 chr1:2376612-2377077
>> [36580]  S99 12370-12759  - |   773 chr1:2379949-2380343
>> [36581]  S99   5270-5384  - |   295   chr1:843397-843511
>> [36582]  S99   1949-3053  - |  2105   chr1:845358-846326
>> ---
>>
>> Using "Pairwise genome alignment" as a keyword in a search engine, I
>> found that the packages CNEr is doing something similar, although it
>> uses a dedicated "GRangePairs" object for the purpose.
>>
>> Before I start to invest 

Re: [Bioc-devel] Best object structure for representing a pairwise genome alignment ?

2020-09-18 Thread Vincent Carey
Starting from

PairwiseAlignments-class  package:Biostrings   R Documentation

PairwiseAlignments, PairwiseAlignmentsSingleSubject, and
PairwiseAlignmentsSingleSubjectSummary objects

Description:

 The ‘PairwiseAlignments’ class is a container for storing a set of
 pairwise alignments.

 The ‘PairwiseAlignmentsSingleSubject’ class is a container for
 storing a set of pairwise alignments with a single subject.

 The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container
 for storing the summary of a set of pairwise alignments.

Usage:

 ## Constructors:
 ## When subject is missing, pattern must be of length 2
 ## S4 method for signature 'XString,XString'
 PairwiseAlignments(pattern, subject,
   type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
 ## S4 method for signature 'XStringSet,missing'
 PairwiseAlignments(pattern, subject,
   type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
 ## S4 method for signature 'character,character'
 PairwiseAlignments(pattern, subject,
   type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1,
   baseClass = "BString")

...

my question would be whether this is a relevant starting place?  Clearly
the focus is not on coordinates, but perhaps a structure that maintains
genomic content and coordinates together would be of use?


On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy 
wrote:

> Dear Bioc developers,
>
> I am currently analysing pairwise genome alignments with Bioconductor,
> and I represent them with a GRanges object of the first genome,
> containing one element by alignment block, and storing the coordinates
> in the other genome in a metadata column containing another GRanges object.
>
> Something like this.
>
> GRanges object with 36582 ranges and 2 metadata columns:
>seqnames  ranges strand | scorequery
> | 
>[1]   S1 162-550  + |   861XSR:909374-909853
>[2]   S1833-3738  + |  7238XSR:910181-913291
>[3]   S1   3769-4212  + |  1165XSR:913510-913953
>[4]   S1   4246-4381  + |   359XSR:914134-914275
>[5]   S1   4532-5990  + |  2977 chr2:6694031-6695569
>...  ... ...... .   ...  ...
>[36578]  S99 17228-17759  - |   793 chr1:2375870-2376379
>[36579]  S99 16417-16935  - |   632 chr1:2376612-2377077
>[36580]  S99 12370-12759  - |   773 chr1:2379949-2380343
>[36581]  S99   5270-5384  - |   295   chr1:843397-843511
>[36582]  S99   1949-3053  - |  2105   chr1:845358-846326
>---
>
> Using "Pairwise genome alignment" as a keyword in a search engine, I
> found that the packages CNEr is doing something similar, although it
> uses a dedicated "GRangePairs" object for the purpose.
>
> Before I start to invest time in either direction, I wanted to check on
> that mailing list if there were other solutions already existing, in
> particularly closer to the core packages ?
>
> Have a nice day,
>
> Charles
>
> --
> Charles Plessy - - ~ ~ ~ ~ ~  ~ ~ ~ ~ ~ - - charles.ple...@oist.jp
> Okinawa  Institute  of  Science  and  Technology  Graduate  University
> Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
> Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Best object structure for representing a pairwise genome alignment ?

2020-09-18 Thread Charles Plessy

Dear Bioc developers,

I am currently analysing pairwise genome alignments with Bioconductor, 
and I represent them with a GRanges object of the first genome, 
containing one element by alignment block, and storing the coordinates 
in the other genome in a metadata column containing another GRanges object.


Something like this.

GRanges object with 36582 ranges and 2 metadata columns:
  seqnames  ranges strand | scorequery
   | 
  [1]   S1 162-550  + |   861XSR:909374-909853
  [2]   S1833-3738  + |  7238XSR:910181-913291
  [3]   S1   3769-4212  + |  1165XSR:913510-913953
  [4]   S1   4246-4381  + |   359XSR:914134-914275
  [5]   S1   4532-5990  + |  2977 chr2:6694031-6695569
  ...  ... ...... .   ...  ...
  [36578]  S99 17228-17759  - |   793 chr1:2375870-2376379
  [36579]  S99 16417-16935  - |   632 chr1:2376612-2377077
  [36580]  S99 12370-12759  - |   773 chr1:2379949-2380343
  [36581]  S99   5270-5384  - |   295   chr1:843397-843511
  [36582]  S99   1949-3053  - |  2105   chr1:845358-846326
  ---

Using "Pairwise genome alignment" as a keyword in a search engine, I 
found that the packages CNEr is doing something similar, although it 
uses a dedicated "GRangePairs" object for the purpose.


Before I start to invest time in either direction, I wanted to check on 
that mailing list if there were other solutions already existing, in 
particularly closer to the core packages ?


Have a nice day,

Charles

--
Charles Plessy - - ~ ~ ~ ~ ~  ~ ~ ~ ~ ~ - - charles.ple...@oist.jp
Okinawa  Institute  of  Science  and  Technology  Graduate  University
Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel