Re: [Bioc-devel] version increments for unchanged packages

2015-06-11 Thread Hervé Pagès

On 06/11/2015 11:50 AM, Hervé Pagès wrote:

Hi Stephanie,

On 06/11/2015 11:33 AM, Gabe Becker wrote:

Stephanie,

As far as I know, it is so that package versions are unique to specific
releases of bioconductor. This has the benefits of

1. providing assurances that that particular version of a package is
tested and confirmed to work within that release, and
2. enforces that the source code/data for a particular version of a
package appears only and exactly once within the Bioconductor SVN
structure.


Together, these prevent there being ambiguity when a package needs to be
updated/fixed in the context of a particular release, both in terms of
what
version needs to be fixed and where that fix needs to be applied.


Exactly.

One more thing: because the exact version of R that is used to build
binary packages is not necessarily the same between 2 versions of
Bioconductor, we would end up with binary packages that are different
(and possibly not interchangeable) but impossible to distinguish based
on their name if we were not bumping versions (e.g. both would be
GWASdata_1.4.0.zip). Could make troubleshooting nightmarish.


I forgot that we don't build binaries for data experiment packages
anymore. Because these packages don't need compilation, the source
package can be installed on any platform without the need of any extra
tool (e.g. Rtools or Xcode).

That argument still applies for software packages though...

H.



Hope that makes sense,
H.



Best,
~G

On Thu, Jun 11, 2015 at 10:40 AM, Stephanie M. Gogarten 
sdmor...@u.washington.edu wrote:


Why is it that packages with no changes still get new version numbers at
every release? For example, my experiment data package GWASdata has not
changed since the last release, but the version was bumped from 1.4.0 to
1.6.0. I imagine most users expect that a change in version number
indicates some change in content.

Stephanie

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel









--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Hervé Pagès

I also think that we're heading towards something like that.

So genome(gr) - hg19 would:

  (a) Add any missing information to the seqinfo.
  (b) Sort the seqlevels in canonical order.
  (c) Change the seqlevels style to UCSC style if they are not.

The 3 tasks are orthogonal. I guess most of the times people want
an easy way to perform them all at once.

We could easily support (a) and (b). This assumes that the current
seqlevels are already valid hg19 seqlevels:

  si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2, chr6_cox_hap2))
  gr1 - GRanges(seqinfo=si1)
  hg19_si - Seqinfo(genome=hg19)

  ## (a):
  seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
  seqinfo(gr1)
  # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
  #   seqnames   seqlengths isCircular genome
  #   chrX155270560  FALSE   hg19
  #   chrUn_gl000249  38502  FALSE   hg19
  #   chr2243199373  FALSE   hg19
  #   chr6_cox_hap2 4795371  FALSE   hg19

  ## (b):
  seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
  seqinfo(gr1)
  # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
  #   seqnames   seqlengths isCircular genome
  #   chr2243199373  FALSE   hg19
  #   chrX155270560  FALSE   hg19
  #   chr6_cox_hap2 4795371  FALSE   hg19
  #   chrUn_gl000249  38502  FALSE   hg19

(c) is harder because seqlevelsStyle() doesn't know how to rename
scaffolds yet:

  si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2, 
HSCHR6_MHC_COX_CTG1))

  gr2 - GRanges(seqinfo=si2)

  seqlevelsStyle(gr2)
  # [1] NCBI

  seqlevelsStyle(gr2) - UCSC
  seqlevels(gr2)
  # [1] chrX HSCHRUN_RANDOM_CTG42 chr2 


  # [4] HSCHR6_MHC_COX_CTG1

So we need to work on this.

I'm not sure about using genome(gr) - hg19 for this. Right now
it sets the genome column of the seqinfo with the supplied string
and nothing else. Aren't there valid use cases for this? What about
using seqinfo(gr) - hg19 instead? It kind of suggests that the
whole seqinfo component actually gets filled.

H.

On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:

that's kind of always been my goal...


Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence
lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

Maybe this could eventually support setting the seqinfo with:

genome(gr) - hg19

Or is that being too clever?

On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:
  Hi,
 
  FWIW I started to work on supporting quick generation of a standalone
  Seqinfo object via Seqinfo(genome=hg38) in GenomeInfoDb.
 
  It already supports hg38, hg19, hg18, panTro4, panTro3, panTro2,
  bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1, mm10,
  mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4, galGal3,
  gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2, sacCer3,
  and sacCer2. I'll add more.
 
  See ?Seqinfo for some examples.
 
  Right now it fetches the information from internet every time you
  call it but maybe we should just store that information in the
  GenomeInfoDb package as Tim suggested?
 
  H.
 
 
  On 06/03/2015 12:54 PM, Tim Triche, Jr. wrote:
 
  That would be perfect actually.  And it would radically reduce 
  modularize maintenance.  Maybe that's the best way to go after
all.  Quite
  sensible.
 
  --t
 
  On Jun 3, 2015, at 12:46 PM, Vincent Carey
st...@channing.harvard.edu mailto:st...@channing.harvard.edu
  wrote:
 
  It really isn't hard to have multiple OrganismDb packages in
place -- the
  process of making new ones is documented and was given as an
exercise in
  the EdX course.  I don't know if we want to institutionalize it and
  distribute such -- I think we might, so that there would be
Hs19, Hs38,
  mm9, etc. packages.  They have very little content, they just
coordinate
  interactions with packages that you'll already have.
 
  On Wed, Jun 3, 2015 at 3:26 PM, Tim Triche, Jr.
tim.tri...@gmail.com mailto:tim.tri...@gmail.com
  wrote:
 
  Right, I typically do that too, and if you're working on human
data it
  isn't a big deal.  What makes things a lot more of a drag is
when you
  work
  on e.g. mouse data (mm9 vs mm10, aka GRCm37 vs GRCm38) where
  Mus.musculus
  is essentially a build ahead of Homo.sapiens.
 
  R seqinfo(Homo.sapiens)
  Seqinfo object with 93 sequences (1 circular) from hg19 genome
 
  R seqinfo(Mus.musculus)
  Seqinfo object with 66 sequences (1 circular) from mm10 genome:
 
  It's not as explicit as directly assigning the seqinfo from a
genome

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Hervé Pagès

On 06/05/2015 11:43 AM, Michael Lawrence wrote:

On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès hpa...@fredhutch.org wrote:

I also think that we're heading towards something like that.

So genome(gr) - hg19 would:

   (a) Add any missing information to the seqinfo.
   (b) Sort the seqlevels in canonical order.
   (c) Change the seqlevels style to UCSC style if they are not.

The 3 tasks are orthogonal. I guess most of the times people want
an easy way to perform them all at once.

We could easily support (a) and (b). This assumes that the current
seqlevels are already valid hg19 seqlevels:

   si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2, chr6_cox_hap2))
   gr1 - GRanges(seqinfo=si1)
   hg19_si - Seqinfo(genome=hg19)

   ## (a):
   seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
   seqinfo(gr1)
   # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
   #   seqnames   seqlengths isCircular genome
   #   chrX155270560  FALSE   hg19
   #   chrUn_gl000249  38502  FALSE   hg19
   #   chr2243199373  FALSE   hg19
   #   chr6_cox_hap2 4795371  FALSE   hg19

   ## (b):
   seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
   seqinfo(gr1)
   # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
   #   seqnames   seqlengths isCircular genome
   #   chr2243199373  FALSE   hg19
   #   chrX155270560  FALSE   hg19
   #   chr6_cox_hap2 4795371  FALSE   hg19
   #   chrUn_gl000249  38502  FALSE   hg19

(c) is harder because seqlevelsStyle() doesn't know how to rename
scaffolds yet:

   si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2, HSCHR6_MHC_COX_CTG1))
   gr2 - GRanges(seqinfo=si2)

   seqlevelsStyle(gr2)
   # [1] NCBI

   seqlevelsStyle(gr2) - UCSC
   seqlevels(gr2)
   # [1] chrX HSCHRUN_RANDOM_CTG42 chr2
   # [4] HSCHR6_MHC_COX_CTG1

So we need to work on this.

I'm not sure about using genome(gr) - hg19 for this. Right now
it sets the genome column of the seqinfo with the supplied string
and nothing else. Aren't there valid use cases for this?


Not sure. People would almost always want the seqname style and order
to be consistent with the given genome.


Agreed but my worry is that when they don't, then they would be left
with no way to just set the genome column.

H.




What about
using seqinfo(gr) - hg19 instead? It kind of suggests that the
whole seqinfo component actually gets filled.



Yea, but genome is so intuitive compared to seqinfo.




H.

On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:


that's kind of always been my goal...


Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence
lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

 Maybe this could eventually support setting the seqinfo with:

 genome(gr) - hg19

 Or is that being too clever?

 On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org wrote:
   Hi,
  
   FWIW I started to work on supporting quick generation of a
standalone
   Seqinfo object via Seqinfo(genome=hg38) in GenomeInfoDb.
  
   It already supports hg38, hg19, hg18, panTro4, panTro3, panTro2,
   bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1,
mm10,
   mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4,
galGal3,
   gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2, sacCer3,
   and sacCer2. I'll add more.
  
   See ?Seqinfo for some examples.
  
   Right now it fetches the information from internet every time you
   call it but maybe we should just store that information in the
   GenomeInfoDb package as Tim suggested?
  
   H.
  
  
   On 06/03/2015 12:54 PM, Tim Triche, Jr. wrote:
  
   That would be perfect actually.  And it would radically reduce 
   modularize maintenance.  Maybe that's the best way to go after
 all.  Quite
   sensible.
  
   --t
  
   On Jun 3, 2015, at 12:46 PM, Vincent Carey
 st...@channing.harvard.edu mailto:st...@channing.harvard.edu
   wrote:
  
   It really isn't hard to have multiple OrganismDb packages in
 place -- the
   process of making new ones is documented and was given as an
 exercise in
   the EdX course.  I don't know if we want to institutionalize it
and
   distribute such -- I think we might, so that there would be
 Hs19, Hs38,
   mm9, etc. packages.  They have very little content, they just
 coordinate
   interactions with packages that you'll already have.
  
   On Wed, Jun 3, 2015 at 3:26 PM, Tim Triche, Jr.
 tim.tri...@gmail.com mailto:tim.tri...@gmail.com

   wrote:
  
   Right, I typically do that too, and if you're working on human
 data it
   isn't a big deal.  What makes

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Hervé Pagès

On 06/05/2015 01:19 PM, Michael Lawrence wrote:

To support the multi-genome case, one could set the genome as a
vector, one value for each seqname, and it would fix the
style/seqlength per seqname. It could sort by the combination of
seqname and species. Presumably it would do nothing for unknown
genomes.

But I agree that a standardizeSeqinfo() that amounts to genome(x) -
genome(x) would make sense.

I don't think people sort too often by seqnames (except to the
natural ordering),


That's what sort(), order(), rank() do by default: they sort by seqnames
first, then by start, then by end, and finally by strand.


but I could be wrong. I do sympathize though with
the need for a low-level accessor. At least one would want a parameter
for disabling the standardization.


Ok. So the candidates are:

 (a) standardizeSeqinfo(gr) - hg19

 (b) gr - standardizeSeqinfo(gr, hg19)

 (c) standardizeGenome(gr) - hg19

 (d) gr - standardizeGenome(gr, hg19)

 (e) seqinfo(gr) - hg19

Is there a risk of confusion with keepStandardChromosomes where
standard means a very different thing? I'll add 2 more:

 (f) normalizeSeqinfo(gr) - hg19

 (g) gr - normalizeSeqinfo(gr, hg19)

Anyway, we're not here yet. As pointed in an earlier post, there are
still some missing pieces to complete the puzzle.

Thanks,
H.



On Fri, Jun 5, 2015 at 12:54 PM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:

In WGBS we frequently sequence a human with spikein from the lambda genome.
In this case, most of the chromosomes of the Granges are from human, except
one.  This is a usecase where genome(GR) is not constant.  I suggest, partly
for compatibility, to keep genome, but perhaps do something like
   standardizeGenome()
or something like this.

I would indeed love, love, love a function which just cleans it up.

Kasper

On Fri, Jun 5, 2015 at 2:51 PM, Gabe Becker becker.g...@gene.com wrote:


Herve,

This is probably a naive question, but what usecases are there for
creating
an object with the wrong seqinfo for its genome?

~G

On Fri, Jun 5, 2015 at 11:43 AM, Michael Lawrence
lawrence.mich...@gene.com

wrote:



On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès hpa...@fredhutch.org
wrote:

I also think that we're heading towards something like that.

So genome(gr) - hg19 would:

   (a) Add any missing information to the seqinfo.
   (b) Sort the seqlevels in canonical order.
   (c) Change the seqlevels style to UCSC style if they are not.

The 3 tasks are orthogonal. I guess most of the times people want
an easy way to perform them all at once.

We could easily support (a) and (b). This assumes that the current
seqlevels are already valid hg19 seqlevels:

   si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2, chr6_cox_hap2))
   gr1 - GRanges(seqinfo=si1)
   hg19_si - Seqinfo(genome=hg19)

   ## (a):
   seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
   seqinfo(gr1)
   # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
   #   seqnames   seqlengths isCircular genome
   #   chrX155270560  FALSE   hg19
   #   chrUn_gl000249  38502  FALSE   hg19
   #   chr2243199373  FALSE   hg19
   #   chr6_cox_hap2 4795371  FALSE   hg19

   ## (b):
   seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
   seqinfo(gr1)
   # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
   #   seqnames   seqlengths isCircular genome
   #   chr2243199373  FALSE   hg19
   #   chrX155270560  FALSE   hg19
   #   chr6_cox_hap2 4795371  FALSE   hg19
   #   chrUn_gl000249  38502  FALSE   hg19

(c) is harder because seqlevelsStyle() doesn't know how to rename
scaffolds yet:

   si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2,

HSCHR6_MHC_COX_CTG1))

   gr2 - GRanges(seqinfo=si2)

   seqlevelsStyle(gr2)
   # [1] NCBI

   seqlevelsStyle(gr2) - UCSC
   seqlevels(gr2)
   # [1] chrX HSCHRUN_RANDOM_CTG42 chr2
   # [4] HSCHR6_MHC_COX_CTG1

So we need to work on this.

I'm not sure about using genome(gr) - hg19 for this. Right now
it sets the genome column of the seqinfo with the supplied string
and nothing else. Aren't there valid use cases for this?


Not sure. People would almost always want the seqname style and order
to be consistent with the given genome.


What about
using seqinfo(gr) - hg19 instead? It kind of suggests that the
whole seqinfo component actually gets filled.



Yea, but genome is so intuitive compared to seqinfo.




H.

On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:


that's kind of always been my goal...


Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence
lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

 Maybe this could eventually support setting the seqinfo with:

 genome(gr) - hg19

 Or is that being too clever?

 On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès hpa

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Hervé Pagès

On 06/05/2015 01:39 PM, Tim Triche, Jr. wrote:

how about just

gr - addSeqinfo(gr, hg19)


mmh, we don't really add a seqinfo. We transform the existing one.
This transformation can be called standardization, or normalization,
or... but I wouldn't call it addition.

H.



Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Fri, Jun 5, 2015 at 1:36 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

On 06/05/2015 01:19 PM, Michael Lawrence wrote:

To support the multi-genome case, one could set the genome as a
vector, one value for each seqname, and it would fix the
style/seqlength per seqname. It could sort by the combination of
seqname and species. Presumably it would do nothing for unknown
genomes.

But I agree that a standardizeSeqinfo() that amounts to
genome(x) -
genome(x) would make sense.

I don't think people sort too often by seqnames (except to the
natural ordering),


That's what sort(), order(), rank() do by default: they sort by seqnames
first, then by start, then by end, and finally by strand.

but I could be wrong. I do sympathize though with
the need for a low-level accessor. At least one would want a
parameter
for disabling the standardization.


Ok. So the candidates are:

  (a) standardizeSeqinfo(gr) - hg19

  (b) gr - standardizeSeqinfo(gr, hg19)

  (c) standardizeGenome(gr) - hg19

  (d) gr - standardizeGenome(gr, hg19)

  (e) seqinfo(gr) - hg19

Is there a risk of confusion with keepStandardChromosomes where
standard means a very different thing? I'll add 2 more:

  (f) normalizeSeqinfo(gr) - hg19

  (g) gr - normalizeSeqinfo(gr, hg19)

Anyway, we're not here yet. As pointed in an earlier post, there are
still some missing pieces to complete the puzzle.

Thanks,
H.


On Fri, Jun 5, 2015 at 12:54 PM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com
mailto:kasperdanielhan...@gmail.com wrote:

In WGBS we frequently sequence a human with spikein from the
lambda genome.
In this case, most of the chromosomes of the Granges are
from human, except
one.  This is a usecase where genome(GR) is not constant.  I
suggest, partly
for compatibility, to keep genome, but perhaps do something like
standardizeGenome()
or something like this.

I would indeed love, love, love a function which just cleans
it up.

Kasper

On Fri, Jun 5, 2015 at 2:51 PM, Gabe Becker
becker.g...@gene.com mailto:becker.g...@gene.com wrote:


Herve,

This is probably a naive question, but what usecases are
there for
creating
an object with the wrong seqinfo for its genome?

~G

On Fri, Jun 5, 2015 at 11:43 AM, Michael Lawrence
lawrence.mich...@gene.com
mailto:lawrence.mich...@gene.com

wrote:


On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
wrote:

I also think that we're heading towards
something like that.

So genome(gr) - hg19 would:

(a) Add any missing information to the seqinfo.
(b) Sort the seqlevels in canonical order.
(c) Change the seqlevels style to UCSC style
if they are not.

The 3 tasks are orthogonal. I guess most of the
times people want
an easy way to perform them all at once.

We could easily support (a) and (b). This
assumes that the current
seqlevels are already valid hg19 seqlevels:

si1 - Seqinfo(c(chrX, chrUn_gl000249,
chr2, chr6_cox_hap2))
gr1 - GRanges(seqinfo=si1)
hg19_si - Seqinfo(genome=hg19)

## (a):
seqinfo(gr1) - merge(seqinfo(gr1),
hg19_si)[seqlevels(gr1)]
seqinfo(gr1)
# Seqinfo object with 4 sequences (1
circular) from hg19 genome:
#   seqnames   seqlengths isCircular genome
#   chrX155270560  FALSE   hg19
#   chrUn_gl000249  38502  FALSE   hg19
#   chr2

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Hervé Pagès

On 06/05/2015 01:48 PM, Gabe Becker wrote:

I dunno, standardizeSeqInfo just seems really long for a function name
users are going to have to call.

At the risk of annoying Herve further, what about

gr - castSeqInfo(gr, gh19)


gr!



?

~G

On Fri, Jun 5, 2015 at 1:46 PM, Tim Triche, Jr. tim.tri...@gmail.com
mailto:tim.tri...@gmail.com wrote:

maybe standardizeSeqinfo or fixSeqinfo is clearer after all

Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Fri, Jun 5, 2015 at 1:41 PM, Gabe Becker becker.g...@gene.com
mailto:becker.g...@gene.com wrote:



On Fri, Jun 5, 2015 at 1:39 PM, Tim Triche, Jr.
tim.tri...@gmail.com mailto:tim.tri...@gmail.com wrote:

how about just

gr - addSeqinfo(gr, hg19)


Add sounds like it's, well, adding rather than replacing (Which
it sometimes would do.

gr - fixSeqInfo(gr, hg19)

instead?

~G


--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research





--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-04 Thread Hervé Pagès
] BSgenome.Dmelanogaster.UCSC.dm6

BSgenome.Drerio.UCSC.danRer5


[15] BSgenome.Drerio.UCSC.danRer6

BSgenome.Drerio.UCSC.danRer7


[17] BSgenome.Ecoli.NCBI.20080805
BSgenome.Gaculeatus.UCSC.gasAcu1
[19] BSgenome.Ggallus.UCSC.galGal3

BSgenome.Ggallus.UCSC.galGal4


[21] BSgenome.Hsapiens.NCBI.GRCh38  BSgenome.Hsapiens.UCSC.hg17

[23] BSgenome.Hsapiens.UCSC.hg18BSgenome.Hsapiens.UCSC.hg19

[25] BSgenome.Hsapiens.UCSC.hg38
BSgenome.Mfascicularis.NCBI.5.0
[27] BSgenome.Mfuro.UCSC.musFur1

BSgenome.Mmulatta.UCSC.rheMac2


[29] BSgenome.Mmulatta.UCSC.rheMac3

BSgenome.Mmusculus.UCSC.mm10


[31] BSgenome.Mmusculus.UCSC.mm8BSgenome.Mmusculus.UCSC.mm9

[33] BSgenome.Ptroglodytes.UCSC.panTro2
BSgenome.Ptroglodytes.UCSC.panTro3
[35] BSgenome.Rnorvegicus.UCSC.rn4

BSgenome.Rnorvegicus.UCSC.rn5


[37] BSgenome.Rnorvegicus.UCSC.rn6
BSgenome.Scerevisiae.UCSC.sacCer1
[39] BSgenome.Scerevisiae.UCSC.sacCer2
BSgenome.Scerevisiae.UCSC.sacCer3
[41] BSgenome.Sscrofa.UCSC.susScr3

BSgenome.Tguttata.UCSC.taeGut1





Am I insane for suggesting this?  It would make things a little easier

for

rtracklayer, most SummarizedExperiment and SE-derived objects, blah,

blah,

blah...


Best,

--t




Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] \alias{} -- rather \concept{} for conceptual links to help pages

2015-05-18 Thread Hervé Pagès

Hi Martin,

On 05/18/2015 05:14 AM, Martin Maechler wrote:

 From R-help, subject  Variable number of loops
I've opened a new thread, moving from R-help to R-devel ..


Jim Lemon drjimle...@gmail.com
 on Sun, 17 May 2015 09:19:06 +1000 writes:


  Hi all, Given the number of help requests that involve
  permutations/combinations, and the less than obvious
  naming of the expand.grid function, perhaps adding an
  alias such as permute.elements or combine.elements
  might ease the tasks of both searchers and those offering
  help. Neither of the above names appear to be used at
  present.

  Jim


Using \alias{} is not a very good thing here, since as you know they
are *key*s that must remain unique if possible and they can be
linked to -- which I think would not be helpful for  'expand.grid'.


It seems to me that Jim was maybe suggesting to define an alias for the
expand.grid function i.e. something like:

  permute.elements - expand.grid

or

  combine.elements - expand.grid

as a way to address the less than obvious naming of the expand.grid
function. But maybe I misunderstood...

Cheers,
H.



Rather, for quite a few years now, we have had \concept{} for
adding search keywords, i.e., things that
help.search()  and hence ??topic  will find.

The other advantage of \concept{} is that you can use short
phrases, i.e.,

\concept{all variable combinations}

would be possible here.

(Better wording proposals for this specific case are welcome! --
  maybe privately).

Martin Maechler, ETH Zurich


  On Sun, May 17, 2015 at 5:54 AM, Bert Gunter
  gunter.ber...@gene.com wrote:
  1. Please always reply to the list unless there is a
  compelling reason to keep the discussion private. You
  will have a better chance of getting something useful
  that way.
 
  2. I don't know what you mean by I don't have a fixed
  number of variables. You have to specify at least the
  number of variables and how many levels each has in order
  to work out what you requested, which is **NOT** the
  number of permutations but the number of combinations
  AFAICS, which is exactly what expand.grid will give you.
 
  3. Maybe what you're looking for is the ... arguments in
  function calls, which would be used along the lines of:
 
  myfun - function( x,y,...)  { ## some code combs -
  expand.grid(...)  ## some more code }
 
  Any good R tutorial will tell you about this if this is
  unfamiliar.
 
  4. Another possibility might be to deliver a list of
  named variables as an argument and then use do.call, e.g.
 
  myfun - (x,y, alist) { ## some code combs -
  do.call(expand.grid, alist) ## some more code }
 
  ?do.call and/or a tutorial for details.
 
  5. Otherwise, maybe someone else can figure out what
  you're looking for.
 
  Cheers, Bert
 
 
 
  Bert Gunter Genentech Nonclinical Biostatistics (650)
  467-7374
 
  Data is not information. Information is not
  knowledge. And knowledge is certainly not wisdom.
  Clifford Stoll
 
 
 
 
  On Sat, May 16, 2015 at 11:16 AM, WRAY NICHOLAS
  nicholas.w...@ntlworld.com wrote:
  I might be but doesn't expand.grid need a defined and
  listed number of inputs?  The problem I'm having is that
  the number of variables is not fixed, so I'm not sure
  whether I can reference the variable number of variables
  by using a vector -- haven't had time to try yet But
  thanks anyway Nick Wray
 
  On 16 May 2015 at 14:28, Bert Gunter
  gunter.ber...@gene.com wrote:
 
  Are you trying to reinvent ?expand.grid ?
 
  -- Bert
 
  Bert Gunter Genentech Nonclinical Biostatistics (650)
  467-7374
 
  Data is not information. Information is not
  knowledge. And knowledge is certainly not wisdom.
  Clifford Stoll
 
  [...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected failure when calling new() with unnamed arg and

2015-05-13 Thread Hervé Pagès

Thanks Martin for looking into this.  H.

On 05/13/2015 03:57 AM, Martin Maechler wrote:

Hervé Pagès hpa...@fredhutch.org
 on Tue, 12 May 2015 15:18:42 -0700 writes:



Hi,



The man page for new() suggests that if 'a' is an object with slots
slot1 and slot2 and C is a class that extends the class of 'a',
then the 2 following calls should be equivalent:



new(C, a, ...)
new(C, slot1=a@slot1, slot2=a@slot2, ...)



This is generally the case but I just ran into a situation where it's
not. In the following example the former fails while the latter works:



setClass(A, representation(slot1=numeric, slot2=logical))
setClass(B, contains=A, representation(design=formula))
setClass(C, contains=B)
a - new(A, slot1=77, slot2=TRUE)



new(C, a, design=x ~ y)  # fails
new(C, slot1=a@slot1, slot2=a@slot2, design=x ~ y)  # works



Note that new(B, a, design=x ~ y) works so the 3-level class
hierarchy is really needed in order to reproduce.



Probably related to this, I also noted that new(B) and/or new(C)
return invalid objects:



c - new(C)



validObject(c)
# Error in validObject(c) :
#  invalid class “C” object: invalid object for slot design
#  in class C: got class S4, should be or extend class formula



is(c@design, formula)
# [1] FALSE



class(c@design)
# [1] S4



Note that 'c' can be fixed:



c@design - formula(NULL)



validObject(c)
# [1] TRUE



Maybe something that the default initialize method should take care
of?



Another singularity that is maybe at the root of all of this is that
the formula S4 class is virtual:



showClass(formula)
# Virtual Class formula [package methods]
#
# Slots:
#
# Name:   .S3Class
# Class: character
#
# Extends: oldClass



so a bare call to new(formula) fails:



new(formula)
# Error in new(formula) :
#   trying to generate an object from a virtual class (formula)



Shouldn't new(formula) just return an empty S3 formula (like
formula(NULL) does), in the same way that new(integer) returns
an empty ordinary integer vector?


Interesting .. and at least worth following.

One problem and historical reason for the current setup seems
that the formula S3 class is not from 'base' but 'stats' :

R's source, src/library/methods/R/BasicClasses.R,
lines 524 ff has the following comment block

|  .OldClassesPrototypes is a list of S3 classes for which prototype
|  objects are known  reasonable.  The classes will reappear in
|  .OldClassesList, but will have been initialized first in
|  .InitBasicClasses.  NB:  the methods package will NOT set up
|  prototypes for S3 classes except those in package base and for ts
|  (and would rather not do those either).  The package that owns the
|  S3 class should have code to call setOldClass in its
|  initialization.

So, when John Chambers wrote this, he envisioned that the
'stats' package would do the correct thing for its own classes.
OTOH, as history went, the stats package was never allowed to
depend on methods.
There are many other S3 classes from 'stats' which also end up
similarly, being defined via  setOldClass() and that itself
produces a VIRTUAL class.
Also, another part of the (R source) comment above is no longer
quite accurate, e.g., data.frame is in .OldClassesPrototypes
but not in .OldClassesList ...

As I do agree that formula is much more basic than these other classes,
I'm currently looking at tweaks to the methods (and stats) code,
to get this to work as indeed - you mentioned above -  we
already allow empty S3 formula objects anyway.

... half an hour later : Indeed, I've been able to use the above information
such that  new(formula) and new(formula, y ~ x)
work.

However, your code above now --- with my changes --- would  fail :

   setClass(A, representation(slot1=numeric, slot2=logical))
   setClass(B, contains=A, representation(design=formula))
   setClass(C, contains=B)
  Error in reconcilePropertiesAndPrototype(name, slots, prototype, 
superClasses,  :
B is not eligible to be the data part of another class (must be a basic 
class or a virtual class with no slots)
  

So, I am not yet committing my changes to R-devel.
Hopefully more on this, later today.

Martin Maechler,
ETH Zurich



Thanks,
H.



   sessionInfo()
R version 3.2.0 Patched (2015-04-17 r68202)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS



--
Hervé Pagès
Fred Hutchinson Cancer Research Center


  [..]




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unexpected failure when calling new() with unnamed arg and

2015-05-12 Thread Hervé Pagès

Hi,

The man page for new() suggests that if 'a' is an object with slots
slot1 and slot2 and C is a class that extends the class of 'a',
then the 2 following calls should be equivalent:

  new(C, a, ...)
  new(C, slot1=a@slot1, slot2=a@slot2, ...)

This is generally the case but I just ran into a situation where it's
not. In the following example the former fails while the latter works:

  setClass(A, representation(slot1=numeric, slot2=logical))
  setClass(B, contains=A, representation(design=formula))
  setClass(C, contains=B)
  a - new(A, slot1=77, slot2=TRUE)

  new(C, a, design=x ~ y)  # fails
  new(C, slot1=a@slot1, slot2=a@slot2, design=x ~ y)  # works

Note that new(B, a, design=x ~ y) works so the 3-level class
hierarchy is really needed in order to reproduce.

Probably related to this, I also noted that new(B) and/or new(C)
return invalid objects:

  c - new(C)

  validObject(c)
  # Error in validObject(c) :
  #  invalid class “C” object: invalid object for slot design
  #  in class C: got class S4, should be or extend class formula

  is(c@design, formula)
  # [1] FALSE

  class(c@design)
  # [1] S4

Note that 'c' can be fixed:

  c@design - formula(NULL)

  validObject(c)
  # [1] TRUE

Maybe something that the default initialize method should take care
of?

Another singularity that is maybe at the root of all of this is that
the formula S4 class is virtual:

  showClass(formula)
  # Virtual Class formula [package methods]
  #
  # Slots:
  #
  # Name:   .S3Class
  # Class: character
  #
  # Extends: oldClass

so a bare call to new(formula) fails:

  new(formula)
  # Error in new(formula) :
  #   trying to generate an object from a virtual class (formula)

Shouldn't new(formula) just return an empty S3 formula (like
formula(NULL) does), in the same way that new(integer) returns
an empty ordinary integer vector?

Thanks,
H.

 sessionInfo()
R version 3.2.0 Patched (2015-04-17 r68202)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] exptData(SummarizedExperiment)

2015-05-12 Thread Hervé Pagès

SummarizedExperiment was just an example. I agree it can be a
little challenging for end users to know where to find a particular
functionality but I'm not sure about using meta packages to address
that. At least I feel we should probably avoid creating new meta
packages out of the blue, with arbitrary limits and possibly endless
discussions about what exactly goes in them. Also I don't think there
is a single core but rather several domain-specific cores.

What about using the existing workflow packages instead?
A workflow package (like the variants package here
http://bioconductor.org/help/workflows/variants/)
covers a specific domain and loading it should load the core
for that domain. Plus the user gets a great vignette as a bonus
to get started so it's not just an empty shell.

There are probably some shortcomings with workflow packages
that would need to be addressed before they can serve as
convenient meta packages though e.g. they're treated too
differently from other BioC packages (e.g. they're not available
via biocLite() and don't show up under the biocViews tree here
http://bioconductor.org/packages/release/BiocViews.html).
Nothing that seems impossible to address though...

H.


On 05/12/2015 03:22 PM, Michael Lawrence wrote:

It's more general than SummarizedExperiment. I think people would
appreciate a simple way to load the core, without having to remember,
for example, that VCF reading is in VariantAnnotation.

On Mon, May 11, 2015 at 9:51 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi Michael,

On 05/11/2015 05:35 PM, Michael Lawrence wrote:

Splitting stuff into different packages is good for modularity, but
tough on the mind of the user. What about having some sort of meta
package that simply loads the core infrastructure packages? Named
something simple like Genomics or GenomicsCore.


Don't know if we need this. For example, for all the
SummarizedExperiment use cases I ran into, the end-user generally
only needs to load the corresponding high-level package (DESeq2,
VariantAnnotation, minfi, GenomicAlignments, etc...) and that takes
care of loading all the low-level infrastructure packages.

H.


On Mon, May 11, 2015 at 5:10 PM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote:

 Hi Tim,

 The SummarizedExperiment class is being replaced with the
 RangedSummarizedExperiment class from the new
SummarizedExperiment
 package. This is a work-in-progress and the name and internal
 representation of the RangedSummarizedExperiment class are not
 finalized yet. The main goal for now is to move all the
 SummarizedExperiment stuff from GenomicRanges to its own
package.

 Anyway, metadata() is the replacement for exptData() on
 RangedSummarizedExperiment objects. It's on my list to add
 an exptData method for backward compatibility.

 Cheers,
 H.


 On 05/11/2015 04:37 PM, Tim Triche, Jr. wrote:

 who determined that breaking this would be a good idea?!?

 R ?SummarizedExperiment
 Help on topic 'SummarizedExperiment' was found in the
following
 packages:

 Package   Library
 GenomicRanges
   /home/tim/R/x86_64-pc-linux-gnu-library/3.2
 SummarizedExperiment

/home/tim/R/x86_64-pc-linux-gnu-library/3.2

R nrows - 200; ncols - 6
 Rcounts - matrix(runif(nrows * ncols, 1,
1e4), nrows)
 RrowRanges - GRanges(rep(c(chr1, chr2),
c(50, 150)),
 +   IRanges(floor(runif(200,
1e5, 1e6)),
 width=100),
 +   strand=sample(c(+, -),
200, TRUE))
 RcolData - DataFrame(Treatment=rep(c(ChIP,
Input), 3),
 + row.names=LETTERS[1:6])
 Rsset -
 SummarizedExperiment(assays=SimpleList(counts=counts),
 +   rowRanges=rowRanges,
colData=colData)
 Rsset
 class: RangedSummarizedExperiment
 dim: 200 6
 metadata(0):
 assays(1): counts
 rownames: NULL
 rowRanges metadata column names(0):
 colnames(6): A B ... E F
 colData names(1): Treatment
 RassayNames(sset)
 [1] counts
 Rassays(sset) - endoapply(assays(sset), asinh)
 Rhead

Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns

2015-05-06 Thread Hervé Pagès

Hi Michael,

I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2
(devel). Both should become available via biocLite() on Friday around
11am (Seattle time). Thanks for your patience.

Cheers,
H.


On 04/30/2015 10:50 AM, Hervé Pagès wrote:

Hi Michael,

Thanks for the reminder. I must confess this slipped out of my radar.
I'll look into it today and will let you know.

H.

On 04/30/2015 08:42 AM, Michael Stadler wrote:

Hi Herve,

I stumbled again over the 'memory not mapped' issue in trimLRpatterns
using updated versions of R and BioC-devel. I guess it does not hit
people very often, but I would highly appreciate if it could be fixed.

Many thanks,
Michael

PS: I can reproduce the issue using the code below under:
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
[1] C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
  [1] ShortRead_1.27.1GenomicAlignments_1.5.1 Rsamtools_1.21.3
  [4] GenomicRanges_1.21.5GenomeInfoDb_1.5.1  BiocParallel_1.3.4
  [7] Biostrings_2.37.0   XVector_0.9.0   IRanges_2.3.6
[10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2

loaded via a namespace (and not attached):
  [1] lattice_0.20-31  bitops_1.0-6 grid_3.2.0
  [4] futile.options_1.0.0 zlibbioc_1.15.0  hwriter_1.3.2
  [7] latticeExtra_0.6-26  futile.logger_1.4.1  lambda.r_1.1.7
[10] Biobase_2.29.0


On 25.04.2014 13:11, Hervé Pagès wrote:

Hi Michael,

Thanks for the report. I'll look into this.

H.

On 04/22/2014 08:29 AM, Michael Stadler wrote:

Dear Herve,

We are hitting a 'memory not mapped' problem when using trimLRpatterns
as detailed below. I did not manage to reproduce it with few sequences,
so I have to refer to a publicly available sequence file with many
reads. Even then, it occasionally runs through without problems.

Also, our use-case may not be typical and be part of the problem -
maybe
the solution will be to change our use of trimLRpatterns.

Here is some code to illustrate/reproduce the problem:

library(Biostrings)
library(ShortRead)

Rpat - TGGAATTCTCGGGTGCCAAGG
maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9))

fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG,
TGGAATTCTCGGGTGCCAAGG))

# the second read is not trimmed because it runs through the adaptor
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#width seq
#[1] 4 
#[2]28 TGGAATTCTCGGGTGCCAAGGTTT

# as a workaround, we pad the adaptor with Ns and
# increase the mismatch tolerance
numNs - 90
maxRmm - c(maxRmm, 1:numNs+max(maxRmm))
Rpat - paste(c(Rpat, rep(N, numNs)), collapse=)

# now, also the second read gets trimmed
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#width seq
#[1] 4 
#[2] 4 

# to trigger the segmentation fault, many reads are needed
download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;,


ERR000916_1.fastq.gz)
fq2 - readFastq(ERR000916_1.fastq.gz)

fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
# *** caught segfault ***
#address 0x7f5109be4fed, cause 'memory not mapped'
#
#Traceback:
# 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
# 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at,
at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type,
auto.reduce.pattern, PACKAGE = Biostrings)
# 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
auto.reduce.pattern)
# 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
   Rfixed)
# 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
Rfixed, ranges)
# 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
 max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
# 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
 max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
#10: eval(expr, envir, enclos)
#11: eval(call, sys.frame(sys.parent()))
#12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges =
TRUE)
#13: trimLRPatterns(subject = fq2, Rpattern = Rpat

Re: [Rd] Shouldn't vector indexing with negative out-of-range index give an error?

2015-05-06 Thread Hervé Pagès
  much on purpose.
 
  Making it a *warning* instead of the original error
  may have been both more cautious and more helpful for
  detecting programming errors.
 
  OTOH, John Chambers, the father of S and hence grandfather of R,
  may have had good reasons why it seemed more logical to silently
  ignore such out of bound negative indices:
  One could argue that
 
  x[-5]  means  leave away the 5-th element of x
 
  and if there is no 5-th element of x, leaving it away should be a no-op.
 
  After all this musing and history detection, my gut decision
  would be to only change the documentation which Ross forgot to change.
 
  But of course, it may be interesting to hear other programmeR's 
feedback on this.
 
  Martin



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-27 Thread Hervé Pagès

On 04/27/2015 02:15 PM, Michael Lawrence wrote:

It would be nice to have a single function call that would hide these
details. It could probably be made more efficient also by avoiding
multiple matching, unnecessary revmap lists, etc. tableAsGRanges() is
not a good name but it conveys what I mean (does that make it actually
good?).


There is nothing specific to GRanges here. We're just reporting the
frequency of unique elements in a metadata column so this belongs to
the extended Vector API in the same way that findMatches/countMatches
do.

H.



On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

On 04/24/2015 11:41 AM, Michael Lawrence wrote:

Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x)
with a
count mcol. Should be easy to support but what should the API be
like?


This was actually the motivating use case for introducing
findMatches/countMatches a couple of years ago:

   ux - unique(x)
   mcols(ux)$Freq - countMatches(ux, x)

Don't know what a good API would be to make this even more
straightforward though. Maybe via some extra argument to unique()
e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
argument of reduce(). Note that unique() could also support the
'with.revmap' arg. Once it does, the 'with.freq' arg can also
be implemented by just calling elementLengths() on the revmap
metadata column.

H.


On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote:

 On 04/24/2015 10:18 AM, Michael Lawrence wrote:

 It is a great idea, but I'm not sure I would use it to
implement
 table(). Allocating those strings will be costly. Don't we
 already have
 the 4-way int hash? Of course, my intuition might be
completely
 off here.


 It does use the 4-way int hash internally. as.character()
is only used
 at the very-end to stick the names on the returned table
object.

 H.



 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
 hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

  Hi Pete,

  Excellent idea. That will make things like table()
work
 out-of-the-box
  on GenomicRanges objects. I'll add that.

  Thanks,
  H.



  On 04/24/2015 09:43 AM, Peter Haverty wrote:

  Would people be interested in having this:

  setMethod(as.character, GenomicRanges,
  function(x) {
  paste0(seqnames(x), :,
start(x), -,
 end(x))
  })

  ?

  I find myself doing that a lot to make unique
names or for
  output that
  goes to collaborators.  I suppose we might
want to tack
 on the
  strand if it
  isn't *.  I have some code for going the other
 direction too,
  if there is
  interest.



  Pete

  
  Peter M. Haverty, Ph.D.
  Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com
mailto:phave...@gene.com mailto:phave...@gene.com
 mailto:phave...@gene.com mailto:phave...@gene.com
mailto:phave...@gene.com mailto:phave...@gene.com

   [[alternative HTML version deleted]]

  ___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
  mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


  --
  Hervé Pagès

  Program in Computational Biology
  Division of Public Health Sciences

Re: [Bioc-devel] Biobase: Imports repeats Depends

2015-04-27 Thread Hervé Pagès

Hi Henrik,

I know it's not necessary and I know that the WRE manual and
'R CMD check' don't like this but I like to list in Imports what
I import and to list in Depends what I want to see attached to
the search() path. For the same reason that I like to explicitly
export the generic functions that I define in my packages even
if exporting the methods defined on these generics is enough
(because it has the side effect to automatically export the
generic).

More generally it's about expressing intentions directly versus
expressing them in an indirect manner (e.g. by relying on side
effects). I think the latter is wrong. Hope that makes sense.

Cheers,
H.


On 04/25/2015 01:12 PM, Henrik Bengtsson wrote:

It seems unnecessary that BiocGenerics have the same package under
Imports as under Depends.  The former can be dropped.


packageDescription(BiocGenerics)

Package: BiocGenerics
Title: S4 generic functions for Bioconductor
Description: S4 generic functions needed by many Bioconductor packages.
Version: 0.14.0
Author: The Bioconductor Dev Team
Maintainer: Bioconductor Package Maintainer
 maintai...@bioconductor.org
biocViews: Infrastructure
Depends: methods, utils, graphics, stats, parallel
Imports: methods, utils, graphics, stats, parallel
Suggests: Biobase, S4Vectors, IRanges, GenomicRanges, AnnotationDbi,
 oligoClasses, oligo, affyPLM, flowClust, affy, DESeq2, MSnbase,
 annotate, RUnit
License: Artistic-2.0
Collate: S3-classes-as-S4-classes.R normarg-utils.R update-utils.R
 .
NeedsCompilation: no
Packaged: 2015-04-17 03:42:27 UTC; biocbuild
Built: R 3.3.0; ; 2015-04-25 20:08:25 UTC; windows

/Henrik

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-27 Thread Hervé Pagès

On 04/24/2015 11:41 AM, Michael Lawrence wrote:

Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a
count mcol. Should be easy to support but what should the API be like?


This was actually the motivating use case for introducing
findMatches/countMatches a couple of years ago:

  ux - unique(x)
  mcols(ux)$Freq - countMatches(ux, x)

Don't know what a good API would be to make this even more
straightforward though. Maybe via some extra argument to unique()
e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
argument of reduce(). Note that unique() could also support the
'with.revmap' arg. Once it does, the 'with.freq' arg can also
be implemented by just calling elementLengths() on the revmap
metadata column.

H.



On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

On 04/24/2015 10:18 AM, Michael Lawrence wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we
already have
the 4-way int hash? Of course, my intuition might be completely
off here.


It does use the 4-way int hash internally. as.character() is only used
at the very-end to stick the names on the returned table object.

H.



On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x), -,
end(x))
 })

 ?

 I find myself doing that a lot to make unique names or for
 output that
 goes to collaborators.  I suppose we might want to tack
on the
 strand if it
 isn't *.  I have some code for going the other
direction too,
 if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com
mailto:phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 Phone: (206) 667-5791 tel:%28206%29%20667-5791
tel:%28206%29%20667-5791
 Fax: (206) 667-1319 tel:%28206%29%20667-1319
tel:%28206%29%20667-1319


 ___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 11:08 AM, Peter Haverty wrote:

Good catch. We'll want the strand in case we need to go back to a GRanges.
I would make the strand addition optional with the default of FALSE.
It's nice to have a column of strings you can paste right into a genome
browser (sorry Michael :-) ).  I often pass my bench collaborators a
spreadsheet with such a column.


as.character(unstrand(gr)) ?

3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default:

(1) Many functions and methods in GenomicRanges/GenomicAlignments
have an 'ignore.strand' argument. For consistency, the default
value has been set to FALSE everywhere. Note that this was done
even if this default doesn't reflect the most common use case
(e.g. summarizeOverlaps).

(2) I think it's good to have the default behavior of as.character()
allow going back and forth between GRanges and character vector
without losing the strand information.

(3) The table method for Vector would break if as.character was
ignoring the strand by default. Can be worked-around by
implementing a method for GenomicRanges objects but...

Hope that makes sense.

H.




Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com

On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

On 04/24/2015 10:21 AM, Michael Lawrence wrote:

Sorry, one more concern, if you're thinking of using as a range
key, you
will need the strand, but many use cases might not want the
strand on
there. Like for pasting into a genome browser.


What about appending the strand only for GRanges objects that
have at least one range that is not on *?

setMethod(as.character, GenomicRanges,
 function(x)
 {
 if (length(x) == 0L)
 return(character(0))
 ans - paste0(seqnames(x), :, start(x), -, end(x))
 if (any(strand(x) != *))
   ans - paste0(ans, :, strand(x))
 ans
 }
)

  as.character(gr)
  [1] chr1:1-10  chr2:2-10  chr2:3-10  chr2:4-10  chr1:5-10
  [6] chr1:6-10  chr3:7-10  chr3:8-10  chr3:9-10  chr3:10-10

  strand(gr)[2:3] - c(-, +)
  as.character(gr)
  [1] chr1:1-10:*  chr2:2-10:-  chr2:3-10:+  chr2:4-10:*
chr1:5-10:*
  [6] chr1:6-10:*  chr3:7-10:*  chr3:8-10:*  chr3:9-10:*
chr3:10-10:*

H.


On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence
micha...@gene.com mailto:micha...@gene.com
mailto:micha...@gene.com mailto:micha...@gene.com wrote:

 It is a great idea, but I'm not sure I would use it to
implement
 table(). Allocating those strings will be costly. Don't we
already
 have the 4-way int hash? Of course, my intuition might be
completely
 off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
 out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x),
-, end(x))
 })

 ?

 I find myself doing that a lot to make unique names
or for
 output that
 goes to collaborators.  I suppose we might want to
tack on
 the strand if it
 isn't *.  I have some code for going the other
direction
 too, if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com
mailto:phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 10:18 AM, Michael Lawrence wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we already have
the 4-way int hash? Of course, my intuition might be completely off here.


It does use the 4-way int hash internally. as.character() is only used
at the very-end to stick the names on the returned table object.

H.




On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi Pete,

Excellent idea. That will make things like table() work out-of-the-box
on GenomicRanges objects. I'll add that.

Thanks,
H.



On 04/24/2015 09:43 AM, Peter Haverty wrote:

Would people be interested in having this:

setMethod(as.character, GenomicRanges,
function(x) {
paste0(seqnames(x), :, start(x), -, end(x))
})

?

I find myself doing that a lot to make unique names or for
output that
goes to collaborators.  I suppose we might want to tack on the
strand if it
isn't *.  I have some code for going the other direction too,
if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319


___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Error on Windows when selecting a single chromosome from a BSgenome

2015-04-21 Thread Hervé Pagès

Hi Bernat,

Sorry for the delay in getting back to you. Like Dan, I cannot
reproduce this either. Without more input from you (e.g. sessionInfo()),
there is not much we can do.

Note that an easy way to get the coordinates of the masked regions in
a data.frame is with something like this:

  path - system.file(extdata, chr1.masks.rda,
package=BSgenome.Hsapiens.UCSC.hg19.masked)
  load(path)
  as.data.frame(chr1.masks[[AGAPS]])

This is for the assembly gaps on chr1. You can do the same for any
of the 3 other types of masks (intra-contig ambiguities, RepeatMasker,
Tandem Repeats Finder) on any other chromosome/scaffold.

Cheers,
H.


On 03/20/2015 02:50 PM, Dan Tenenbaum wrote:



- Original Message -

From: Bernat Gel b...@imppc.org
To: bioc-devel@r-project.org
Sent: Friday, March 20, 2015 8:06:59 AM
Subject: [Bioc-devel] Error on Windows when selecting a single chromosome   
from a BSgenome

Hi all,

We are trying to get our first package (regioneR) included into
Bioconductor. A couple of months ago it passed all the automatic
checks
but now, without code changes from our part fails on windows.

I've been digging and it seems the problem arises when selecting a
single chromosome from a BSgenome object.

library(BSgenome.Hsapiens.UCSC.hg19.masked)
BSgenome.Hsapiens.UCSC.hg19.masked[[chr1]]

This works in linux and mac but in windows it kills R without any
error
message. Any idea of what's going on?

Our final aim when doing that is to have a data.frame with the
genomic
coordinates of the masked regions, so if there's an alternative
approach
we could use it as a workaround.



I can't reproduce this. Can you send the output of sessionInfo() (run after 
loading BSgenome.Hsapiens.UCSC.hg19.masked but before the line that causes the 
crash)?

Dan



Thanks a lot!

Bernat

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] coverage(gr, weight='score') does not work when score(gr) is an Rle

2015-04-20 Thread Hervé Pagès

Hi,

On 04/17/2015 09:42 AM, Hervé Pagès wrote:

Hi,

I think we should just expand the Rle internally. That will produce
a numeric vector of the length of the GRanges i.e. it will be the
same size as the start and end components of the GRanges object itself.
No big deal at all.

I'll make that change.


Done in GenomicRanges 1.20.2 (release) and 1.21.1 (devel).

Cheers,
H.



H.

On 04/17/2015 09:00 AM, Michael Lawrence wrote:

Ideally it should be supported, but it would take some work as the
coverage
stuff is all in C. Could you give more details on your use case? For
example, if you already have a range for every position on the
chromosome,
you could just extract the score column. I'm guessing it's more
complicated
than that. If the zeros are the problem, you could just subset the
GRanges
to remove the ranges with zero score, and then coerce the score to
numeric
before calling coverage.

Michael

2015-04-17 8:00 GMT-07:00 Philip Lijnzaad p.lijnz...@umcutrecht.nl:


Dear all,  I'm puzzled by the following behaviour:

Given

 n - 10
 gr - GRanges(seqnames=Rle('A', n),
   ranges=IRanges(1:n, width=1),
   score=Rle(5,n))
If I do

 coverage(gr,weight='score')

I get

 Error in .normarg_shift_or_weight(weight, weight, x) :
   'weight' must be a numeric vector, a single string, or a
list-like
object

Surely 'score' should be allowed to be an Rle? Especially given the
fact that the return value of coverage(x,weight=score) when score is
plain numeric vector is always an Rle ! Is this the expected behaviour?
If so, I would argue that violates the principle of least suprise :-)

The background to this is that I do numerical analysis on derived
numerical data along my chromosomes. It contains many
contiguous zeroes so it would be wasteful to cast
everything down using as.numeric().

This is R version 3.01 on x86_64 Linux, Bioconductor version 2.13,


package.version(IRanges)

[1] 1.20.7

package.version(GenomicRanges)

[1] 1.14.4

Regards,


Philip


--
Philip Lijnzaad, PhD
Molecular Cancer Research
University Medical Center (UMC), Utrecht
Stratenum room 2.211
IM: plijnz...@jabber.org , philip.lijnz...@gmail.com
P.O. Box 85060, 3508 AB Utrecht
(Universiteitsweg 100, 3584 CG Utrecht)
The Netherlands
tel: +31 (0)8875 68464


--


De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
uitsluitend bestemd voor de geadresseerde. Indien u dit bericht
onterecht
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
direct
te informeren door het bericht te retourneren. Het Universitair Medisch
Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de
W.H.W.
(Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat
geregistreerd
bij
de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.

Denk s.v.p aan het milieu voor u deze e-mail afdrukt.


--


This message may contain confidential information and ...{{dropped:10}}


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] coverage(gr, weight='score') does not work when score(gr) is an Rle

2015-04-17 Thread Hervé Pagès

On 04/17/2015 10:00 AM, Michael Lawrence wrote:

Is that the case here? He has an Rle as an mcol in the GRanges, so in
general expanding it will not align with the other components.


Not sure what you mean. Can you give an example?

H.



On Fri, Apr 17, 2015 at 9:42 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi,

I think we should just expand the Rle internally. That will produce
a numeric vector of the length of the GRanges i.e. it will be the
same size as the start and end components of the GRanges object itself.
No big deal at all.

I'll make that change.

H.


On 04/17/2015 09:00 AM, Michael Lawrence wrote:

Ideally it should be supported, but it would take some work as
the coverage
stuff is all in C. Could you give more details on your use case? For
example, if you already have a range for every position on the
chromosome,
you could just extract the score column. I'm guessing it's more
complicated
than that. If the zeros are the problem, you could just subset
the GRanges
to remove the ranges with zero score, and then coerce the score
to numeric
before calling coverage.

Michael

2015-04-17 8:00 GMT-07:00 Philip Lijnzaad
p.lijnz...@umcutrecht.nl mailto:p.lijnz...@umcutrecht.nl:

Dear all,  I'm puzzled by the following behaviour:

Given

  n - 10
  gr - GRanges(seqnames=Rle('A', n),
ranges=IRanges(1:n, width=1),
score=Rle(5,n))
If I do

  coverage(gr,weight='score')

I get

  Error in .normarg_shift_or_weight(weight, weight, x) :
'weight' must be a numeric vector, a single string,
or a list-like
object

Surely 'score' should be allowed to be an Rle? Especially
given the
fact that the return value of coverage(x,weight=score)
when score is
plain numeric vector is always an Rle ! Is this the expected
behaviour?
If so, I would argue that violates the principle of least
suprise :-)

The background to this is that I do numerical analysis on
derived
numerical data along my chromosomes. It contains many
contiguous zeroes so it would be wasteful to cast
everything down using as.numeric().

This is R version 3.01 on x86_64 Linux, Bioconductor version
2.13,

package.version(IRanges)

[1] 1.20.7

package.version(GenomicRanges)

[1] 1.14.4

Regards,


Philip


--
Philip Lijnzaad, PhD
Molecular Cancer Research
University Medical Center (UMC), Utrecht
Stratenum room 2.211
IM: plijnz...@jabber.org mailto:plijnz...@jabber.org ,
philip.lijnz...@gmail.com mailto:philip.lijnz...@gmail.com
P.O. Box 85060, 3508 AB Utrecht
(Universiteitsweg 100, 3584 CG Utrecht)
The Netherlands
tel: +31 (0)8875 68464 tel:%2B31%20%280%298875%2068464



--

De informatie opgenomen in dit bericht kan vertrouwelijk
zijn en is
uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onterecht
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de
afzender
direct
te informeren door het bericht te retourneren. Het
Universitair Medisch
Centrum Utrecht is een publiekrechtelijke rechtspersoon in
de zin van de
W.H.W.
(Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat
geregistreerd
bij
de Kamer van Koophandel voor Midden-Nederland onder nr.
30244197.

Denk s.v.p aan het milieu voor u deze e-mail afdrukt.



--

This message may contain confidential information and
...{{dropped:10}}


___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel

Re: [Rd] behavior of as.integer(5000000000)

2015-04-17 Thread Hervé Pagès



On 04/17/2015 08:24 AM, Martin Maechler wrote:

Martin Maechler maech...@lynne.stat.math.ethz.ch
 on Fri, 17 Apr 2015 15:49:35 +0200 writes:



Hervé Pagès hpa...@fredhutch.org
 on Mon, 13 Apr 2015 23:36:14 -0700 writes:


  On 04/13/2015 11:32 PM, Martin Maechler wrote:
 
  Hi,
   as.integer(50)
  [1] 2147483647
  Warning message:
  inaccurate integer conversion in coercion
 
   as.integer(-50)
  [1] NA
  Warning message:
  inaccurate integer conversion in coercion
 
  Is this a bug or a feature? The man page suggests it's the
  latter:
 
  I think you mean the former, a bug.
 
  and I agree entirely, see the following   2 x 2  comparison :
 
   N - 5 * 8^-(0:7)
   as.integer(N)
  [1] NA NA NA NA 1220703125  152587890  
 190734862384185
  Warning message:
  NAs introduced by coercion
   as.integer(-N)
  [1]  NA  NA  NA  NA -1220703125  
-152587890   -19073486
  [8]-2384185
  Warning message:
  NAs introduced by coercion
   as.integer(as.character(N))
  [1] 2147483647 2147483647 2147483647 2147483647 1220703125  152587890  
 190734862384185
  Warning message:
  inaccurate integer conversion in coercion
   as.integer(as.character(-N))
  [1]  NA  NA  NA  NA -1220703125  
-152587890   -19073486
  [8]-2384185
  Warning message:
  inaccurate integer conversion in coercion
 
 
 
  ‘as.integer’ attempts to coerce its argument to be of integer
  type.  The answer will be ‘NA’ unless the coercion succeeds.
 
  even though someone could always argue that coercion of 50
  succeeded (for some definition of succeed).
 
  Also is there any reason why the warning message is different than
  with:
 
   as.integer(-50)
  [1] NA
  Warning message:
  NAs introduced by coercion
 
  In the case of as.integer(-50), it's not really that the
  conversion was inaccurate, it's a little bit worse than that. And
  knowing that NAs where introduced by coercion is important.
 
  Yes.
  The message is less a problem than the bug, but I agree we
  should try to improve it.

  Sounds good. Thanks Martin,

  I've committed a change to R-devel now, such that also this case
  returns NA with a warning, actually for the moment with both the
  old warning and the   'NAs introduced by coercion' warning.
  The nice thing about the old warning is that it explicitly
  mentions integer coercion.

  I currently think we should keep that property, and I'd propose
  to completely drop the
  inaccurate integer conversion in coercion
  warning (it is not used anywhere else currently) and replace it
  in this and other as.integer(.) cases with

  'NAs introduced by integer coercion'

  (or something similar. ... improvements / proposals are welcome).

Replying to myself:

I've found

  'NAs introduced by coercion to integer range'


I like that we see coercion *to* integer instead of just
integer coercion because the former indicates the direction
of the coercion. I'm not that convinced with the range thing
though. I think

  as.integer(c(78, a34, -50))

should emit only one warning and not try to categorize the
reasons for getting an NA.

Thanks,
H.



to be even more on spot, and so will commit it for today.
Of course, amendment proposals are still welcome.

Martin



  BTW, the fact that as.integer(-50) did produce an NA
  instead of -2147483647 so it would have been compatible with 
as.integer(50)
  was just another coincidence, namely that we currently code NA_integer_
  by INT_MIN (for 32 bit integers, INT_MIN = 2147483648 = 2^31)
  [[but your C code must not rely on that, it is an implementation 
detail!]]

  Martin



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] coverage(gr, weight='score') does not work when score(gr) is an Rle

2015-04-17 Thread Hervé Pagès

Hi,

I think we should just expand the Rle internally. That will produce
a numeric vector of the length of the GRanges i.e. it will be the
same size as the start and end components of the GRanges object itself.
No big deal at all.

I'll make that change.

H.

On 04/17/2015 09:00 AM, Michael Lawrence wrote:

Ideally it should be supported, but it would take some work as the coverage
stuff is all in C. Could you give more details on your use case? For
example, if you already have a range for every position on the chromosome,
you could just extract the score column. I'm guessing it's more complicated
than that. If the zeros are the problem, you could just subset the GRanges
to remove the ranges with zero score, and then coerce the score to numeric
before calling coverage.

Michael

2015-04-17 8:00 GMT-07:00 Philip Lijnzaad p.lijnz...@umcutrecht.nl:


Dear all,  I'm puzzled by the following behaviour:

Given

 n - 10
 gr - GRanges(seqnames=Rle('A', n),
   ranges=IRanges(1:n, width=1),
   score=Rle(5,n))
If I do

 coverage(gr,weight='score')

I get

 Error in .normarg_shift_or_weight(weight, weight, x) :
   'weight' must be a numeric vector, a single string, or a list-like
object

Surely 'score' should be allowed to be an Rle? Especially given the
fact that the return value of coverage(x,weight=score) when score is
plain numeric vector is always an Rle ! Is this the expected behaviour?
If so, I would argue that violates the principle of least suprise :-)

The background to this is that I do numerical analysis on derived
numerical data along my chromosomes. It contains many
contiguous zeroes so it would be wasteful to cast
everything down using as.numeric().

This is R version 3.01 on x86_64 Linux, Bioconductor version 2.13,


package.version(IRanges)

[1] 1.20.7

package.version(GenomicRanges)

[1] 1.14.4

Regards,


Philip


--
Philip Lijnzaad, PhD
Molecular Cancer Research
University Medical Center (UMC), Utrecht
Stratenum room 2.211
IM: plijnz...@jabber.org , philip.lijnz...@gmail.com
P.O. Box 85060, 3508 AB Utrecht
(Universiteitsweg 100, 3584 CG Utrecht)
The Netherlands
tel: +31 (0)8875 68464


--

De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
direct
te informeren door het bericht te retourneren. Het Universitair Medisch
Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de
W.H.W.
(Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd
bij
de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.

Denk s.v.p aan het milieu voor u deze e-mail afdrukt.


--

This message may contain confidential information and ...{{dropped:10}}


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] behavior of as.integer(5000000000)

2015-04-17 Thread Hervé Pagès

Hi Martin,

On 04/17/2015 06:49 AM, Martin Maechler wrote:

Hervé Pagès hpa...@fredhutch.org
 on Mon, 13 Apr 2015 23:36:14 -0700 writes:


  On 04/13/2015 11:32 PM, Martin Maechler wrote:
 
  Hi,
   as.integer(50)
  [1] 2147483647
  Warning message:
  inaccurate integer conversion in coercion
 
   as.integer(-50)
  [1] NA
  Warning message:
  inaccurate integer conversion in coercion
 
  Is this a bug or a feature? The man page suggests it's the
  latter:
 
  I think you mean the former, a bug.
 
  and I agree entirely, see the following   2 x 2  comparison :
 
   N - 5 * 8^-(0:7)
   as.integer(N)
  [1] NA NA NA NA 1220703125  152587890   
190734862384185
  Warning message:
  NAs introduced by coercion
   as.integer(-N)
  [1]  NA  NA  NA  NA -1220703125  
-152587890   -19073486
  [8]-2384185
  Warning message:
  NAs introduced by coercion
   as.integer(as.character(N))
  [1] 2147483647 2147483647 2147483647 2147483647 1220703125  152587890   
190734862384185
  Warning message:
  inaccurate integer conversion in coercion
   as.integer(as.character(-N))
  [1]  NA  NA  NA  NA -1220703125  
-152587890   -19073486
  [8]-2384185
  Warning message:
  inaccurate integer conversion in coercion
 
 
 
  ‘as.integer’ attempts to coerce its argument to be of integer
  type.  The answer will be ‘NA’ unless the coercion succeeds.
 
  even though someone could always argue that coercion of 50
  succeeded (for some definition of succeed).
 
  Also is there any reason why the warning message is different than
  with:
 
   as.integer(-50)
  [1] NA
  Warning message:
  NAs introduced by coercion
 
  In the case of as.integer(-50), it's not really that the
  conversion was inaccurate, it's a little bit worse than that. And
  knowing that NAs where introduced by coercion is important.
 
  Yes.
  The message is less a problem than the bug, but I agree we
  should try to improve it.

  Sounds good. Thanks Martin,

I've committed a change to R-devel now, such that also this case
returns NA with a warning, actually for the moment with both the
old warning and the   'NAs introduced by coercion' warning.
The nice thing about the old warning is that it explicitly
mentions integer coercion.

I currently think we should keep that property, and I'd propose
to completely drop the
inaccurate integer conversion in coercion
warning (it is not used anywhere else currently) and replace it
in this and other as.integer(.) cases with

   'NAs introduced by integer coercion'

(or something similar. ... improvements / proposals are welcome).


Thanks. That's a much better warning message.



BTW, the fact that as.integer(-50) did produce an NA
instead of -2147483647 so it would have been compatible with 
as.integer(50)
was just another coincidence, namely that we currently code NA_integer_
by INT_MIN (for 32 bit integers, INT_MIN = 2147483648 = 2^31)
[[but your C code must not rely on that, it is an implementation detail!]]


Yeah, I suspected that.

Thanks for the fix.

H.



Martin



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] truncated warning messages

2015-04-17 Thread Hervé Pagès

Hi,

I was installing hundreds of packages on a machine with a single call to
install.packages() and after a long time the call to install.packages()
finally returned with the following warnings and errors:

Warning messages:
1: packages ‘hgu133aprobe’, ‘hgu95av2.db’, ‘BSgenome.Celegans.UCSC.ce2’, 
‘BSgenome.Mmusculus.UCSC.mm10’, 
‘BSgenome.Dmelanogaster.UCSC.dm3.masked’, 
‘BSgenome.Hsapiens.UCSC.hg18.masked’, 
‘BSgenome.Hsapiens.UCSC.hg19.masked’, 
‘BSgenome.Mmusculus.UCSC.mm9.masked’, 
‘BSgenome.Scerevisiae.UCSC.sacCer2’, ‘BSgenome.Hsapiens.NCBI.GRCh38’, 
‘BSgenome.Rnorvegicus.UCSC.rn5’, ‘drosophila2probe’, ‘hgu95av2probe’, 
‘hgu95av2cdf’, ‘SNPlocs.Hsapiens.dbSNP.20100427’, 
‘SNPlocs.Hsapiens.dbSNP.20101109’, ‘SNPlocs.Hsapiens.dbSNP.20110815’, 
‘SNPlocs.Hsapiens.dbSNP141.GRCh38’, 
‘XtraSNPlocs.Hsapiens.dbSNP141.GRCh38’, ‘org.Sc.sgd.db’, ‘org.Mm.eg.db’, 
‘org.At.tair.db’, ‘hom.Hs.inp.db’, ‘GO.db’, 
‘TxDb.Hsapiens.UCSC.hg18.knownGene’, 
‘TxDb.Hsapiens.UCSC.hg19.knownGene’, 
‘TxDb.Hsapiens.UCSC.hg38.knownGene’, 
‘TxDb.Mmusculus.UCSC.mm9.knownGene’, 
‘TxDb.Mmusculus.UCSC.mm10.knownGene’, ‘TxDb.Dmelanogaster.UCSC. [... 
truncated]

2: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘humanStemCell’ had non-zero exit status
3: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘AnnotationForge’ had non-zero exit status

As you can see the warning message is truncated so I can't see what
happened to these packages. What about having a message like

  package(s) not available: pkg1, pkg2, pkg3, etc...

so when it's truncated we don't loose the important part of the story?

I guess the same kind of change could be made to the other warning
messages, and, more generally, to any warning/error message obtained
by injecting an arbitrary (and potentially long) list of strings.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] behavior of as.integer(5000000000)

2015-04-14 Thread Hervé Pagès

On 04/13/2015 11:32 PM, Martin Maechler wrote:



Hi,
 as.integer(50)
[1] 2147483647
Warning message:
inaccurate integer conversion in coercion



 as.integer(-50)
[1] NA
Warning message:
inaccurate integer conversion in coercion



Is this a bug or a feature? The man page suggests it's the
latter:


I think you mean the former, a bug.

and I agree entirely, see the following   2 x 2  comparison :

N - 5 * 8^-(0:7)
as.integer(N)
   [1] NA NA NA NA 1220703125  152587890   
190734862384185
   Warning message:
   NAs introduced by coercion
as.integer(-N)
   [1]  NA  NA  NA  NA -1220703125  -152587890  
 -19073486
   [8]-2384185
   Warning message:
   NAs introduced by coercion
as.integer(as.character(N))
   [1] 2147483647 2147483647 2147483647 2147483647 1220703125  152587890   
190734862384185
   Warning message:
   inaccurate integer conversion in coercion
as.integer(as.character(-N))
   [1]  NA  NA  NA  NA -1220703125  -152587890  
 -19073486
   [8]-2384185
   Warning message:
   inaccurate integer conversion in coercion




‘as.integer’ attempts to coerce its argument to be of integer
type.  The answer will be ‘NA’ unless the coercion succeeds.



even though someone could always argue that coercion of 50
succeeded (for some definition of succeed).



Also is there any reason why the warning message is different than
with:



 as.integer(-50)
[1] NA
Warning message:
NAs introduced by coercion



In the case of as.integer(-50), it's not really that the
conversion was inaccurate, it's a little bit worse than that. And
knowing that NAs where introduced by coercion is important.


Yes.
The message is less a problem than the bug, but I agree we
should try to improve it.


Sounds good. Thanks Martin,

H.



Martin



--
Hervé Pagès
...


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] behavior of as.integer(5000000000)

2015-04-13 Thread Hervé Pagès

Hi,

   as.integer(50)
  [1] 2147483647
  Warning message:
  inaccurate integer conversion in coercion

   as.integer(-50)
  [1] NA
  Warning message:
  inaccurate integer conversion in coercion

Is this a bug or a feature? The man page suggests it's the latter:

  ‘as.integer’ attempts to coerce its argument to be of integer
  type.  The answer will be ‘NA’ unless the coercion succeeds.

even though someone could always argue that coercion of 50
succeeded (for some definition of succeed).

Also is there any reason why the warning message is different than
with:

   as.integer(-50)
  [1] NA
  Warning message:
  NAs introduced by coercion

In the case of as.integer(-50), it's not really that the
conversion was inaccurate, it's a little bit worse than that. And
knowing that NAs where introduced by coercion is important.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Suggestion: export function fetchChromLengthsFromEnsembl in GenomicFeatures

2015-04-12 Thread Hervé Pagès

Hi jo,

getChromInfoFromBiomart() is a higher level function that is
exported, documented, and intended to be used by the user for
that, rather than internal helpers fetchChromLengthsFromEnsembl()
and fetchChromLengthsFromEnsemblPlants().

However, after some discussions here with Marc and Sonali, my
understanding is that the GRanges objects made from the Ensembl
GTF files are going to have their seqinfo updated soon so you
won't need that.

H.

On 04/11/2015 12:37 PM, Rainer Johannes wrote:

I wanted to ask whether it would be possible to export the functions 
fetchChromLengthsFromEnsembl and fetchChromLengthsFromEnsemblPlants in 
GenomicFeatures, as I find these methods quite useful to retrieve chromosome lengths.

best, jo

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] New package: coMET

2015-04-08 Thread Hervé Pagès

Hi Tiphaine,

As you're probably already aware, a change in the latest release
of Ensembl breaks its vignette:


http://bioconductor.org/checkResults/3.1/bioc-LATEST/coMET/zin2-buildsrc.html

On D-day, only packages that pass build/check will be made available
to BioC 3.1 users via biocLite(). So please fix coMET asap so it can
make it into BioC 3.1.

Thanks,
H.


On 03/23/2015 12:23 PM, Martin, Tiphaine wrote:

Hi all,

I'm pleased to announce the availability of the coMET package 
(http://bioconductor.org/packages/devel/bioc/html/coMET.html , 
https://github.com/TiphaineCMartin/coMET), which visualises regional 
epigenome-wide association scan (EWAS) results and DNA co-methylation patterns. 
The R package is designed for human epigenetic data, but can also be applied to 
genomic and functional genomic datasets in any species such as gene expression.

coMET exists also in a Shiny-based web application 
(http://www.epigen.kcl.ac.uk/comet), which allows users to run a pre-formated 
version (currently only for human genome).
It is possible also to extract the correlations between omic features with 
their pvalues.

Please let me know of any issues you encounter.

Thanks,
Tiphaine Martin


Tiphaine Martin  I   PhD Student   I  TwinsUK

Department of Twin Research, Kings College London,

St Thomas' Hospital Campus, 3rd Floor South Wing Block D

Westminster Bridge Road, London SE1 7EH, UK



Email : tiphaine.mar...@kcl.ac.ukmailto:tiphaine.mar...@kcl.ac.uk

Direct: +44 (0) 207 188 1508

Fax: +44 (0) 207 188 6761


London Microbiome Meeting, 28th of May 2015: http://londonmicrobiome.org/

Mathematical and Quantitative Genomics conference, 29th of May: 
http://quantgen.soc.srcf.net/qg15/

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] issue about S4 slot has a dist object.

2015-04-01 Thread Hervé Pagès

On 04/01/2015 05:05 PM, Michael Lawrence wrote:

So this explains why I wasn't able to figure out how that package was
importing graph, and, yes, I also thought it was strange that graph did
not export it. The methods package explicitly conditions on the warn
level, so it is apparently intentional. It just looks in the global
class table for duplicates, so it does not pay attention to the
namespace. It's not clear to what extent the methods package assumes
that there are no duplicates in the class table; probably too much work
to fix.

As for sharing class definitions, perhaps the methods package should
define it. It already defines classes from the stats package, like
aov. We could start moving more stuff from BiocGenerics to methods.


Sounds good. The more upstream these class definitions are the better.

Just moved setOldClass(dist) from graph to BiocGenerics 0.13.11 and
exported the dist class.

H.





On Wed, Apr 1, 2015 at 4:29 PM, Martin Morgan mtmor...@fredhutch.org
mailto:mtmor...@fredhutch.org wrote:

On 04/01/2015 03:52 PM, Hervé Pagès wrote:

Hi,

In the same way that we avoid having 2 packages define the same
S4 generic function by moving the shared generic definitions to
BiocGenerics, it seems that we should also avoid having 2 packages
call setOldClass on the same S3 class. Like with S4 generic
functions,
we've already started to do this by putting some setOldClass
statements in BiocGenerics (e.g. we've done it for the 'connection'
classes 'file', 'url', 'gzfile', 'bzfile', etc..., see
class?gzfile).
So if nobody objects we'll do this for the 'dist' class too.

Then you won't need to use setOldClass in your cogena package
Zhilong.
You'll just need to make sure that you import BiocGenerics.


This sounds like an ok work-around to me.

For the hard-core...

One thing is that this is not seen when the package is loaded by
itself, e.g.,

  biocLite(zhilongjia/cogena)
  library(cogena)
 

only when loaded by BiocCheck (on the source directory)

  BiocCheck::BiocCheck(cogena)
* This is BiocCheck, version 1.3.13.
* BiocCheck is a work in progress. Output and severity of issues may
   change.
* Installing package...
Note: the specification for S3 class dist in package 'cogena'
seems equivalent to one from package 'graph': not turning on
duplicate class definitions for this class.
^C

This is because BiocCheck (indirectly?) Imports: graph. But the old
class definition seems to 'leak' (even though graph is not on the
search path, and the dist old class is not exported from graph, and
BiocCheck doesn't import the non-exported dist class, and cogena
doesn't Depend or Import graph!)

Also of interest perhaps is that the Note is only printed when
warn=1 (which BiocCheck also uses)

(new R session)

  requireNamespace(graph)
Loading required namespace: graph
  requireNamespace(cogena)
Loading required namespace: cogena
  q()

(new R session:)

  options(warn=1)
  requireNamespace(graph)
Loading required namespace: graph
  requireNamespace(cogena)
Loading required namespace: cogena

Note: the specification for S3 class dist in package 'cogena'
seems equivalent to one from package 'graph': not turning on
duplicate class definitions for this class.
 



Cheers,
H.


On 04/01/2015 03:28 PM, Michael Lawrence wrote:

Using setOldClass is generally fine. In this case, the graph
package is
already defining the dist class, so you could just import
that. The graph
package might have to export it though.

On Wed, Apr 1, 2015 at 3:15 PM, Zhilong Jia
zhilong...@gmail.com mailto:zhilong...@gmail.com wrote:

Hi,

Here is the package.
(https://tracker.bioconductor.__org/issue1204
https://tracker.bioconductor.org/issue1204 or
https://github.com/zhilongjia/__cogena
https://github.com/zhilongjia/cogena; ). When I
biocCheck it, there is a
note.

Note: the specification for S3 class “dist” in package
‘cogena’ seems
equivalent to one from package ‘graph’: not turning on
duplicate class
definitions for this class.


In the source code, there are two R files are related
with this issue,
cogena_class.R
and

https://github.com/__zhilongjia/cogena/blob/master/__R/cogena_class.R

https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R
dist_class.R

https://github.com/__zhilongjia/cogena/blob/master/__R

Re: [Bioc-devel] request to create BSgenome Bos_taurus_UMD3.1.1

2015-03-27 Thread Hervé Pagès

So I went ahead and did BSgenome.Btaurus.UCSC.bosTau8. It should
become available via biocLite() in the next couple of hours (in
BioC devel only).

H.

On 03/26/2015 12:54 AM, Hervé Pagès wrote:

Hi Byungkuk,

On 03/25/2015 10:17 PM, Byungkuk Min wrote:

Dear Dr.Pages,

I've been trying to build a BSgenome for Bos_taurus_UMD3.1.1 from NCBI
http://www.ncbi.nlm.nih.gov/assembly/GCF_03055.6/.
But I'm getting the same error message...


library(BSgenome)
forgeBSgenomeDataPkg(/PATH/TO/genome.fa)

Error: Line starting '1 ...' is malformed!

I don't have much knowledge of R, so creating a custom BSgenome seems out
of my league.

I would like to kindly request the bovine BSgenome package.


Do you really need *that* particular assembly (GCF_03055.6).
Otherwise, there are already some bovine BSgenome packages available:

library(BSgenome)
grep(Btaurus, available.genomes(), value=TRUE)
   [1] BSgenome.Btaurus.UCSC.bosTau3
   [2] BSgenome.Btaurus.UCSC.bosTau3.masked
   [3] BSgenome.Btaurus.UCSC.bosTau4
   [4] BSgenome.Btaurus.UCSC.bosTau4.masked
   [5] BSgenome.Btaurus.UCSC.bosTau6
   [6] BSgenome.Btaurus.UCSC.bosTau6.masked

We don't have bosTau8 yet, which is the latest bovine assembly available
at UCSC (they added it in June 2014) but I could add it. Note that
despite its name (also Bos_taurus_UMD_3.1.1), bosTau8 is not the same
assembly as the one you picked up on NCBI. Yours is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_03055.6

(the latest, from 2014/11/25), but bosTau8 is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_03055.5

(much older, from 2009/12/01)

Anyway, if we ignore chrM and the thousands of scaffolds that are
included in these assemblies, the sequences of the main chromosomes
(i.e. chr1 to chr29 + chrX) are exactly the same in the 2 assemblies.

So maybe the BSgenome package for bosTau8 will do?

Note that the main advantage of making a BSgenome package for a
UCSC assembly (instead of using an NCBI assembly) is that the BSgenome
object then interoperates nicely with the TxDb object that one can
easily make from one of the UCSC tracks available for that assembly
(using the makeTxDbFromUCSC() function from the GenomicFeatures
package).

H.



Thank you!




*Byungkuk Min*
Ph.D Candidate
University of Science  Technology, http://ust.ac.kr
Korea Research Institute of Bioscience and Biotechnology
http://www.kribb.re.kr
125 Gwahangno, Yuseong-Gu, Daejeon, Korea (305-806)
Laboratory : 042-860-4423
FAX : 042-860-4608
Cell phone : 010-5209-7377

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] zero-width ranges representing insertions

2015-03-17 Thread Hervé Pagès



On 03/16/2015 07:36 PM, Valerie Obenchain wrote:

On 03/16/2015 05:31 PM, Hervé Pagès wrote:

On 03/16/2015 04:06 PM, Michael Lawrence wrote:



On Mon, Mar 16, 2015 at 3:12 PM, Robert Castelo robert.cast...@upf.edu
mailto:robert.cast...@upf.edu wrote:

+1 IMO BioC could adopt the zero-width ranges representation for
insertions, adapting readVcf(), writeVcf(), XtraSNPlocs.*, etc., to
deal with each corresponding beast, be VCF, dbSNP or the like. Who
knows, VCF could also change their representation in the future and
it'll be a headache to update the affected packages if we decide to
keep using its insertion representation internally to store variant
ranges in BioC.


That would break just about every tool, so let's hope not. There's a
bunch of code on top of Bioc that currently depends on the current
representation. For example, zero width ranges do not overlap anything,
so they need special treatment to e.g. detect whether an insertion falls
within a gene. There are real benefits to keeping the representation of
indels consistent with the rest of the field (VCF). There was much
thought put into this.


Note that findOverlaps() now handles zero-width ranges.



predictCoding and locateVariants had to work around the fact that
findOverlaps didn't work on zero-length ranges. Now that we can handle
zero-length overlaps this code should be updated (yes, my fault for not
updating).

How predictCoding and locateVariants currently handle insertions can be
modified. The current behavior (bug) should not have no bearing on how
we want to represent indels in the VCF class in general.

I agree with Michael that much thought went into making the VCF class
consistent with the VCF specs. While information has been added to the
file format over the years I am not aware of any changes related to the
positional representation of the variant (ie, it has been stable) and
think it unlikely this would change in the future.


Just to clarify, my point was that the ranges of the insertions as
currently reported in VCF and VRanges objects cannot be used as-is
in findOverlaps() to detect whether an insertion falls within a gene
or where is falls exactly within a gene. So these ranges need special
treatment like zero-width ranges do.

H.



Val




Straight use of findOverlaps() on the ranges of a VCF object leads to
some subtle problems on insertions. For example predictCoding() (which
I guess uses findOverlaps() internally) reports strange things for
these 2 insertions (1 right before and 1 right after the stop codon):

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb - TxDb.Hsapiens.UCSC.hg19.knownGene

  cdsBy(txdb, use.names=TRUE)$uc002wcw.3
GRanges object with 2 ranges and 3 metadata columns:
   seqnames ranges strand |cds_idcds_name exon_rank
  Rle  IRanges  Rle | integer character integer
   [1]chr20 [68351, 68408]  + |206100NA 1
   [2]chr20 [76646, 77058]  + |206101NA 2
   ---
   seqinfo: 93 sequences (1 circular) from hg19 genome


library(VariantAnnotation)
  rowRanges(vcf)  # hand-made VCF
GRanges object with 2 ranges and 5 metadata columns:
 seqnames ranges strand | paramRangeID
Rle  IRanges  Rle | factor
   ins before stop codonchr20 [77055, 77055]  * | NA
ins after stop codonchr20 [77058, 77058]  * | NA
REFALT  QUAL
FILTER
 DNAStringSet DNAStringSetList numeric
character
   ins before stop codon  T TG70
PASS
ins after stop codon  A AG70
PASS
   ---
   seqinfo: 1 sequence from hg19 genome

Calling predictCoding():

  library(BSgenome.Hsapiens.UCSC.hg19)
  predictCoding(vcf, txdb, Hsapiens)
GRanges object with 2 ranges and 17 metadata columns:
 seqnames ranges strand | paramRangeID
Rle  IRanges  Rle | factor
   ins before stop codonchr20 [77055, 77055]  + | NA
ins after stop codonchr20 [77058, 77058]  + | NA
REFALT  QUAL
FILTER
 DNAStringSet DNAStringSetList numeric
character
   ins before stop codon  T TG70
PASS
ins after stop codon  A AG70
PASS
  varAllele CDSLOCPROTEINLOC
QUERYID
 DNAStringSet  IRanges IntegerList
integer
   ins before stop codon TG [468, 468]
156 1
ins after stop codon AG [471, 471]
157 2
TXID CDSID  GENEID
CONSEQUENCE
 character IntegerList character
factor
   ins before stop codon   70477

Re: [Bioc-devel] zero-width ranges representing insertions

2015-03-16 Thread Hervé Pagès

On 03/16/2015 04:06 PM, Michael Lawrence wrote:



On Mon, Mar 16, 2015 at 3:12 PM, Robert Castelo robert.cast...@upf.edu
mailto:robert.cast...@upf.edu wrote:

+1 IMO BioC could adopt the zero-width ranges representation for
insertions, adapting readVcf(), writeVcf(), XtraSNPlocs.*, etc., to
deal with each corresponding beast, be VCF, dbSNP or the like. Who
knows, VCF could also change their representation in the future and
it'll be a headache to update the affected packages if we decide to
keep using its insertion representation internally to store variant
ranges in BioC.


That would break just about every tool, so let's hope not. There's a
bunch of code on top of Bioc that currently depends on the current
representation. For example, zero width ranges do not overlap anything,
so they need special treatment to e.g. detect whether an insertion falls
within a gene. There are real benefits to keeping the representation of
indels consistent with the rest of the field (VCF). There was much
thought put into this.


Note that findOverlaps() now handles zero-width ranges.

Straight use of findOverlaps() on the ranges of a VCF object leads to
some subtle problems on insertions. For example predictCoding() (which
I guess uses findOverlaps() internally) reports strange things for
these 2 insertions (1 right before and 1 right after the stop codon):

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb - TxDb.Hsapiens.UCSC.hg19.knownGene

 cdsBy(txdb, use.names=TRUE)$uc002wcw.3
GRanges object with 2 ranges and 3 metadata columns:
  seqnames ranges strand |cds_idcds_name exon_rank
 Rle  IRanges  Rle | integer character integer
  [1]chr20 [68351, 68408]  + |206100NA 1
  [2]chr20 [76646, 77058]  + |206101NA 2
  ---
  seqinfo: 93 sequences (1 circular) from hg19 genome


library(VariantAnnotation)
 rowRanges(vcf)  # hand-made VCF
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | paramRangeID
   Rle  IRanges  Rle | factor
  ins before stop codonchr20 [77055, 77055]  * | NA
   ins after stop codonchr20 [77058, 77058]  * | NA
   REFALT  QUAL 
  FILTER
DNAStringSet DNAStringSetList numeric 
character
  ins before stop codon  T TG70 
PASS
   ins after stop codon  A AG70 
PASS

  ---
  seqinfo: 1 sequence from hg19 genome

Calling predictCoding():

 library(BSgenome.Hsapiens.UCSC.hg19)
 predictCoding(vcf, txdb, Hsapiens)
GRanges object with 2 ranges and 17 metadata columns:
seqnames ranges strand | paramRangeID
   Rle  IRanges  Rle | factor
  ins before stop codonchr20 [77055, 77055]  + | NA
   ins after stop codonchr20 [77058, 77058]  + | NA
   REFALT  QUAL 
  FILTER
DNAStringSet DNAStringSetList numeric 
character
  ins before stop codon  T TG70 
PASS
   ins after stop codon  A AG70 
PASS

 varAllele CDSLOCPROTEINLOC   QUERYID
DNAStringSet  IRanges IntegerList integer
  ins before stop codon TG [468, 468]   156 1
   ins after stop codon AG [471, 471]   157 2
   TXID CDSID  GENEID CONSEQUENCE
character IntegerList characterfactor
  ins before stop codon   70477245938  frameshift
   ins after stop codon   70477245938  frameshift
  REFCODON   VARCODON REFAA
DNAStringSet DNAStringSet AAStringSet
  ins before stop codonAATAAT
   ins after stop codonTAATAA
VARAA
AAStringSet
  ins before stop codon
   ins after stop codon
  ---
  seqinfo: 1 sequence from hg19 genome

PROTEINLOC, REFCODON, VARCODON, and CONSEQUENCE don't seem quite right
to me. Could be that my hand-made vcf is missing some important data
needeed by predictCoding() though...

H.




robert.

On 3/16/15 9:00 PM, Kasper Daniel Hansen wrote:
 There would be a LOT of value  in having core packages use exactly the
  same representation; I don't have any opinion about which one is
  best.
 
  Kasper
 
  On Mon, Mar 16, 2015 at 3:38 PM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
  mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote:
 
  On 03/16/2015 09:22

Re: [Bioc-devel] zero-width ranges representing insertions

2015-03-16 Thread Hervé Pagès

Using a GAlignments object to show 3 different representations of the
same insertion (three nucleotides (TTT) inserted between positions 10
and 11):

  library(GenomicAlignments)
  gal - GAlignments(Rle(chr1, 3), pos=c(10L, 10L, 11L),
 cigar=c(1M3I1M, 1M3I, 3I),
 strand=Rle(strand(+), 3),
 ref=c(-, A, ),
 alt=c(TTT, ATTT, TTT),
 who=c(dbSNP/HGVS, VariantAnnotation/VCF,
   XtraSNPlocs/UCSC))

   gal
  GAlignments object with 3 alignments and 3 metadata columns:
seqnames strand   cigarqwidth start   end width
   Rle  Rle character integer integer integer integer
[1] chr1  +  1M3I1M 51011 2
[2] chr1  +1M3I 41010 1
[3] chr1  +  3I 31110 0
njunc | ref alt   who
integer | character character   character
[1] 0 |   - TTTdbSNP/HGVS
[2] 0 |   AATTT VariantAnnotation/VCF
[3] 0 | TTT  XtraSNPlocs/UCSC
---
seqinfo: 1 sequence from an unspecified genome; no seqlengths

In the context of a GAlignments object the alt allele can be seen
as the sequence of the query (SEQ field in BAM). So we expect the
number of nucleotides in the alt allele to match the qwidth:

   nchar(mcols(gal)$alt) == qwidth(gal)
  [1] FALSE  TRUE  TRUE

Not true for the dbSNP/HGVS representation.

The range of the insertion on the reference is:

   as(gal, GRanges)
  GRanges object with 3 ranges and 3 metadata columns:
seqnamesranges strand | ref alt 
   who
   Rle IRanges  Rle | character character 
character
[1] chr1  [10, 11]  + |   - TTT 
dbSNP/HGVS
[2] chr1  [10, 10]  + |   AATTT 
VariantAnnotation/VCF
[3] chr1  [11, 10]  + | TTT 
XtraSNPlocs/UCSC

---
seqinfo: 1 sequence from an unspecified genome; no seqlengths

We expect its width to be the same as the nb of nucleotides in the
ref alleles:

   width(gal) == nchar(mcols(gal)$ref)
  [1] FALSE  TRUE  TRUE

Again, not true for the dbSNP/HGVS representation.

H.


On 03/16/2015 03:12 PM, Robert Castelo wrote:

+1 IMO BioC could adopt the zero-width ranges representation for
insertions, adapting readVcf(), writeVcf(), XtraSNPlocs.*, etc., to deal
with each corresponding beast, be VCF, dbSNP or the like. Who knows, VCF
could also change their representation in the future and it'll be a
headache to update the affected packages if we decide to keep using its
insertion representation internally to store variant ranges in BioC.

robert.

On 3/16/15 9:00 PM, Kasper Daniel Hansen wrote:

There would be a LOT of value  in having core packages use exactly the

  same representation; I don't have any opinion about which one is
  best.
 
  Kasper
 
  On Mon, Mar 16, 2015 at 3:38 PM, Hervé Pagès hpa...@fredhutch.org
  mailto:hpa...@fredhutch.org wrote:
 
  On 03/16/2015 09:22 AM, Michael Lawrence wrote:
 
  Yes, I think it would make sense for the Xtra package to follow the
  established convention in VariantAnnotation/VCF.
 
 
  I don't know. I agree it would be nice to use a more consistent
  representation of insertions across the software but I'm not
  convinced we should necessarily follow the VCF way, which is to
  include the base before the event in the ref and alt alleles as well
  as in the reported range.
 
  Note that there doesn't seem to be any consensus in the broader
  Bioinformatics community either. For example dbSNP and HGVS report
  the range that corresponds to the 2 flanking nucleotides but they
  don't include these nucleotides in the ref or alt alleles. VCF does
  the same as GFF3 which says start equals end and the implied site is
  to the right of the indicated base except that VCF wants to treat
  events that occur at position 1 in a special way. In that case VCF
  says the base *after* the event should be included (seems like the
  VCF authors want to avoid both: empty ranges and also ranges that
  start at POS 0). BTW it doesn't seem that
  VariantAnnotation::isInsertion() is aware of that special treatment.
 
  UCSC uses a zero-width range, and so does the XtraSNPlocs.*
  packages. The advantage of this representation is its simplicity.
  There is no special cases. It also reflects the notion that an
  insertion is a replacement of an empty string with a non-empty
  string. No need to include flanking nucleotides in the representation
  (which is artificial and distorts the real alt allele).
 
 
  H.
 
 
  On Mon, Mar 16, 2015 at 9:16 AM, Robert Castelo
  robert.cast...@upf.edu mailto:robert.cast...@upf.edu wrote:
 
  dear devel people, specially Val and Michael

Re: [Bioc-devel] zero-width ranges representing insertions

2015-03-16 Thread Hervé Pagès
]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Submitting a new package

2015-03-13 Thread Hervé Pagès

Hi Dan,

On 03/13/2015 12:04 PM, Dan Tenenbaum wrote:



- Original Message -

From: Charles Determan Jr deter...@umn.edu
To: Dan Tenenbaum dtene...@fredhutch.org
Cc: Bioc-devel@r-project.org
Sent: Friday, March 13, 2015 12:04:12 PM
Subject: Re: [Bioc-devel] Submitting a new package


Thanks Dan, could you confirm that the email statement ( packages
NEAR bioconductor POINT org) means packa...@bioconductor.org ? I
don't recognize the NEAR notation.



Yes. Spammers are not supposed to recognize it either, which is why we didn't 
just say packa...@bioconductor.org.


I think Charles means why NEAR when we normally use AT for this?

Thanks,
H.



Dan




Charles


On Fri, Mar 13, 2015 at 1:59 PM, Dan Tenenbaum 
dtene...@fredhutch.org  wrote:




- Original Message -

From: Charles Determan Jr  deter...@umn.edu 
To: Bioc-devel@r-project.org
Sent: Friday, March 13, 2015 11:57:24 AM
Subject: [Bioc-devel] Submitting a new package

I have finished a new package recently that I would like to submit
to
bioconductor. I have read through the package guidelines and I
believe I
understand the package submission process but embarrassingly I
cannot
figure out how to actually submit the package? Could someone kindly
explain where I actually submit the package for review?



See the very bottom of
http://bioconductor.org/developers/package-submission/

Dan




Regards,

--
Dr. Charles Determan, PhD
Integrated Biosciences

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel






--


Dr. Charles Determan, PhD
Integrated Biosciences



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Package build error for windows system

2015-03-10 Thread Hervé Pagès

Hi Liu,

On 03/10/2015 10:08 AM, Liu, Yuanhang wrote:

Hello, everyone,


I recently submitted my first R package. Everything checks fine on my
local machine. However, when I upload the tarball to the single package
builder at bioconductor.org, it gave me the following error on windows
system:


* creating vignettes ...Warning: running command
'E:/packagebuilder/R/bin/x64/Rscript --vanilla --default-packages= -e
tools::buildVignettes(dir = '.', tangle = TRUE)' had status 5
ERROR
Error in withCallingHandlers(tryCatch(evalq((function (i) :
object '.rcpp_warning_recorder' not found


Don't worry about this one. The problem is probably on the build machine
side (in the R it uses). Please ignore.




The name of the windows build system is :


moscato2 Windows Server 2008 R2 Enterprise SP1 (64-bit)/x64​


What does this mean? I would really appreciate it if someone can help me
with this.


I also got a warning on one linux system (morelia Mac OS X Mavericks
(10.9.5)/x86_64):


* checking loading without being on the library search path ... WARNING

Loading required package: IRanges
Error: package BiocGenerics 0.13.4 is loaded, but = 0.13.6 is required by 
IRanges
In addition: Warning message:
version 0.13.6 of BiocGenerics masked by 0.13.4 in 
/private/tmp/Rtmp8nLYMq/RLIBS_9d027c2fae3
Execution halted


mmh... again something doesn't look quite right on this build machine. 
Please ignore.




Then I checked my version of BiocGenerics, which is 0.12.1, the latest version.


This is the current release version. However remember that for testing
before package submission you should be using BioC devel (BioC 3.1, 
which requires R 3.2), not BioC release (BioC 3.0). That's what the

automatic single package builder uses.
The latest version of BiocGenerics in devel is 0.13.6.

Cheers,
H.


How come the system require 0.13.6? I also checked IRanges manual, which it says 
it depends on BiocGenerics(= 0.11.3). Would anybody tell me what does this 
mean? Thanks in advance!

Regards
Yuanhang Liu
PhD Candidate
Bioinformatics and Computational Biology laboratory
Cellular  Structural Biology, Department of Epidemiology  Biostatistics
UT Health Science Center at San Antonio
email: liu...@uthscsa.educel: 210-784-082​8


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] CRAN package with Bioconductor dependencies

2015-03-04 Thread Hervé Pagès

Hi Gordon,

On 03/04/2015 02:12 PM, Gordon Brown wrote:

Hi,


Hi, Bioc-devel folks,

Is there an accepted way to migrate a package from CRAN to BioC?
  CRAN's policy says, The package�s license must give the right for
CRAN to distribute the package in perpetuity

Is it enough to request that CRAN archive the package, then submit it
as a new package to BioC?  I own a CRAN package (msarc) that
probably should have been in BioC from the start, but isn't.
  Suggestions?



I can't speak to the CRAN policy side of things, but several packages have 
migrated from CRAN to Bioconductor.
At a minimum, you need to make sure that the version in Bioconductor is higher 
than the version in CRAN, but you should also definitely ask CRAN to archive 
the package. The way Bioconductor works, your package will first be available 
only in the devel version, but will then become available in the release 
version after our next release. Currently we're scheduled to release 3.1 on 
April 17 and the deadline for package submissions for this release is March 27.

Dan


Thanks, Dan.   I'll check with the CRAN folks to see what they say, and pass 
along anything significant regarding policy etc.  It's not a crisis if there's 
a lag between getting it archived at CRAN and into the release version of 
Bioconductor.


The CRAN folks will ask you to proceed the other way around. First
submit your package to Bioconductor. Only after we release (in April), 
ask them to remove and archive your package. It makes sense since this

ensures continuous availability of the package.

Cheers,
H.



Cheers,

  - Gord
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès
 where you indexed using
   -which(x$c=6) is a bad idea:  if none of the entries were 6 or more,
   this would be indexing with an empty vector, and you'd get nothing, 
not
   everything.
  
   Duncan Murdoch
  
  
  
   On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
   murdoch.dun...@gmail.com wrote:
   On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
   I know how to get the output I need, but I would benefit from an
   explanation why R behaves the way it does.
  
   # I have a data frame x:
   x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
   x
   # I want to toss rows in x that contain values =6. But I don't 
want
   to toss my NAs there.
  
   subset(x,c6) # Works correctly, but removes NAs in c, understand 
why
   x[which(x$c6),] # Works correctly, but removes NAs in c, 
understand
  why
   x[-which(x$c=6),] # output I need
  
   # Here is my question: why does the following line replace the 
values
   of all rows that contain an NA # in x$c with NAs?
  
   x[x$c6,]  # Leaves rows with c=NA, but makes the whole row an NA.
  Why???
   x[(x$c6) | is.na(x$c),] # output I need - I have to be
  super-explicit
  
   Thank you very much!
  
   Most of your examples (except the ones using which()) are doing 
logical
   indexing.  In logical indexing, TRUE keeps a line, FALSE drops the
  line,
   and NA returns NA.  Since x$c  6 is NA if x$c is NA, you get the
   third kind of indexing.
  
   Your last example works because in the cases where x$c is NA, it
   evaluates NA | TRUE, and that evaluates to TRUE.  In the cases where
  x$c
   is not NA, you get x$c  6 | FALSE, and that's the same as x$c  6,
   which will be either TRUE or FALSE.
  
   Duncan Murdoch
  
  
  
  
  
 
 
 
  --
  Dimitri Liakhovitski
 
  __
  r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

  [[alternative HTML version deleted]]

  __
  r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès



On 03/03/2015 02:17 PM, Gabriel Becker wrote:

Stephanie,

Actually, it's as.logical that isn't preserving matrix dimensions,
because it coerces to a logical vector:

  x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
  dim(as.logical(x))


It's true, as.logical() doesn't help here but Stephanie is right, %in%
does not preserve the dimensions either:

 dim(x %in% 1:5)
NULL

That's because match() itself doesn't preserve the dimensions:

 dim(match(x, 1:5))
NULL

So maybe my fast is.true() should be:

is.true - function(x)
{
  ans - as.logical(x) %in% TRUE
  if (is.null(dim(x))) {
names(ans) - names(x)
  } else {
dim(ans) - dim(x)
dimnames(ans) - dimnames(x)
  }
  ans
}

or something like that...

H.


NULL

~G

On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten
sdmor...@u.washington.edu mailto:sdmor...@u.washington.edu wrote:



On 3/3/15 1:26 PM, Hervé Pagès wrote:



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language primitives

William Dunlap wdun...@tibco.com
mailto:wdun...@tibco.com
  on Fri, 27 Feb 2015 08:04:36 -0800
writes:


   You could define functions like

   is.true - function(x) !is.na http://is.na(x)  x
   is.false - function(x) !is.na http://is.na(x)  !x

   and use them in your selections.  E.g.,
   x -
data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
   x[is.true(x$c = 6), ]
   a  b  c
   7   7  8  7
   10 10 11 10

   Bill Dunlap
   TIBCO Software
   wdunlap tibco.com http://tibco.com

Yes; the Matrix package has had these

is0  - function(x) !is.na http://is.na(x)  x == 0
isN0 - function(x) is.na http://is.na(x) | x != 0
is1  - function(x) !is.na http://is.na(x)  x   # also ==
isTRUE componentwise


Note that using %in% to block propagation of NAs is about 2x faster:

   x - sample(c(NA_integer_, 1:1), 50, replace=TRUE)
   microbenchmark(as.logical(x) %in% TRUE, !is.na
http://is.na(x)  x)
Unit: milliseconds
  expr   minlq  mean
  medianuq
   as.logical(x) %in% TRUE  6.034744  6.264382  6.999083
6.29488  6.346028
 !is.na http://is.na(x)  x 11.202808 11.402437
11.469101 11.44848 11.517576
max neval
40.36472 100 tel:40.36472%20%20%20100
   11.90916   100


Unfortunately %in% does not preserve matrix dimensions:

  x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE),
nrow=50)
  dim(x)
[1] 50 10
  dim(!is.na http://is.na(x)  x)
[1] 50 10
  dim(as.logical(x) %in% TRUE)
NULL

Stephanie






namespace hidden for a while  [note the comment of the last
one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse - all0 - function(x) .Call(R_all0, x)
anyFalse - any0 - function(x) .Call(R_any0, x)
##
## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0
## any0 - function(x) isTRUE(any(x == 0))  ## ~=
anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all
with R is
that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group
generics)
and 'is.na http://is.na() have been known and package
writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works correctly also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na http://is.na(x)  x == 0
as soon as 'x' is an object with a '==', 'Compare' and/or
an is.na http://is.na()
method.


Excellent point. Thank you! It really makes a big difference for
developers who maintain a complex hierarchy of S4 classes and
methods,
when functions like is.true, anyFalse, etc..., which can be
expressed in
terms of more basic operations like ==, !=, !, is.na
http://is.na, etc..., just work
out-of-the-box on objects for which these basic operations

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Hervé Pagès

On 03/03/2015 03:06 PM, Peter Haverty wrote:

I'd like to see a basic class that takes a DataFrame and a sub-class that
takes a GRanges.


Yes.


I still think GRanges should be a subclass of DataFrame,
which would make this easy, but I don't seem to be winning that argument.


Just impossible. As Michael mentioned back in November, they have
conflicting APIs.



While the hood is up, can we try some different names?
SummarizedExperiment never seemed like a great fit to me because it doesn't
necessarily contain experiments or summaries thereof.  It's a collection of
like-sized rectangular things with metadata on the two dimensions.  Maybe
the name could reflect what it holds rather than a common use case?
AnnotatedMatrixList?


We actually need 2 names: 1 for the parent class, 1 for the child. I'm
starting to think that introducing 2 new names would maybe make the
migration a little bit easier, especially since the plan is to move the
refactored SummarizedExperiment to its own package. With 2 new names
we can start the new package, implement the 2 new classes in it, and
have the old SummarizedExperiment (in GenomicRanges) and the 2 new
classes peacefully cohabit during the time of the migration.

Cheers,
H.



  Anyway, I'm excited to see a version on the way that takes a DataFrame as
rowData.  I'm glad you guys are working on that.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Tue, Mar 3, 2015 at 2:57 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:


Seems like rowData could be made to work universallly through coercion.
rowRanges would not, however, and one would like a convenient mechanism to
condition on whether range information is available. One way is to
introduce a new class and rely on dispatch. But that adds complexity.

On Tue, Mar 3, 2015 at 2:44 PM, Gabe Becker becker.g...@gene.com wrote:


Jim et al.,

Why have two accessors (rowRanges, rowData), each of which are less
flexible than the underlying structure and thus will fail (return NULL?

or

GRanges()/DataFrame() ?) in some proportion of valid objects?

~G

On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com
wrote:


Motivated by the discussion thread from November (

https://stat.ethz.ch/

pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core

team

is planning on making changes to the SummarizedExperiment class.  Our

end

goal is to allow the @rowData slot to become more flexible and hold

either

a DataFrame or GRanges type object.

To this end we have currently deprecated the current rowData accessor

in

favor of a rowRanges accessor.  This change has resulted in a few

broken

builds in devel, which we are in the process of fixing now.  We will
contact any package authors directly if needed for this migration.

The rowData accessor will be deprecated in this release, however

eventually

the plan is to re-purpose this function to serve as an accessor for
DataFrame data on the rows.

Please let us know if you have any questions with the above and if you

need

any assistance with the transition.

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] R-devel does not update the C++ returned variables

2015-03-02 Thread Hervé Pagès

On 03/02/2015 01:00 PM, Hervé Pagès wrote:

Hi,

On 03/02/2015 12:18 PM, Dénes Tóth wrote:



On 03/02/2015 04:37 PM, Martin Maechler wrote:



On 2 March 2015 at 09:09, Duncan Murdoch wrote:
| I generally recommend that people use Rcpp, which hides a lot of the
| details.  It will generate your .Call calls for you, and generate the
| C++ code that receives them; you just need to think about the real
| problem, not the interface.  It has its own learning curve, but I
think
| it is easier than using the low-level code that you need to work
with .Call.



Thanks for that vote, and I second that.



And these days the learning is a lot flatter than it was a decade ago:



R Rcpp::cppFunction(NumericVector doubleThis(NumericVector x) {
return(2*x); })
R doubleThis(c(1,2,3,21,-4))
[1]  2  4  6 42 -8
R



That defined, compiled, loaded and run/illustrated a simple function.



Dirk


Indeed impressive,  ... and it also works with integer vectors
something also not 100% trivial when working with compiled code.

When testing that, I've went a step further:

## now test:
require(microbenchmark)
i - 1:10


Note that the relative speed of the algorithms also depends on the size
of the input vector. i + i becomes the winner for longer vectors (e.g. i
- 1:1e6), but a proper Rcpp version is still approximately twice as
fast.


The difference in speed is probably due to the fact that R does safe
arithmetic. C or C++ do not:

doubleThisInt(i)
   [1]  2147483642  2147483644  2147483646  NA -2147483646
-2147483644

2L * i
   [1] 2147483642 2147483644 2147483646 NA NA NA
   Warning message:
   In 2L * i : NAs produced by integer overflow


That was with

  i - as.integer(2^30-4) + 1:6

Cheers,
H.



H.



Rcpp::cppFunction(NumericVector doubleThisNum(NumericVector x) {
return(2*x); })
Rcpp::cppFunction(IntegerVector doubleThisInt(IntegerVector x) {
return(2*x); })
i - 1:1e6
mb - microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i),
i*2, 2*i, i*2L, 2L*i, i+i, times=100)
plot(mb, log=y, notch=TRUE)



(mb - microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i,
times=2^12))
## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924):
## Unit: nanoseconds
##   expr min  lq  mean median   uq   max neval cld
##  doubleThis(i) 762 985 1319.5974   1124 1338 17831  4096   b
##  i * 2 124 151  258.4419164  221 4  4096  a
##  2 * i 127 154  266.4707169  216 20213  4096  a
## i * 2L 143 164  250.6057181  234 16863  4096  a
## 2L * i 144 177  269.5015193  237 16119  4096  a
##  i + i 152 183  272.6179199  243 10434  4096  a

plot(mb, log=y, notch=TRUE)
## hmm, looks like even the simple arithm. differ slightly ...
##
## == zoom in:
plot(mb, log=y, notch=TRUE, ylim = c(150,300))

dev.copy(png, file=mbenchm-doubling.png)
dev.off() # [ - why do I need this here for png ??? ]
##-- see the appended *png graphic

Those who've learnt EDA or otherwise about boxplot notches, will
know that they provide somewhat informal but robust pairwise tests on
approximate 5% level.
 From these, one *could* - possibly wrongly - conclude that
'i * 2' is significantly faster than both 'i * 2L' and also
'i + i'  which I find astonishing, given that  i is integer here...

Probably no reason for deep thoughts here, but if someone is
enticed, this maybe slightly interesting to read.

Martin Maechler, ETH Zurich



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel does not update the C++ returned variables

2015-03-02 Thread Hervé Pagès

Hi,

On 03/02/2015 12:18 PM, Dénes Tóth wrote:



On 03/02/2015 04:37 PM, Martin Maechler wrote:



On 2 March 2015 at 09:09, Duncan Murdoch wrote:
| I generally recommend that people use Rcpp, which hides a lot of the
| details.  It will generate your .Call calls for you, and generate the
| C++ code that receives them; you just need to think about the real
| problem, not the interface.  It has its own learning curve, but I
think
| it is easier than using the low-level code that you need to work
with .Call.



Thanks for that vote, and I second that.



And these days the learning is a lot flatter than it was a decade ago:



R Rcpp::cppFunction(NumericVector doubleThis(NumericVector x) {
return(2*x); })
R doubleThis(c(1,2,3,21,-4))
[1]  2  4  6 42 -8
R



That defined, compiled, loaded and run/illustrated a simple function.



Dirk


Indeed impressive,  ... and it also works with integer vectors
something also not 100% trivial when working with compiled code.

When testing that, I've went a step further:

## now test:
require(microbenchmark)
i - 1:10


Note that the relative speed of the algorithms also depends on the size
of the input vector. i + i becomes the winner for longer vectors (e.g. i
- 1:1e6), but a proper Rcpp version is still approximately twice as fast.


The difference in speed is probably due to the fact that R does safe
arithmetic. C or C++ do not:

   doubleThisInt(i)
  [1]  2147483642  2147483644  2147483646  NA -2147483646 
-2147483644


   2L * i
  [1] 2147483642 2147483644 2147483646 NA NA NA
  Warning message:
  In 2L * i : NAs produced by integer overflow

H.



Rcpp::cppFunction(NumericVector doubleThisNum(NumericVector x) {
return(2*x); })
Rcpp::cppFunction(IntegerVector doubleThisInt(IntegerVector x) {
return(2*x); })
i - 1:1e6
mb - microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i),
i*2, 2*i, i*2L, 2L*i, i+i, times=100)
plot(mb, log=y, notch=TRUE)



(mb - microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i,
times=2^12))
## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924):
## Unit: nanoseconds
##   expr min  lq  mean median   uq   max neval cld
##  doubleThis(i) 762 985 1319.5974   1124 1338 17831  4096   b
##  i * 2 124 151  258.4419164  221 4  4096  a
##  2 * i 127 154  266.4707169  216 20213  4096  a
## i * 2L 143 164  250.6057181  234 16863  4096  a
## 2L * i 144 177  269.5015193  237 16119  4096  a
##  i + i 152 183  272.6179199  243 10434  4096  a

plot(mb, log=y, notch=TRUE)
## hmm, looks like even the simple arithm. differ slightly ...
##
## == zoom in:
plot(mb, log=y, notch=TRUE, ylim = c(150,300))

dev.copy(png, file=mbenchm-doubling.png)
dev.off() # [ - why do I need this here for png ??? ]
##-- see the appended *png graphic

Those who've learnt EDA or otherwise about boxplot notches, will
know that they provide somewhat informal but robust pairwise tests on
approximate 5% level.
 From these, one *could* - possibly wrongly - conclude that
'i * 2' is significantly faster than both 'i * 2L' and also
'i + i'  which I find astonishing, given that  i is integer here...

Probably no reason for deep thoughts here, but if someone is
enticed, this maybe slightly interesting to read.

Martin Maechler, ETH Zurich



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] 100k SVN commits...

2015-02-27 Thread Hervé Pagès

On 02/27/2015 03:28 PM, Martin Morgan wrote:

I noticed that we're at r99955 in svn; who will be the lucky person
making the 100,000th commit?


depends how many zeroes after the one you're gonna put on the
check... 5?

;-)

H.



Martin


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-25 Thread Hervé Pagès

FWIW, and after checking how pileup() handles this, I thought I should
report:

  library(GenomicAlignments)
  bamfile - system.file(extdata, ex1.bam, package=Rsamtools)

Then:

   stackStringsFromBam(bamfile, param=seq1:1-6)
A DNAStringSet instance of length 4
  width seq
  [1] 6 CACTAG
  [2] 6 ++CTAG
  [3] 6 AG
  [4] 6 +G

   which - GRanges(seq1, IRanges(c(1, 5, 1), c(2, 6, 3)))
   pileup(bamfile, scanBamParam=ScanBamParam(which=which))
seqnames pos strand nucleotide count which_label
  1 seq1   1  +  C 1seq1:1-2
  2 seq1   2  +  A 1seq1:1-2
  3 seq1   5  +  A 3seq1:5-6
  4 seq1   6  +  G 4seq1:5-6
  5 seq1   1  +  C 1seq1:1-3
  6 seq1   2  +  A 1seq1:1-3
  7 seq1   3  +  C 2seq1:1-3

This output looks clean to me: (a) Nucleotide positions that belong
to more than 1 range in 'which' are treated as distinct positions,
and (b) reads that overlap with 2 non-overlapping ranges (e.g.
read #1 and ranges #1 and #2) only contribute at most once at each
position covered by these ranges.

H.


On 02/25/2015 01:21 PM, Hervé Pagès wrote:

Hi,

On 02/25/2015 06:37 AM, Gabe Becker wrote:

I think we need to be a little careful of trying to know the users
intentions better than they do here.  Reduce is a (very) easy operation
to do on a GRanges, so if the user didn't, are we really safe assuming
they meant to when passing the GRanges as a which?

I would argue for the samtools way, not because samtools does it
(though consistency is good) but because it allows the user to do more
things, while not making it that painful to do the thing that they might
want most often.

I agree with Michael that an additional argument might be a good middle
ground.


Maybe another good middle ground is to issue a warning when 'which'
contains overlapping ranges. The warning could suggest to the user
to reduce() the ranges first. Maybe the warning should also point
out that reducing doesn't completely eliminate the risk of selecting
the same record more than once (at least that's the case for the
readGAlignment* functions, but I don't know if it holds for
BamTallyParam). The risk is actually higher with paired-end reads
where the same pair is selected once for each range in 'which' that
overlaps with at least one of the 2 mates in the pair.

But as already discussed here (or maybe it was on the old bioconductor
list, don't remember), better solutions to the duplicated selection
problem could be achieved via one of the following:

   (1) Expose some sort of unique id for the records in a BAM file.
   I agree with Michael that duplicated selection is incompatible
   with summarization tools like BamTallyParam or pileup(). Having
   access to a unique id would completely solve the problem.

   (2) Introduce an argument similar to 'which' but with a slightly
   different semantic e.g. select records that *start* in one of
   the specified ranges (for single-end reads), or select pairs of
   records for which the *first* mate starts in one of the specified
   ranges.
   Advantages of this semantic:

 (a) If the ranges don't overlap, then no duplicates.

 (b) It can be used in the context of tiling the genome and
 processing the reads by tile. This plays well with
 parallelization (the semantic of 'which' does not).

   Inconvenient: the arbitrary nature of the selection criteria
   and its lack of symmetry is incompatible with summarization
   tools like BamTallyParam or pileup().

Note that (1) and (2) are not exclusive.

Cheers,
H.



~G

On Tue, Feb 24, 2015 at 7:40 PM, Leonardo Collado Torres
lcoll...@jhu.edu mailto:lcoll...@jhu.edu wrote:

Related to my post on a separate thread

(https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html),
I think that if 'which' is not being reduced by default, a simple
example showing the effects of this could be included in the
functions
that have such an argument. Also note that 'reducing' could lead to
unintended results.

For example, in the help page for GenomicAlignments::readGAlignments,
after the 'gal4' example it would be nice to add something like this:


## Note that if overlapping ranges are provided in 'which'
## reads could be selected more than once. This would artificually
## increase the coverage or affect other downstream results.
## If you 'reduce' the ranges, reads that originally overlapped
## two disjoint segments will be included.

which_dups - RangesList(seq1=rep(IRanges(1000, 2000), 2),
 seq2=IRanges(c(100, 1000), c(1000, 2000)))
param_dups - ScanBamParam(which=which_dups)
param_reduced - ScanBamParam(which=reduce(which_dups))
gal4_dups - readGAlignments(bamfile, param=param_dups)
gal4_reduced

Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-25 Thread Hervé Pagès
@r-project.org mailto:Bioc-devel@r-project.org
mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Hervé Pagès

On 02/23/2015 11:05 AM, Leonard Goldstein wrote:

Hi Michael and Thomas,

I ran into the same problem in the past (i.e. when I started working
with functions like scanBam I expected them not to return the same
alignment multiple times)

One thing to consider might be that returning alignments multiple
times is consistent with the behavior of the samtools view command.
Quoting from the samtools manual:

“Important note: when multiple regions are given, some alignments may
be output multiple times if they overlap more than one of the
specified regions.”


Thanks Leonard for pointing this out. This is indeed the reason why all
the functions in Rsamtools and GenomicAlignments that take a 'which'
argument (via a ScanBamParam object) treat it the samtools way, that
is, as a vector of meaningful loci.

Most of them track the index of the individual loci via a which_label
metadata column. See for example Rsamtools::pileup() and all the
readGAlignment*() functions in the GenomicAlignments package.
FWIW the man page for the readGAlignment*() functions contains the
following note:

 Note that a given record is loaded one time for each region it
 belongs to (this is a scanBam() feature, readGAlignments()
 is based on scanBam()).

but maybe this should be emphasized a little bit more.

Cheers,
H.



Maybe there is an argument for keeping things consistent with
samtools? As you said, if documented properly, the user can decide
whether to reduce regions specified in which or not.

Leonard


On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
lawrence.mich...@gene.com wrote:

We should at leaast try to avoid surprising the user. Seems like most
people expect which to be a simple restriction, so I think for now I will
just reduce the which, and if someone has a use case for separate queries,
we can address it in the future.

On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann sandmann.tho...@gene.com
wrote:


Personally, I don't have a use case with meaningful loci worth tracking,
so keeping it simple would work for me.

Incidentally, would it be good to deal with the 'which' parameter in a
consistent way across different methods ? I just saw this recent post on
the mailing list in which a used got confused by duplicate counts returned
after passing 'which' to scanBamParam:

https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html


---

Thomas Sandmann, PhD
Computational biologist

Genentech, Inc.
1 DNA Way
South San Francisco, CA 94080
USA

Phone: +1 650 225 6273
Fax: +1 650 225 5389
Email: sandmann.tho...@gene.com

If a man will begin with certainties, he shall end in doubts; but if he
will be content to begin with doubts he shall end in certainties. -- Sir
Francis Bacon


On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


We just have to decide which is the more useful interpretation of which
-- as a simple restriction, or as a vector of meaningful locii, which will
be analyzed individually? I would actually favor the first one (the same as
yours), just because it's simpler. To keep track of the query ranges, we
would need to add a new column to the returned object, which will more
often than not just be clutter. I guess we could introduce a new parameter,
reduceWhich which defaults to TRUE and reduces the which. If FALSE, it
instead adds the column mapping back to the original which ranges.


On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann 
sandmann.tho...@gene.com wrote:


Hi Michael,

ah, I see. I hadn't realized that returning the pileups separately for
each region could be a desired feature, but that makes sense. I agree, as
it is easy for the user to 'reduce' the ranges beforehand your first option
(e.g. returning the ID of the range) would be more flexible.

Perhaps you would consider adding a sentence to the documentation of
'which' on BamTallyParam's help page explaining that users might want to
'reduce' their ranges beforehand if they are only interested in a single
tally for each base ?

Thanks a lot !
Thomas








 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Fwd: readGAlignmentsFromBam function

2015-02-19 Thread Hervé Pagès

Hi Karolis,

Please use the bioc-devel mailing list (to which you should
be subscribed) to ask this type of question.

If you manually run the example in your package (using BioC devel,
which requires R-3.2), you'll get:

   prebs_values - calc_prebs(bam_files, manufacturer_cdf_mapping)
  Inferred name for CDF package: HGU133Plus2_mapping.txt - hgu133plus2cdf


  [1] Finished: 
/home/hpages/R/R-3.2.r67773/library/prebsdata/sample_bam_files/input1.bam
  [1] Finished: 
/home/hpages/R/R-3.2.r67773/library/prebsdata/sample_bam_files/input2.bam

  Note: Some probe IDs contain duplicates.

  Estimated values for Bayesian prior:

  Alpha=1e-04

  Beta=0.0136640101662362

  Note: 41457 probe sequences are missing in _mapping.txt file.

  Normalizing
  Calculating Expression
  Warning messages:
  1: 'readGAlignmentsFromBam' is deprecated.
  Use 'readGAlignments' instead.
  See help(Deprecated)
  2: 'readGAlignmentsFromBam' is deprecated.
  Use 'readGAlignments' instead.
  See help(Deprecated)

Hope this helps. Too bad 'R CMD check' truncates the deprecation
message :-/

Cheers,
H.


On 02/19/2015 01:15 PM, Michael Lawrence wrote:

-- Forwarded message --
From: Karolis Uziela karolis.uzi...@scilifelab.se
Date: Feb 19, 2015 9:22 AM
Subject: readGAlignmentsFromBam function
To: Michael Lawrence lawrence.mich...@gene.com
Cc: bioconduc...@r-project.org bioconduc...@r-project.org

Hi,

I have looked at R check results for the devel version of Bioconductor and
I noticed that the package I developed (prebs) gives this warning:

Warning: 'readGAlignmentsFromBam' is deprecated.

What is the new function that should be used instead of
readGAlignmentsFromBam (GenomicAlignments package)? Do the changes
affect readGAlignmentPairs function, too?

Regards,
Karolis



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Masked version of BSgenome.Hsapiens.NCBI.GRCh38?

2015-02-08 Thread Hervé Pagès

Hi Ulrich,

I was not sure about how much demand there is for the masked BSgenome
packages in general so I was just waiting for someone to ask. Note that
the masks are typically generated from data available at UCSC so it
sounds that it's time to make BSgenome.Hsapiens.NCBI.hg38 and
BSgenome.Hsapiens.NCBI.hg38.masked available.

I'll prepare the 2 packages in the next couple of weeks and post back
here when they are ready for download.

Cheers,
H.

On 02/06/2015 06:13 AM, Ulrich Bodenhofer wrote:

Hi,

The latest human genome build GRCh38 has been around in Bioconductor for
some while (package BSgenome.Hsapiens.NCBI.GRCh38). As far as I can
tell, however, there is currently no package that provides easy access
to masked/unmasked regions in the genome (like there is a .masked
version that wraps BSgenome objects into MaskedBSgenome objects for many
other genomes, e.g. there is BSgenome.Hsapiens.UCSC.hg19.masked for
hg19). Here is my question: does anybody have plans to include  a
package BSgenome.Hsapiens.NCBI.GRCh38.masked (or under a different name)
into Bioconductor 3.1? At least I could not find anything in the current
development branch.

Thanks and best regards,
Ulrich



*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenho...@bioinf.jku.at mailto:bodenho...@bioinf.jku.at
http://www.bioinf.jku.at/ http://www.bioinf.jku.at

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] :: and ::: as .Primitives?

2015-01-23 Thread Hervé Pagès

Hi,

On 01/23/2015 07:01 AM, luke-tier...@uiowa.edu wrote:

On Thu, 22 Jan 2015, Michael Lawrence wrote:


On Thu, Jan 22, 2015 at 11:44 AM,  luke-tier...@uiowa.edu wrote:


For default methods there ought to be a way to create those so the
default method is computed at creation or load time and stored in an
environment.


We had considered that, but we thought the definition of the function
would be easier to interpret if it explicitly specified the namespace,
instead of using tricks with environments. The same applies for
memoizing the lookup in front of a loop.


interpret in what sense (human reader or R interpreter)? In either
case I'm not convinced.


From a developer perspective, especially when debugging, when we do
selectMethod(match, ...) and it turns out that this returns the
default method, it's good to see:

  Method Definition (Class derivedDefaultMethod):

  function (x, table, nomatch = NA_integer_, incomparables = NULL,
  ...)
  base::match(x, table, nomatch = nomatch, incomparables = incomparables,
  ...)
  environment: namespace:BiocGenerics

  Signatures:
  x   table
  target  DataFrame ANY
  defined ANY   ANY

rather than some obscure/uninformative body. I hope we can keep that.




The implementation of these functions is almost simpler in C than it
is in R, so there is relatively little risk to this change. But I
agree the benefits are also somewhat minor.


I don't disagree, but it remains that even calling the C version has
costs that should not need to be paid. But maybe we can leave that to
the compiler/byte code engine. Optimizing references to symbols
resolved statically to name spaces and imports is on the to do list,
and with a little care that mechanism should work for foo::bar uses as
well.


That would be great. Thanks!

H.



Best,

luke




For other cases if I want to use foo::bar many times, say
in a loop, I would do

foo_bar - foo::bar

and use foo_bar, or something along those lines.

When :: and ::: were introduce they were intended primarily for
reflection and debugging, so speed was not an issue. ::: is still
really only reliably usable that way, and making it faster may just
encourage bad practice. :: is different and there are good arguments
for using it in code, but I'm not yet seeing good arguments for use in
ways that would be performance-critical, but I'm happy to be convinced
otherwise. If there is a need for a faster :: then going to a
SPECIALSXP is fine; it would also be good to make the byte code
compiler aware of it, and possibly to work on ways to improve the
performance further e.g. through cacheing.

Best,

luke


On Thu, 22 Jan 2015, Peter Haverty wrote:



Hi all,

When S4 methods are defined on base function (say, match), the
function becomes a method with the body base::match(x,y). A call to
such a function often spends more time doing :: than in the function
itself.  I always assumed that :: was a very low-level thing, but it
turns out to be a plain old function defined in base/R/namespace.R.
What would you all think about making :: and ::: .Primitives?  I
have submitted some examples, timings, and a patch to the R bug
tracker (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16134).
I'd be very interested to hear your thoughts on the matter.

Regards,
Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi guys,

Indeed, the Hits object returned by findOverlaps() is not fully
sorted anymore. Now it's sorted by query hit *only* and not by query
hit *and* subject hit. Fully sorting a big Hits object has a high
cost, both in terms of time and memory footprint. The partial
sorting is *much* cheaper: it's done using a tabulated sorting
algo implemented in C that works in linear time.

The partial sorting is important: it allows a very common
transformation like as(hits, List) to be super fast. But the
full sorting was overkill and generally not needed. Also note that
the full sorting was never enforced via the validity method for
Hits objects (and t(hits) was breaking that order in BioC  3.1).
Now the validity method for Hits enforces the partial sorting and
t(hits) preserves it.

There were only 3 or 4 packages that broke in devel because of
that change (typically the change broke their unit tests). I fixed
them (except Repitools, but it's still on my list). The fix is easy:
if having the hits fully sorted matters, just use sort() on the Hits
object. The man page for ?findOverlaps will soon be updated to
reflect these changes.

Cheers,
H.


On 01/15/2015 06:42 AM, Kasper Daniel Hansen wrote:

Has it ever been documented that the return object is sorted in a specific
way?  I just want to make sure we think about whether that is something we
want to enforce giving the possibility of using a different algorithm in
the future.

We could also address this by implementing (perhaps it already exists) a
sort() method for the return object.  That would still break existing code
though.

Best,
Kasper

On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


I bet there is a lot of code that depends on having the hits (conveniently)
ordered by query,subject index, so we should try to restore the previous
behavior.

On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
dstr7...@uni.sydney.edu.au
wrote:


Hello,

For an identical query, the matrix results are in a different order.
Consider the subject hits of the last two rows :


mapping# R Under development (unstable) (2015-01-13 r67453) and

IRanges 2.1.35
  queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   7
[7,] 6   6


mapping# R Under development (unstable) (2015-01-13 r67453) and

IRanges 2.0.1
  queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   6
[7,] 6   7

This causes some values to be extracted in a different order by our
annotationLookup function, and causes an error for the development

version

of Repitools on a test case which uses all.equal to compare a list to a
correct list, but not for the release version which uses the release
version of IRanges. Should I update the test case to have a new expected
result, or is this new characteristic of findOverlaps likely to revert to
the previous output soon ?

The two sets of intervals to produce this result are anno and probesGR,
defined in the tests.R file in the Repitools package.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi Michael,

On 01/15/2015 11:59 AM, Michael Lawrence wrote:

My concern is mostly in user code not seen in Bioc svn.


I understand but the fate of that code is to get out of sync
sooner or later. And sooner rather than later if it relies on
undocumented behavior.


But perhaps the
partial sorting (by query) is sufficient for many of those.


It seems to be sufficient for more than 99.5% of the packages in
BioC svn :-)

Note that keeping Hits objects partially sorted instead of fully
sorted not only speeds up findOverlaps() but also basic operations
on Hits objects like union(), t(), etc...

Since we are on it, I should also mention that new in BioC 3.1 is a
Hits() constructor function which takes care of partially sorting the
hits, selectHits() for selecting hits in the same way the 'select'
arg of findOverlaps() does, and all the comparison operations (==, =,
order, sort, rank, etc..., see ?`Hits-comparison` in S4Vectors).

Cheers,
H.



On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi guys,

Indeed, the Hits object returned by findOverlaps() is not fully
sorted anymore. Now it's sorted by query hit *only* and not by query
hit *and* subject hit. Fully sorting a big Hits object has a high
cost, both in terms of time and memory footprint. The partial
sorting is *much* cheaper: it's done using a tabulated sorting
algo implemented in C that works in linear time.

The partial sorting is important: it allows a very common
transformation like as(hits, List) to be super fast. But the
full sorting was overkill and generally not needed. Also note that
the full sorting was never enforced via the validity method for
Hits objects (and t(hits) was breaking that order in BioC  3.1).
Now the validity method for Hits enforces the partial sorting and
t(hits) preserves it.

There were only 3 or 4 packages that broke in devel because of
that change (typically the change broke their unit tests). I fixed
them (except Repitools, but it's still on my list). The fix is easy:
if having the hits fully sorted matters, just use sort() on the Hits
object. The man page for ?findOverlaps will soon be updated to
reflect these changes.

Cheers,
H.



On 01/15/2015 06:42 AM, Kasper Daniel Hansen wrote:

Has it ever been documented that the return object is sorted in
a specific
way?  I just want to make sure we think about whether that is
something we
want to enforce giving the possibility of using a different
algorithm in
the future.

We could also address this by implementing (perhaps it already
exists) a
sort() method for the return object.  That would still break
existing code
though.

Best,
Kasper

On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

I bet there is a lot of code that depends on having the hits
(conveniently)
ordered by query,subject index, so we should try to restore
the previous
behavior.

On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
dstr7...@uni.sydney.edu.au mailto:dstr7...@uni.sydney.edu.au
wrote:

Hello,

For an identical query, the matrix results are in a
different order.
Consider the subject hits of the last two rows :

mapping# R Under development (unstable)
(2015-01-13 r67453) and

IRanges 2.1.35
   queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   7
[7,] 6   6

mapping# R Under development (unstable)
(2015-01-13 r67453) and

IRanges 2.0.1
   queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   6
[7,] 6   7

This causes some values to be extracted in a different
order by our
annotationLookup function, and causes an error for the
development

version

of Repitools on a test case which uses all.equal to
compare a list to a
correct list, but not for the release version which uses
the release

Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-09 Thread Hervé Pagès

Hi,

On 01/08/2015 07:02 AM, Martin Maechler wrote:



Adding an optional argument to get (and mget) like
val - get(name, where, ..., value.if.not.found=NULL )   (*)



would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.


Exactly -- well, of course: That problem { NULL can be the legit value of what 
you
want to get() } was the only reason to have a 'value.if.not' argument at all.

Note that this is not about a universal replacement of
the  if(exists(..)) { .. get(..) } idiom,


FWIW, if(exists(..)) { x - get(..) } is not safe because it's not
atomic. I've seen situations where exists() returns TRUE but then
get() fails to find the symbol (even if called immediately after
exists()).

After scratching my head for a while I found out that the symbol was
removed by some finalizer function defined somewhere (not on the 
environment exists() and gets() were looking at, of course). And

since garbage collection is triggered between the moment exists() sees
the symbol and get() tries to get it, the finalizer was executed and
the symbol removed.

After that, I started to systematically use x - try(get(...)) instead
(which is atomic).

Cheers,
H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Hervé Pagès

On 01/08/2015 01:30 PM, peter dalgaard wrote:

If you look at the definition of %in%, you'll find that it is implemented using 
match, so if we did as you suggest, I give it about three days before someone 
suggests to inline the function call...


But you wouldn't bet money on that right? Because you know you would
loose.


Readability of source code is not usually our prime concern.


Don't sacrifice readability if you do not have a good reason for it.
What's your reason here? Are you seriously suggesting that inlining
makes a significant difference? As Michael pointed out, the expensive
operation here is the hashing. But sadly some people like inlining and
want to use it everywhere: it's easy and they feel good about it, even
if it hurts readability and maintainability (if you use x %in% y
instead of the inlined version, the day someone changes the
implementation of x %in% y for something faster, or fixes a bug
in it, your code will automatically benefit, right now it won't).

More simply put: good readability generally leads to better code.



The  idea does have some merit, though.

Apropos, why is there no setcontains()?


Wait... shouldn't everybody use all(match(x, y, nomatch = 0L)  0L) ?

H.



-pd


On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote:

Hi,

Current implementation:

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
}

First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)  0L'
with 'x %in% y' and 'y %in% x', respectively. They're strictly
equivalent but the latter form is a lot more readable than the former
(isn't this the raison d'être of %in%?):

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(c(x %in% y, y %in% x))
}

Furthermore, replacing 'all(c(x %in% y, y %in x))' with
'all(x %in% y)  all(y %in% x)' improves readability even more and,
more importantly, reduces memory footprint significantly on big vectors
(e.g. by 15% on integer vectors with 15M elements):

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(x %in% y)  all(y %in% x)
}

It also seems to speed up things a little bit (not in a significant
way though).

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-06 Thread Hervé Pagès

Hi,

Current implementation:

  setequal - function (x, y)
  {
x - as.vector(x)
y - as.vector(y)
all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
  }

First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)  0L'
with 'x %in% y' and 'y %in% x', respectively. They're strictly
equivalent but the latter form is a lot more readable than the former
(isn't this the raison d'être of %in%?):

  setequal - function (x, y)
  {
x - as.vector(x)
y - as.vector(y)
all(c(x %in% y, y %in% x))
  }

Furthermore, replacing 'all(c(x %in% y, y %in x))' with
'all(x %in% y)  all(y %in% x)' improves readability even more and,
more importantly, reduces memory footprint significantly on big vectors
(e.g. by 15% on integer vectors with 15M elements):

  setequal - function (x, y)
  {
x - as.vector(x)
y - as.vector(y)
all(x %in% y)  all(y %in% x)
  }

It also seems to speed up things a little bit (not in a significant
way though).

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

2014-12-31 Thread Hervé Pagès

Hi Gordon,

My guess is that it has to do with how many symbols get exported.
For example on my machine, doing library(limma) in a fresh session
takes 0.261s and triggers export of 292 symbols (as reported by
ls(..., all.names=TRUE)). Doing library(GenomicRanges) in a fresh
session takes 2.724s and triggers export of 1581 symbols (counting
the symbols exported by all the packages that get loaded).

Michael it's great to hear that somebody is working on speeding up
the code in charge of this.

Happy New Year everybody!
H.


On 12/31/2014 06:07 PM, Gordon K Smyth wrote:

Hi Michael,

What aspect of the methods package causes the slowness?

There are many packages (limma for one) that depend on methods but load
quickly.

Regards
Gordon



Date: Wed, 31 Dec 2014 09:17:01 -0800
From: Michael Lawrence lawrence.mich...@gene.com
To: Peng Yu pengyu...@gmail.com
Cc: Bioconductor Package Maintainer maintai...@bioconductor.org,
bioc-devel@r-project.org bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
Depends in the DESCRIPTION files of bioconductor packages.

The slowness is due to the methods package. We're working on it.

Michael

On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu pengyu...@gmail.com wrote:


On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan mtmor...@fredhutch.org
wrote:

On 12/24/2014 07:31 PM, Maintainer wrote:


Hi,

Many bioconductor packages Depends on other packages but not Imports
other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
usually preferred to Depends.




http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends


http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/

Could the unnecessary Depends be forced to be replaced by Imports?
This should improve the package load time significantly.



R package symbols and other objects are collated at build time into a

'name

space'. When used,

- Import: loads the name space from disk.
- Depends: loads the name space from disk, and attaches it to the

search()

path.

Attaching is very inexpensive compared to loading, so there is no speed
improvement gained by Import'ing instead of Depend'ing.


Yes. For example, changing Depends to Imports does not improve the
package load time much.

But loading a package in 4 sec seems to be too long.


 system.time(suppressPackageStartupMessages(library(MBASED)))

   user  system elapsed
  4.404   0.100   4.553

For example, it only takes 10% of the time to load ggplot2. It seems
that many bioconductor packages have similar problems.


system.time(suppressPackageStartupMessages(library(ggplot2)))

   user  system elapsed
  0.394   0.036   0.460


The main reason to Depend: on a package is because the symbols
defined by
the package are needed by the end-user. Import'ing a package is

appropriate

when the package provides functionality only relevant to the package

author.

What causes the load time to be too long? Is it because exporting too
many functions from all dependent packages to the global namespace?


There are likely to be specific packages that mis-use Depends; packages

such

as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
functions that are useful to the end user.

Maintainers are certainly encouraged to think carefully about adding
packages providing functionality irrelevant to the end-user to the

Depends:

field. The codetoolsBioC package (available from svn, see
http://bioconductor.org/developers/how-to/source-control/) provides
some
mostly reliable hints to package authors about correctly formulating a
NAMESPACE file to facilitate using Imports: instead of Depends:.

General questions about Bioconductor packages should be addressed to
the
support forum https://support.bioconductor.org.

Questions about Bioconductor development (such as this) should be

addressed

to the bioc-devel mailing list (subscription required)
https://stat.ethz.ch/mailman/listinfo/bioc-devel.

I have cc'd the bioc-devel mailing list; I hope that is ok.




--
Regards,
Peng


__
The information in this email is confidential and inte...{{dropped:20}}


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bamFlagAsBitMatrix error (?bug)

2014-12-14 Thread Hervé Pagès

Hi,

On 12/14/2014 01:29 PM, Valerie Obenchain wrote:

This should be fixed now (thanks Martin).

We recently deprecated the flag 'isNotPrimaryRead' in favor of
'isNotPrimaryAlignment' (Rsamtools devel).


The new name is actually isSecondaryAlignment.

H.


This affected
GenomicAlignments and others ...

~/b/Rpacks$ grep -lr isNotPrimaryRead *
CoverageView/R/cov.matrix.R
CoverageView/R/cov.interval.R
gage/vignettes/RNA-seqWorkflow.Rnw
hiReadsProcessor/R/hiReadsProcessor.R
QDNAseq/man/binReadCounts.Rd
QDNAseq/R/binReadCounts.R
Rsamtools/NEWS
Rsamtools/man/ScanBamParam-class.Rd
Rsamtools/R/methods-ScanBamParam.R
SplicingGraphs/inst/unitTests/test_countReads-methods.R
SplicingGraphs/inst/scripts/TSPC-utils.R
SplicingGraphs/vignettes/SplicingGraphs.Rnw
SplicingGraphs/man/assignReads.Rd
SplicingGraphs/man/txpath-methods.Rd
systemPipeR/R/utilities.R


These packages should be clear (of this error) on tomorrow's builds.

Thanks.
Val

On 12/14/2014 09:15 AM, Valerie Obenchain wrote:

Hi Sean,

Yes, we are aware of the problem, thanks. Hopefully this will be
resolved for tomorrows builds ... we'll post back when it's fixed.

Val

On 12/14/2014 08:33 AM, Sean Davis wrote:

Hi, Martin, Val, and Herve.

This looks like a little problem with the bitnames in
Rsamtools/GenomicAlignments.  Perhaps this is related to some bitnames
being deprecated?

Thanks,
Sean



library(GenomicAlignments)
sbp = ScanBamParam(which=cds[1:100],flag=scanBamFlag(isDuplicate =
FALSE))
x = readGAlignmentPairs(LOCALBAMS[180],param=sbp)

Error in bamFlagAsBitMatrix(flag1, bitnames = isNotPrimaryRead) :
   invalid bitname(s): isNotPrimaryRead

traceback()

8: stop(invalid bitname(s): , in1string)
7: bamFlagAsBitMatrix(flag1, bitnames = isNotPrimaryRead)
6: .make_GAlignmentPairs_from_GAlignments(gal, use.mcols = use.mcols)
5: readGAlignmentPairsFromBam(bam, character(), use.names = use.names,
param = param, with.which_label = with.which_label)
4: readGAlignmentPairsFromBam(bam, character(), use.names = use.names,
param = param, with.which_label = with.which_label)
3: readGAlignmentPairsFromBam(file = file, use.names = use.names,
...)
2: readGAlignmentPairsFromBam(file = file, use.names = use.names,
...)
1: readGAlignmentPairs(LOCALBAMS[180], param = sbp)

sessionInfo()

R Under development (unstable) (2014-11-18 r66997)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats4parallel  stats graphics  grDevices datasets  utils
methods   base

other attached packages:
  [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.7
  AnnotationDbi_1.29.11
  [4] GenomicAlignments_1.3.14stringr_0.6.2
   plotrix_3.5-10
  [7] pd.huex.1.0.st.v2_3.10.0RSQLite_1.0.0
   DBI_0.3.1
[10] limma_3.23.2oligo_1.31.0
  Biobase_2.27.0
[13] oligoClasses_1.29.3 knitr_1.8
   VariantAnnotation_1.13.19
[16] Rsamtools_1.19.15   Biostrings_2.35.7
   XVector_0.7.3
[19] GenomicRanges_1.19.21   GenomeInfoDb_1.3.7
  IRanges_2.1.33
[22] S4Vectors_0.5.14BiocGenerics_0.13.3
   BiocInstaller_1.17.1

loaded via a namespace (and not attached):
  [1] BBmisc_1.8BSgenome_1.35.8   BatchJobs_1.5
BiocParallel_1.1.9RCurl_1.95-4.5XML_3.98-1.1
  [7] affxparser_1.39.3-1   affyio_1.35.0 base64enc_0.1-2
biomaRt_2.23.5bit_1.1-12bitops_1.0-6
[13] brew_1.0-6checkmate_1.5.1   codetools_0.2-9
digest_0.6.6  evaluate_0.5.5fail_1.2
[19] ff_2.2-13 foreach_1.4.2 formatR_1.0
htmltools_0.2.6   iterators_1.0.7   knitrBootstrap_1.0.0
[25] markdown_0.7.4preprocessCore_1.29.0 rmarkdown_0.3.3
rtracklayer_1.27.6sendmailR_1.2-1   splines_3.2.0
[31] tools_3.2.0   yaml_2.1.13   zlibbioc_1.13.0

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Error when reading BAM files with readGAlignmentPairs

2014-12-12 Thread Hervé Pagès

Hi Leonard,

You can also see this error on the build report:


http://bioconductor.org/checkResults/3.1/bioc-LATEST/GenomicAlignments/zin2-buildsrc.html

This is a known issue that is the consequence of a recent change in
Rsamtools. We'll sort it out.

Thanks for your patience,
H.


On 12/12/2014 12:51 PM, Leonard Goldstein wrote:

Dear Bioc developers,

There seems to be an issue with GenomicAlignments or dependent
packages as I can no longer read BAM files with readGAlignmentPairs
that could be read without problems previously (see example below). I
assume this is a general problem but can provide test files if
necessary.

Leonard



readGAlignmentPairs(file)

Error in .Call2(Integer_explode_bits, x, bitpos, PACKAGE = S4Vectors) :
   'bitpos' must contain values = 1


sessionInfo()

R Under development (unstable) (2014-11-04 r66932)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicAlignments_1.3.14 Rsamtools_1.19.14Biostrings_2.35.6
[4] XVector_0.7.3GenomicRanges_1.19.21GenomeInfoDb_1.3.7
[7] IRanges_2.1.32   S4Vectors_0.5.14 BiocGenerics_0.13.3

loaded via a namespace (and not attached):
  [1] base64enc_0.1-2BatchJobs_1.5  BBmisc_1.8 
BiocParallel_1.1.9
  [5] bitops_1.0-6   brew_1.0-6 checkmate_1.5.0codetools_0.2-9
  [9] DBI_0.3.1  digest_0.6.6   fail_1.2   foreach_1.4.2
[13] iterators_1.0.7RSQLite_1.0.0  sendmailR_1.2-1stringr_0.6.2
[17] tools_3.2.0zlibbioc_1.13.0




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Hervé Pagès
/bioc-devel


--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Can't seem to use useDevel(); biocLite() for BioC 3.1

2014-11-04 Thread Hervé Pagès

Hi,

On 11/02/2014 10:58 PM, Dan Tenenbaum wrote:



- Original Message -

From: Leonardo Collado Torres lcoll...@jhu.edu
To: Kasper Daniel Hansen kasperdanielhan...@gmail.com
Cc: bioc-devel@r-project.org
Sent: Sunday, November 2, 2014 9:26:25 PM
Subject: Re: [Bioc-devel] Can't seem to use useDevel(); biocLite() for BioC 3.1

Thanks, I incorrectly thought that I was using the same R version as
the Bioc-build machines since the r66923 part matched. Now I
realize
that this tag is used in all of the newest R builds at
http://r.research.att.com/.



It's admittedly confusing;


Agreed. But the error message issued by useDevel() in the current
release adds to the confusion:

   library(BiocInstaller)
  Bioconductor version 3.0 (BiocInstaller 1.16.0), ?biocLite for help
   useDevel()
  Error: 'devel' version already in use

What about something like:

  Error: you need R 3.2 to run BioC 'devel' version

Thanks,
H.


the question of whether to use R-devel with Bioc-devel has a different answer 
every six months, but the answer can always be found here:

http://www.bioconductor.org/developers/how-to/useDevel/

Also, when looking at the devel build report, you should probably focus more on the 
version portion of the R version column than on the SVN revision number.

Dan



On Sun, Nov 2, 2014 at 9:57 PM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:

You should not be using R-3.1.2 patched with the current devel
version of
Bioconductor; use R-devel.

Best,
Kasper

On Sun, Nov 2, 2014 at 9:32 PM, Leonardo Collado Torres
lcoll...@jhu.edu
wrote:


Hi,

I can't seem to install devel packages via biocLite() and I wonder
if
something is broken or if I'm missing something. For example, take
GenomeInfoDb which is at 1.3.6 and is passing all checks.

The usual code using a fresh R 3.1.2-patched install isn't working
as
it downloads the latest release version (1.2.2):


source(http://bioconductor.org/biocLite.R;)

Bioconductor version 3.0 (BiocInstaller 1.16.0), ?biocLite for
help
## Was expecting 3.1 here

useDevel()

Error: 'devel' version already in use

biocLite('GenomeInfoDb')

BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.0 (BiocInstaller 1.16.0), R version
3.1.2.
Installing package(s) 'GenomeInfoDb'
trying URL
'http://bioconductor.org/packages/3.0/bioc/bin/macosx/contrib/3.1/GenomeInfoDb_1.2.2.tgz'
Content type 'application/x-gzip' length 404120 bytes (394 KB)
opened URL
==
downloaded 394 KB


The downloaded binary packages are in

/var/folders/cx/n9s558kx6fb7jf5z_pgszgb8gn/T//Rtmpxp29xo/downloaded_packages


sessionInfo()

R version 3.1.2 Patched (2014-11-01 r66923)
## Note that it matches the current R version used by the
Bioc-devel
build machines
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
   base

other attached packages:
[1] BiocInstaller_1.16.0 colorout_1.0-2

loaded via a namespace (and not attached):
[1] tools_3.1.2



For my computer it's not problem because I can download the pkg
via
svn and install locally. But it breaks my tests in TravisCI which
relies on biocLite().

Cheers,
Leo

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] ScalarLogical and setAttrib

2014-11-01 Thread Hervé Pagès

Hi,

The problem is better illustrated with:

  library(inline)
  test2 - cfunction(body = '
SEXP success = PROTECT(ScalarLogical(0));
setAttrib(success, install(foo), mkString(bar));
UNPROTECT(1);
return success;
  ')
  test3 - cfunction(body = '
SEXP success = PROTECT(ScalarLogical(0));
LOGICAL(success)[0] = 1;
UNPROTECT(1);
return success;
  ')

Then:

   test2()
  [1] FALSE
  attr(,foo)
  [1] bar

   test3()
  [1] TRUE
  attr(,foo)
  [1] bar

Looks like someone assumed that the SEXP returned by ScalarLogical()
would never be touched before being returned to R (which is probably
true 99% of the time but not 100%).

Cheers,
H.


On 10/31/2014 03:58 PM, Jeroen Ooms wrote:

Is it expected that attributes set on a LGLSXP created by
ScalarLogical will apply to all future objects created by
ScalarLogical as well? For example: the 'test1' function below returns
FALSE and 'test2' returns FALSE with an attribute:

   library(inline)
   test1 - cfunction(body = 'return ScalarLogical(0);')
   test2 - cfunction(body = '
 SEXP success = PROTECT(ScalarLogical(0));
 setAttrib(success, install(foo), mkString(bar));
 UNPROTECT(1);
 return success;
   ')

However after running test2(), then test1() will also return the attribute:

test1()
   [1] FALSE
test2()
   [1] FALSE
   attr(,foo)
   [1] bar
test1()
   [1] FALSE
   attr(,foo)
   [1] bar

It seems like ScalarLogical returns a singleton object, which is not
the case for ScalarInteger or ScalarReal. I am currently working
around this using duplicate(ScalarLogical(0)), but was quite surprised
by this behavior of ScalarLogical.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ScalarLogical and setAttrib

2014-11-01 Thread Hervé Pagès

Hi,

The problem is better illustrated with:

  library(inline)
  test2 - cfunction(body = '
SEXP success = PROTECT(ScalarLogical(0));
setAttrib(success, install(foo), mkString(bar));
UNPROTECT(1);
return success;
  ')
  test3 - cfunction(body = '
SEXP success = PROTECT(ScalarLogical(0));
LOGICAL(success)[0] = 1;
UNPROTECT(1);
return success;
  ')

Then:

   test2()
  [1] FALSE
  attr(,foo)
  [1] bar

   test3()
  [1] TRUE
  attr(,foo)
  [1] bar

Looks like someone assumed that the SEXP returned by ScalarLogical()
would never be touched before being returned to R (which is probably
true 99% of the time but not 100%).

Cheers,
H.


On 10/31/2014 03:58 PM, Jeroen Ooms wrote:

Is it expected that attributes set on a LGLSXP created by
ScalarLogical will apply to all future objects created by
ScalarLogical as well? For example: the 'test1' function below returns
FALSE and 'test2' returns FALSE with an attribute:

   library(inline)
   test1 - cfunction(body = 'return ScalarLogical(0);')
   test2 - cfunction(body = '
 SEXP success = PROTECT(ScalarLogical(0));
 setAttrib(success, install(foo), mkString(bar));
 UNPROTECT(1);
 return success;
   ')

However after running test2(), then test1() will also return the attribute:

test1()
   [1] FALSE
test2()
   [1] FALSE
   attr(,foo)
   [1] bar
test1()
   [1] FALSE
   attr(,foo)
   [1] bar

It seems like ScalarLogical returns a singleton object, which is not
the case for ScalarInteger or ScalarReal. I am currently working
around this using duplicate(ScalarLogical(0)), but was quite surprised
by this behavior of ScalarLogical.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] getting a listing of all src packages

2014-10-30 Thread Hervé Pagès

Hi Darin,

We're grateful that you're willing to do this!

Please be aware that a particular version of BioC is aimed
to be used with a particular version of R only. E.g. the current
release (BioC 3.0) is aimed to be used with R-3.1.1. Also some
BioC packages depend on particular versions of CRAN packages.
Only the CRAN packages used by our build system have been tested
to be compatible with BioC 3.0:

  http://bioconductor.org/checkResults/3.0/bioc-LATEST/

You can see the list of CRAN packages used by the latest builds
by clicking on the link under 'Installed pkgs' in the top table.

Note that, because these builds are currently live (they run every
night and will stop in April 2015), the CRAN packages they use are
always the latest packages available on CRAN.

Cheers,
H.


On 10/30/2014 06:22 AM, Sean Davis wrote:

Hi, Darin.

You might find these instructions useful:

http://www.bioconductor.org/about/mirrors/mirror-how-to/

Basically, you can use rsync to get the repository and keep it up-to-date.

Sean

On Thu, Oct 30, 2014 at 9:17 AM, Darin Perusich da...@darins.net wrote:


Hello All,

Would it be possible to get a listing of all package sources from
/packages/release/bioc/src/contrib or make the location browseable?
I'm trying to download all the bioconductor tarball's so I can create
rpm packages of them for inclusion in the openSUSE Build Service. I've
already done this for all the CRAN packages and would like to include
bioconductor packages as well.

https://build.opensuse.org/project/show/devel:languages:R:released
https://build.opensuse.org/project/show/devel:languages:R:CRAN-packages

--
Later,
Darin

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] depends on packages providing classes

2014-10-28 Thread Hervé Pagès
 typically be of great utility.

Given your list above, I would think that depending on GenomicRanges
would
often
be sufficient, and IRanges/S4Vectors would not require dependency
assertion.  I would
think that GenomeInfoDb should be a voluntary attachment for a specific
session.

These are just my guesses -- I doubt there will be complete consensus,
but
I have
started to think very critically about using Depends, and I think it is
better when its
use is minimized.



That of course leads to the R CMD check NOTE on depending on too many
packages I guess I should ignore that one.

Best,
Kasper

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099






[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] depends on packages providing classes

2014-10-28 Thread Hervé Pagès



On 10/28/2014 12:42 PM, Vincent Carey wrote:



On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi,

On 10/28/2014 08:48 AM, Vincent Carey wrote:

On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com
mailto:kasperdanielhan...@gmail.com wrote:

Well, first I want to make sure that there is not something
special
regarding S4 methods and classes. I have a feeling that they
are a special
case.

Second, while I agree with Jim's general opinion, it is a
little bit
different when I have return objects which are defined in
other packages.
If I don't depend on this other package, the user is hosed
wrt. the return
object, unless I manually export all classes from this other


In what sense?  If you return an instance of GRanges, certain
things can be
done
even if GenomicRanges is not attached.


Yes certain things maybe, but it's hard to predict which ones.

  You can get values of slots, for
example.

With the following little package

%vjcair cat foo/NAMESPACE

importFrom(IRanges, IRanges)

importClassesFrom(__GenomicRanges, GRanges)

importFrom(GenomicRanges, GRanges)

export(myfun)



%vjcair cat foo/DESCRIPTION

Package: foo

Title: foo

Version: 0.0.0

Author: VJ Carey st...@channing.harvard.edu
mailto:st...@channing.harvard.edu

Description:

Suggests:

Depends:

Imports: GenomicRanges

Maintainer: VJ Carey st...@channing.harvard.edu
mailto:st...@channing.harvard.edu

License: Private

LazyLoad: yes



%vjcair cat foo/R/*

myfun = function(seqnames=1, ranges=IRanges(1,2), ...)

 GRanges(seqnames=seqnames, ranges=ranges, ...)


The following works:


library(foo)


x = myfun()


x


GRanges object with 1 range and 0 metadata columns:

seqnamesranges strand

   Rle IRanges  Rle

[1]1[1, 2]  *

---

seqinfo: 1 sequence from an unspecified genome; no seqlengths


So the show method works, even though I have not touched it.  (I
did not

expect it to work, in fact.)


Exactly. Let's call it luck ;-)

  Additionally, I can get access to slots.


The end user should never try to access slots directly but use getters
and setters instead. And most getters and setters for GRanges objects
are defined and documented in the GenomicRanges package. Those that are
not are defined in packages that GenomicRanges depends on.

  But
ranges()

fails.  If I, the user, want to use it, I need to arrange for that.


IMO if your package returns a GRanges object to the user, then the user
should be able to access the man page for GRanges objects with ?GRanges.


Oddly enough, that seems to be incorrect.  I added a man page to foo
that has
a \link[GenomicRanges]{GRanges-class}.  I ran help.start and the cross
reference
from my man page succeeds.  Furthermore with the sessionInfo below, ?GRanges
succeeds at the CLI.


Did you try to run example(GRanges)? I'm not sure that will work.

For example after I do library(rtracklayer), I can indeed do
?DNAStringSet at the command line (I'm surprised this works), but
then example(DNAStringSet) fails:

   example(DNAStringSet)
  Warning message:
  In example(DNAStringSet) : no help found for ‘DNAStringSet’

I'm also surprised this is just a warning but that's another story...

H.


 I am not trying to defend the NOTE but the
principle of minimizing
Depends declarations needs to be considered critically, and I am just
exploring the space.

  ?GRanges  # it worked as usual in the tty

  sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats graphics  grDevices datasets  utils tools methods

[8] base


other attached packages:

[1] foo_0.0.0rmarkdown_0.3.8  knitr_1.6

[4] weaver_1.31.0codetools_0.2-9  digest_0.6.4

[7] BiocInstaller_1.16.0


loaded via a namespace (and not attached):

  [1] BiocGenerics_0.11.5   evaluate_0.5.5formatR_1.0

  [4] GenomeInfoDb_1.1.26   GenomicRanges_1.17.48 htmltools_0.2.6

  [7] IRanges_1.99.32   parallel_3.1.1S4Vectors_0.2.8

[10] stats4_3.1.1  stringr_0.6.2 XVector_0.5.8

And that works only if the GenomicRanges package is attached. Attaching
GenomicRanges will also attach other packages that GenomicRanges depends

Re: [Rd] No error when assigning values to an empty vector/matrix/array

2014-10-24 Thread Hervé Pagès

Hi Henrik,

On 10/23/2014 08:10 PM, Henrik Bengtsson wrote:

Assigning one or more values to a vector/matrix/array x for which
length(x) == 0 gives no error, e.g.


x - integer(0)
x[] - 1:2



x - matrix(nrow=0, ncol=1)
x[] - 1:2
x[,1] - 1:2



x - array(dim=c(0,1,1))
x[] - 1:2
x[,1,1] - 1:2


whereas


x - integer(1)
x[] - 1:2

Warning message:
In x[] - 1:2 :
   number of items to replace is not a multiple of replacement length

x - matrix(nrow=1, ncol=1)
x[] - 1:2

Warning message:
In x[] - 1:2 :
   number of items to replace is not a multiple of replacement length

x[,1] - 1:2

Error in x[, 1] - 1:2 :
   number of items to replace is not a multiple of replacement length

x - array(dim=c(1,1,1))
x[] - 1:2

Warning message:
In x[] - 1:2 :
   number of items to replace is not a multiple of replacement length

x[,1,1] - 1:2

Error in x[, 1, 1] - 1:2 :
   number of items to replace is not a multiple of replacement length

Is this intended by design or is it a bug that should be reported?


Since [- supports truncating of the right value, why an exception
should be made when the left vector has length 0?

Also note that these warnings or errors are complaining that the number
of items to replace (left length) is not a multiple of replacement
length (right length). This suggests that when the left length is a
multiple of the right length, everything is fine.
And this is actually the case when the left length is 0. Because
0 is a multiple of anything. So in that case, the right value is
truncated to length 0 and no warning is issued. Makes sense to me.

Cheers,
H.



/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] No error when assigning values to an empty vector/matrix/array

2014-10-24 Thread Hervé Pagès

Hi,

On 10/24/2014 06:58 AM, S Ellison wrote:

Also note that these warnings or errors are complaining that the
number of items to replace (left length) is not a multiple of
replacement length (right length). This suggests that when the left
length is a multiple of the right length, everything is fine.
And this is actually the case when the left length is 0. Because
0 is a multiple of anything. So in that case, the right value is
truncated to length 0 and no warning is issued. Makes sense to me.


Thanks Hervé, you gave the perfect explanation/rationale for this being
consistent.


This explains why a check for exact multiple of replacement length does not 
trigger a warning, but surely that is not sensible in the length 0 case. In all 
other cases, this check warns when there will be truncation of the replacement, 
and that seems to me the sensible intent of the check. A silent truncation to 
nothing is surely not the intended behaviour.


Yes truncation should not be silent. But truncation to length zero
should not be seen as a special case either. What would be more
sensible is that [- recognizes the 2 distinct situations that
deserve a warning:
  1. Truncation (i.e. when left length is  right length).
  2. Left length is  right length AND left length is not a multiple
 of right length.
Then the warning we get should be clear about which situation was
detected. So we would get a sensible warning all the time, even
when left length is 0.

H.


I can't help feeling that the 'check for multiple of length' was a neat 
portmanteau check for several possible problems when recycling is allowed, but 
that the possibility of assigning to a length 0 object was not considered.

I'd suggest logging it as an issue to for R-core to at least look at and either 
to fix or to at least warn of in documentation.

S Ellison




***
This email and any attachments are confidential. Any u...{{dropped:26}}


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Documentation on how to update a BioC experimental package?

2014-10-23 Thread Hervé Pagès

Hi Henrik,

FWIW there is some additional information here:

  https://stat.ethz.ch/pipermail/bioc-devel/2014-January/005156.html

In particular the section starting at One important difference with
the svn software repository provides some useful hints on how to
commit changes to the data (see step 4.).

H.

On 10/23/2014 07:46 PM, Dan Tenenbaum wrote:



- Original Message -

From: Vincent Carey st...@channing.harvard.edu
To: Henrik Bengtsson h...@biostat.ucsf.edu
Cc: bioC-devel bioc-de...@stat.math.ethz.ch
Sent: Thursday, October 23, 2014 7:45:13 PM
Subject: Re: [Bioc-devel] Documentation on how to update a BioC experimental
package?

There is a README.txt in the pkgs folder.

I will attach it.  I think this is accurate, but there may be
something
else on the site.



See the Experiment Data Packages section of

http://www.bioconductor.org/developers/how-to/source-control/

Dan





On Thu, Oct 23, 2014 at 10:23 PM, Henrik Bengtsson
h...@biostat.ucsf.edu
wrote:


It's been a while since I worked with experimental packages.  Where
can I find documentation on how to (Subversion) update our
AffymetrixDataTestFiles package with additional data files?  All I
know is that the SVN repository only contains a stub of the package
and
http://www.bioconductor.org/developers/package-guidelines/#package-types
provides little information and basically only point to the devel
mailing list.

Thanks,

Henrik

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Why be default 'D' is not dropped from coverage?

2014-10-20 Thread Hervé Pagès

Hi Leo,

On 10/18/2014 10:50 AM, Leonardo Collado Torres wrote:

Hi,

A collaborator of mine is working on a new software and we while we were
doing a sanity check to compare the base-level coverage from BAM files
and bigWig files generated from his software we realized that by default
bases corresponding to a 'D' on the CIGAR string get counted when
reading coverage from BAM files.

That is:

  getMethod('coverage', 'GAlignments')
Method Definition:

function (x, shift = 0L, width = NULL, weight = 1L, ...)
{
 .local - function (x, shift = 0L, width = NULL, weight = 1L,
 method = c(auto, sort, hash), drop.D.ranges = FALSE)
 coverage(grglist(x, drop.D.ranges = drop.D.ranges), shift = shift,
 width = width, weight = weight, method = method)
 .local(x, shift, width, weight, ...)
}
environment: namespace:GenomicAlignments

Signatures:
 x
target  GAlignments
defined GAlignments
  packageVersion('GenomicAlignments')
[1] ‘1.2.0’


You can see that this is default elsewhere, for example:

  extractAlignmentRangesOnReference
function (cigar, pos = 1L, drop.D.ranges = FALSE, f = NULL)
{
 if (!isTRUEorFALSE(drop.D.ranges))
 stop('drop.D.ranges' must be TRUE or FALSE)
 if (drop.D.ranges) {
 ops - c(M, =, X, I)
 }
 else {
 ops - c(M, =, X, I, D)
 }
 cigarRangesAlongReferenceSpace(cigar, flag = NULL, pos = pos,
 f = f, ops = ops, drop.empty.ranges = FALSE, reduce.ranges = TRUE)
}
environment: namespace:GenomicAlignments


What is the rationale for setting `drop.D.ranges` by default to FALSE?


Because last time I checked (but that was 3 or 4 years ago), that's what
Samtools pileup was doing.

Cheers,
H.



Thanks,
Leo




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] runnable examples for internal function ?

2014-10-16 Thread Hervé Pagès

Hi Tiphaine,

On 10/15/2014 06:30 AM, Martin, Tiphaine wrote:


Hi Herve,

Thank you for your email. very useful.
But I updated my NAMESPACE in order to list only functions that I want to 
export.
Even if I did that and if in addition, I keep different man files for functions 
that I do not want to export, I have this error message.
But if I removed them, I don't have error messages.
I kept different man files for functions that I do not want to export to help 
my colleagues. It seems that currently, it is not recommended in Bioconductor.


By colleagues I guess you mean co-developers, not users of
your package right?

Yes it seems reasonable to expect 'R CMD BiocCheck' to only check
\value and \example for man pages that document exported things.
We should fix that.

Note that it's questionable whether it's a good idea or not to keep
around man pages for non exported things. IMO it's better to document
internal helpers using in-source comments. And since you already
have those in-source comments in place anyway (because you're
using roxygen2), all you need to do is find a way to block
roxygen2 from generating the man pages for these internal helpers.
Co-developers of your package will still be able to see useful
information and that information will be placed where developers
expect it to be: in the source code itself.

My 2 cents.

Cheers,
H.



thank you for your help,

Tiphaine

From: Hervé Pagès hpa...@fhcrc.org
Sent: 14 October 2014 03:02
To: Martin, Tiphaine; Michael Lawrence; Dan Tenenbaum
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] runnable examples for internal function ?

Hi Tiphaine,

On 10/13/2014 04:31 PM, Martin, Tiphaine wrote:

Hi,


I wrote a list of functions used in my package called coMET. I decided with my 
colleagues to try to publish it in Bioconductor. Currently, it has not been yet 
submitted to bioconductor because I would like to be sure that it satisfies all 
your guidelines.
A list of functions is useful only internally. I saw that there are two methods to remove 
the function from the package index and to disable some of their automated tests: the 
first method is to put a dot as first letter, the second method is to put the 
internal keyword 
(http://cran.r-project.org/web/packages/roxygen2/vignettes/rdkeywords.html).


That link is to the documentation of the roxygen2 package. Are you
using roxygen2 to develop your package? You didn't say so.
You say that you saw that prefixing the function name with a dot
or putting the 'internal' keyword in the man page will 'remove the
function from the package index and disable some of its automated
tests'. I guess that's something you saw in the roxygen2 documentation
right? I don't use roxygene2 myself so I don't know what's the roxygen2
way to control what's exposed or not to the user. However I know many
BioC developers use it for their package so maybe they'll be able to
provide some good advice here.

Keep in mind that using a tool like roxygen2 adds an extra layer of
convenience when developing your package but, unfortunately, it doesn't
completely exempt you from learning and understanding some basic
concepts like NAMESPACE, man pages aliases, keywords, etc...
The ultimate reference for these things is still the Writing R
Extensions manual:

http://cran.r-project.org/doc/manuals/r-release/R-exts.html

So, and FWIW, below I'll describe quickly how you need to proceed when
you don't use a fancy tool like roxygen2 to automatically generate
the NAMESPACE and man pages in your package:

1) The real true way to not expose a function to the user is to not
 export it. What one exports from a package is controlled via
 the NAMESPACE file. So first you need to learn about how to use
 the NAMESPACE file to control exactly what you want to expose to
 the user. See Writing R Extensions manual for the details.

2) Every symbol that is exported needs to be documented, that is, there
 must be a man page with an alias for this symbol. And of course
 opening the man page at the R prompt with ?symbol should display
 useful information about that symbol.
 'R CMD check' is the tool that will check that every exported
 symbol is documented. It will also perform many checks on the
 man pages to make sure that they are formatted properly.

3) As Dan said previously, any function that is exported also needs
 to have runnable examples and a \value section in its man page.
 Note that this is a Bioconductor requirement, 'R CMD check'
 doesn't check that. 'R CMD BiocCheck' is the tool that will
 perform this check and any other Bioconductor specific check.


I decided to use the second method to reduce to rewrite now all my package.


Note that, for plain package development (i.e. without using a
a fancy tool like roxygen2), using the internal keyword in a man
page has no effect on whether the function is exported or not.




When I run the new

Re: [Bioc-devel] warning when merging disjoint sets of seqlevels

2014-10-13 Thread Hervé Pagès

Hi Michael,

On 10/12/2014 02:02 PM, Michael Lawrence wrote:

This recently became a warning, and I am not sure why. Yes, in the overlap
case, that might be something to worry about. But a perfectly reasonable
use case of merge,Seqinfo is to merge two disjoint sets of seqlevels. Now
we're forced to use suppressWarnings() for that.


For that use case, nothing has changed, we've always had a warning and
we still have it.

With BioC 2.14:

   merge(Seqinfo(chr1), Seqinfo(chr2))
  Seqinfo of length 2
  seqnames seqlengths isCircular genome
  chr1 NA NA   NA
  chr2 NA NA   NA
  Warning message:
  In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chr1
- in 'y': chr2
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).

With BioC 3.0:

   merge(Seqinfo(chr1), Seqinfo(chr2))
  Seqinfo object with 2 sequences from an unspecified genome; no 
seqlengths:

seqnames seqlengths isCircular genome
chr1 NA NA   NA
chr2 NA NA   NA
  Warning message:
  In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)

It's just that the warning is different. The intention was to make
the new warning more to the point.

H.



Michael

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] warning when merging disjoint sets of seqlevels

2014-10-13 Thread Hervé Pagès

On 10/13/2014 07:45 AM, Michael Lawrence wrote:

But the behavior has changed for this case:

merge(Seqinfo(chr1), Seqinfo())

Before it was each has seqlevels not in the other but now it is these
two sets are disjoint. Subtle difference, but I think it's important?


What has changed is that now we get the warning for the edge-case where
one of the 2 Seqinfo objects is empty. In that particular case the
warning is arguably not useful so I just removed it.

H.




On Sun, Oct 12, 2014 at 11:46 PM, Hervé Pagès hpa...@fhcrc.org
mailto:hpa...@fhcrc.org wrote:

Hi Michael,


On 10/12/2014 02:02 PM, Michael Lawrence wrote:

This recently became a warning, and I am not sure why. Yes, in
the overlap
case, that might be something to worry about. But a perfectly
reasonable
use case of merge,Seqinfo is to merge two disjoint sets of
seqlevels. Now
we're forced to use suppressWarnings() for that.


For that use case, nothing has changed, we've always had a warning and
we still have it.

With BioC 2.14:

merge(Seqinfo(chr1), Seqinfo(chr2))
   Seqinfo of length 2
   seqnames seqlengths isCircular genome
   chr1 NA NA   NA
   chr2 NA NA   NA
   Warning message:
   In .Seqinfo.mergexy(x, y) :
 Each of the 2 combined objects has sequence levels not in the
other:
 - in 'x': chr1
 - in 'y': chr2
 Make sure to always combine/compare objects based on the same
reference
 genome (use suppressWarnings() to suppress this warning).

With BioC 3.0:

merge(Seqinfo(chr1), Seqinfo(chr2))
   Seqinfo object with 2 sequences from an unspecified genome; no
seqlengths:
 seqnames seqlengths isCircular genome
 chr1 NA NA   NA
 chr2 NA NA   NA
   Warning message:
   In .Seqinfo.mergexy(x, y) :
 The 2 combined objects have no sequence levels in common. (Use
 suppressWarnings() to suppress this warning.)

It's just that the warning is different. The intention was to make
the new warning more to the point.

H.


Michael

 [[alternative HTML version deleted]]

_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org mailto:hpa...@fhcrc.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] runnable examples for internal function ?

2014-10-13 Thread Hervé Pagès
 it 
to avoid listing functions in the index (RGtk2 has thousands of functions). But 
in general, it's not a good idea to be exporting things that are truly 
internal. Perhaps internal methods on exported generics, but even then, it's 
probably best to keep the aliases with the generic, or something.


On Mon, Oct 13, 2014 at 1:33 PM, Dan Tenenbaum 
dtene...@fhcrc.orgmailto:dtene...@fhcrc.org wrote:


- Original Message -

From: Tiphaine Martin 
tiphaine.mar...@kcl.ac.ukmailto:tiphaine.mar...@kcl.ac.uk
To: bioc-devel@r-project.orgmailto:bioc-devel@r-project.org
Sent: Monday, October 13, 2014 1:29:08 PM
Subject: [Bioc-devel] runnable examples for internal function ?

Hi,


I would like to know if I need to add a runnable example for each
function that has keyword either internal or not.



I don't know what you mean by this, but maybe I should.



I ask that because with BiocCheck, version 1.0.2, I had a message for
function with keyword internal such as 

Checking exported objects have runnable examples...
 * CONSIDER: Adding runnable examples to the following man pages
   which document exported objects:



If the function is exported, it must have a runnable example.



and now, I have an error message with BiocCheck, version 1.1.18

* Checking exported objects have runnable examples...
Error in if (line == ## No test:  || insideDontTest || line == ##
End(No test)) { :
   missing value where TRUE/FALSE needed
Calls: Anonymous ... checkExportsAreDocumented -
doesManPageHaveRunnableExample - removeDontTest
Execution halted



Can you file an issue at https://github.com/Bioconductor/BiocCheck/issues and 
include the name of the package?
I will get to it as soon as I can.

Dan




Regards,

Tiphaine




Tiphaine Martin
PhD Research Student | King's College
The Department of Twin Research  Genetic Epidemiology | Genetics 
Molecular Medicine Division
St Thomas' Hospital
4th Floor, Block D, South Wing
SE1 7EH, London
United Kingdom

email : tiphaine.mar...@kcl.ac.ukmailto:tiphaine.mar...@kcl.ac.uk
Fax: +44 (0) 207 188 6761tel:%2B44%20%280%29%20207%20188%206761

   [[alternative HTML version deleted]]

___
Bioc-devel@r-project.orgmailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.orgmailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] NAMESPACE question

2014-10-09 Thread Hervé Pagès
.


Is there a reproducible example? I see in your code there are several
places where you require() or library() various packages. I think one of
these Depends: on GenomicRanges, and the messages you see are the effect
of moving GenomicRanges from 'loaded' to 'attached'. You can see the
effect with

library(qpgraph)
sessionInfo() ## GenomicRanges loaded but not attached
library(GenomicRanges) ## information about the package being attached

Probably in your code you do not actually want to require() ad hoc
packages and influence the user search path (and implicitly rely on
search path order for correct functionality), but rather to
requireNamespace(foo); foo::fun(...) (or possibly loadNamespace()).

Complicated!

Martin



robert.

On 10/07/2014 05:05 PM, Michael Lawrence wrote:

Does that happen with the other methods or just [? As a last resort,
you could just drop the import (because [ is a primitive, it should
just work).

On Tue, Oct 7, 2014 at 3:08 AM, Robert Castelo robert.cast...@upf.edu
mailto:robert.cast...@upf.edu wrote:

hi Martin,

On 10/06/2014 07:24 PM, Martin Morgan wrote:
[...]

There are two 'as.vector' generics, one defined in Matrix and one in
BiocGenerics (and made available via IRanges). These generics have
different methods

 showMethods(Matrix::as.vector)
Function: as.vector (package base)
x=abIndex, mode=ANY
x=abIndex, mode=character
x=ANY, mode=ANY
x=dgCMatrix, mode=missing
x=dgeMatrix, mode=missing
x=diagonalMatrix, mode=missing
x=dsCMatrix, mode=missing
x=ldenseMatrix, mode=missing
x=Matrix, mode=missing
x=ndenseMatrix, mode=missing
x=sparseVector, mode=character
x=sparseVector, mode=missing

 showMethods(BiocGenerics::as.__vector)
Function: as.vector (package BiocGenerics)
x=ANY
x=AtomicList
x=Rle
x=XDouble
x=XInteger
x=XRaw
x=XString
x=XStringSet

so it's important that your code clearly distinguish between
generics.
One possibility is to remove importMethodsFrom(IRanges,
as.vector) from
the NAMESPACE, and explicitly use IRanges::as.vector(...) in
your code.


ok, i've done this as it is the easiest at the moment to meet the
release schedule. i guess that in the future i should try to avoid
using the '::' operator by importing exclusively what is needed from
each package.

codetoolsBioC::__writeNamespaceImports(__qpgraph) might
provide you with
some guidance (it's not 100% reliable; available via svn at

https://hedgehog.fhcrc.org/__bioconductor/trunk/madman/__Rpacks/codetoolsBioC



https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/codetoolsBioC)


about what functionality is being imported.


thanks for the heads up about codetoolsBioC, i've tried it out and
seen that some of the suggested imports are not necessary but some
others i was really missing them (which makes me wonder how was it
possible that he package did not break at those points).

one further question related to NAMESPACE. i subset GRanges objects
in the package via the '[' operator, i've included this into the
NAMESPACE file as:

importMethodsFrom(__GenomicRanges,
c, cbind, rbind,
mcols-, start, end, strand, sort,
[, [-, [[, [[-, $, $-)

however, when the package reaches a subset operation x[i] with x
being a GRanges object, an entire package loading sequence starts:

Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’
[... etc ...]

which may look a bit odd to the user. for every other imported
method the package uses them silently without loading the
corresponding package, am i importing '[' for GRanges objects from
the wrong package? is there a way to import '[' so that my package
can use it without triggering that package loading sequence?


thanks again!
robert.


_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing
list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel














--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] a warning we may not need any more?

2014-10-09 Thread Hervé Pagès

Hi Vince,

Good idea. I made that change in GenomeInfoDb 1.1.26:

  library(GenomeInfoDb)
  x - Seqinfo(seqnames=c(chr1, chr2, chr3, chrM),
   seqlengths=c(100, 200, NA, 15),
   isCircular=c(NA, FALSE, FALSE, TRUE),
   genome=toy)

  y - Seqinfo(seqnames=c(chr3, chr4, chrM),
   seqlengths=c(300, NA, 15))

  merge(x, y)  # warning

  genome(y) - genome(x)[[1]]
  merge(x, y)  # no warning

Cheers,
H.

On 10/09/2014 12:04 PM, Vincent Carey wrote:

*Warning message:*

*In .Seqinfo.mergexy(x, y) :*

*  Each of the 2 combined objects has sequence levels not in the other:*

*  - in 'x': chr4, chr16, chr10, chr11, chr18, chr20, chr22*

*  - in 'y': chrY*

*  Make sure to always combine/compare objects based on the same reference*

*  genome (use suppressWarnings() to suppress this warning).*


*It seems to me that if we can check equality of the assigned genomes, we
should*

*not issue this warning.*

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4Vectors unit tests

2014-10-07 Thread Hervé Pagès

On 10/07/2014 10:35 AM, Michael Lawrence wrote:

Looks like we still have to move over the DataFrame, etc tests from IRanges?


Yes this and other leftovers.

H.



Michael

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] minor suggestion for BiocStyle?

2014-10-02 Thread Hervé Pagès
://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BSgenome forge file input file restrictions

2014-09-29 Thread Hervé Pagès

Hi Florian,

True. These restrictions don't make much sense these days anymore!
Some of them are gone in the devel version of BSgenome. The
BSgenomeForge vignette in devel now says:

  The sequence data must be in a single twoBit file (e.g. musFur1.2bit)
  or in a collection of FASTA files (possibly gzip-compressed).

I guess I should also support a single FASTA file.

H.

On 09/29/2014 01:36 AM, Hahne, Florian wrote:

Hi all,
I was wondering whether some of the rather arbitrary restrictions on input 
files for the process of forging as new Bsgenome package could be liftet. In 
particular:

Why do we need all chromosomes in individual files? Couldn�t the function be 
smart enough to just extract the relevant bits from a single file containing 
all chromosomes? Or even from several such files?

Why are gzipped files not allowed? Pretty much all tools in Biostrings seem to 
be able to deal with gzipped fasta files these days.

Thanks,
Florian

[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Importing summary from AnnotationDbi, Category, GOstats

2014-09-23 Thread Hervé Pagès

Hi Jim,

On 09/23/2014 09:42 AM, James W. MacDonald wrote:

There is an issue on the support site having to do with the inability to
import the summary function from the namespaces from the packages listed in
the subject line. I see the same problem/errors with my affycoretools
package.

When you load ReportingTools, you get the following warnings:

Warning messages:
1: In namespaceImportMethods(ns, loadNamespace(j - imp[[1L]], c(lib.loc,  :
   No generic function found corresponding to requested imported methods for
summary from package AnnotationDbi (malformed exports?)
2: In namespaceImportMethods(ns, loadNamespace(j - imp[[1L]], c(lib.loc,  :
   No generic function found corresponding to requested imported methods for
summary from package Category (malformed exports?)
3: In namespaceImportMethods(ns, loadNamespace(j - imp[[1L]], c(lib.loc,  :
   No generic function found corresponding to requested imported methods for
summary from package GOstats (malformed exports?)


Is this because of the following changes?


r94092 | hpa...@fhcrc.org | 2014-09-12 18:02:34 -0700 (Fri, 12 Sep 2014) |
7 lines
Changed paths:
M /trunk/madman/Rpacks/AnnotationDbi/DESCRIPTION
M /trunk/madman/Rpacks/AnnotationDbi/NAMESPACE
M /trunk/madman/Rpacks/AnnotationDbi/R/createAnnObjs-utils.R
M /trunk/madman/Rpacks/AnnotationDbi/R/createAnnObjs.NCBIORG_DBs.R
M /trunk/madman/Rpacks/AnnotationDbi/R/methods-Inparanoid.R
M /trunk/madman/Rpacks/AnnotationDbi/R/methods-Inparanoid8.R
M /trunk/madman/Rpacks/AnnotationDbi/R/methods-geneCentricDbs.R

- Drop dependency on IRanges (stuff from IRanges needed by AnnotationDbi is
   now in S4Vectors).
- Add dependency on stats4 and import summary() from it. This is an S4
   generic and AnnotationDbi defines and exports S4 methods for it.
- Address the A package almost never needs to use ::: for its own objects
   NOTE from 'R CMD check'.

which included

Index: NAMESPACE
===
--- NAMESPACE (revision 94000)
+++ NAMESPACE (working copy)
@@ -1,11 +1,11 @@
  import(methods)
  import(utils)
+importFrom(stats4, summary)
  import(Biobase)
  import(DBI)
  import(RSQLite)
  import(BiocGenerics)
-importFrom(S4Vectors, isTRUEorFALSE, isSingleString, metadata)
-importFrom(IRanges, elementLengths)
+importFrom(S4Vectors, isTRUEorFALSE, isSingleString, metadata,
elementLengths)

  importFrom(GenomeInfoDb, genomeStyles, extractSeqlevels, mapSeqlevels)
  exportClasses(
@@ -169,9 +169,6 @@
  intraIDMapper,
  idConverter,

-#Needs to be exported from RSQLite
-summary,
-
  ## AnnotationDb
  metadata,

so it appears summary is no longer exported?


Yes, no need to export summary from AnnotationDbi because summary() is a
generic defined in the stats4 package. Before I made that change,
AnnotationDbi was implicitly promoting base::summary() to an S4 generic
which was then a different generic from stats4::summary(). Having these
2  distinct summary() generics was causing the usual troubles when a
user had AnnotationDbi and stats4 in their search path. Also it was
breaking the attract package in some obscure way.

The warnings you get when you load ReportingTools will hopefully
go away if you reinstall the package. Let me know if it doesn't.

Hope this helps,
H.



Best,

Jim






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] need for consistent coordinate mapping API

2014-09-19 Thread Hervé Pagès

Hi,

It's an interesting problem. Right now mapCoords() has some
limitations. For example I can use it to map from reference sequence
to read cycle but not the other way around. Or from reference genome
to transcriptome but not the other way around (this reverse mapping
is actually what low level util transcriptLocs2RefLocs() does).
Ideally we should be able to easily go back and forth between 2
coordinate spaces.

Also we need to think about how more complex use cases (like Laurent's
one) could be handled by mapCoords(). Not clear. We might need to
change mapCoords's design a little bit, or maybe we'll need something
else.

H.

On 09/19/2014 10:40 AM, Laurent Gatto wrote:


On 19 September 2014 18:07, Michael Lawrence wrote:


Hi guys,

This is the problem of mapping back and forth between coordinate spaces,
such as between genomic and transcript space. I think there was some
progress this release cycle (introduction of mapCoords generic, etc), but I
think there is yet more to do. For example, transcriptLocs2RefLocs could be
given a ranges-based wrapper that conforms to the mapCoords API somehow.
Could we please put this on the TODO list of someone (in Seattle) for the
next release cycle?


And I would be very interested in (and slowly working towards)
generalising this to proteomics data. For now, there is a rather long
description in [1], but eventually, it should be standardised. I was not
aware of mapCoords and will read about it.

Laurent

[1] 
http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/doc/mapping.html


Michael

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] some turbulences due to IRanges/S4Vectors split

2014-09-11 Thread Hervé Pagès

Hi developers,

I recently moved SimpleList and DataFrame from IRanges to S4Vectors.
That breaks a number of packages as you can see on today's build/check
report:

  http://bioconductor.org/checkResults/3.0/bioc-LATEST/

Will fix today. Sorry for the inconvenience.

Cheers,
H.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-09 Thread Hervé Pagès

OK so let's go for a 1 line summarization of the seqinfo.
Vince is that OK if we keep this at the bottom of the object?
That way it will always be visible, even when the object requires
more than 1 screen to display (e.g. when it has a lot of metadata
cols). Will look something like:

   gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
   RleIRanges  Rle
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
  seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); no 
seqlengths


Thanks,
H.

On 09/09/2014 06:38 AM, Michael Lawrence wrote:

Agreed, that looks a lot nicer.

On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan mtmor...@fhcrc.org wrote:


On 09/09/2014 04:02 AM, Michael Lawrence wrote:


I'm in favor of this display. The seqinfo output at the bottom has always
been annoying (over-emphasized).



the fact that the lengths are 'NA' can be a helpful prompt to do something
about it, e.g., add seqinfo when inputting the data. Also they are helpful
when one is told that seqlengths are incompatible during, e.g.,
findOverlaps. But I like the idea of less but more informative display of
seqinfo, along the lines suggested by Vince.

seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA'
seqlengths

Martin



On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey 
st...@channing.harvard.edu
wrote:




On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès hpa...@fhcrc.org wrote:

  On 09/08/2014 06:42 PM, Michael Lawrence wrote:


  Instead of printing out multiple lines of a table that is rarely of

interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.



I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:


  What I had in mind was



  gr

GRanges with 3 ranges and 0 metadata columns:

 genome: hg19


   seqnames   ranges strand

 RleIRanges  Rle
  [1]chr14 [19069583, 19069654]  +
  [2]chr14 [19363738, 19363809]  +
  [3]chr14 [19363755, 19363826]  -
  [4]chr14 [19369799, 19369870]  +




you could then probably dispense with the seqlengths.  i have
never found them too useful except as a key to the  genome.

if there are multiple genomes, we have something like

genomes: hg19, mm9

the point is to make it prominent, particularly at a time of transition.



  --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---

  seqlevelsseqlengths isCircular
genome
  chr1  249250621   NA
  hg19
  chr10 135534747   NA
  hg19
  ... ......
...
  chrX  155270560   NA
  hg19
  chrY   59373566   NA
  hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.




On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey hic...@wehi.edu.au
mailto:hic...@wehi.edu.au wrote:

  Perhaps it might be useful to have some way of highlighting if any
  of the chromosomes are circular or highlighting if there are
  multiple genomes present? Otherwise this information might be
hidden
  in the …

  Cheers,
  Pete


  On 09/09/2014, at 9:44 AM, Hervé Pagès hpa...@fhcrc.org
  mailto:hpa...@fhcrc.org wrote:

On 09/08/2014 02:28 PM, Peter Hickey wrote:
Just a vote for still allowing for multiple genomes in a
Seqinfo
  object (in a GRanges object). My use case is in
bisulfite-sequencing
  experiments where there is often a spike-in of a lambda phage
genome
  along with the genome of interest (human or mouse). It's often
  useful to keep all data from a single library together in the same
  objet but process according to genome(x) for each seqlevel.
   
Note taken. Thanks Pete! It's always great to know about
concrete
use
cases.
   
   
FWIW, I like Vincent's proposal of
selectSome(unique(genome(x)))
  in the show method.
   
Or what about displaying

Re: [Bioc-devel] writeVcf performance

2014-09-09 Thread Hervé Pagès
mailto:Bioc-devel@r-project.org

  mailto:Bioc-devel@r-project.
mailto:Bioc-devel@r-project.org
 mailto:Bioc-devel@r-project.__org
mailto:Bioc-devel@r-project.org

  mailto:Bioc-devel@r-project
mailto:Bioc-devel@r-project.
 mailto:Bioc-devel@r-project
mailto:Bioc-devel@r-project.__org

  mailto:Bioc-devel@r-project.
mailto:Bioc-devel@r-project.org
 mailto:Bioc-devel@r-project.__org
mailto:Bioc-devel@r-project.org mailing
  list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel

https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel


https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel

https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel





https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel

https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel


https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel

https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel




  --
  Computational
Biologist
  Genentech Research





   [[alternative HTML
version
deleted]]



_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.__org
mailto:Bioc-devel@r-project.org
 mailto:Bioc-devel@r-project.
mailto:Bioc-devel@r-project.org
 mailto:Bioc-devel@r-project.__org
mailto:Bioc-devel@r-project.org
  mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel

https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel


https://stat.ethz.ch/mailman/listinfo/bioc-devel
https://stat.ethz.ch/mailman/__listinfo/bioc-devel

https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel



  --
  Computational Biology / Fred
Hutchinson Cancer
 Research Center
  1100 Fairview Ave. N.
  PO Box 19024 Seattle, WA 98109

  Location: Arnold Building M1 B861
  Phone: (206) 667-2793
tel:%28206%29%20667-2793
 tel:%28206%29%20667-2793
tel:%28206%29%20667-2793






  --
  Computational Biologist
  Genentech Research




 --
 Computational Biologist
 Genentech Research



 --
 Computational Biology / Fred Hutchinson Cancer
Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793 tel:%28206%29%20667-2793
tel:%28206%29%20667-2793





--
Computational Biologist
Genentech Research


_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Computational Biologist
Genentech Research


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health

Re: [Bioc-devel] writeVcf performance

2014-09-09 Thread Hervé Pagès

Ah, ok. I should have 'svn up' and re-tried 'R CMD check' before
reporting this. Thanks and sorry for the noise.

H.

On 09/09/2014 03:09 PM, Valerie Obenchain wrote:

Hi Herve,

This unit test passes in VA 1.11.30 (the current version in svn). It was
related to writeVcf(), not the IRanges/S4Vector stuff. My fault, not yours.

Val

On 09/09/2014 02:47 PM, Hervé Pagès wrote:

Hi Val,

On 09/09/2014 02:12 PM, Valerie Obenchain wrote:

Writing 'list' data has been fixed in 1.11.30. fyi, Herve is in the
process of moving SimpleList and DataFrame from IRanges to S4Vectors;
finished up today I think.


I fixed VariantAnnotation's NAMESPACE this morning but 'R CMD check'
failed for me because of an unit test error in test_VRanges_vcf().
Here is how to quickly reproduce:

   library(VariantAnnotation)
   library(RUnit)

source(path/to/VariantAnnotation/inst/unitTests/test_VRanges-class.R)

   dest - tempfile()
   vr - make_TARGET_VRanges_vcf()
   writeVcf(vr, dest)
   vcf - readVcf(dest, genome = hg19)
   perm - c(1, 7, 8, 4, 2, 10)
   vcf.vr - as(vcf, VRanges)[perm]
   genome(vr) - hg19
   checkIdenticalVCF(vr, vcf.vr)  # Error in checkIdentical(orig, vcf) :
FALSE

Hard for me to tell whether this is related to DataFrame moving from
IRanges to S4Vectors or to a regression in writeVcf(). Do you think
you can have a look? Thanks for the help and sorry for the trouble.

H.


Anyhow, if you get VariantAnnotation from svn
you'll need to update S4Vectors, IRanges and GenomicRanges (and maybe
rtracklayer).

I'm working on the 'chunking' approach next. It looks like we can still
gain from adding that.

Valerie

On 09/08/2014 12:00 PM, Valerie Obenchain wrote:

fyi Martin found a bug with the treatment of list data (ie, Number =
'.') in the header. Working on a fix ...

Val


On 09/08/2014 08:43 AM, Gabe Becker wrote:

Val,

That is great. I'll check this out and test it on our end.

~G

On Mon, Sep 8, 2014 at 8:38 AM, Valerie Obenchain voben...@fhcrc.org
mailto:voben...@fhcrc.org wrote:

The new writeVcf code is in 1.11.28.

Using the illumina file you suggested, geno fields only, writing
now
takes about 17 minutes.

  hdr
class: VCFHeader
samples(1): NA12877
meta(6): fileformat ApplyRecalibration ... reference source
fixed(1): FILTER
info(22): AC AF ... culprit set
geno(8): GT GQX ... PL VF

  param = ScanVcfParam(info=NA)
  vcf = readVcf(fl, , param=param)
  dim(vcf)
[1] 516127621

  system.time(writeVcf(vcf, out.vcf))
 user   system  elapsed
  971.0326.568 1004.593

In 1.11.28, parsing of geno data was moved to C. If this didn't
speed things up enough we were planning to implement 'chunking'
through the VCF and/or move the parsing of info to C, however, it
looks like geno was the bottleneck.

I've tested a number of samples/fields combinations in files
with =
.5 million rows and the improvement over writeVcf() in release is
~ 90%.

Valerie




On 09/04/14 15:28, Valerie Obenchain wrote:

Thanks Gabe. I should have something for you on Monday.

Val


On 09/04/2014 01:56 PM, Gabe Becker wrote:

Val and Martin,

Apologies for the delay.

We realized that the Illumina platinum genome vcf files
make
a good test
case, assuming you strip out all the info (info=NA when
reading it into
R) stuff.


ftp://platgene:G3n3s4me@ussd-__ftp.illumina.com/NA12877_S1.__genome.vcf.gz





ftp://platgene:g3n3s...@ussd-ftp.illumina.com/NA12877_S1.genome.vcf.gz


took about ~4.2 hrs to write out, and is about 1.5x the
size
of the
files we are actually dealing with (~50M ranges vs our
~30M).

Looking forward a new vastly improved writeVcf :).

~G


On Tue, Sep 2, 2014 at 1:53 PM, Michael Lawrence
lawrence.mich...@gene.com
mailto:lawrence.mich...@gene.com
mailto:lawrence.michael@gene.__com
mailto:lawrence.mich...@gene.com wrote:

 Yes, it's very clear that the scaling is non-linear,
and Gabe has
 been experimenting with a chunk-wise + parallel
algorithm.
 Unfortunately there is some frustrating overhead with
the
 parallelism. But I'm glad Val is arriving at
something
quicker.

 Michael


 On Tue, Sep 2, 2014 at 1:33 PM, Martin Morgan
mtmor...@fhcrc.org mailto:mtmor...@fhcrc.org
 mailto:mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org wrote:

 On 08/27/2014 11:56 AM, Gabe Becker wrote:

 The profiling I attached in my previous email
is for 24 geno
 fields, as I said,
 but our typical usecase involves only ~4-6
fields

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

Hi Vince,

Yes it would make sense to have the show method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.

On 09/08/2014 07:21 AM, Vincent Carey wrote:

For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :


seqinfo(txdb)


Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3


genome(seqinfo(txdb))


 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM

dm3dm3dm3dm3dm3dm3dm3dm3

  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

dm3dm3dm3dm3dm3dm3dm3

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 11:41 AM, Michael Lawrence wrote:

I might have requested the genome annotation, but I'm pretty sure it
wasn't me who decide on tracking it on a per-sequence basis.


OK, maybe. Don't trust my memory too much on this. No regrets though.
I think it was the right thing to do ;-) Just because the SAM/BAM
format does it is a good enough reason for us to do it too. According
to the SAM Spec, the header can look like:

  @HD   VN:1.3  SO:coordinate
  @SQ   SN:chr1_hg19LN:45   AS:hg19
  @SQ   SN:chr1_mm10LN:42   AS:mm10

The only problem is that seqinfo(BamFile(test.bam)) seems to ignore
the AS tag (genome assembly identifier) at the moment:

   seqinfo(BamFile(test.bam))
  Seqinfo of length 2
  seqnames seqlengths isCircular genome
  chr1_hg1945 NA   NA
  chr1_mm1042 NA   NA

Hopefully that can be addressed. But that's a separate issue...

Cheers,
H.



I could
imagine use cases for that though, e.g., when diagnosing sequencing
contamination (like human vs. mouse). But most other tools and file
formats expect a single genome per track, so, for example, rtracklayer
has an internal function singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Hervé Pagès hpa...@fhcrc.org
mailto:hpa...@fhcrc.org wrote:

Hi Vince,

Yes it would make sense to have the show method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.


On 09/08/2014 07:21 AM, Vincent Carey wrote:

For GRanges x, my naive expectation is that genome(x) returns a
length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to
have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

seqinfo(txdb)


Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3

genome(seqinfo(txdb))


  CH2L CH2R CH3L CH3R  CH4  CHX
CHUM

 dm3dm3dm3dm3dm3dm3
dm3dm3

   CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

 dm3dm3dm3dm3dm3dm3dm3

 [[alternative HTML version deleted]]

_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org mailto:hpa...@fhcrc.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319


_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 02:28 PM, Peter Hickey wrote:

Just a vote for still allowing for multiple genomes in a Seqinfo object (in a 
GRanges object). My use case is in bisulfite-sequencing experiments where there 
is often a spike-in of a lambda phage genome along with the genome of interest 
(human or mouse). It's often useful to keep all data from a single library 
together in the same objet but process according to genome(x) for each seqlevel.


Note taken. Thanks Pete! It's always great to know about concrete use
cases.



FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
method.


Or what about displaying the genome next to the seqlevel it's
associated with? Like e.g.:

   gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
   RleIRanges  Rle
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
---
seqinfo:
  seqlevels seqlengths isCircular genome
  chr1   249250621   NA   hg19
  chr10  135534747   NA   hg19
  chr11  135006516   NA   hg19
  ...  .........
  chrUn_gl000249 38502   NA   hg19
  chrX   155270560   NA   hg19
  chrY59373566   NA   hg19

That way, we also raise awareness about the isCircular field.
The current choice to only display the seqlengths pre-dates the
existence of the seqinfo slot but might be a little bit misleading
those days since it only exposes some arbitrary seqinfo fields.

H.



Cheers,
Pete



I might have requested the genome annotation, but I'm pretty sure it wasn't
me who decide on tracking it on a per-sequence basis. I could imagine use
cases for that though, e.g., when diagnosing sequencing contamination (like
human vs. mouse). But most other tools and file formats expect a single
genome per track, so, for example, rtracklayer has an internal function
singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s hpa...@fhcrc.org wrote:


Hi Vince,

Yes it would make sense to have the show method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.


On 09/08/2014 07:21 AM, Vincent Carey wrote:


For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

seqinfo(txdb)




Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3

genome(seqinfo(txdb))




 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM

dm3dm3dm3dm3dm3dm3dm3dm3

  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

dm3dm3dm3dm3dm3dm3dm3

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, 

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 06:42 PM, Michael Lawrence wrote:

Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.


I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:

   gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
   RleIRanges  Rle
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
--- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
seqlevelsseqlengths isCircular genome
chr1  249250621   NA   hg19
chr10 135534747   NA   hg19
... .........
chrX  155270560   NA   hg19
chrY   59373566   NA   hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.




On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey hic...@wehi.edu.au
mailto:hic...@wehi.edu.au wrote:

Perhaps it might be useful to have some way of highlighting if any
of the chromosomes are circular or highlighting if there are
multiple genomes present? Otherwise this information might be hidden
in the …

Cheers,
Pete


On 09/09/2014, at 9:44 AM, Hervé Pagès hpa...@fhcrc.org
mailto:hpa...@fhcrc.org wrote:

  On 09/08/2014 02:28 PM, Peter Hickey wrote:
  Just a vote for still allowing for multiple genomes in a Seqinfo
object (in a GRanges object). My use case is in bisulfite-sequencing
experiments where there is often a spike-in of a lambda phage genome
along with the genome of interest (human or mouse). It's often
useful to keep all data from a single library together in the same
objet but process according to genome(x) for each seqlevel.
 
  Note taken. Thanks Pete! It's always great to know about concrete use
  cases.
 
 
  FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
in the show method.
 
  Or what about displaying the genome next to the seqlevel it's
  associated with? Like e.g.:
 
gr
   GRanges with 3 ranges and 0 metadata columns:
 seqnames   ranges strand
RleIRanges  Rle
 [1]chr14 [19069583, 19069654]  +
 [2]chr14 [19363738, 19363809]  +
 [3]chr14 [19363755, 19363826]  -
 [4]chr14 [19369799, 19369870]  +
 ---
 seqinfo:
   seqlevels seqlengths isCircular genome
   chr1   249250621   NA   hg19
   chr10  135534747   NA   hg19
   chr11  135006516   NA   hg19
   ...  .........
   chrUn_gl000249 38502   NA   hg19
   chrX   155270560   NA   hg19
   chrY59373566   NA   hg19
 
  That way, we also raise awareness about the isCircular field.
  The current choice to only display the seqlengths pre-dates the
  existence of the seqinfo slot but might be a little bit misleading
  those days since it only exposes some arbitrary seqinfo fields.
 
  H.
 
 
  Cheers,
  Pete
 
 
  I might have requested the genome annotation, but I'm pretty
sure it wasn't
  me who decide on tracking it on a per-sequence basis. I could
imagine use
  cases for that though, e.g., when diagnosing sequencing
contamination (like
  human vs. mouse). But most other tools and file formats expect
a single
  genome per track, so, for example, rtracklayer has an
internal function
  singleGenome() to take care of this.
 
  On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s hpa...@fhcrc.org
mailto:hpa...@fhcrc.org wrote:
 
  Hi Vince,
 
  Yes it would make sense to have the show method report the
genome
  when genome(x) contains a unique

[Bioc-devel] bug when coercing from list to SimpleList

2014-09-05 Thread Hervé Pagès

Hi Michael,

I found the following bug when coercing a list to a SimpleList
with IRanges devel (not with IRanges release):

  library(IRanges)
  x - list(a=matrix(rep(a, 6), nrow=3),
b=array(rep(b, 24), dim=c(3,4,2)))

Then:

   sapply(as(x, SimpleList), class)
 ab
  matrix matrix

   lapply(as(x, SimpleList), dim)
  $a
  [1] 3 2

  $b
  [1] 24  1

The array was turned into a matrix!

Note that the SimpleList() constructor behaves as expected:

   sapply(SimpleList(x), class)
 ab
  matrix  array

   lapply(SimpleList(x), dim)
  $a
  [1] 3 2

  $b
  [1] 3 4 2

Do you think you can have a look?

Thanks,
H.

 sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] IRanges_1.99.25 S4Vectors_0.1.5 BiocGenerics_0.11.4

loaded via a namespace (and not attached):
[1] stats4_3.1.0

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Mistake in extractTranscriptSeqs() {from: GenomicFeatures} when transcripts are on minus strand

2014-08-26 Thread Hervé Pagès
...AGGACGACAGCAGCAGCAGCGCGTCTCCTTCAG
  [2]   147 
GGAGCTCCCAGGGAAGTGGTTGATCCGGTG...TGGAGAAGCAGCAGCAGAAGGAGCAGGAGCAAG
  [3]69 
TGAGAGCCACGAGCCAAGGTGGGCACTTGATGTCGGATCTCTTCAACAAGCTGGTCATGAGGCGCAAGG
  [4]   468 
GCATCTCTGGGAAAGGACCTCTGGTGAGGG...TGCAATAAAGGATCTCTAGCTGTGCAGGA


Now with extractTranscriptSeqs():

   tx_seq - extractTranscriptSeqs(genome, transcripts[uc009vis.3])
   tx_seq
A DNAStringSet instance of length 1
  width seq   names 


  [1]   843 ACCTACAAGATTACTAACA...AAAGGATCTCTAGCTGTGCAGGA uc009vis.3

'tx_seq' contains the 4 sequences returned by getSeq() and concatenated:

   tx_seq == unlist(exon_seqs)
  [1] TRUE

Using UCSC online query tools will give you the same sequence:

  http://genome.ucsc.edu/cgi-bin/hgGene?db=hg19hgg_gene=uc009vis.3

(Click on Genomic Sequence, then check 5' UTR Exons, CDS Exons, and
3' UTR Exons. No introns, no promoter/upstream, no downstream
sequences.)

Hope this helps clarifying things.

H.




sessionInfo()

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] grDevices datasets  grid  parallel  stats graphics  utils 
methods   base

other attached packages:
  [1] GenomicFeatures_1.17.14AnnotationDbi_1.27.9   
Biobase_2.25.0 BSgenome.Hsapiens.UCSC.hg19_1.3.99
  [5] BSgenome_1.33.8Biostrings_2.33.13 
XVector_0.5.7  spliceR_1.7.1
  [9] plyr_1.8.1 RColorBrewer_1.0-5 
VennDiagram_1.6.7  cummeRbund_2.7.2
[13] Gviz_1.9.11rtracklayer_1.25.13
GenomicRanges_1.17.28  GenomeInfoDb_1.1.18
[17] IRanges_1.99.24S4Vectors_0.1.2
fastcluster_1.1.13 reshape2_1.4
[21] ggplot2_1.0.0  RSQLite_0.11.4 
DBI_0.2-7  BiocGenerics_0.11.4




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] subset fails on a DataFrame with 0 cols

2014-07-29 Thread Hervé Pagès

On 07/28/2014 07:52 PM, Michael Lawrence wrote:

Thanks, should be fixed in devel S4Vectors.


Indeed. Thanks!

H.




On Mon, Jul 28, 2014 at 6:46 PM, Hervé Pagès hpa...@fhcrc.org
mailto:hpa...@fhcrc.org wrote:

Hi Michael,

Works if the DataFrame has columns:

   library(IRanges)
   DF1 - DataFrame(aa=letters[1:4])
   DF0 - DF1[0]

Then:

DF1
   DataFrame with 4 rows and 1 column
  aa
 character
   1   a
   2   b
   3   c
   4   d

DF0
   DataFrame with 4 rows and 0 columns

subset(DF1, c(FALSE, TRUE))
   DataFrame with 2 rows and 1 column
  aa
 character
   1   b
   2   d

OK. But:

subset(DF0, c(FALSE, TRUE))
   Error in NSBS(i, x, exact = exact, upperBoundIsStrict =
!allow.append) :
 subscript is a logical vector with out-of-bounds TRUE values

This currently causes subset() to fail on a GRanges object with no
metadata cols.

Thanks,
H.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org mailto:hpa...@fhcrc.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319

_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Distinction between release and devel package websites

2014-07-24 Thread Hervé Pagès

Hi Henrik,

On 07/24/2014 06:03 AM, Henrik Bengtsson wrote:

On Thu, Jul 24, 2014 at 3:16 AM, Hervé Pagès hpa...@fhcrc.org wrote:

Hi Andrzej,


On 07/22/2014 02:28 PM, Andrzej Oleś wrote:


Hi Hervé,

thank you for the demo! Yes, this is definitely much more clear than
just a different color. Maybe we could first implement this idea on
the build/check report websites and see how the uptake will be? I
always keep getting confused by the colors which keep changing with
every release cycle anyway...



Done:

   http://bioconductor.org/checkResults/3.0/bioc-LATEST/index.html

http://bioconductor.org/checkResults/3.0/data-experiment-LATEST/index.html

Will revert back if there is too much complaining about this...


Thanks for all updates. The latter hurts my eyes though -


I didn't modify the data experiment report to use a less bright logo
yet. It should auto-update the next time the report is generated though.


may I
instead suggest to use a left/right side bar with text Bioconductor
developer version (3.0) written sideways.


We need to deal with a space problem. The report is already too wide.


Alternatively, add
developer and release tag after each package's version, e.g.
affycompData 1.2.0 (BioC release 2.14) and affycompData 1.3.0 (BioC
developer 3.0) (possible on a separate line).


Space problem again.



BTW, the BioC version is nowhere to be seen on the current check
results pages; it's only from the URL you can infer this.


It's in the title!



While at it, it would be great to have links on the check pages that
quickly takes to the corresponding devel/release versions.


The check page for devel has links to the devel landing pages and the
check page for release has links to the release release landing pages.
Just click on the name of the package and it will take you to the
corresponding landing page.


 Also, if
one could jump directly to a particular package, that would also be
great, e.g. 
http://bioconductor.org/checkResults/3.0/data-experiment-LATEST/index.html#gageData


http://bioconductor.org/checkResults/3.0/bioc-LATEST/aroma.light/

Cheers,
H.



Thxs,

Henrik



Cheers,

H.



Cheers,
Andrzej

On Tue, Jul 22, 2014 at 10:04 PM, Hervé Pagès hpa...@fhcrc.org wrote:


Hi Andrzej,


On 07/22/2014 10:14 AM, Andrzej Oleś wrote:



Hi all,

I think having links is useful, e.g. for someone who uses BioC release
but wants to install by hand a particular package from the devel
branch.

Distinct colors between release and devel make sense only if one
understands their meaning, which in the end might prove not to be very
useful.




I was thinking of something like this:

http://www.bioconductor.org/checkResults/3.0/data-experiment-LATEST/

Just a demo. This will be reset to the usual background tomorrow.

Cheers,
H.



I would rather recommend emphasizing the distinction between
release and devel in clear text across the package landing page,
possibly in multiple places, e.g. somewhere close to the actual
package version number; for instance, add the word devel after the
version number with a tooltip which will give some explanation/warning
that this is not the stable release version.

The concept of a notification box is far from ideal because it tends
to be annoing to the user and once dismissed 'forever' the user won't
be warned in the future.

I think that the actual problem arises from the fact that the release
landing pages are not clearly prioritized over the devel ones. Maybe
this could be  addressed by preventing the devel pages from being
harvested by google? It could make also sense to emphasize (bold face,
color, ...) the package release landing page on the result list
returned by the search engine on the BioC website. Currently, the
results for release and devel differ only in their relative path,
which can be easily overlooked, and both say Package Home, see
example below:

Bioconductor - DESeq2 - /packages/release/bioc/html/DESeq2.html
 Bioconductor - DESeq2 Home
Bioconductor - DESeq2 - /packages/devel/bioc/html/DESeq2.html
 Bioconductor - DESeq2 Home


Cheers,
Andrzej

On Tue, Jul 22, 2014 at 6:26 PM, James W. MacDonald jmac...@uw.edu
wrote:



Given that we have an ongoing problem with people inadvisedly clicking
and
installing things, is there a good rationale for allowing them to do
so?

Perhaps the landing page for each package should be stripped of links
and
replaced with some indication of the availability for each package on
the
various operating systems. There could also be a note indicating that
people
can install using biocLite().

Best,

Jim




On 7/22/2014 11:48 AM, Dan Tenenbaum wrote:




Seems like there are two problems, first that the release and devel
pages
look similar, but more importantly that people are downloading and
installing from the package pages when they should be using
biocLite().

I am open to the suggestions for making the release/devel pages look
more
different from each other, but I think something needs

Re: [Bioc-devel] A new way to manage side-by-side BioC release and Devel installs

2014-07-24 Thread Hervé Pagès

Hi Gabe,

Thanks for the heads up, sounds promising.

Note that one major difficulty of switching between BioC release
and devel is that between October and April it also requires to switch
between R release and devel. How is that handled in switchr?

Thanks,
H.


On 07/24/2014 08:42 AM, Gabe Becker wrote:

Hey all,

One of the things that has come up from the recent Release/Devel
distinction thread on this list is that people don't consider there to be
an easy way of handling both at the same time.

I'd like to offer an alternative based on the switchr
https://github.com/gmbecker/switchr package I'm developing.

switchr is designed to allow seamless switching between distinct sets of
installed packages. It also has build in support for the BioC release/devel
distinction, like so:


* library(switchr)*
*switchTo(BiocDevel)*

trying URL '
http://www.bioconductor.org/packages/3.0/bioc/bin/windows/contrib/3.1/BiocInstaller_1.15.5.zip
'
Content type 'application/zip' length 109769 bytes (107 Kb)
opened URL
downloaded 107 Kb

package 'BiocInstaller' successfully unpacked and MD5 sums checked
Switched to the 'BioC_3.0' computing environment. 31 packages are currently
available. Packages installed in your site library ARE suppressed.
  To switch back to your previous environment type switchBack()

* BiocInstaller::biocVersion()*

[1] '3.0'

* switchBack()*

Reverted to the 'original' computing environment. 40 packages are currently
available. Packages installed in your site library ARE NOT suppressed.
  To switch back to your previous environment type switchBack()

* BiocInstaller::biocVersion()*

[1] '2.14'

Note that this only works if you have an R version able to install the
devel version of the BiocInstaller package, but that will be true of any
solution using BioC Devel. I'm working on making the failure more graceful
when this is not the case, and on writing/updating the documentation more
generally.

switchr is more widely useful than this, which I will be talking about
publicly soon, but since it is so on-topic I figured I'd give this list a
heads-up on this aspect of it.

Please feel free to try switchr out, any feedback is appreciated.
~G



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Distinction between release and devel package websites

2014-07-22 Thread Hervé Pagès

Hi Andrzej,

On 07/22/2014 10:14 AM, Andrzej Oleś wrote:

Hi all,

I think having links is useful, e.g. for someone who uses BioC release
but wants to install by hand a particular package from the devel
branch.

Distinct colors between release and devel make sense only if one
understands their meaning, which in the end might prove not to be very
useful.


I was thinking of something like this:

  http://www.bioconductor.org/checkResults/3.0/data-experiment-LATEST/

Just a demo. This will be reset to the usual background tomorrow.

Cheers,
H.


I would rather recommend emphasizing the distinction between
release and devel in clear text across the package landing page,
possibly in multiple places, e.g. somewhere close to the actual
package version number; for instance, add the word devel after the
version number with a tooltip which will give some explanation/warning
that this is not the stable release version.

The concept of a notification box is far from ideal because it tends
to be annoing to the user and once dismissed 'forever' the user won't
be warned in the future.

I think that the actual problem arises from the fact that the release
landing pages are not clearly prioritized over the devel ones. Maybe
this could be  addressed by preventing the devel pages from being
harvested by google? It could make also sense to emphasize (bold face,
color, ...) the package release landing page on the result list
returned by the search engine on the BioC website. Currently, the
results for release and devel differ only in their relative path,
which can be easily overlooked, and both say Package Home, see
example below:

Bioconductor - DESeq2 - /packages/release/bioc/html/DESeq2.html
   Bioconductor - DESeq2 Home
Bioconductor - DESeq2 - /packages/devel/bioc/html/DESeq2.html
   Bioconductor - DESeq2 Home


Cheers,
Andrzej

On Tue, Jul 22, 2014 at 6:26 PM, James W. MacDonald jmac...@uw.edu wrote:

Given that we have an ongoing problem with people inadvisedly clicking and
installing things, is there a good rationale for allowing them to do so?

Perhaps the landing page for each package should be stripped of links and
replaced with some indication of the availability for each package on the
various operating systems. There could also be a note indicating that people
can install using biocLite().

Best,

Jim




On 7/22/2014 11:48 AM, Dan Tenenbaum wrote:


Seems like there are two problems, first that the release and devel pages
look similar, but more importantly that people are downloading and
installing from the package pages when they should be using biocLite().

I am open to the suggestions for making the release/devel pages look more
different from each other, but I think something needs to be done about the
second problem as well. Perhaps a popup that comes up when you click on a
package tarball saying The best way to install this is with biocLite(); are
you sure you want to download it?

Whatever we do probably won't happen until after BioC2014.

Dan


- Original Message -


From: Julian Gehring julian.gehr...@embl.de
To: Hervé Pagès hpa...@fhcrc.org, Michael Lawrence
lawrence.mich...@gene.com, Vincent Carey
st...@channing.harvard.edu
Cc: bioc-devel@r-project.org
Sent: Tuesday, July 22, 2014 8:07:29 AM
Subject: Re: [Bioc-devel] Distinction between release and devel package
websites

Hi,

Tooltips that appear while hovering over selected links are easy to
miss.  This alone will likely not be clear enough.  We should convey
the
information that the entire website presents a different version of
the
package.

The idea of a notification box that can be made visible by the
individual user seems tempting.  One can combine this with an
optional
cookie, to remember the state between browser sessions.

Changing the layout of the devel page itself will also be helpful to
make the distinction more pronounced.  Hopefully we could approach
this
in a way that maintains the nice design of the bioc website.

Best
Julian


On 21.07.2014 21:50, Hervé Pagès wrote:


Hi,

In addition to these suggestions, how about using a special
background
color for package landing pages in devel?

Cheers,
H.

On 07/21/2014 07:32 PM, Michael Lawrence wrote:


Or an unobtrusive notification box that drops down from the top
of the
page, saying something like this is devel; there would be a
dismiss
button and a checkbox for whether to show again. The user is free
to
simply
ignore it and proceed as normal.




On Mon, Jul 21, 2014 at 7:10 PM, Vincent Carey
st...@channing.harvard.edu
wrote:


how about a tooltip that reads installation via biocLite() is
the
recommended approach to Bioconductor software
acquisition, other approaches may lead to inconsistent
package-sets
that
appears when a reader hovers over a tarball.  i would imagine
that
this is
how the wrong package gets installed, by manually using an
inappropriate
tarball.

wrong documentation is not so easy but the doc on the devel
branch might
have a different tooltip

Re: [Bioc-devel] Distinction between release and devel package websites

2014-07-21 Thread Hervé Pagès

Hi,

In addition to these suggestions, how about using a special background
color for package landing pages in devel?

Cheers,
H.

On 07/21/2014 07:32 PM, Michael Lawrence wrote:

Or an unobtrusive notification box that drops down from the top of the
page, saying something like this is devel; there would be a dismiss
button and a checkbox for whether to show again. The user is free to simply
ignore it and proceed as normal.




On Mon, Jul 21, 2014 at 7:10 PM, Vincent Carey st...@channing.harvard.edu
wrote:


how about a tooltip that reads installation via biocLite() is the
recommended approach to Bioconductor software
acquisition, other approaches may lead to inconsistent package-sets that
appears when a reader hovers over a tarball.  i would imagine that this is
how the wrong package gets installed, by manually using an inappropriate
tarball.

wrong documentation is not so easy but the doc on the devel branch might
have a different tooltip cautioning the readers to be sure they want to
read the doc on the devel version.



On Mon, Jul 21, 2014 at 9:39 PM, Julian Gehring julian.gehr...@embl.de
wrote:


Hi,

Can we make the package websites for the devel and release version of a
package more distinguishable?

To elaborate on this: In the past, I have seen several users having
problems with using bioconductor because they ended up on the wrong page
(mostly the devel page when they would have needed the release).  This
resulted in getting the wrong documentation or installing the wrong
package.  The pages are well designed, and there is no reason to change
this.  However, the websites for the devel and release version of a

package

look almost identical, and that these two get confused seems to happen to
many users (me included).

If you search for a package within the bioc website, the release version
always comes first in the search results.  If you are coming from the
outside (e.g. google), this may not be the case.  In fact, googling a few
packages names often returned only the devel page in the top 10 search
results.

What are the feelings regarding this? We could add a header section on

the

devel page that states that this is an unstable version not meant to be
used in production settings, and provide a link to the respective release
version?

Best wishes
Julian

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] ggbio 1.13.11 fails to load due to namespace change in IRanges

2014-07-16 Thread Hervé Pagès
   colorspace_1.2-4 DBI_0.2-7
[16] dichromat_2.0-0  digest_0.6.4 fail_1.2
   foreach_1.4.2Formula_1.1-2
[21] GenomeInfoDb_1.1.12  GenomicAlignments_1.1.21
GenomicFeatures_1.17.12  GenomicRanges_1.17.24GGally_0.4.6
[26] grid_3.1.0   gridExtra_0.9.1  gtable_0.1.2
   Hmisc_3.14-4 IRanges_1.99.22
[31] iterators_1.0.7  lattice_0.20-29
latticeExtra_0.6-26  MASS_7.3-33  munsell_0.4.2
[36] plyr_1.8.1   proto_0.3-10
RColorBrewer_1.0-5   Rcpp_0.11.2  RCurl_1.95-4.1
[41] reshape_0.8.5reshape2_1.4
Rsamtools_1.17.31RSQLite_0.11.4   rtracklayer_1.25.13
[46] S4Vectors_0.1.2  scales_0.2.4 sendmailR_1.1-2
   splines_3.1.0stats4_3.1.0
[51] stringr_0.6.2survival_2.37-7  tools_3.1.0
   XML_3.98-1.1 XVector_0.5.7
[56] zlibbioc_1.11.1



packageVersion(ggbio)


[1] ‘1.13.11’





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Confusing namespace issue with IRanges 1.99.17

2014-07-08 Thread Hervé Pagès
 parallel_3.1.0
S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0

foo ## Load data
foo load(~/Desktop/DF.Rdata)

foo ## Run function
foo result - foo(DF)
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] fooPkg_0.0.1

loaded via a namespace (and not attached):
[1] BiocGenerics_0.11.3 IRanges_1.99.17 parallel_3.1.0
S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

  clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
  parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

  xtabs

The following objects are masked from ‘package:base’:

  anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get,
  intersect, is.unsorted, lapply, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
  rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unlist

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] IRanges_1.99.17 S4Vectors_0.1.0 BiocGenerics_0.11.3
fooPkg_0.0.1

loaded via a namespace (and not attached):
[1] stats4_3.1.0 tools_3.1.0







The same thing happens with the following setup:

R version 3.1.1 RC (2014-07-07 r66083)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices datasets  utils methods
[8] base

other attached packages:
[1] IRanges_1.99.17 S4Vectors_0.1.0 BiocGenerics_0.11.3
[4] fooPkg_0.0.1colorout_1.0-2

loaded via a namespace (and not attached):
[1] stats4_3.1.1 tools_3.1.1

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [BioC] granges() method for GenomicRanges objects akin to ranges()...

2014-07-08 Thread Hervé Pagès

Hi,

On 05/13/2014 01:15 AM, Julian Gehring wrote:

Hi,

In summary, would it be feasible to add to 'GenomicRanges'?

1) A 'granges(x, use.mcols=FALSE, ...)' method with signature 'GRanges'
that converts to a 'GRanges' object and optionally drops the mcols (if
'use.mcols' is TRUE)


Will do.



2) A 'dropMcols' or 'dropmcols' method with signature 'GRanges' that is
a wrapper for
   mcols(x) - NULL


How about setMcols(), which is more general than dropmcols()?

Thanks,
H.



If I can be of help in providing a patch for this, please let me know.

Best wishes
Julian



On 05.05.2014 23:29, Hervé Pagès wrote:

On 05/05/2014 02:12 PM, Cook, Malcolm wrote:

On 05/05/2014 01:00 PM, Cook, Malcolm wrote:

   Wondering,
  
   Is it too off the beaten track to expect
  
   `mcols-`(x,NULL)
  
  args(`mcols-`)
 function (x, ..., value)
  
  Arguments after the ellipsis must be named:
  
 `mcols-`(x, value=NULL)

Herve - Great - of course - so - does this not provide the means
requested by the original poster?


I think Tim also wanted 'x' to be downgraded to a GRanges instance,
like Julian's grangesPlain() does. We could use granges() for that.

Deciding of an idiom that can be used inline for just dropping the
mcols would be good too. `mcols-`(x, value=NULL) is a little bit
tricky, ugly, and error prone as you noticed. These are probably
enough reasons for not choosing it as *the* idiom. Its only advantage
is that it doesn't introduce a new symbol.

H.



  
  Nothing we can do about this.
  
  Cheers,
  H.
  
  
   to work?
  
   hint: it does not
  
 -Original Message-
 From: bioc-devel-boun...@r-project.org
[mailto:bioc-devel-boun...@r-project.org] On Behalf Of Hervé Pagès
 Sent: Monday, May 05, 2014 1:28 PM
 To: Kasper Daniel Hansen; Michael Lawrence
 Cc: Johnston, Jeffrey; ttri...@usc.edu;
bioc-devel@r-project.org; bioconduc...@r-project.org
 Subject: Re: [Bioc-devel] [BioC] granges() method for
GenomicRanges objects akin to ranges()...
 
 Hi,
 
 I have no problem using granges() for that. Just to clarify:
(a) it would propagate the names()
(b) it would drop the metadata()
(c) the mcols() would propagate only if 'use.mcols=TRUE' was
specified ('use.mcols' is FALSE by default)
(d) it would return a GRanges *instance* i.e. input object
'x'
would be downgraded to GRanges if it extends GRanges
 
 @Kasper: granges() on SummarizedExperiment ignores the
'use.mcols'
 arg and always propagates the mcols. Alternatively you can use
rowData()
 which also propagates the mcols. granges() is actually just an
alias
 for rowData() on SummarizedExperiment objects.
 
 H.
 
 
 On 05/05/2014 10:31 AM, Kasper Daniel Hansen wrote:
  I agree with Michael on this.
 
  I can see why, in some usage cases, granges() is convenient
to have with
  use.mcols=FALSE (which seems to have been added in the
latest release).
But in my usage of granges(), where I call granges() on
objects like
  SummarizedExperiments and friends, I have been expecting
granges() to give
  me the GRange component of the object.  Not a crippled
version of the
  GRange component.
 
  This is - to me - very counter intuitive and I wish I had
seen this
  earlier.  It is particular frustrating that this default is
part of the
  generic.
 
  Best,
  Kasper
 
 
  On Mon, May 5, 2014 at 12:11 PM, Michael Lawrence
lawrence.mich...@gene.com
  wrote:
 
  In my opinion, granges() is not very clear as to the
intent. The mcols are
  part of the GRanges, so why would calling granges() drop
them? I think we
  want something similar to unclass(), unname(), etc. This
why I suggested
  dropmcols().
 
 
 
 
  On Mon, May 5, 2014 at 8:17 AM, Tim Triche, Jr.
tim.tri...@gmail.com
  wrote:
 
  That's exactly what I was after -- the generic is already
defined, so why
  not use it?
 
  --t
 
  On May 5, 2014, at 7:42 AM, Julian Gehring
julian.gehr...@embl.de
  wrote:
 
  Hi,
 
  On 05.05.2014 16:22, Martin Morgan wrote:
  generalize as setMcols, like setNames? setMcols(x, NULL)
 
  How about Tim's original suggestion, to add a 'granges'
method that
  works on a 'GRanges' input?  The current definition
 
  granges(x, use.mcols=FALSE, ...)
 
  seem suited for this.
 
  Best wishes
  Julian
 
 
   [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] [BioC] granges() method for GenomicRanges objects akin to ranges()...

2014-07-08 Thread Hervé Pagès

On 07/08/2014 11:29 AM, Michael Lawrence wrote:

On Tue, Jul 8, 2014 at 10:36 AM, Julian Gehring julian.gehr...@embl.de
wrote:


Hi Herve,


  2) A 'dropMcols' or 'dropmcols' method with signature 'GRanges' that is

a wrapper for
mcols(x) - NULL



How about setMcols(), which is more general than dropmcols()?



Do you mean a function like:

   setMcols - function(x, value = NULL) {
 mcols(x) = value
 return(x)
   }

I'd be fine with this.  However, some argued before that setting to NULL
may be counterintuitive for non-advanced users.



Probably best to have both setMcols and dropMcols.


OK. Let's go for both.

Thanks,
H.






Best wishes
Julian


___
Bioconductor mailing list
bioconduc...@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.
science.biology.informatics.conductor



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] makeTranscriptDbFrom... AnnotationHub

2014-07-08 Thread Hervé Pagès

Hi Michael,

On 07/08/2014 12:11 PM, Michael Love wrote:

The recent TranscriptDb thread reminded me of a question: are there
plans (or am I missing the function) to easily get a TranscriptDb out
of the AnnotationHub objects? It would be great to have a preprocessed
Ensembl txdb like we have for UCSC.


I think the 1st thing we should do is have a 
makeTranscriptDbFromGRanges() function. It should not be too hard

because we already have the code :) Marc wrote it. But it's currently
part of the makeTranscriptDbFromGFF() function. Roughly speaking this
function does 2 things: (1) import the GFF or GTF file as a GRanges
object, then (2) turn that GRanges object into a TranscriptDb object.
So we should move the code that does (2) into a separate function,
the makeTranscriptDbFromGRanges() function, and have
makeTranscriptDbFromGFF() call it internally.

Then you could call makeTranscriptDbFromGRanges() on any of these
GFF- or GTF-based GRanges objects you get from AnnotationHub.

We'll work on this soon and announce here when it becomes available.

Cheers,
H.




ah - AnnotationHub()
gr - 
ah$ensembl.release.73.gtf.homo_sapiens.Homo_sapiens.GRCh37.73.gtf_0.0.1.RData
gr

GRanges with 2268089 ranges and 12 metadata columns:
 seqnames ranges strand   | source
Rle  IRanges  Rle   |   factor
 [1]1 [11869, 12227]  +   |   processed_transcript
 [2]1 [12613, 12721]  +   |   processed_transcript
 [3]1 [13221, 14409]  +   |   processed_transcript
 [4]1 [11872, 12227]  +   | unprocessed_pseudogene
 [5]1 [12613, 12721]  +   | unprocessed_pseudogene
 ...  ......... ......
   [2268085]   MT [14747, 15887]  +   | protein_coding
   [2268086]   MT [14747, 15887]  +   | protein_coding
   [2268087]   MT [14747, 14749]  +   | protein_coding
   [2268088]   MT [15888, 15953]  +   |Mt_tRNA
   [2268089]   MT [15956, 16023]  -   |Mt_tRNA
type score phase gene_id   transcript_id
factor numeric integer character character
 [1]exon  NA  NA ENSG0223972 ENST0456328
 [2]exon  NA  NA ENSG0223972 ENST0456328
 [3]exon  NA  NA ENSG0223972 ENST0456328
 [4]exon  NA  NA ENSG0223972 ENST0515242
 [5]exon  NA  NA ENSG0223972 ENST0515242
 ... ...   ...   ... ... ...
   [2268085]exon  NA  NA ENSG0198727 ENST0361789
   [2268086] CDS  NA 0 ENSG0198727 ENST0361789
   [2268087] start_codon  NA 0 ENSG0198727 ENST0361789
   [2268088]exon  NA  NA ENSG0210195 ENST0387460
   [2268089]exon  NA  NA ENSG0210196 ENST0387461
 exon_number   gene_name   gene_biotype transcript_name
   numeric charactercharacter character
 [1]   1 DDX11L1 pseudogene DDX11L1-002
 [2]   2 DDX11L1 pseudogene DDX11L1-002
 [3]   3 DDX11L1 pseudogene DDX11L1-002
 [4]   1 DDX11L1 pseudogene DDX11L1-201
 [5]   2 DDX11L1 pseudogene DDX11L1-201
 ... ... ...... ...
   [2268085]   1  MT-CYB protein_coding  MT-CYB-201
   [2268086]   1  MT-CYB protein_coding  MT-CYB-201
   [2268087]   1  MT-CYB protein_coding  MT-CYB-201
   [2268088]   1   MT-TTMt_tRNA   MT-TT-201
   [2268089]   1   MT-TPMt_tRNA   MT-TP-201
 exon_id  protein_id
 character character
 [1] ENSE2234944NA
 [2] ENSE3582793NA
 [3] ENSE2312635NA
 [4] ENSE2234632NA
 [5] ENSE3608237NA
 ... ... ...
   [2268085] ENSE1436074NA
   [2268086]NA ENSP0354554
   [2268087]NANA
   [2268088] ENSE1544475NA
   [2268089] ENSE1544473NA
   ---
   seqlengths:
  1   2 ...  MT
 NA  NA ...  NA

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O

<    4   5   6   7   8   9   10   11   12   13   >