Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq

2013-11-16 Thread Nicolas Delhomme
Hej Martin!

That is indeed extremely useful. I’ll add that to my vignette as a source for 
annotation. One thing that comes immediately to mind is to have AnnotationHub 
and GenomicFeatures interact. I could imagine it would be useful to have e.g. a 
makeTranscriptDbFromAnnotationHub function. I admit I haven’t though about this 
in detail, but I already can think of use cases where such a function would 
come in handy.

Nico

---
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delho...@embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---





On 15 Nov 2013, at 20:13, Martin Morgan mtmor...@fhcrc.org wrote:

 On 11/15/2013 10:22 AM, Michael Lawrence wrote:
 Doesn't look like genomeIntervals has any C code (?), so a performance
 comparison would be interesting. rtracklayer jumps through all sorts of
 hoops to handle obscure things like URL encoding in GFF3. The code in
 genomeIntervals seems more streamlined.
 
 Wanted to mention, and it would be good to know if this was not helpful at 
 all, that the Ensembl gtf files are available through AnnotationHub as 
 GRanges objects
 
  library(AnnotationHub)
  hub = AnnotationHub()
  hub$ensembl.release.73.tab
 hub$ensembl.release.73.fasta. ... [378]
 hub$ensembl.release.73.gtf. ... [63]
  xx = 
  hub$ensembl.release.73.gtf.gallus_gallus.Gallus_gallus.Galgal4.73.gtf_0.0.1.RData
  xx
 GRanges with 381368 ranges and 12 metadata columns:
 seqnames   ranges strand   | sourcetype
RleIRanges  Rle   |   factorfactor
   [1]  1 [1735, 2449]  +   | protein_codingexon
   [2]  1 [2379, 2449]  +   | protein_coding CDS
   score phasegene_id  transcript_id
   numeric integercharactercharacter
   [1]  NA  NA ENSGALG0009771 ENSGALT0015891
   [2]  NA 0 ENSGALG0009771 ENSGALT0015891
   exon_number   gene_biotypeexon_id protein_id
 numericcharactercharactercharacter
   [1]   1 protein_coding ENSGALE0301221   NA
   [2]   1 protein_coding   NA ENSGALP0015874
gene_nametranscript_name
  charactercharacter
   [1]   NA   NA
   [2]   NA   NA
 [ reached getOption(max.print) -- omitted 9 rows ]
  ---
  seqlengths:
1  2 ... AADN03010940.1
   NA NA ... NA
 
 Martin
 
 
 
 
 
 
 
 On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote:
 
 Took that thread to the devel list, just feels more appropriate with
 regards to the content.
 
 I already have that on my TODO list :-). This is not up-to-date, i.e. I
 haven’t done the comparison in ~2 years, but last time I did,
 genomeIntervals attribute parsing was faster than rtracklayer equivalent. I
 suppose that’s because it is already implemented in C in genomeIntervals.
 As said I don’t have any actual comparative numbers, still you might want
 to have a look at the genomeIntervals code. As I don’t think that
 genomeIntervals get as much exposition as rtracklayer does, many more
 people would benefit from an equivalent rtracklayer implementation. If
 you’re interested, I could do a performance comparison - based on my usual
 use case - between both packages.
 
 Nico
 
 ---
 Nicolas Delhomme
 
 Genome Biology Computational Support
 
 European Molecular Biology Laboratory
 
 Tel: +49 6221 387 8310
 Email: nicolas.delho...@embl.de
 Meyerhofstrasse 1 - Postfach 10.2209
 69102 Heidelberg, Germany
 ---
 
 
 
 
 
 On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com
 wrote:
 
 It might be worth taking a look at rtracklayer and the TranscriptDb
 stuff in GenomicFeatures. It could save you time, and if you notice any
 deficiencies in rtracklayer, it would help me. For example, if the
 attribute parsing is a bottleneck, I can push it down to C.
 
 Michael
 
 On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de
 wrote:
 Hej Michael,
 
 Good question really. I have a number of reason for this:
 
 1) I’ve been using the genomeIntervals readGff3 function for that - for
 years now - and I’ve always been satisfied by its performance, especially
 when parsing the gff/gtf ninth column. The parseGffAttribute and
 getGffAttribute functions are extremely convenient. I honestly haven’t
 checked if there was any recent development in rtracklayer /
 GenomicFeatures similar to 

Re: [Rd] Linking to native routines in other packages

2013-11-16 Thread Romain Francois

Le 16/11/2013 11:02, Romain Francois a écrit :

Hello,

I'm currently working on making Rcpp use the feature described here more:
http://cran.r-project.org/doc/manuals/R-exts.html#Linking-to-native-routines-in-other-packages


To give more context, Rcpp has for a long time built what we called the
Rcpp user library, i.e. a library we could link against user the
linker. We were then producing appropriate linker flag with
Rcpp:::LdFlags(), ...

Now, I'm moving away from this and the intention is that a package using
Rcpp would only have to use

LinkingTo: Rcpp

This sets the -I flag as before to find headers from Rcpp, but this also
now takes advantage of what is described in writing R extensions:
http://cran.r-project.org/doc/manuals/R-exts.html#Linking-to-native-routines-in-other-packages


I'm doing this in a way that, when we are not compiling Rcpp, for
example the type2name function is defined in Rcpp's headers as an
inline function that grabs the registered function from Rcpp.

inline const char* type2name(SEXP x){
 typedef const char* (*Fun)(SEXP) ;
 static Fun fun = GET_(Fun) R_GetCCallable( Rcpp, type2name) ;
 return fun(x) ;
 }


This works great.


Now for the question. The documentation says:

It must also specify ‘Imports’ or ‘Depends’ of those packages, for they
have to be loaded prior to this one (so the path to their compiled code
has been registered).

Indeed if I don't have Depends or Imports, the R_init_Rcpp is not
called, and so the function is not registered.

But now if I do Depends: Rcpp or Imports: Rcpp for the sole purpose of
this LinkingTo mechanism, I'm getting

* checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘Rcpp’
   All declared Imports should be used.
See the information on DESCRIPTION files in the chapter ‘Creating R
packages’ of the ‘Writing R Extensions’ manual.

It would be simple enough to require of our users that they have
Imports: Rcpp and import(Rcpp) or less in their NAMESPACE, but I was
wondering if we could make this more transparent, i.e. having LinkingTo:
Rcpp mean loading Rcpp.

I'm also curious about this sentence from the doc:

In the future R may provide some automated tools to simplify exporting
larger numbers of routines.

Is there a draft of something ?

Romain



For details on how we will be using LinkingTo. Please see:

https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/routines.h

where depending
- when we are compiling Rcpp, we just have a declaration of the
function, which is then defined in any of our .cpp files.
- when we are using Rcpp from another package, we are retrieving the
function

https://github.com/RcppCore/Rcpp/blob/master/src/Rcpp_init.cpp

where the functions are registered with the RCPP_REGISTER macro.

This way of using it moves all the logic to the package exposing its
functions. I find this nicer to use than other tactics I've seen, such
as the sub technique from Matrix ...


Typo alert. Of course here I meant the stub technique.

Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] serialization for external pointers

2013-11-16 Thread Romain Francois

Hello,

Are there any recipe to handle serialization / deserialization of 
external pointers.


I'm thinking about something similar in spirit to the way we handle 
finalization of external pointers.


Currently, if we create an external pointer, save the session, quit R, 
then load the session, we get a null pointer.


One way I'm thinking of is to have an environment in the protected 
part of the xp, then have an active binding there, since apparently 
active bindings:

 - are get during serialization
 - lose their active ness when reloaded:

$ R
[...]
 f - local( {
+ x - 1
+ function(v) {
+if (missing(v))
+cat(get\n)
+else {
+cat(set\n)
+x - v
+}
+x
+ }
+ })
 makeActiveBinding(fred, f, .GlobalEnv)
 bindingIsActive(fred, .GlobalEnv)
[1] TRUE

 q(yes)
get
get


romain@naxos /tmp $ R
[..]
 fred
[1] 1
 bindingIsActive(fred, .GlobalEnv)
[1] FALSE

Is this possible ? Is there any other hook to handle serialization, 
unserialization of external pointers ?


Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] serialization for external pointers

2013-11-16 Thread Romain Francois

Le 16/11/2013 14:30, Romain Francois a écrit :

Hello,

Are there any recipe to handle serialization / deserialization of
external pointers.

I'm thinking about something similar in spirit to the way we handle
finalization of external pointers.

Currently, if we create an external pointer, save the session, quit R,
then load the session, we get a null pointer.

One way I'm thinking of is to have an environment in the protected
part of the xp, then have an active binding there, since apparently
active bindings:
  - are get during serialization
  - lose their active ness when reloaded:


This will not work. Apparently the active feature is kept on other 
environments:


$ R
[...]
 f - local( {
+ x - 1
+ function(v) {
+if (missing(v))
+cat(get\n)
+else {
+cat(set\n)
+x - v
+}
+x
+ }
+ })
 makeActiveBinding(fred, f, .GlobalEnv)
 bindingIsActive(fred, .GlobalEnv)
[1] TRUE

 e - new.env()
 makeActiveBinding(fred, f, e)
 bindingIsActive(fred, e)
[1] TRUE

 q()
Save workspace image? [y/n/c]: y
get
get

Then:

$ R
[...]
 e
environment: 0x104c56f78
 bindingIsActive(fred, .GlobalEnv)
[1] FALSE
 bindingIsActive(fred, e)
[1] TRUE

Is this the expected behavior ?

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] serialization for external pointers

2013-11-16 Thread Simon Urbanek
On Nov 16, 2013, at 8:30 AM, Romain Francois rom...@r-enthusiasts.com wrote:

 Hello,
 
 Are there any recipe to handle serialization / deserialization of external 
 pointers.
 
 I'm thinking about something similar in spirit to the way we handle 
 finalization of external pointers.
 

See refhook in serialize/unserialize.


 Currently, if we create an external pointer, save the session, quit R, then 
 load the session, we get a null pointer.
 

Saving the session is a different story than serialization, because it also 
involves loading the right code in addition to the data and you cannot use 
refhooks. I recall discussing this with Luke a few years ago, and the main 
problem is that R has no way of knowing what is needed to call the particular 
hooks, because the package is not even loaded, so it cannot install hooks. 
AFAIR the net result was that the proper way to do it is to handle the NULL 
pointer code when you first access the pointer  to restore the object based on 
additional info that you store in the object. E.g. , in rJava we store the Java 
serialization of the object so it can be restored on first use of the new 
session.

One piece that is still missing it the ability to set a hook to update the save 
object on save() - since we don’t necessarily want to add the extra information 
every time the object is created or the serialization may become stale over 
time. That would still be very useful to add …

Cheers,
Simon


 One way I'm thinking of is to have an environment in the protected part of 
 the xp, then have an active binding there, since apparently active bindings:
 - are get during serialization
 - lose their active ness when reloaded:
 
 $ R
 [...]
  f - local( {
 + x - 1
 + function(v) {
 +if (missing(v))
 +cat(get\n)
 +else {
 +cat(set\n)
 +x - v
 +}
 +x
 + }
 + })
  makeActiveBinding(fred, f, .GlobalEnv)
  bindingIsActive(fred, .GlobalEnv)
 [1] TRUE
 
  q(yes)
 get
 get
 
 
 romain@naxos /tmp $ R
 [..]
  fred
 [1] 1
  bindingIsActive(fred, .GlobalEnv)
 [1] FALSE
 
 Is this possible ? Is there any other hook to handle serialization, 
 unserialization of external pointers ?
 
 Romain
 
 -- 
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel