Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq
Hej Martin! That is indeed extremely useful. I’ll add that to my vignette as a source for annotation. One thing that comes immediately to mind is to have AnnotationHub and GenomicFeatures interact. I could imagine it would be useful to have e.g. a makeTranscriptDbFromAnnotationHub function. I admit I haven’t though about this in detail, but I already can think of use cases where such a function would come in handy. Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 20:13, Martin Morgan mtmor...@fhcrc.org wrote: On 11/15/2013 10:22 AM, Michael Lawrence wrote: Doesn't look like genomeIntervals has any C code (?), so a performance comparison would be interesting. rtracklayer jumps through all sorts of hoops to handle obscure things like URL encoding in GFF3. The code in genomeIntervals seems more streamlined. Wanted to mention, and it would be good to know if this was not helpful at all, that the Ensembl gtf files are available through AnnotationHub as GRanges objects library(AnnotationHub) hub = AnnotationHub() hub$ensembl.release.73.tab hub$ensembl.release.73.fasta. ... [378] hub$ensembl.release.73.gtf. ... [63] xx = hub$ensembl.release.73.gtf.gallus_gallus.Gallus_gallus.Galgal4.73.gtf_0.0.1.RData xx GRanges with 381368 ranges and 12 metadata columns: seqnames ranges strand | sourcetype RleIRanges Rle | factorfactor [1] 1 [1735, 2449] + | protein_codingexon [2] 1 [2379, 2449] + | protein_coding CDS score phasegene_id transcript_id numeric integercharactercharacter [1] NA NA ENSGALG0009771 ENSGALT0015891 [2] NA 0 ENSGALG0009771 ENSGALT0015891 exon_number gene_biotypeexon_id protein_id numericcharactercharactercharacter [1] 1 protein_coding ENSGALE0301221 NA [2] 1 protein_coding NA ENSGALP0015874 gene_nametranscript_name charactercharacter [1] NA NA [2] NA NA [ reached getOption(max.print) -- omitted 9 rows ] --- seqlengths: 1 2 ... AADN03010940.1 NA NA ... NA Martin On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote: Took that thread to the devel list, just feels more appropriate with regards to the content. I already have that on my TODO list :-). This is not up-to-date, i.e. I haven’t done the comparison in ~2 years, but last time I did, genomeIntervals attribute parsing was faster than rtracklayer equivalent. I suppose that’s because it is already implemented in C in genomeIntervals. As said I don’t have any actual comparative numbers, still you might want to have a look at the genomeIntervals code. As I don’t think that genomeIntervals get as much exposition as rtracklayer does, many more people would benefit from an equivalent rtracklayer implementation. If you’re interested, I could do a performance comparison - based on my usual use case - between both packages. Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com wrote: It might be worth taking a look at rtracklayer and the TranscriptDb stuff in GenomicFeatures. It could save you time, and if you notice any deficiencies in rtracklayer, it would help me. For example, if the attribute parsing is a bottleneck, I can push it down to C. Michael On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Michael, Good question really. I have a number of reason for this: 1) I’ve been using the genomeIntervals readGff3 function for that - for years now - and I’ve always been satisfied by its performance, especially when parsing the gff/gtf ninth column. The parseGffAttribute and getGffAttribute functions are extremely convenient. I honestly haven’t checked if there was any recent development in rtracklayer / GenomicFeatures similar to
Re: [Rd] Linking to native routines in other packages
Le 16/11/2013 11:02, Romain Francois a écrit : Hello, I'm currently working on making Rcpp use the feature described here more: http://cran.r-project.org/doc/manuals/R-exts.html#Linking-to-native-routines-in-other-packages To give more context, Rcpp has for a long time built what we called the Rcpp user library, i.e. a library we could link against user the linker. We were then producing appropriate linker flag with Rcpp:::LdFlags(), ... Now, I'm moving away from this and the intention is that a package using Rcpp would only have to use LinkingTo: Rcpp This sets the -I flag as before to find headers from Rcpp, but this also now takes advantage of what is described in writing R extensions: http://cran.r-project.org/doc/manuals/R-exts.html#Linking-to-native-routines-in-other-packages I'm doing this in a way that, when we are not compiling Rcpp, for example the type2name function is defined in Rcpp's headers as an inline function that grabs the registered function from Rcpp. inline const char* type2name(SEXP x){ typedef const char* (*Fun)(SEXP) ; static Fun fun = GET_(Fun) R_GetCCallable( Rcpp, type2name) ; return fun(x) ; } This works great. Now for the question. The documentation says: It must also specify ‘Imports’ or ‘Depends’ of those packages, for they have to be loaded prior to this one (so the path to their compiled code has been registered). Indeed if I don't have Depends or Imports, the R_init_Rcpp is not called, and so the function is not registered. But now if I do Depends: Rcpp or Imports: Rcpp for the sole purpose of this LinkingTo mechanism, I'm getting * checking dependencies in R code ... NOTE Namespace in Imports field not imported from: ‘Rcpp’ All declared Imports should be used. See the information on DESCRIPTION files in the chapter ‘Creating R packages’ of the ‘Writing R Extensions’ manual. It would be simple enough to require of our users that they have Imports: Rcpp and import(Rcpp) or less in their NAMESPACE, but I was wondering if we could make this more transparent, i.e. having LinkingTo: Rcpp mean loading Rcpp. I'm also curious about this sentence from the doc: In the future R may provide some automated tools to simplify exporting larger numbers of routines. Is there a draft of something ? Romain For details on how we will be using LinkingTo. Please see: https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/routines.h where depending - when we are compiling Rcpp, we just have a declaration of the function, which is then defined in any of our .cpp files. - when we are using Rcpp from another package, we are retrieving the function https://github.com/RcppCore/Rcpp/blob/master/src/Rcpp_init.cpp where the functions are registered with the RCPP_REGISTER macro. This way of using it moves all the logic to the package exposing its functions. I find this nicer to use than other tactics I've seen, such as the sub technique from Matrix ... Typo alert. Of course here I meant the stub technique. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] serialization for external pointers
Hello, Are there any recipe to handle serialization / deserialization of external pointers. I'm thinking about something similar in spirit to the way we handle finalization of external pointers. Currently, if we create an external pointer, save the session, quit R, then load the session, we get a null pointer. One way I'm thinking of is to have an environment in the protected part of the xp, then have an active binding there, since apparently active bindings: - are get during serialization - lose their active ness when reloaded: $ R [...] f - local( { + x - 1 + function(v) { +if (missing(v)) +cat(get\n) +else { +cat(set\n) +x - v +} +x + } + }) makeActiveBinding(fred, f, .GlobalEnv) bindingIsActive(fred, .GlobalEnv) [1] TRUE q(yes) get get romain@naxos /tmp $ R [..] fred [1] 1 bindingIsActive(fred, .GlobalEnv) [1] FALSE Is this possible ? Is there any other hook to handle serialization, unserialization of external pointers ? Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] serialization for external pointers
Le 16/11/2013 14:30, Romain Francois a écrit : Hello, Are there any recipe to handle serialization / deserialization of external pointers. I'm thinking about something similar in spirit to the way we handle finalization of external pointers. Currently, if we create an external pointer, save the session, quit R, then load the session, we get a null pointer. One way I'm thinking of is to have an environment in the protected part of the xp, then have an active binding there, since apparently active bindings: - are get during serialization - lose their active ness when reloaded: This will not work. Apparently the active feature is kept on other environments: $ R [...] f - local( { + x - 1 + function(v) { +if (missing(v)) +cat(get\n) +else { +cat(set\n) +x - v +} +x + } + }) makeActiveBinding(fred, f, .GlobalEnv) bindingIsActive(fred, .GlobalEnv) [1] TRUE e - new.env() makeActiveBinding(fred, f, e) bindingIsActive(fred, e) [1] TRUE q() Save workspace image? [y/n/c]: y get get Then: $ R [...] e environment: 0x104c56f78 bindingIsActive(fred, .GlobalEnv) [1] FALSE bindingIsActive(fred, e) [1] TRUE Is this the expected behavior ? -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] serialization for external pointers
On Nov 16, 2013, at 8:30 AM, Romain Francois rom...@r-enthusiasts.com wrote: Hello, Are there any recipe to handle serialization / deserialization of external pointers. I'm thinking about something similar in spirit to the way we handle finalization of external pointers. See refhook in serialize/unserialize. Currently, if we create an external pointer, save the session, quit R, then load the session, we get a null pointer. Saving the session is a different story than serialization, because it also involves loading the right code in addition to the data and you cannot use refhooks. I recall discussing this with Luke a few years ago, and the main problem is that R has no way of knowing what is needed to call the particular hooks, because the package is not even loaded, so it cannot install hooks. AFAIR the net result was that the proper way to do it is to handle the NULL pointer code when you first access the pointer to restore the object based on additional info that you store in the object. E.g. , in rJava we store the Java serialization of the object so it can be restored on first use of the new session. One piece that is still missing it the ability to set a hook to update the save object on save() - since we don’t necessarily want to add the extra information every time the object is created or the serialization may become stale over time. That would still be very useful to add … Cheers, Simon One way I'm thinking of is to have an environment in the protected part of the xp, then have an active binding there, since apparently active bindings: - are get during serialization - lose their active ness when reloaded: $ R [...] f - local( { + x - 1 + function(v) { +if (missing(v)) +cat(get\n) +else { +cat(set\n) +x - v +} +x + } + }) makeActiveBinding(fred, f, .GlobalEnv) bindingIsActive(fred, .GlobalEnv) [1] TRUE q(yes) get get romain@naxos /tmp $ R [..] fred [1] 1 bindingIsActive(fred, .GlobalEnv) [1] FALSE Is this possible ? Is there any other hook to handle serialization, unserialization of external pointers ? Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel