Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq
Took that thread to the devel list, just feels more appropriate with regards to the content. I already have that on my TODO list :-). This is not up-to-date, i.e. I haven’t done the comparison in ~2 years, but last time I did, genomeIntervals attribute parsing was faster than rtracklayer equivalent. I suppose that’s because it is already implemented in C in genomeIntervals. As said I don’t have any actual comparative numbers, still you might want to have a look at the genomeIntervals code. As I don’t think that genomeIntervals get as much exposition as rtracklayer does, many more people would benefit from an equivalent rtracklayer implementation. If you’re interested, I could do a performance comparison - based on my usual use case - between both packages. Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com wrote: It might be worth taking a look at rtracklayer and the TranscriptDb stuff in GenomicFeatures. It could save you time, and if you notice any deficiencies in rtracklayer, it would help me. For example, if the attribute parsing is a bottleneck, I can push it down to C. Michael On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Michael, Good question really. I have a number of reason for this: 1) I’ve been using the genomeIntervals readGff3 function for that - for years now - and I’ve always been satisfied by its performance, especially when parsing the gff/gtf ninth column. The parseGffAttribute and getGffAttribute functions are extremely convenient. I honestly haven’t checked if there was any recent development in rtracklayer / GenomicFeatures similar to these functions. If there were not, I think they would be a great addition to either package. 2) As you might guess it’s essentially historical, back when I started that package in 2009, there was not today’s fantastic set of packages. 3) As you painfully know, there’s about as many gff format as they are gff files, and because my package is a pipeline I really want to make sure that it’s output is consistent, hence I have strict requirement with regards to the gff/gtf format I accept. Which means that times and again, I have to do slight adjustment but I prefer that over outputting garbage. 4) RNA-Seq analyses are filled with pitfalls, hence I think it is essential that users understand the data formats they handle and actually what these analyses are all about. I don’t want them to use my package as they would use a black box. 5) It’s educational. There’s a vignette that describes how to parse and convert gff/gtf annotation in the minimal gff/gtf formatted file that would suit my package Well, I suppose it’s more than you asked for, but here are my reasons ;-) You’re welcome to comment and I’d be happy to look again at rtracklayer (been through GenomicFeatures recently and I like it much) if you would advise me so. Have a nice WE, Cheers, Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 12:44, Michael Lawrence lawrence.mich...@gene.com wrote: Why not use rtracklayer / GenomicFeatures for parsing GTF? That format is tough; no reason for everyone to take it on by themselves. On Fri, Nov 15, 2013 at 2:40 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Natalia! There were a number of lines in that particular gtf that violated the assumptions I had about EnsEMBL gtf. Not all the fields in the attributes' column were always set and one of the gene name had a space character in it. I’ve made the parsing of gtf file annotation more flexible/lenient and that should resolve that particular issue you had. The changes should propagate in ~2 days to Bioc with easyRNASeq version 1.8.2. Rather than using the geneModel, which implementation is old and has gotten slow because of changes in the underlying architecture, I prefer an approach where I 1) filter the gtf / gff annotation file for only those lines I’m interested in (e.g. of type exon, mRNA and gene for a gff file) 2) collapse every exon of a gene into what I call now a “synthetic transcript”. The reason for changing the naming from geneModel to synthetic transcript is that “gene model” has different meaning depending on the
Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq
Doesn't look like genomeIntervals has any C code (?), so a performance comparison would be interesting. rtracklayer jumps through all sorts of hoops to handle obscure things like URL encoding in GFF3. The code in genomeIntervals seems more streamlined. On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote: Took that thread to the devel list, just feels more appropriate with regards to the content. I already have that on my TODO list :-). This is not up-to-date, i.e. I havent done the comparison in ~2 years, but last time I did, genomeIntervals attribute parsing was faster than rtracklayer equivalent. I suppose thats because it is already implemented in C in genomeIntervals. As said I dont have any actual comparative numbers, still you might want to have a look at the genomeIntervals code. As I dont think that genomeIntervals get as much exposition as rtracklayer does, many more people would benefit from an equivalent rtracklayer implementation. If youre interested, I could do a performance comparison - based on my usual use case - between both packages. Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com wrote: It might be worth taking a look at rtracklayer and the TranscriptDb stuff in GenomicFeatures. It could save you time, and if you notice any deficiencies in rtracklayer, it would help me. For example, if the attribute parsing is a bottleneck, I can push it down to C. Michael On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Michael, Good question really. I have a number of reason for this: 1) Ive been using the genomeIntervals readGff3 function for that - for years now - and Ive always been satisfied by its performance, especially when parsing the gff/gtf ninth column. The parseGffAttribute and getGffAttribute functions are extremely convenient. I honestly havent checked if there was any recent development in rtracklayer / GenomicFeatures similar to these functions. If there were not, I think they would be a great addition to either package. 2) As you might guess its essentially historical, back when I started that package in 2009, there was not todays fantastic set of packages. 3) As you painfully know, theres about as many gff format as they are gff files, and because my package is a pipeline I really want to make sure that its output is consistent, hence I have strict requirement with regards to the gff/gtf format I accept. Which means that times and again, I have to do slight adjustment but I prefer that over outputting garbage. 4) RNA-Seq analyses are filled with pitfalls, hence I think it is essential that users understand the data formats they handle and actually what these analyses are all about. I dont want them to use my package as they would use a black box. 5) Its educational. Theres a vignette that describes how to parse and convert gff/gtf annotation in the minimal gff/gtf formatted file that would suit my package Well, I suppose its more than you asked for, but here are my reasons ;-) Youre welcome to comment and Id be happy to look again at rtracklayer (been through GenomicFeatures recently and I like it much) if you would advise me so. Have a nice WE, Cheers, Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 12:44, Michael Lawrence lawrence.mich...@gene.com wrote: Why not use rtracklayer / GenomicFeatures for parsing GTF? That format is tough; no reason for everyone to take it on by themselves. On Fri, Nov 15, 2013 at 2:40 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Natalia! There were a number of lines in that particular gtf that violated the assumptions I had about EnsEMBL gtf. Not all the fields in the attributes' column were always set and one of the gene name had a space character in it. Ive made the parsing of gtf file annotation more flexible/lenient and that should resolve that particular issue you had. The changes should propagate in ~2 days to Bioc with easyRNASeq version 1.8.2. Rather than using the geneModel, which implementation is old and has gotten slow because of changes in the underlying architecture, I prefer an approach where I 1)
Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq
On 11/15/2013 10:22 AM, Michael Lawrence wrote: Doesn't look like genomeIntervals has any C code (?), so a performance comparison would be interesting. rtracklayer jumps through all sorts of hoops to handle obscure things like URL encoding in GFF3. The code in genomeIntervals seems more streamlined. Wanted to mention, and it would be good to know if this was not helpful at all, that the Ensembl gtf files are available through AnnotationHub as GRanges objects library(AnnotationHub) hub = AnnotationHub() hub$ensembl.release.73.tab hub$ensembl.release.73.fasta. ... [378] hub$ensembl.release.73.gtf. ... [63] xx = hub$ensembl.release.73.gtf.gallus_gallus.Gallus_gallus.Galgal4.73.gtf_0.0.1.RData xx GRanges with 381368 ranges and 12 metadata columns: seqnames ranges strand | sourcetype RleIRanges Rle | factorfactor [1] 1 [1735, 2449] + | protein_codingexon [2] 1 [2379, 2449] + | protein_coding CDS score phasegene_id transcript_id numeric integercharactercharacter [1] NA NA ENSGALG0009771 ENSGALT0015891 [2] NA 0 ENSGALG0009771 ENSGALT0015891 exon_number gene_biotypeexon_id protein_id numericcharactercharactercharacter [1] 1 protein_coding ENSGALE0301221 NA [2] 1 protein_coding NA ENSGALP0015874 gene_nametranscript_name charactercharacter [1] NA NA [2] NA NA [ reached getOption(max.print) -- omitted 9 rows ] --- seqlengths: 1 2 ... AADN03010940.1 NA NA ... NA Martin On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote: Took that thread to the devel list, just feels more appropriate with regards to the content. I already have that on my TODO list :-). This is not up-to-date, i.e. I haven’t done the comparison in ~2 years, but last time I did, genomeIntervals attribute parsing was faster than rtracklayer equivalent. I suppose that’s because it is already implemented in C in genomeIntervals. As said I don’t have any actual comparative numbers, still you might want to have a look at the genomeIntervals code. As I don’t think that genomeIntervals get as much exposition as rtracklayer does, many more people would benefit from an equivalent rtracklayer implementation. If you’re interested, I could do a performance comparison - based on my usual use case - between both packages. Nico --- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --- On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com wrote: It might be worth taking a look at rtracklayer and the TranscriptDb stuff in GenomicFeatures. It could save you time, and if you notice any deficiencies in rtracklayer, it would help me. For example, if the attribute parsing is a bottleneck, I can push it down to C. Michael On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de wrote: Hej Michael, Good question really. I have a number of reason for this: 1) I’ve been using the genomeIntervals readGff3 function for that - for years now - and I’ve always been satisfied by its performance, especially when parsing the gff/gtf ninth column. The parseGffAttribute and getGffAttribute functions are extremely convenient. I honestly haven’t checked if there was any recent development in rtracklayer / GenomicFeatures similar to these functions. If there were not, I think they would be a great addition to either package. 2) As you might guess it’s essentially historical, back when I started that package in 2009, there was not today’s fantastic set of packages. 3) As you painfully know, there’s about as many gff format as they are gff files, and because my package is a pipeline I really want to make sure that it’s output is consistent, hence I have strict requirement with regards to the gff/gtf format I accept. Which means that times and again, I have to do slight adjustment but I prefer that over outputting garbage. 4) RNA-Seq analyses are filled with pitfalls, hence I think it is essential that users understand the data formats they handle and actually what these analyses are all about. I don’t want them to use my package as they would use a black box. 5) It’s educational. There’s a vignette that describes how to parse
[Rd] Trouble running Rtools31 on Wine
Hi An attempt to use R and Rtools in Wine fails, see the bug report to Wine: http://bugs.winehq.org/show_bug.cgi?id=34865 The people there say that Rtools uses an outdated Cygwin DLL with a custom patch. Is there any chance we can upgrade our Cygwin DLL to a supported upstream version? Thanks. Cheers Kirill [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD SHLIB error bad value (core2) for -mtune= switch
I am getting the following error. Please help if the problem is resolved: $ make 0 [main] make 10092 stdio_init: couldn't make stderr distinct from stdout R CMD SHLIB src/C/util.c src/C/factor_model_util.c src/C/pagerank.c src/C/hierarchical.c src/C/factor_model_multicontext.c src/C/fact or_model_util2.cpp -o lib/c_funcs.so Segmentation fault - Pranav Waila pranav.wa...@gmail.com Research Scholar DST - Centre for Interdisciplinary Mathematical Sciences Faculty of Science Banaras Hindu University,Varanasi-221005 -- View this message in context: http://r.789695.n4.nabble.com/R-CMD-SHLIB-error-bad-value-core2-for-mtune-switch-tp4645928p4680510.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD SHLIB error bad value (core2) for -mtune= switch
No information has been given, where this error occurred. What were you doing? Simon On 15 Nov 2013, at 10:49, pranav.waila pranav.wa...@gmail.com wrote: I am getting the following error. Please help if the problem is resolved: $ make 0 [main] make 10092 stdio_init: couldn't make stderr distinct from stdout R CMD SHLIB src/C/util.c src/C/factor_model_util.c src/C/pagerank.c src/C/hierarchical.c src/C/factor_model_multicontext.c src/C/fact or_model_util2.cpp -o lib/c_funcs.so Segmentation fault - Pranav Waila pranav.wa...@gmail.com Research Scholar DST - Centre for Interdisciplinary Mathematical Sciences Faculty of Science Banaras Hindu University,Varanasi-221005 -- View this message in context: http://r.789695.n4.nabble.com/R-CMD-SHLIB-error-bad-value-core2-for-mtune-switch-tp4645928p4680510.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Trouble running Rtools31 on Wine
On 13-11-13 5:38 AM, Kirill Müller wrote: Hi An attempt to use R and Rtools in Wine fails, see the bug report to Wine: http://bugs.winehq.org/show_bug.cgi?id=34865 The people there say that Rtools uses an outdated Cygwin DLL with a custom patch. Is there any chance we can upgrade our Cygwin DLL to a supported upstream version? Thanks. Rtools doesn't use any special Cygwin dlls. The one in the current distribution is rather old, so if it doesn't work in Wine, just go to the Cygwin site and install a newer Cygwin version. (Not sure if this is easy in Wine, but you could do it on a Windows machine, and copy the files over.) Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel