Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq

2013-11-15 Thread Nicolas Delhomme
Took that thread to the devel list, just feels more appropriate with regards to 
the content.

I already have that on my TODO list :-). This is not up-to-date, i.e. I haven’t 
done the comparison in ~2 years, but last time I did, genomeIntervals attribute 
parsing was faster than rtracklayer equivalent. I suppose that’s because it is 
already implemented in C in genomeIntervals. As said I don’t have any actual 
comparative numbers, still you might want to have a look at the genomeIntervals 
code. As I don’t think that genomeIntervals get as much exposition as 
rtracklayer does, many more people would benefit from an equivalent rtracklayer 
implementation. If you’re interested, I could do a performance comparison - 
based on my usual use case - between both packages.

Nico

---
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delho...@embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---





On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com wrote:

 It might be worth taking a look at rtracklayer and the TranscriptDb stuff in 
 GenomicFeatures. It could save you time, and if you notice any deficiencies 
 in rtracklayer, it would help me. For example, if the attribute parsing is a 
 bottleneck, I can push it down to C.
 
 Michael
 
 On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de wrote:
 Hej Michael,
 
 Good question really. I have a number of reason for this:
 
 1) I’ve been using the genomeIntervals readGff3 function for that - for years 
 now - and I’ve always been satisfied by its performance, especially when 
 parsing the gff/gtf ninth column. The parseGffAttribute and getGffAttribute 
 functions are extremely convenient. I honestly haven’t checked if there was 
 any recent development in rtracklayer / GenomicFeatures similar to these 
 functions. If there were not, I think they would be a great addition to 
 either package.
 
 2) As you might guess it’s essentially historical, back when I started that 
 package in 2009, there was not today’s fantastic set of packages.
 
 3) As you painfully know, there’s about as many gff format as they are gff 
 files, and because my package is a pipeline I really want to make sure that 
 it’s output is consistent, hence I have strict requirement with regards to 
 the gff/gtf format I accept. Which means that times and again, I have to do 
 slight adjustment but I prefer that over outputting garbage.
 
 4) RNA-Seq analyses are filled with pitfalls, hence I think it is essential 
 that users understand the data formats they handle and actually what these 
 analyses are all about. I don’t want them to use my package as they would use 
 a black box.
 
 5) It’s educational. There’s a vignette that describes how to parse and 
 convert gff/gtf annotation in the minimal gff/gtf formatted file that would 
 suit my package
 
 Well, I suppose it’s more than you asked for, but here are my reasons ;-) 
 You’re welcome to comment and I’d be happy to look again at rtracklayer (been 
 through GenomicFeatures recently and I like it much) if you would advise me 
 so.
 
 Have a nice WE,
 
 Cheers,
 
 Nico
 
 
 ---
 Nicolas Delhomme
 
 Genome Biology Computational Support
 
 European Molecular Biology Laboratory
 
 Tel: +49 6221 387 8310
 Email: nicolas.delho...@embl.de
 Meyerhofstrasse 1 - Postfach 10.2209
 69102 Heidelberg, Germany
 ---
 
 
 
 
 
 On 15 Nov 2013, at 12:44, Michael Lawrence lawrence.mich...@gene.com wrote:
 
  Why not use rtracklayer / GenomicFeatures for parsing GTF? That format is 
  tough; no reason for everyone to take it on by themselves.
 
 
 
 
  On Fri, Nov 15, 2013 at 2:40 AM, Nicolas Delhomme delho...@embl.de wrote:
  Hej Natalia!
 
  There were a number of lines in that particular gtf that violated the 
  assumptions I had about EnsEMBL gtf. Not all the fields in the attributes' 
  column were always set and one of the gene name had a space character in 
  it. I’ve made the parsing of gtf file annotation more flexible/lenient and 
  that should resolve that particular issue you had. The changes should 
  propagate in ~2 days to Bioc with easyRNASeq version 1.8.2.
 
  Rather than using the geneModel, which implementation is old and has gotten 
  slow because of changes in the underlying architecture, I prefer an 
  approach where I
  1) filter the gtf / gff annotation file for only those lines I’m interested 
  in (e.g. of type exon, mRNA and gene for a gff file)
  2) collapse every exon of a gene into what I call now a “synthetic 
  transcript”. The reason for changing the naming from geneModel to synthetic 
  transcript is that “gene model” has different meaning depending on the 
 

Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq

2013-11-15 Thread Michael Lawrence
Doesn't look like genomeIntervals has any C code (?), so a performance
comparison would be interesting. rtracklayer jumps through all sorts of
hoops to handle obscure things like URL encoding in GFF3. The code in
genomeIntervals seems more streamlined.






On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote:

 Took that thread to the devel list, just feels more appropriate with
 regards to the content.

 I already have that on my TODO list :-). This is not up-to-date, i.e. I
 haven’t done the comparison in ~2 years, but last time I did,
 genomeIntervals attribute parsing was faster than rtracklayer equivalent. I
 suppose that’s because it is already implemented in C in genomeIntervals.
 As said I don’t have any actual comparative numbers, still you might want
 to have a look at the genomeIntervals code. As I don’t think that
 genomeIntervals get as much exposition as rtracklayer does, many more
 people would benefit from an equivalent rtracklayer implementation. If
 you’re interested, I could do a performance comparison - based on my usual
 use case - between both packages.

 Nico

 ---
 Nicolas Delhomme

 Genome Biology Computational Support

 European Molecular Biology Laboratory

 Tel: +49 6221 387 8310
 Email: nicolas.delho...@embl.de
 Meyerhofstrasse 1 - Postfach 10.2209
 69102 Heidelberg, Germany
 ---





 On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com
 wrote:

  It might be worth taking a look at rtracklayer and the TranscriptDb
 stuff in GenomicFeatures. It could save you time, and if you notice any
 deficiencies in rtracklayer, it would help me. For example, if the
 attribute parsing is a bottleneck, I can push it down to C.
 
  Michael
 
  On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de
 wrote:
  Hej Michael,
 
  Good question really. I have a number of reason for this:
 
  1) I’ve been using the genomeIntervals readGff3 function for that - for
 years now - and I’ve always been satisfied by its performance, especially
 when parsing the gff/gtf ninth column. The parseGffAttribute and
 getGffAttribute functions are extremely convenient. I honestly haven’t
 checked if there was any recent development in rtracklayer /
 GenomicFeatures similar to these functions. If there were not, I think they
 would be a great addition to either package.
 
  2) As you might guess it’s essentially historical, back when I started
 that package in 2009, there was not today’s fantastic set of packages.
 
  3) As you painfully know, there’s about as many gff format as they are
 gff files, and because my package is a pipeline I really want to make sure
 that it’s output is consistent, hence I have strict requirement with
 regards to the gff/gtf format I accept. Which means that times and again, I
 have to do slight adjustment but I prefer that over outputting garbage.
 
  4) RNA-Seq analyses are filled with pitfalls, hence I think it is
 essential that users understand the data formats they handle and actually
 what these analyses are all about. I don’t want them to use my package as
 they would use a black box.
 
  5) It’s educational. There’s a vignette that describes how to parse and
 convert gff/gtf annotation in the minimal gff/gtf formatted file that would
 suit my package
 
  Well, I suppose it’s more than you asked for, but here are my reasons
 ;-) You’re welcome to comment and I’d be happy to look again at rtracklayer
 (been through GenomicFeatures recently and I like it much) if you would
 advise me so.
 
  Have a nice WE,
 
  Cheers,
 
  Nico
 
 
  ---
  Nicolas Delhomme
 
  Genome Biology Computational Support
 
  European Molecular Biology Laboratory
 
  Tel: +49 6221 387 8310
  Email: nicolas.delho...@embl.de
  Meyerhofstrasse 1 - Postfach 10.2209
  69102 Heidelberg, Germany
  ---
 
 
 
 
 
  On 15 Nov 2013, at 12:44, Michael Lawrence lawrence.mich...@gene.com
 wrote:
 
   Why not use rtracklayer / GenomicFeatures for parsing GTF? That format
 is tough; no reason for everyone to take it on by themselves.
  
  
  
  
   On Fri, Nov 15, 2013 at 2:40 AM, Nicolas Delhomme delho...@embl.de
 wrote:
   Hej Natalia!
  
   There were a number of lines in that particular gtf that violated the
 assumptions I had about EnsEMBL gtf. Not all the fields in the attributes'
 column were always set and one of the gene name had a space character in
 it. I’ve made the parsing of gtf file annotation more flexible/lenient and
 that should resolve that particular issue you had. The changes should
 propagate in ~2 days to Bioc with easyRNASeq version 1.8.2.
  
   Rather than using the geneModel, which implementation is old and has
 gotten slow because of changes in the underlying architecture, I prefer an
 approach where I
   1) 

Re: [Bioc-devel] [BioC] GTF file error when using easyRNAseq

2013-11-15 Thread Martin Morgan

On 11/15/2013 10:22 AM, Michael Lawrence wrote:

Doesn't look like genomeIntervals has any C code (?), so a performance
comparison would be interesting. rtracklayer jumps through all sorts of
hoops to handle obscure things like URL encoding in GFF3. The code in
genomeIntervals seems more streamlined.


Wanted to mention, and it would be good to know if this was not helpful at all, 
that the Ensembl gtf files are available through AnnotationHub as GRanges objects


 library(AnnotationHub)
 hub = AnnotationHub()
 hub$ensembl.release.73.tab
hub$ensembl.release.73.fasta. ... [378]
hub$ensembl.release.73.gtf. ... [63]
 xx = 
hub$ensembl.release.73.gtf.gallus_gallus.Gallus_gallus.Galgal4.73.gtf_0.0.1.RData

 xx
GRanges with 381368 ranges and 12 metadata columns:
 seqnames   ranges strand   | sourcetype
RleIRanges  Rle   |   factorfactor
   [1]  1 [1735, 2449]  +   | protein_codingexon
   [2]  1 [2379, 2449]  +   | protein_coding CDS
   score phasegene_id  transcript_id
   numeric integercharactercharacter
   [1]  NA  NA ENSGALG0009771 ENSGALT0015891
   [2]  NA 0 ENSGALG0009771 ENSGALT0015891
   exon_number   gene_biotypeexon_id protein_id
 numericcharactercharactercharacter
   [1]   1 protein_coding ENSGALE0301221   NA
   [2]   1 protein_coding   NA ENSGALP0015874
gene_nametranscript_name
  charactercharacter
   [1]   NA   NA
   [2]   NA   NA
 [ reached getOption(max.print) -- omitted 9 rows ]
  ---
  seqlengths:
1  2 ... AADN03010940.1
   NA NA ... NA

Martin








On Fri, Nov 15, 2013 at 10:14 AM, Nicolas Delhomme delho...@embl.de wrote:


Took that thread to the devel list, just feels more appropriate with
regards to the content.

I already have that on my TODO list :-). This is not up-to-date, i.e. I
haven’t done the comparison in ~2 years, but last time I did,
genomeIntervals attribute parsing was faster than rtracklayer equivalent. I
suppose that’s because it is already implemented in C in genomeIntervals.
As said I don’t have any actual comparative numbers, still you might want
to have a look at the genomeIntervals code. As I don’t think that
genomeIntervals get as much exposition as rtracklayer does, many more
people would benefit from an equivalent rtracklayer implementation. If
you’re interested, I could do a performance comparison - based on my usual
use case - between both packages.

Nico

---
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delho...@embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---





On 15 Nov 2013, at 18:58, Michael Lawrence lawrence.mich...@gene.com
wrote:


It might be worth taking a look at rtracklayer and the TranscriptDb

stuff in GenomicFeatures. It could save you time, and if you notice any
deficiencies in rtracklayer, it would help me. For example, if the
attribute parsing is a bottleneck, I can push it down to C.


Michael

On Fri, Nov 15, 2013 at 8:23 AM, Nicolas Delhomme delho...@embl.de

wrote:

Hej Michael,

Good question really. I have a number of reason for this:

1) I’ve been using the genomeIntervals readGff3 function for that - for

years now - and I’ve always been satisfied by its performance, especially
when parsing the gff/gtf ninth column. The parseGffAttribute and
getGffAttribute functions are extremely convenient. I honestly haven’t
checked if there was any recent development in rtracklayer /
GenomicFeatures similar to these functions. If there were not, I think they
would be a great addition to either package.


2) As you might guess it’s essentially historical, back when I started

that package in 2009, there was not today’s fantastic set of packages.


3) As you painfully know, there’s about as many gff format as they are

gff files, and because my package is a pipeline I really want to make sure
that it’s output is consistent, hence I have strict requirement with
regards to the gff/gtf format I accept. Which means that times and again, I
have to do slight adjustment but I prefer that over outputting garbage.


4) RNA-Seq analyses are filled with pitfalls, hence I think it is

essential that users understand the data formats they handle and actually
what these analyses are all about. I don’t want them to use my package as
they would use a black box.


5) It’s educational. There’s a vignette that describes how to parse 

[Rd] Trouble running Rtools31 on Wine

2013-11-15 Thread Kirill Müller
Hi

An attempt to use R and Rtools in Wine fails, see the bug report to Wine:

http://bugs.winehq.org/show_bug.cgi?id=34865

The people there say that Rtools uses an outdated Cygwin DLL with a 
custom patch. Is there any chance we can upgrade our Cygwin DLL to a 
supported upstream version? Thanks.


Cheers

Kirill

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD SHLIB error bad value (core2) for -mtune= switch

2013-11-15 Thread pranav.waila
I am getting the following error. Please help if the problem is resolved:

$ make
  0 [main] make 10092 stdio_init: couldn't make stderr distinct from
stdout
R CMD SHLIB src/C/util.c src/C/factor_model_util.c src/C/pagerank.c
src/C/hierarchical.c src/C/factor_model_multicontext.c src/C/fact
or_model_util2.cpp -o lib/c_funcs.so
Segmentation fault



-
Pranav Waila
pranav.wa...@gmail.com
Research Scholar
DST - Centre for Interdisciplinary Mathematical Sciences
Faculty of Science
Banaras Hindu University,Varanasi-221005
--
View this message in context: 
http://r.789695.n4.nabble.com/R-CMD-SHLIB-error-bad-value-core2-for-mtune-switch-tp4645928p4680510.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD SHLIB error bad value (core2) for -mtune= switch

2013-11-15 Thread Simon Zehnder
No information has been given, where this error occurred. What were you doing?

Simon 
On 15 Nov 2013, at 10:49, pranav.waila pranav.wa...@gmail.com wrote:

 I am getting the following error. Please help if the problem is resolved:
 
 $ make
  0 [main] make 10092 stdio_init: couldn't make stderr distinct from
 stdout
 R CMD SHLIB src/C/util.c src/C/factor_model_util.c src/C/pagerank.c
 src/C/hierarchical.c src/C/factor_model_multicontext.c src/C/fact
 or_model_util2.cpp -o lib/c_funcs.so
 Segmentation fault
 
 
 
 -
 Pranav Waila
 pranav.wa...@gmail.com
 Research Scholar
 DST - Centre for Interdisciplinary Mathematical Sciences
 Faculty of Science
 Banaras Hindu University,Varanasi-221005
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-CMD-SHLIB-error-bad-value-core2-for-mtune-switch-tp4645928p4680510.html
 Sent from the R devel mailing list archive at Nabble.com.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Trouble running Rtools31 on Wine

2013-11-15 Thread Duncan Murdoch

On 13-11-13 5:38 AM, Kirill Müller wrote:

Hi

An attempt to use R and Rtools in Wine fails, see the bug report to Wine:

http://bugs.winehq.org/show_bug.cgi?id=34865

The people there say that Rtools uses an outdated Cygwin DLL with a
custom patch. Is there any chance we can upgrade our Cygwin DLL to a
supported upstream version? Thanks.


Rtools doesn't use any special Cygwin dlls.  The one in the current 
distribution is rather old, so if it doesn't work in Wine, just go to 
the Cygwin site and install a newer Cygwin version.  (Not sure if this 
is easy in Wine, but you could do it on a Windows machine, and copy the 
files over.)


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel