Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-11 Thread Rainer Johannes
update: this approach bases on Martin's suggestion and I think is ideal for EnsDb or any Ensembl based annotation packages. It's not required to install an explicit genome package, but load (and eventually cache) dynamically the correct genomic sequence from Ensembl via AnnotationHub. That way c

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-10 Thread Martin Morgan
On 06/10/2015 01:11 AM, Rainer Johannes wrote: Dear Martin, the AnnotationHub approach looks awesome! However, somehow it does not work for me, I always get an error: library(AnnotationHub) library(Rsamtools) library(GenomicFeatures) ah <- AnnotationHub() ## somehow I don't see DNA sequences f

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-10 Thread Rainer Johannes
update: seems something is strange with the index... when I load the ensembl-77 DNA for mouse I get two files, the fasta file and the index and I can extract the sequences, while in the example below I just got the fasta file On 10 Jun 2015, at 10:11, Rainer Johannes mailto:johannes.rai...@eur

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-10 Thread Rainer Johannes
dear Ludwig, On 10 Jun 2015, at 10:29, Ludwig Geistlinger mailto:ludwig.geistlin...@bio.ifi.lmu.de>> wrote: Dear Johannes, one follow-up question/comment on the EnsDb packages: The reason they escaped my notice (and thus potentially will also others) is that I expected such packages to be nam

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-10 Thread Ludwig Geistlinger
Dear Johannes, one follow-up question/comment on the EnsDb packages: The reason they escaped my notice (and thus potentially will also others) is that I expected such packages to be named "^TxDb...". What actually argues against sticking to existing Bioc vocabulary and naming eg EnsDb.Hsapiens.v

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-10 Thread Rainer Johannes
Dear Martin, the AnnotationHub approach looks awesome! However, somehow it does not work for me, I always get an error: library(AnnotationHub) library(Rsamtools) library(GenomicFeatures) ah <- AnnotationHub() ## somehow I don't see DNA sequences for release-80... thus using 75 query(ah, c("Taki

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-09 Thread Martin Morgan
On 06/08/2015 11:43 PM, Rainer Johannes wrote: dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute "entrezid" that is stored in the internal

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-09 Thread Rainer Johannes
dear Ludwig, On 09 Jun 2015, at 10:46, Ludwig Geistlinger mailto:ludwig.geistlin...@bio.ifi.lmu.de>> wrote: Dear Johannes, Thx for providing the great EnsDb packages! One question: As of now, I am able to choose between TxDb and EnsDb for genomic coordinates of genomic features such as genes

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-09 Thread Ludwig Geistlinger
Dear Johannes, Thx for providing the great EnsDb packages! One question: As of now, I am able to choose between TxDb and EnsDb for genomic coordinates of genomic features such as genes, transcripts, and exons. For the sequences themselves I need the corresponding BSgenome package. While it is e

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-08 Thread Rainer Johannes
dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute "entrezid" that is stored in the internal database there is however no link to NCBI or

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-03 Thread Robert M. Flight
Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&S

[Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-03 Thread Ludwig Geistlinger
Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via > genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672")) gives me: GRanges object with 1 range and 1 metadata column: seqnames