Dear Arek and Arnaud, I am cc'ing Rob Nash from SGD who has provided the information below regarding how to retrieve information on yeast 3' UTRs. Is this information sufficient or would you still need Rob and his colleagues to make their data available through the BioMart interface? If the latter is the case, would you mind initiating a discussion with SGD?
Rob's comment: >>>"http://www.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000129301 Yassour M, Kaplan T, Fraser HB, Levin JZ, Pfiffner J, Adiconis X, Schroth G, Luo S, Khrebtukova I, Gnirke A, Nusbaum C, Thompson DA, Friedman N, Regev A (2009) Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc Natl Acad Sci U S A 106(9):3264-9 PMID: 19208812 This data was generated using RNA-seq and an algorithm to construct a transcript catalog. To look at the data go to GBrowse: http://browse.yeastgenome.org/fgb2/gbrowse/scgenome/ and go to the 'Select Tracks' tab and under 'Gene Structure' click the 'All on' box next to 'UTRs', then click on the 'Back to Browser' button at the bottom to browse the data. Normally, you would be able to download the sequence from within GBrowse by selecting the tracks, clicking the floppy disk icon and then selecting FASTA output, but currently that feature isn't working. In the interim there is a possible solution to this problem that a bioinformatics analyst within our group proposed you try. Here are the steps she suggested you follow to get the sequence: 1. Download the appropriate tracks from our download site in BED format (http://www.yeastgenome.org/download-data/published-datasets-directory) 2. Grep or otherwise filter for those entries corresponding to the 3'UTRs. 3. Go to UCSC and click on the table browser option (http://genome.ucsc.edu/cgi-bin/hgTables?org=human) 4. Select clade "other", genome "S. cerevisiae" and assembly "April 2011" 5. Upload the BED files individually by clicking "Add custom tracks" 6. Select the output format to sequence and click on get output. I hope this allows you to get the UTR sequences of interest."<<< Thanks, Claudio ________________________________________ From: Arek Kasprzyk [[email protected]] Sent: Tuesday, February 07, 2012 6:48 AM To: Claudio Joazeiro Cc: Arnaud Kerhornou; [email protected]; Paul Kersey Subject: Re: [BioMart Users] Yeast 3'UTRs Dear Claudio, We at BioMart do not host any data ourselves. We rely on the instances of BioMart set up by third parties. It would be probably best to ask SGD folks if they plan to make their data available through the BioMart interface in the future so those annotations could become available to the BioMart community. Failing that, perhaps Enesmbl genomes is planning to have those annotations? (I am cc'ing Paul Kersey who may want to comment on that) a On Thu, Feb 2, 2012 at 10:02 AM, Claudio Joazeiro <[email protected]<mailto:[email protected]>> wrote: Dear Arnaud, Thank you for the prompt response. Is there interest in BioMart's part to have yeast UTR information to provide through its portal? If so, I am certain SGD can help since that annotation is available. I can help mediate an introduction if you would like. Regarding the length of flanking sequences, I realize that I can select any number. The problem is that yeast 3' UTRs have variable lengths so the output for any given specified number would not be accurate for every gene. Regards, Claudio ________________________________________ From: Arnaud Kerhornou [[email protected]<mailto:[email protected]>] Sent: Thursday, February 02, 2012 3:03 AM To: Claudio Joazeiro Cc: [email protected]<mailto:[email protected]> Subject: Re: [BioMart Users] Yeast 3'UTRs Hi Claudio, There is no UTR information held in Ensembl for Scerevisiae Our data come from SGD GFF3 flat files, and I don't think they contain UTR information. Re. the length of the flanking sequences, you can specify any length you wish in the filter page. Regards, Arnaud On 02/02/2012 05:32, Claudio Joazeiro wrote: > To whom it may concern: > > We are having a problem with the 3' UTR setting of the BioMart Central Portal > interface. It feels like a bug in the underlying database, so we are hoping > you would be able to diagnose/fix it. > > We need to retrieve the 3' UTRs of yeast genes. I have tried to do this in a > couple of ways: > > TRIAL 1: > > DATABASE: Ensembl > Datasets: S cerevisiae genes > Sequences: 3' UTR > Upstream Flank: (blank) > Downstream Flank (blank) > Filters: (default) > Header: Ensembl Gene ID and Associated Gene Name > > The result I got was: “Sequence unavailable” for all genes > > Then I attempted TRIAL 2: > > DATABASE: Ensembl > Datasets: S cerevisiae genes > Sequences: Flank-coding region (not ideal, though, as this is expected to > yield longer sequences than those we're looking for) > Upstream Flank: (blank) > Downstream Flank: 100 (this is also a problem because although the median > length of yeast 3' UTRs is 104 bp, they can be as long as ~1,000 bp, so we > would be missing sequences) > Filters: (default) > Header: Ensembl Gene ID and Associated Gene Name > > This works, but with the above caveats. > > Thanks in advance for your help. > > Sincerely, > > Claudio Joazeiro, Ph.D. > Assistant Professor > Department of Cell Biology > The Scripps Research Institute > CB-163 > 10550 N Torrey Pines Rd > La Jolla, CA 92037 > > Phone: (858) 784-7570<tel:%28858%29%20784-7570> > Fax: (858) 784-9779<tel:%28858%29%20784-9779> > > > > > > _______________________________________________ > Users mailing list > [email protected]<mailto:[email protected]> > https://lists.biomart.org/mailman/listinfo/users > _______________________________________________ Users mailing list [email protected]<mailto:[email protected]> https://lists.biomart.org/mailman/listinfo/users -- Arek Kasprzyk, MD, MSc, PhD BioMart Project Lead _______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
