Hi,

I have communicated with others in the group and there are no plan to add UTR 
annotations to the SGD reference GFF file or database in the near future. To 
generate integrated sets for many of the types of data such as UTRs, 
transcripts, uORFs etc. requires significant curation effort.  Much of our 
effort in this regard is currently focused on getting all of this data intro 
GBrowse for our users. In many cases there are several studies providing 
similar sets of data and deciding which set or which specific annotations are 
the best will require significant effort and expertise in this rapidly moving 
field. 

In the interim, we provided Claudio with some instructions on how to get the 
sequences using UCSC, so I am not sure if this did not work, if he is more 
familiar with BioMart or was looking for a curated set.  We are currently 
looking into ways to allow SGD users to retrieve the sequences of these 
annotated features with Gbrowse, or YeastMine. 

The BED files are available via our downloads site 
(http://www.yeastgenome.org/download-data/published-datasets-directory) should 
you wish to pursue this avenue.

All the best,
Rob




On Feb 8, 2012, at 7:13 AM, Arnaud Kerhornou wrote:

> Dear Rob,
> 
> Is there any plan at SGD to integrate these UTR annotations into the 
> reference gene models set ?
> Then it'll be straight forward for us to get them into Ensembl and Biomart 
> from the SGD gff3 files, as they will be part of the gene models.
> 
> The other option for us is to add the BED files to the Ensembl browser.
> That shouldn't be an issue, and they get displayed on the location view of 
> the chromosomes, but I don't know how users will be able to extract UTR 
> sequences from these Ensembl tracks.
> 
> Arnaud
> 
> On 08/02/2012 05:23, Claudio Joazeiro wrote:
>> Dear Arek and Arnaud,
>> 
>> I am cc'ing Rob Nash from SGD who has provided the information below 
>> regarding how to retrieve information on yeast 3' UTRs. Is this information 
>> sufficient or would you still need Rob and his colleagues to make their data 
>> available through the BioMart interface? If the latter is the case, would 
>> you mind initiating a discussion with SGD?
>> 
>> Rob's comment:
>> 
>>>>> "http://www.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000129301
>> Yassour M, Kaplan T, Fraser HB, Levin JZ, Pfiffner J, Adiconis X, Schroth G, 
>> Luo S, Khrebtukova I, Gnirke A, Nusbaum C, Thompson DA, Friedman N, Regev A  
>> (2009) Ab initio construction of a eukaryotic transcriptome by massively 
>> parallel mRNA sequencing.
>> Proc Natl Acad Sci U S A 106(9):3264-9
>> PMID: 19208812
>> 
>> This data was generated using RNA-seq and an algorithm to construct a 
>> transcript catalog.
>> 
>> To look at the data go to GBrowse:
>> 
>> http://browse.yeastgenome.org/fgb2/gbrowse/scgenome/
>> 
>> and go to the 'Select Tracks' tab and under 'Gene Structure' click the 'All 
>> on' box next to 'UTRs', then click on the 'Back to Browser' button at the 
>> bottom to browse the data.
>> 
>> Normally, you would be able to download the sequence from within GBrowse by 
>> selecting the tracks, clicking the floppy disk icon and then selecting FASTA 
>> output, but currently that feature isn't working. In the interim there is a 
>> possible solution to this problem that a bioinformatics analyst within our 
>> group proposed you try. Here are the steps she suggested you follow to get 
>> the sequence:
>> 
>> 1. Download the appropriate tracks from our download site in BED format 
>> (http://www.yeastgenome.org/download-data/published-datasets-directory)
>> 2. Grep or otherwise filter for those entries corresponding to the 3'UTRs.
>> 3. Go to UCSC and click on the table browser option 
>> (http://genome.ucsc.edu/cgi-bin/hgTables?org=human)
>> 4. Select clade "other", genome "S. cerevisiae" and assembly "April 2011"
>> 5. Upload the BED files individually by clicking "Add custom tracks"
>> 6. Select the output format to sequence and click on get output.
>> 
>> I hope this allows you to get the UTR sequences of interest."<<<
>> 
>> Thanks,
>> Claudio
>> 
>> ________________________________________
>> From: Arek Kasprzyk [[email protected]]
>> Sent: Tuesday, February 07, 2012 6:48 AM
>> To: Claudio Joazeiro
>> Cc: Arnaud Kerhornou; [email protected]; Paul Kersey
>> Subject: Re: [BioMart Users] Yeast 3'UTRs
>> 
>> Dear Claudio,
>> 
>> We at BioMart do not host any data ourselves. We rely on the instances of 
>> BioMart set up by third parties. It would be probably best to ask SGD folks 
>> if they plan to make their data available through the BioMart interface in 
>> the future so those annotations could become available to the BioMart 
>> community. Failing that, perhaps Enesmbl genomes is planning to have those 
>> annotations? (I am cc'ing Paul Kersey who may want to comment on that)
>> 
>> a
>> 
>> On Thu, Feb 2, 2012 at 10:02 AM, Claudio 
>> Joazeiro<[email protected]<mailto:[email protected]>>  wrote:
>> 
>> Dear Arnaud,
>> 
>> Thank you for the prompt response. Is there interest in BioMart's part to 
>> have yeast UTR information to provide through its portal? If so, I am 
>> certain SGD can help since that annotation is available. I can help mediate 
>> an introduction if you would like.
>> 
>> Regarding the length of flanking sequences, I realize that I can select any 
>> number. The problem is that yeast 3' UTRs have variable lengths so the 
>> output for any given specified number would not be accurate for every gene.
>> 
>> Regards,
>> Claudio
>> ________________________________________
>> From: Arnaud Kerhornou [[email protected]<mailto:[email protected]>]
>> Sent: Thursday, February 02, 2012 3:03 AM
>> To: Claudio Joazeiro
>> Cc: [email protected]<mailto:[email protected]>
>> Subject: Re: [BioMart Users] Yeast 3'UTRs
>> 
>> Hi Claudio,
>> 
>> There is no UTR information held in Ensembl for Scerevisiae
>> Our data come from SGD GFF3 flat files, and I don't think they contain
>> UTR information.
>> 
>> Re. the length of the flanking sequences, you can specify any length you
>> wish in the filter page.
>> 
>> Regards,
>> Arnaud
>> 
>> On 02/02/2012 05:32, Claudio Joazeiro wrote:
>>> To whom it may concern:
>>> 
>>> We are having a problem with the 3' UTR setting of the BioMart Central 
>>> Portal interface. It feels like a bug in the underlying database, so we are 
>>> hoping you would be able to diagnose/fix it.
>>> 
>>> We need to retrieve the 3' UTRs of yeast genes. I have tried to do this in 
>>> a couple of ways:
>>> 
>>> TRIAL 1:
>>> 
>>> DATABASE: Ensembl
>>> Datasets: S cerevisiae genes
>>> Sequences: 3' UTR
>>> Upstream Flank: (blank)
>>> Downstream Flank (blank)
>>> Filters: (default)
>>> Header: Ensembl Gene ID and Associated Gene Name
>>> 
>>> The result I got was: “Sequence unavailable” for all genes
>>> 
>>> Then I attempted TRIAL 2:
>>> 
>>> DATABASE: Ensembl
>>> Datasets: S cerevisiae genes
>>> Sequences: Flank-coding region (not ideal, though, as this is expected to 
>>> yield longer sequences than those we're looking for)
>>> Upstream Flank: (blank)
>>> Downstream Flank: 100 (this is also a problem because although the median 
>>> length of yeast 3' UTRs is 104 bp, they can be as long as ~1,000 bp, so we 
>>> would be missing sequences)
>>> Filters: (default)
>>> Header: Ensembl Gene ID and Associated Gene Name
>>> 
>>> This works, but with the above caveats.
>>> 
>>> Thanks in advance for your help.
>>> 
>>> Sincerely,
>>> 
>>> Claudio Joazeiro, Ph.D.
>>> Assistant Professor
>>> Department of Cell Biology
>>> The Scripps Research Institute
>>> CB-163
>>> 10550 N Torrey Pines Rd
>>> La Jolla, CA  92037
>>> 
>>> Phone: (858) 784-7570<tel:%28858%29%20784-7570>
>>> Fax: (858) 784-9779<tel:%28858%29%20784-9779>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list
>>> [email protected]<mailto:[email protected]>
>>> https://lists.biomart.org/mailman/listinfo/users
>>> 
>> _______________________________________________
>> Users mailing list
>> [email protected]<mailto:[email protected]>
>> https://lists.biomart.org/mailman/listinfo/users
>> 
>> 
>> 
>> --
>> 
>> Arek Kasprzyk, MD, MSc, PhD
>> BioMart Project Lead
>> 
> 

Rob Nash, Ph.D.
Senior Scientific Curator
Saccharomyces Genome Database
Department of Genetics
1501 California Ave, Rm 2C412A
Palo Alto, CA  94304-5577 USA
phone: (650) 723-6425  fax: (650) 725-1534
email: [email protected]
http://www.yeastgenome.org/



_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to