Junjun,
We have modified our configuration to more closely match the
Ensembl gene dataset config, but this hasn't changed our original
problem (splitting of sequence into multiple entries in results
files). We did manage to (finally) completely disable batch queuing,
and that does make the problem go away.
Our question is, was there a known bug with batch queueing in rc6?
We are still using rc6 because we had to make some modifications to
allow multiple genomes to be accessible in a single mart.
thanks,
-David Goodstein
JGI
On Jul 14, 2011, at 9:10 AM, Junjun Zhang wrote:
For 'coding' sequence type, exportable should have oderBy set to
'transcript_id,exon_rank'. Similar as Ensembl gene dataset shows.
From: jzhang <[email protected]>
Date: Thu, 14 Jul 2011 12:05:04 -0400
To: "[email protected]" <[email protected]>
Subject: [BioMart Users] FW: trouble with sequence download
Forget to send to the list.
From: jzhang <[email protected]>
Date: Thu, 14 Jul 2011 12:01:37 -0400
To: Joni Fazo <[email protected]>
Subject: Re: [BioMart Users] trouble with sequence download
Hi Joni,
After looking at the configuration file for phytozome dataset at:
http://www.phytozome.net/biomart/martservice?type=configuration&dataset=phytozome
It seems to me there might be some problem with the 'Exportable'
setting in phytozome mart. orderBy="exon_rank" may not be correct,
it should be ordered by transcript ID. You can look at how this is
set up in Ensembl gene mart at: http://www.biomart.org/biomart/martservice?type=configuration&dataset=hsapiens_gene_ensembl
(in the page search for: internalName="cdna" linkName="cdna")
You can also connect to ensembl mart db using MartEditor to exam
the settings more closely:
Hope this helps!
Junjun
From: Joni Fazo <[email protected]>
Date: Wed, 13 Jul 2011 13:15:28 -0400
To: jzhang <[email protected]>
Subject: Re: [BioMart Users] trouble with sequence download
Hi Junjun,
Please go to: http://www.phytozome.net/biomart/martview
To download all the CDS for one genome please follow these steps:
1) Select the dataset "Phytozome 7.0 Genomes"
2) For Filters select the Organism "Arabidopsis thaliana"
3) For Attributes select "Sequences"
4) Select the radio button "Coding Sequences"
5) As well as the default check boxes, also select "Exon CDS
Start" and "Exon CDS End"
6) Click the Results button
7) Then select "Export all results to compressed web file"
The resulting file will have the CDS sequence split for the
following 5 transcripts (so 10 entries total):
AT1G31930.2, AT3G02530.1, AT3G54470.1, AT4G24270.2, AT5G61910.4
All other transcripts in the file will just have one entry.
If you follow the above steps but add the additional filter of
the transcript names, the resulting file will have the CDS for
the transcripts in just 5 entries. Which I believe is correct.
Thanks in advance for your help,
Joni
On Wed, Jul 13, 2011 at 7:13 AM, Junjun Zhang <[email protected]
> wrote:
Hi Joni,
Is it possible to provide us the URL where we can reproduce and
test the problem below?
Thanks,
Junjun
From: Joni Fazo <[email protected]>
Date: Tue, 12 Jul 2011 16:23:50 -0400
To: "[email protected]" <[email protected]>
Subject: [BioMart Users] trouble with sequence download
Hello,
My name is Joni Fazo and I am trying to trouble shoot an error
on our Biomart configuration for http://phytozome.net.
Our issue is that the sequence data for some transcripts are
split into multiple entries when all CDS for a given genome is
requested.
If the user requests the CDS for the individual transcripts,
the sequence is presented by one entry. Listed below is an
example of the FASTA headers for one such transcript
(AT1G31930.2):
Split entries (generated when all CDS for the genome is
requested from Biomart):
>11466849;11465832|11466986;11466755|AT1G31930|AT1G31930.2
>11468117;11467266;11467524;11468367;11467780;11467074|
11468289;11467445;11467698;11468961;11468036;11467178|AT1G31930|
AT1G31930.2
Single entry (generated when just CDS for AT1G31930.2 is
requested from Biomart):
>
11468117;11467266;11467524;11468367;11467780;11466849;11467074;11465832
|
11468289;11467445;11467698;11468961;11468036;11466986;11467178;11466755
|AT1G31930|AT1G31930.2
Has anyone encountered this issue or something similar?
Best regards,
Joni Fazo
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users
David M. Goodstein, Ph.D.
Phytozome Group Lead
Plant and Computational Genomics Group
Joint Genome Institute - U.S. Dept. of Energy
Center for Integrative Genomics - UC Berkeley
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users