I'm having some issues receiving 5UTR and cDNA sequences using Biomart's Perl
API's (version 0.7). Symptoms include
- I get 5' UTR's returned for some sequences that don't even have UTRs
(i.e. ENST00000366808)
- Some 5' UTR's returned include the 5'UTR and unfortunately don't stop after
that. After the 5UTR sequence it gives sequences that exist further down in
the cDNA (ENST00000447632)
- A small set of Transcripts for 5utr & cDNA show up more than once (with
different sequences oddly). Unique filtering is turned on.
(i.e. ENST00000445545, ENST00000362061, ENST00000447632 ...)
Some 5' UTR's returned are indeed correct. However, many are wrong.
Interestingly, if I download the sequences from the web-browsers for example by
going to ensembl.org, I don't get the problematic UTR's/cDNA. That leads me to
believe there may be something wrong with the biomart Perl API's, how my system
is setup, or the way I'm calling the API's.
I have tried the following
- Installed 0.7 Biomart on both MacOS and RHEL 6.1 (same problems seem on both
OS's)
- Tried both the Biomart & Ensembl URL repositories (same problems seem on both)
- Tried ENSEMBL63 and ENSEMBL65 on both of the above (same problems)
- Tried to export to HTML (instead of FASTA) and have the same issues.
Would appreciate any possible help. Here is a code-snippet of the repositories
and perl code incase it helps.
Ensembl65 rep:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MartRegistry>
<MartRegistry>
<MartURLLocation database="ensembl_mart_65" default="1" displayName="Ensembl
Genes 65" host="www.ensembl.org" includeDatasets="" martUser=""
name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80"
serverVirtualSchema="default" visible="1" />
<MartURLLocation database="sequence_mart_65" default=""
displayName="Sequence" host="www.ensembl.org" includeDatasets="" martUser=""
name="ENSEMBL_MART_SEQUENCE" path="/biomart/martservice" port="80"
serverVirtualSchema="default" visible="" />
</MartRegistry>
Perl code:
use strict;
use BioMart::Initializer;use BioMart::Query;
use BioMart::QueryRunner;
my $confFile = "theaboverepfile.xml";
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,
'action'=>$action);
my $registry = $initializer->getRegistry;
my $query =
BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
$query->setDataset("hsapiens_gene_ensembl");
$query->addFilter("with_protein_id", ["Only"]);
$query->addFilter("chromosome_name", ["1"]);
$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("chromosome_name");
$query->addAttribute("strand");
$query->addAttribute("external_gene_id");
$query->addAttribute("5utr");
$query->formatter("FASTA");
my $query_runner = BioMart::QueryRunner->new();
$query_runner->uniqueRowsOnly(1);
$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
Thanks in advance_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users