Hello,
I'm trying the new interface for biomart and I have a couple of comments.
I'm trying to download protein sequences for genes in homo sapiens for
GRCh37.p3
In the results I find something like this for multiple genes:
ENSG00000000003|ENSP00000362111
MASPSRRLQTKPVITCFKSVLLIYTFIFWITGVILLAVGIWGKVSLENYFSLLNEKATNV
PFVLIATGTVIILLGTFGCFATCRASAWMLKLYAMFLTLVFLVELVAAIVGFVFRHEIKN
SFKNNYEKALKQYNSTGDYRSHAVDKIQNTLHCCGVTDYRDWTDTNYYSEKGFPKSCCKL
EDCTPQRDADKVNNEGCFIKVMTIIESEMGVVAGISFGVACFQLIGIFLAYCLSRAITNN
QYEIV*
ENSG00000000003|ENSP00000409517
MASPSRRLQTKPVITCFKSVLLIYTFIFWITGVILLAVGIWGKVSLENYFSLLNEKATFG
CFATCRASAWMLKLYAMFLTLVFLVELVAAIVGFVFRHEIKNSFKNNYEKALKQYNSTGD
YRSHAVDKIQNTLHCCGVTDYRDWTDTNYYSEKGFPKSCCKLEDCTPQRDADKVNNEGCF
IKVMTIIESEMGVVAGISFGVACFQLIGIFLAYCLSRAITNNQYEIV*
ENSG00000000003|
Sequence unavailable
ENSG00000000003|
Sequence unavailable
It makes me think there is a problem when joining the tables and so some
empty protein ids are resulting in extra rows for the result.
Also I would like to know if the is the need of using the unique option
present in the previous version to show only one result per gene
id|protein id
Also these queries tend to be quiet long. As the application works now I
can download the file as txt, but is extremely slow and I guess for some
files it may result in incomplete files without way of knowing if the
file is actually complete or not. I mean a time out may result in an
incomplete file which looks good and there will be no way to probe the
contrary just from the data. I think the possibility of getting the file
as a gz compressed file is highly desirable either directly from the
service or by a link in email.
Best regards,
J
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users