Hello,

I'm trying the new interface for biomart and I have a couple of comments.

I'm trying to download protein sequences for genes in homo sapiens for GRCh37.p3

In the results I find something like this for multiple genes:

ENSG00000000003|ENSP00000362111
MASPSRRLQTKPVITCFKSVLLIYTFIFWITGVILLAVGIWGKVSLENYFSLLNEKATNV
PFVLIATGTVIILLGTFGCFATCRASAWMLKLYAMFLTLVFLVELVAAIVGFVFRHEIKN
SFKNNYEKALKQYNSTGDYRSHAVDKIQNTLHCCGVTDYRDWTDTNYYSEKGFPKSCCKL
EDCTPQRDADKVNNEGCFIKVMTIIESEMGVVAGISFGVACFQLIGIFLAYCLSRAITNN
QYEIV*

ENSG00000000003|ENSP00000409517
MASPSRRLQTKPVITCFKSVLLIYTFIFWITGVILLAVGIWGKVSLENYFSLLNEKATFG
CFATCRASAWMLKLYAMFLTLVFLVELVAAIVGFVFRHEIKNSFKNNYEKALKQYNSTGD
YRSHAVDKIQNTLHCCGVTDYRDWTDTNYYSEKGFPKSCCKLEDCTPQRDADKVNNEGCF
IKVMTIIESEMGVVAGISFGVACFQLIGIFLAYCLSRAITNNQYEIV*

ENSG00000000003|
Sequence unavailable

ENSG00000000003|
Sequence unavailable

It makes me think there is a problem when joining the tables and so some empty protein ids are resulting in extra rows for the result.

Also I would like to know if the is the need of using the unique option present in the previous version to show only one result per gene id|protein id

Also these queries tend to be quiet long. As the application works now I can download the file as txt, but is extremely slow and I guess for some files it may result in incomplete files without way of knowing if the file is actually complete or not. I mean a time out may result in an incomplete file which looks good and there will be no way to probe the contrary just from the data. I think the possibility of getting the file as a gz compressed file is highly desirable either directly from the service or by a link in email.

Best regards,

J
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to