I second Syed's opinion. The virtual issue can be found here:

http://www.oxfordjournals.org/our_journals/databa/biomart_virtual_issue.html


There is also a constantly updated community page if this is of any help

http://www.biomart.org/community.html


a


On Mon, Apr 16, 2012 at 7:00 PM, Syed Haider <[email protected]> wrote:

> Hi Andreas,
>
> many thanks for your thorough investigation. Just to point out that
> biomart does not do the caching, however, the database backend caching do
> kick in for identical queries and hence, the difference in response time by
> repeating the same query. Therefore, we can infer that its not the internet
> slowness, its just the source locations (machines) where different
> databases are hosted, is taking the time. So you would benefit by moving
> these locally, but its going to be on the DB end - not internet. Final
> decision is yours as you might want to find the appropriate balance between
> maintenance and benefits :) For source data, Ensembl databases are fairly
> straightforward to dump from their ftp. For others, please contact the
> relevant guys from this table of contents:
>
> http://database.**oxfordjournals.org/content/**2011.toc<http://database.oxfordjournals.org/content/2011.toc>
>
> Best,
> Syed
>
>
>
>
> On 15/04/2012 17:59, andreas H wrote:
>
>> Hi Syed,
>>
>> using webservices API I get more or less the same behaviour:
>>
>> $bash-prompt > time wget
>> 'http://www.biomart.org/**biomart/martservice?query=<http://www.biomart.org/biomart/martservice?query=>
>> <?**xml version="1.0"
>> encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default"
>> formatter = "TSV" header = "0" uniqueRows = "0" count = ""
>> datasetConfigVersion = "0.8"><Dataset name="hsapiens_gene_ensembl"
>> config="gene_ensembl_ap"><**Filter name="hgnc_symbol"
>> value="RAN"/><Attribute name="go_id"/></Dataset></**Query>' -O result.txt
>>
>> --17:37:45--
>> http://www.biomart.org/**biomart/martservice?query=%3C?**
>> xml%20version=%221.0%22%**20encoding=%22UTF-8%22?%3E%3C!**
>> DOCTYPE%20Query%3E%3CQuery%**20virtualSchemaName%20=%20%**
>> 22default%22%20formatter%20=%**20%22TSV%22%20header%20=%20%**
>> 220%22%20uniqueRows%20=%20%**220%22%20count%20=%20%22%22%**
>> 20datasetConfigVersion%20=%20%**220.8%22%3E%3CDataset%20name=%**
>> 22hsapiens_gene_ensembl%22%**20config=%22gene_ensembl_ap%**
>> 22%3E%3CFilter%20name=%22hgnc_**symbol%22%20value=%22RAN%22/%**
>> 3E%3CAttribute%20name=%22go_**id%22/%3E%3C/Dataset%3E%3C/**Query%3E<http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20virtualSchemaName%20=%20%22default%22%20formatter%20=%20%22TSV%22%20header%20=%20%220%22%20uniqueRows%20=%20%220%22%20count%20=%20%22%22%20datasetConfigVersion%20=%20%220.8%22%3E%3CDataset%20name=%22hsapiens_gene_ensembl%22%20config=%22gene_ensembl_ap%22%3E%3CFilter%20name=%22hgnc_symbol%22%20value=%22RAN%22/%3E%3CAttribute%20name=%22go_id%22/%3E%3C/Dataset%3E%3C/Query%3E>
>>
>> => `result.txt'
>> Resolving www.biomart.org... 206.108.121.49
>> Connecting to www.biomart.org|206.108.121.**49|:80... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: unspecified [text/plain]
>>
>> [ <=> ] 1,815 --.--K/s
>>
>> 17:38:03 (48.08 MB/s) - `result.txt' saved [1815]
>>
>>
>> real 0m17.942s
>> user 0m0.000s
>> sys 0m0.030s
>>
>> By watching wget's output, the longest time is spend here:
>> Resolving www.biomart.org... 206.108.121.49
>> Connecting to www.biomart.org|206.108.121.**49|:80... connected.
>> HTTP request sent, awaiting response...
>>
>>
>> Doing it a second time, results in only a few seconds wait time (half a
>> second to 2 seconds).
>>
>> I have tried this from two machines belonging to two different
>> organisations in London. The speed of the connections I assume is quite
>> good.
>>
>> These may be useful:
>> 1) Running say RAS query from organisation A takes 17 seconds.
>> After a few minutes, running the same query from organisation B, takes
>> less than 2 seconds.
>> Running new, say LOX, query from B takes 17 seconds. After a few minutes
>> doing the same from A takes 2 seconds. So I guess some cache at biomart
>> short-circuits a lot of query/processing time *at biomart*. Which I
>> think means that network speed is OK but processing/querying is not
>> (though even half a second could be considered too much by some
>> standards).
>>
>> 2) Replacing 'go_id' as attribute name by 'ensembl_gene_id' results in
>> half a second queries from first time. replacing by 'ensembl_peptide_id'
>> takes 3 seconds the first time, half a second afterwards.
>>
>>
>> If you can point me to the URL of an online biomart query form, I will
>> try that too.
>>
>>
>> Thanks,
>>
>> Andreas
>>
>> On 13/04/2012 23:01, Syed Haider wrote:
>>
>>> Hi Andreas,
>>>
>>> I wonder if you see the same response time when you connect to the
>>> website directly or using webservice API. Just trying to isolate the
>>> R-specific and biomart-specific response time. I am assuming that your
>>> institutional internet speed is good enough. I suggest the above because
>>> if the response time could be improved, it can save you a lot of time
>>> and hassle of managing these database locally.
>>>
>>> Best,
>>> Syed
>>>
>>> On 12/04/2012 15:27, andreas H wrote:
>>>
>>>> Hi Syed,
>>>> thanks for your reply,
>>>>
>>>> speed, bandwidth are the reasons.
>>>>
>>>> e.g. **unless I am doing something wrong**
>>>>
>>>> right now using R and biomaRt (with factory settings) :
>>>>
>>>> require(biomaRt)
>>>> a_mart=useMart(biomart="**ensembl",dataset='hsapiens_**gene_ensembl')
>>>> system.time({getBM(attributes=**c('go_id'),filters='hgnc_**
>>>> symbol',values=c("KRAS"),
>>>>
>>>> mart=a_mart) })
>>>> # user system elapsed
>>>> # 0.007 0.001 23.218
>>>>
>>>> which is too slow. Admittedly, the second time this runs *within the
>>>> same R session*, it drops down to 3 seconds, but this is not a likely
>>>> scenario.
>>>>
>>>> Thanks,
>>>> Andreas
>>>>
>>>> On 12/04/2012 14:59, Syed Haider wrote:
>>>>
>>>>> Hi Andreas,
>>>>>
>>>>> Before getting into the details of database updates, i wonder why would
>>>>> you prefer to have databases downloaded locally as opposed to using it
>>>>> whereever its hosted. BioMart is a system that enables data integration
>>>>> by means of federation. Hence, it kind of defeats the purpose if one
>>>>> downloads everything locally. Moreover, if you have data locally, you
>>>>> enter into a rather bigger probem of updates for each of the sources
>>>>> you
>>>>> downloaded as all of them have their own major/minor release cycles
>>>>> etc.
>>>>> I might be missing something important w.r.t your requirements, hence i
>>>>> ask :)
>>>>>
>>>>> Best,
>>>>> Syed
>>>>>
>>>>> On 12/04/2012 14:10, andreas H wrote:
>>>>>
>>>>>> Hello there,
>>>>>>
>>>>>> After days of searching around I can't find a simple step-by-step
>>>>>> guide
>>>>>> on how to install biomart locally including databases. Could you point
>>>>>> me to such a document if what I want to do is possible at all?
>>>>>>
>>>>>> I have also read the biomart 0.8 install manual, I have managed to set
>>>>>> up a webserver, I can make it read data from local source but I can't
>>>>>> find how to automatically download and install these sources locally.
>>>>>>
>>>>>> If you can contribute more, then preferably:
>>>>>> 1) I would like to know how to select a database and transfer it
>>>>>> locally
>>>>>> to my own, local mysql server, without having to go to each vendor's
>>>>>> website download their files and install it to my local db. I mean is
>>>>>> there a click-and-install feature in *mart-configurator* or
>>>>>> something? I
>>>>>> visualise this as I give my local mysql password and the databses are
>>>>>> install automatically there.
>>>>>>
>>>>>> 2) I would like to install only a small subset of databases locally -
>>>>>> the ones that i use a lot.
>>>>>>
>>>>>> 3) Ideally, I would like with a few clicks to update the biomart
>>>>>> databases which are installed locally, every few weeks/months.
>>>>>>
>>>>>> My aim right now is to convert between ensembl, uniprot, hugo protein
>>>>>> IDs and get GO terms for each protein.
>>>>>>
>>>>>> Thanking you in advance,
>>>>>> andreas
>>>>>>
>>>>>>
>>>>
>>>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
>> Company Limited by Guarantee, Registered in England under Company No.
>> 534147 with its Registered Office at 123 Old Brompton Road, London SW7
>> 3RP.
>>
>> This e-mail message is confidential and for use by the addressee only.
>> If the message is received by anyone other than the addressee, please
>> return the message to the sender by replying to it and then delete the
>> message from your computer and network.
>>
> ______________________________**_________________
> Users mailing list
> [email protected]
> https://lists.biomart.org/**mailman/listinfo/users<https://lists.biomart.org/mailman/listinfo/users>
>



-- 

Arek Kasprzyk, MD, MSc, PhD
BioMart Project Lead
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to