I second Syed's opinion. The virtual issue can be found here: http://www.oxfordjournals.org/our_journals/databa/biomart_virtual_issue.html
There is also a constantly updated community page if this is of any help http://www.biomart.org/community.html a On Mon, Apr 16, 2012 at 7:00 PM, Syed Haider <[email protected]> wrote: > Hi Andreas, > > many thanks for your thorough investigation. Just to point out that > biomart does not do the caching, however, the database backend caching do > kick in for identical queries and hence, the difference in response time by > repeating the same query. Therefore, we can infer that its not the internet > slowness, its just the source locations (machines) where different > databases are hosted, is taking the time. So you would benefit by moving > these locally, but its going to be on the DB end - not internet. Final > decision is yours as you might want to find the appropriate balance between > maintenance and benefits :) For source data, Ensembl databases are fairly > straightforward to dump from their ftp. For others, please contact the > relevant guys from this table of contents: > > http://database.**oxfordjournals.org/content/**2011.toc<http://database.oxfordjournals.org/content/2011.toc> > > Best, > Syed > > > > > On 15/04/2012 17:59, andreas H wrote: > >> Hi Syed, >> >> using webservices API I get more or less the same behaviour: >> >> $bash-prompt > time wget >> 'http://www.biomart.org/**biomart/martservice?query=<http://www.biomart.org/biomart/martservice?query=> >> <?**xml version="1.0" >> encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default" >> formatter = "TSV" header = "0" uniqueRows = "0" count = "" >> datasetConfigVersion = "0.8"><Dataset name="hsapiens_gene_ensembl" >> config="gene_ensembl_ap"><**Filter name="hgnc_symbol" >> value="RAN"/><Attribute name="go_id"/></Dataset></**Query>' -O result.txt >> >> --17:37:45-- >> http://www.biomart.org/**biomart/martservice?query=%3C?** >> xml%20version=%221.0%22%**20encoding=%22UTF-8%22?%3E%3C!** >> DOCTYPE%20Query%3E%3CQuery%**20virtualSchemaName%20=%20%** >> 22default%22%20formatter%20=%**20%22TSV%22%20header%20=%20%** >> 220%22%20uniqueRows%20=%20%**220%22%20count%20=%20%22%22%** >> 20datasetConfigVersion%20=%20%**220.8%22%3E%3CDataset%20name=%** >> 22hsapiens_gene_ensembl%22%**20config=%22gene_ensembl_ap%** >> 22%3E%3CFilter%20name=%22hgnc_**symbol%22%20value=%22RAN%22/%** >> 3E%3CAttribute%20name=%22go_**id%22/%3E%3C/Dataset%3E%3C/**Query%3E<http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20virtualSchemaName%20=%20%22default%22%20formatter%20=%20%22TSV%22%20header%20=%20%220%22%20uniqueRows%20=%20%220%22%20count%20=%20%22%22%20datasetConfigVersion%20=%20%220.8%22%3E%3CDataset%20name=%22hsapiens_gene_ensembl%22%20config=%22gene_ensembl_ap%22%3E%3CFilter%20name=%22hgnc_symbol%22%20value=%22RAN%22/%3E%3CAttribute%20name=%22go_id%22/%3E%3C/Dataset%3E%3C/Query%3E> >> >> => `result.txt' >> Resolving www.biomart.org... 206.108.121.49 >> Connecting to www.biomart.org|206.108.121.**49|:80... connected. >> HTTP request sent, awaiting response... 200 OK >> Length: unspecified [text/plain] >> >> [ <=> ] 1,815 --.--K/s >> >> 17:38:03 (48.08 MB/s) - `result.txt' saved [1815] >> >> >> real 0m17.942s >> user 0m0.000s >> sys 0m0.030s >> >> By watching wget's output, the longest time is spend here: >> Resolving www.biomart.org... 206.108.121.49 >> Connecting to www.biomart.org|206.108.121.**49|:80... connected. >> HTTP request sent, awaiting response... >> >> >> Doing it a second time, results in only a few seconds wait time (half a >> second to 2 seconds). >> >> I have tried this from two machines belonging to two different >> organisations in London. The speed of the connections I assume is quite >> good. >> >> These may be useful: >> 1) Running say RAS query from organisation A takes 17 seconds. >> After a few minutes, running the same query from organisation B, takes >> less than 2 seconds. >> Running new, say LOX, query from B takes 17 seconds. After a few minutes >> doing the same from A takes 2 seconds. So I guess some cache at biomart >> short-circuits a lot of query/processing time *at biomart*. Which I >> think means that network speed is OK but processing/querying is not >> (though even half a second could be considered too much by some >> standards). >> >> 2) Replacing 'go_id' as attribute name by 'ensembl_gene_id' results in >> half a second queries from first time. replacing by 'ensembl_peptide_id' >> takes 3 seconds the first time, half a second afterwards. >> >> >> If you can point me to the URL of an online biomart query form, I will >> try that too. >> >> >> Thanks, >> >> Andreas >> >> On 13/04/2012 23:01, Syed Haider wrote: >> >>> Hi Andreas, >>> >>> I wonder if you see the same response time when you connect to the >>> website directly or using webservice API. Just trying to isolate the >>> R-specific and biomart-specific response time. I am assuming that your >>> institutional internet speed is good enough. I suggest the above because >>> if the response time could be improved, it can save you a lot of time >>> and hassle of managing these database locally. >>> >>> Best, >>> Syed >>> >>> On 12/04/2012 15:27, andreas H wrote: >>> >>>> Hi Syed, >>>> thanks for your reply, >>>> >>>> speed, bandwidth are the reasons. >>>> >>>> e.g. **unless I am doing something wrong** >>>> >>>> right now using R and biomaRt (with factory settings) : >>>> >>>> require(biomaRt) >>>> a_mart=useMart(biomart="**ensembl",dataset='hsapiens_**gene_ensembl') >>>> system.time({getBM(attributes=**c('go_id'),filters='hgnc_** >>>> symbol',values=c("KRAS"), >>>> >>>> mart=a_mart) }) >>>> # user system elapsed >>>> # 0.007 0.001 23.218 >>>> >>>> which is too slow. Admittedly, the second time this runs *within the >>>> same R session*, it drops down to 3 seconds, but this is not a likely >>>> scenario. >>>> >>>> Thanks, >>>> Andreas >>>> >>>> On 12/04/2012 14:59, Syed Haider wrote: >>>> >>>>> Hi Andreas, >>>>> >>>>> Before getting into the details of database updates, i wonder why would >>>>> you prefer to have databases downloaded locally as opposed to using it >>>>> whereever its hosted. BioMart is a system that enables data integration >>>>> by means of federation. Hence, it kind of defeats the purpose if one >>>>> downloads everything locally. Moreover, if you have data locally, you >>>>> enter into a rather bigger probem of updates for each of the sources >>>>> you >>>>> downloaded as all of them have their own major/minor release cycles >>>>> etc. >>>>> I might be missing something important w.r.t your requirements, hence i >>>>> ask :) >>>>> >>>>> Best, >>>>> Syed >>>>> >>>>> On 12/04/2012 14:10, andreas H wrote: >>>>> >>>>>> Hello there, >>>>>> >>>>>> After days of searching around I can't find a simple step-by-step >>>>>> guide >>>>>> on how to install biomart locally including databases. Could you point >>>>>> me to such a document if what I want to do is possible at all? >>>>>> >>>>>> I have also read the biomart 0.8 install manual, I have managed to set >>>>>> up a webserver, I can make it read data from local source but I can't >>>>>> find how to automatically download and install these sources locally. >>>>>> >>>>>> If you can contribute more, then preferably: >>>>>> 1) I would like to know how to select a database and transfer it >>>>>> locally >>>>>> to my own, local mysql server, without having to go to each vendor's >>>>>> website download their files and install it to my local db. I mean is >>>>>> there a click-and-install feature in *mart-configurator* or >>>>>> something? I >>>>>> visualise this as I give my local mysql password and the databses are >>>>>> install automatically there. >>>>>> >>>>>> 2) I would like to install only a small subset of databases locally - >>>>>> the ones that i use a lot. >>>>>> >>>>>> 3) Ideally, I would like with a few clicks to update the biomart >>>>>> databases which are installed locally, every few weeks/months. >>>>>> >>>>>> My aim right now is to convert between ensembl, uniprot, hugo protein >>>>>> IDs and get GO terms for each protein. >>>>>> >>>>>> Thanking you in advance, >>>>>> andreas >>>>>> >>>>>> >>>> >>>> >> The Institute of Cancer Research: Royal Cancer Hospital, a charitable >> Company Limited by Guarantee, Registered in England under Company No. >> 534147 with its Registered Office at 123 Old Brompton Road, London SW7 >> 3RP. >> >> This e-mail message is confidential and for use by the addressee only. >> If the message is received by anyone other than the addressee, please >> return the message to the sender by replying to it and then delete the >> message from your computer and network. >> > ______________________________**_________________ > Users mailing list > [email protected] > https://lists.biomart.org/**mailman/listinfo/users<https://lists.biomart.org/mailman/listinfo/users> > -- Arek Kasprzyk, MD, MSc, PhD BioMart Project Lead
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
