Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-18 Thread zehetner
Great work Andra! Is there any possibility to add for Ensembl related properties (Gene ID, Transcript ID etc.) the version of Ensembl from which these Ids are extracted (maybe by adding a qualifier to the ID value) as Entrez seems to provide this information? Between Ensembl versions these IDs can

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-18 Thread Ricordisamoa
Il 03/10/2014 22:31, Legoktm ha scritto: On 9/30/14 10:00 AM, Andra Waagmeester wrote: Would it be accepted if we increase the throughput by running multiple instances of ProteinBoxBot in parallel. If so, what would be an accepted number of parallel instances of a bot to run? We can run

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-17 Thread Andra Waagmeester
The suggestion to use wbeditentity was great. It took me some time to get used to using that call, but finally I managed and the optimisation was great. So great that we also finished including the Mouse genome, yesterday. It only took 2 days to complete, in contrast to the weeks with the human

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-03 Thread Legoktm
On 9/30/14 10:00 AM, Andra Waagmeester wrote: Would it be accepted if we increase the throughput by running multiple instances of ProteinBoxBot in parallel. If so, what would be an accepted number of parallel instances of a bot to run? We can run multiple instances from different geographical

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-01 Thread Katie Filbert
On Wed, Oct 1, 2014 at 10:56 AM, Andra Waagmeester an...@micelio.be wrote: There are about 47000 genes. In the first step the bot check if an entry already exists and, if not a new entry is made subsequently three claims are added (Entrez Gene ID (P351)

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-10-01 Thread Jeroen De Dauw
Hey, Currently each entity creation and subsequent claims are unique API calls. So using the wbeditentity will probably result in an improvement. Thanks for the suggestion. I second that suggestion. It should definitely not take 2 weeks or more to add a mere 50k items. In case your bot is PHP

[Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-09-30 Thread Andra Waagmeester
Hi All, I have joined the development team of the ProteinBoxBot ( https://www.wikidata.org/wiki/User:ProteinBoxBot) . Our goal is to make Wikidata the canonical resource for referencing and translating identifiers for genes and proteins from different species. Currently adding all genes

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

2014-09-30 Thread Denny Vrandečić
That's very cool! To get an idea, how big is your dataset? On Tue Sep 30 2014 at 12:06:56 PM Daniel Kinzler daniel.kinz...@wikimedia.de wrote: What makes it so slow? Note that you can use wbeditentity to perform complex edits with a single api call. It's not as streight forward to use as,