There are about 47000 genes. In the first step the bot check if an entry
already exists and, if not a new entry is made subsequently three claims
are added (Entrez Gene ID (P351)
<https://www.wikidata.org/wiki/Property:P351#top> found in taxon (P703)
<https://www.wikidata.org/wiki/Property:P703#top>, subclass of (P279)) as
well as synonyms. Currently this process takes a week to complete. In a
second phase identifiers for each gene are obtained and added as respective
claims. Typically this ranges from 1 claim per property to up to 20 claims
per property (This is a rough estimate). This bot has been running for 2
weeks and is currently at 74% of all genes covered.


Currently each entity creation and subsequent claims are unique API calls.
So using the wbeditentity will probably result in an improvement. Thanks
for the suggestion.

Cheers, Andra




On Tue, Sep 30, 2014 at 9:05 PM, Daniel Kinzler <daniel.kinz...@wikimedia.de
> wrote:

> What makes it so slow?
>
> Note that you can use wbeditentity to perform complex edits with a single
> api
> call. It's not as streight forward to use as, say, wbaddclaim, but much
> more
> powerfull and efficient.
>
> -- daniel
>
> Am 30.09.2014 19:00, schrieb Andra Waagmeester:
> > Hi All,
> >
> >       I have joined the development team of the ProteinBoxBot
> > (https://www.wikidata.org/wiki/User:ProteinBoxBot) . Our goal is to make
> > Wikidata the canonical resource for referencing and translating
> identifiers for
> > genes and proteins from different species.
> >
> > Currently adding all genes from the human genome and their related
> identifiers
> > to Wikidata takes more then a month to complete. With the objective to
> add other
> > species, as well as having frequent updates for each of the genomes, it
> would be
> > convenient if we could increase this throughput.
> >
> > Would it be accepted if we increase the throughput by running multiple
> instances
> > of ProteinBoxBot in parallel. If so, what would be an accepted number of
> > parallel instances of a bot to run? We can run multiple instances from
> different
> > geographical locations if necessary.
> >
> > Kind regards,
> >
> >
> > Andra
> >
> >
> >
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > Wikidata-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
> >
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to