Re: [pywikibot] Creating new items + claims efficiently

Kévin Bois Tue, 05 Feb 2019 00:15:20 -0800

Hello,

I'll answer in the body of both mails. Thank you so much for your help !



> 
> Le 4 févr. 2019 à 17:14, Strainu <[email protected]> a écrit :
> 
> Not sure about this, but you might consider using low-level API
> functions directly or even crafting your API calls by hand. That kind
> of defies the purpose of using pwb, but oh well...
=> I see. I think I'll try figuring it out with pywikibot first, for simplicity 
sake. If I can't find a good enough solution with pwb, I may try that.
> 
> This sounds like a great job for a SPARQL Query
> (see https://query.wikidata.org for the public endpoint for WIkidata).
> Is it feasible to add such an interface to your instance?
=> Yes, I'll plug in a SPARQL endpoint soon, I assume that kind of request is 
fast, so this is definitely something I'll try ! 

--

> Le 4 févr. 2019 à 17:59, Pellegrino Prevete <[email protected]> a 
> écrit :
> 
> Il giorno Mon, 4 Feb 2019 15:36:08 +0100
> Kévin Bois <[email protected]> ha scritto:
> 
>> Hello,
>> 
>> I'm trying to write a pywikibot script which read and create items /
>> properties on my Wikibase instance. Following pieces of tutorials and
>> script examples, I managed to write something working.
>> 
>> 1/ The idea is to read a CSV file, and create an item with its
>> properties for each line. So I have to loop over thousands of lines
>> and create an item and multiple claims associated, and it takes quite
>> some time to do so. (atleast 1 hour to create 1000 items) I guess
>> it's because for each line, I create a new entity and new claims,
>> which means multiple requests for each line. Some pseudo code I use
>> in my script: To create a new item, I use : repo.editEntity({}, {},
>> summary='new item') assuming repo = site.data_repository() To create
>> a new claim, I use : self.user_add_claim_unless_exists(item, claim),
>> assuming my Bot inherit WikidataBot
>> 
>> Is there a better way to optimize that kind of bulk import ?
>> 
>> --
>> 
>> 2/ I kind of have the same problem If I want to check if an item
>> already exists, because first I need to get all existing items and
>> check if they are in my CSV or not. (the CSV does not contain QIDs,
>> but does contain a "custom" ID I've created and added as a property
>> to each item )
>> 
>> --
>> 
>> I hope I was clear enough, any relevant example, idea, advice, would
>> be much appreciated. Bear in mind I'm a beginner with the whole
>> ecosystem so I'm open to any recommendation. Thanks !
>> _______________________________________________ pywikibot mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/pywikibot
> 
> I do not know if this message will be delivered. I hope so.
> 
> About the first question, I think you can split split the workload
> among different python threads.
> 
=> That sounds awesome, I'll look into that

> About the second, could you generate the QID with an injective function
> from your id, so you would just have to execute the function using your
> ID and check if the correspondent QID exists.
> 
=> It sounds like what I had in mind but I'm not sure if I understood correctly 
what you mean. 
To expand what I wanted to do: before adding anything with the script, I wanted 
to create a big mapping (in a python dictionary) with my custom ID and its 
corresponding QID. something like that: id_mapping = {custom_id1 => QID1, 
custom_id2 => QID2, ...}. Then I could easily look into that dictionary when 
needed before actually adding an item. This is why I'm trying to retrieve all 
existing item as a first step.

> Pellegrino
> 
> _______________________________________________
> pywikibot mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikibot

_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Re: [pywikibot] Creating new items + claims efficiently

Reply via email to