Hello, I'll answer in the body of both mails. Thank you so much for your help !
> > Le 4 févr. 2019 à 17:14, Strainu <[email protected]> a écrit : > > Not sure about this, but you might consider using low-level API > functions directly or even crafting your API calls by hand. That kind > of defies the purpose of using pwb, but oh well... => I see. I think I'll try figuring it out with pywikibot first, for simplicity sake. If I can't find a good enough solution with pwb, I may try that. > > This sounds like a great job for a SPARQL Query > (see https://query.wikidata.org for the public endpoint for WIkidata). > Is it feasible to add such an interface to your instance? => Yes, I'll plug in a SPARQL endpoint soon, I assume that kind of request is fast, so this is definitely something I'll try ! -- > Le 4 févr. 2019 à 17:59, Pellegrino Prevete <[email protected]> a > écrit : > > Il giorno Mon, 4 Feb 2019 15:36:08 +0100 > Kévin Bois <[email protected]> ha scritto: > >> Hello, >> >> I'm trying to write a pywikibot script which read and create items / >> properties on my Wikibase instance. Following pieces of tutorials and >> script examples, I managed to write something working. >> >> 1/ The idea is to read a CSV file, and create an item with its >> properties for each line. So I have to loop over thousands of lines >> and create an item and multiple claims associated, and it takes quite >> some time to do so. (atleast 1 hour to create 1000 items) I guess >> it's because for each line, I create a new entity and new claims, >> which means multiple requests for each line. Some pseudo code I use >> in my script: To create a new item, I use : repo.editEntity({}, {}, >> summary='new item') assuming repo = site.data_repository() To create >> a new claim, I use : self.user_add_claim_unless_exists(item, claim), >> assuming my Bot inherit WikidataBot >> >> Is there a better way to optimize that kind of bulk import ? >> >> -- >> >> 2/ I kind of have the same problem If I want to check if an item >> already exists, because first I need to get all existing items and >> check if they are in my CSV or not. (the CSV does not contain QIDs, >> but does contain a "custom" ID I've created and added as a property >> to each item ) >> >> -- >> >> I hope I was clear enough, any relevant example, idea, advice, would >> be much appreciated. Bear in mind I'm a beginner with the whole >> ecosystem so I'm open to any recommendation. Thanks ! >> _______________________________________________ pywikibot mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/pywikibot > > I do not know if this message will be delivered. I hope so. > > About the first question, I think you can split split the workload > among different python threads. > => That sounds awesome, I'll look into that > About the second, could you generate the QID with an injective function > from your id, so you would just have to execute the function using your > ID and check if the correspondent QID exists. > => It sounds like what I had in mind but I'm not sure if I understood correctly what you mean. To expand what I wanted to do: before adding anything with the script, I wanted to create a big mapping (in a python dictionary) with my custom ID and its corresponding QID. something like that: id_mapping = {custom_id1 => QID1, custom_id2 => QID2, ...}. Then I could easily look into that dictionary when needed before actually adding an item. This is why I'm trying to retrieve all existing item as a first step. > Pellegrino > > _______________________________________________ > pywikibot mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikibot
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
