[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-04-02 Thread gerritbot
gerritbot added a comment. Change 423470 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Add integration test for dumpRdf.php --part-id https://gerrit.wikimedia.org/r/423470TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-04-02 Thread gerritbot
gerritbot added a comment. Change 413290 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Make bnodelabeler be aware of shards https://gerrit.wikimedia.org/r/413290TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-04-02 Thread gerritbot
gerritbot added a comment. Change 423470 had a related patch set uploaded (by Hoo man; owner: Hoo man): [mediawiki/extensions/Wikibase@master] Add integration test for dumpRdf.php --part-id https://gerrit.wikimedia.org/r/423470TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-21 Thread gerritbot
gerritbot added a comment. Change 421101 merged by jenkins-bot: [mediawiki/core@master] Update purtle to 1.0.7 https://gerrit.wikimedia.org/r/421101TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-21 Thread gerritbot
gerritbot added a comment. Change 421101 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [mediawiki/core@master] Update purtle to 1.0.7 https://gerrit.wikimedia.org/r/421101TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-20 Thread Smalyshev
Smalyshev added a comment. @thiemowmde Packagist packages are not auto-updated from Wikimedia github, afaik. But I took care of it.TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: thiemowmde,

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-20 Thread gerritbot
gerritbot added a comment. Change 420636 merged by jenkins-bot: [purtle@master] Update release notes for version 1.0.7 https://gerrit.wikimedia.org/r/420636TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-20 Thread thiemowmde
thiemowmde added a comment. @Smalyshev, I am administrator on a lot of Packagist packages, but not on https://packagist.org/packages/wikimedia/purtle. In theory the only thing you need to do is to tag a new v1.0.7 release via git. Packagist should pick this up. If it does not, we might need to

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-19 Thread gerritbot
gerritbot added a comment. Change 420636 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [purtle@master] Update release notes https://gerrit.wikimedia.org/r/420636TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-03-19 Thread gerritbot
gerritbot added a comment. Change 413288 merged by jenkins-bot: [purtle@master] Add ability to set bnode labeler to writer & factory https://gerrit.wikimedia.org/r/413288TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-02-21 Thread gerritbot
gerritbot added a comment. Change 413290 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [mediawiki/extensions/Wikibase@master] Make bnodelabeler be aware of shards https://gerrit.wikimedia.org/r/413290TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-02-21 Thread gerritbot
gerritbot added a comment. Change 413288 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [purtle@master] Add ability to set bnode labaler https://gerrit.wikimedia.org/r/413288TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-02-21 Thread Smalyshev
Smalyshev added a comment. With UUID the problem is it'd be very hard to test I'm afraid. However, we could just set prefix as genid{$shard}- instead of just genid and that should work I think. Ah, I guess this is till a problem when joining dump file shards into a single file. Yes, exactly.TASK

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-01-23 Thread daniel
daniel added a comment. I think the simplest way to fix this is to add a UUID (or something similar) to the ID prefix (currently, it's just "genid"). BNodeLabeler has a parameter for the prefix in the constructor. RdfWriterFactory will have to be changed to optionally know and set a BNodeLabeler

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-01-23 Thread Smalyshev
Smalyshev added a comment. Yes, this looks like a problem. We should be using separate IDs for separate bnodes. Probably should have some kind of initializer for shards that guarantees the spaces are not intersecting. Maybe using multipliers - i.e. the ID would be N * shardCount + shardNumber, so

[Wikidata-bugs] [Maniphest] [Commented On] T185589: Repeating blank node ids in Wikidata entity RDF dumps

2018-01-23 Thread hoo
hoo added a comment. @Lucas_Werkmeister_WMDE Brought this up on https://gerrit.wikimedia.org/r/405739 where I'm working on splitting the dumping up even further (so blank node ids would repeat even faster/ more often).TASK DETAILhttps://phabricator.wikimedia.org/T185589EMAIL