[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. I believe 1.5% slowdown is acceptable for the functionality. I will review the code soon and add my comments.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
WMDE-leszek added a comment. Thanks @Ladsgroup! I've copied over the table to the task description, so it is better visible. I've also added links to the actual patches tested, and added a column with some ratio measuring somehow how slower was the patched code compared to master. @daniel, @Smalyshev: do you think there is enough data to make a decision if we go with any of the patches, or do we try some other approach/further optimizations? If any of patches was fine in general, I'd abandon the other one, and ask you for the actual code review then!TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. I did a table of running it on master, the old patch and the alternative one. NameFirst runSecond runThird runAverage Master19.9021.09121.09620.696 Old patch23.30123.15622.93123.129 Alternative patch20.96520.71621.33721.006 TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. With the alternative approach: Without it (on master): root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m21.666s user 0m18.016s sys 0m0.376s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m21.303s user 0m17.436s sys 0m0.504s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m20.083s user 0m16.468s sys 0m0.352s Patches I cherry-picked: commit 40df4ad514a32c15b8b84e4e204a8f5edf3b7b20 Author: Leszek Manicki Date: Wed Apr 5 11:45:12 2017 +0200 [WIP] Consider foreign entities in RDF mapping Namespaces used in clauses related to foreign entities now contain the repository name (prefix) of the repository the entity belongs to. Repository prefix is also dropped from the "local name" of the foreign entity. DO NOT MERGE before T162371 is resolved! This is I6acb00b23425bdce4dd03227415e593595b3c1cc without the cache in RdfVocabulary. Change-Id: I1fe6c6ba92d5992d1b6d1802a20fe8b47033443a commit b63f84d779115e1cd849a81c90873eff678ca345 Author: Leszek Manicki Date: Thu Nov 2 13:39:28 2017 +0100 Use DataModel with more efficient EntityId::getRepositoryName and ::getLocalPart Change-Id: I19a9cf6f5c9b1fd5c3e1a9b2bf77a36c3d1d4739 commit e2069fbc8deaf1fad735b217c4d112daca137b54 Author: Thiemo Mttig Date: Thu Nov 9 18:17:12 2017 +0100 Reformatting in preparation for foreign entities in RDF mapping This is some very basic reformating, only done to make the diffs in I6acb00b much easier to review. Bug: T161592 Change-Id: I9aff0eb930cd755582505b641385932ef9217c36 commit 22cf861cb9e49928876b007540a3f89650c1d30c Merge: b824931 b9b1ec8 Author: jenkins-bot Date: Tue Nov 14 08:23:05 2017 + Merge "Add noop hint for Descriptions too" With it: root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m19.684s user 0m15.984s sys 0m0.372s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m20.478s user 0m16.812s sys 0m0.380s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m20.139s user 0m16.560s sys 0m0.328sTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
WMDE-leszek added a comment. Thanks @daniel and @Smalyshev for the feedback so far. Submitted https://github.com/wmde/WikibaseDataModel/pull/767 to get those changes made. Feel free to review and comment wherever you find applicable.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. The benchmark looks pretty good. I'll review the patches a bit later (I have a cold right now and it's hard for me to concentrate) and if the code is OK I see no reason not to merge it. We could also do another xhprof run before that to see if there are any other things we could improve.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
daniel added a comment. @WMDE-leszek The approach looks fine, and should help. The question is if it's sufficient.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
WMDE-leszek added a comment. Thanks @Ladsgroup for running another round of benchmark. To make it clear what has been actually benchmarked. I've update the same patch that has been subject to analysis before, i.e. https://gerrit.wikimedia.org/r/#/c/345332/22. The patch itself has not really been changed, I've just updated it based the current state of the master. Additionally, based on some comments @Smalyshev made, I've made some changes to several methods of EntityId (i.e. I didn't really make any changes to the Wikibase patch). Namely, I've changed EntityId::getRepositoryName and EntityId::getLocalPart so that they're not doing splitting etc of the ID serialization but just generate those parts once. The actual change (which is the only thing changed for the whole new benchmark run) is https://github.com/wmde/WikibaseDataModel/commit/81f4186ea81f890769931322f4b36a3f6a0ae68d. Other issues pointed in how some things are done in EntityId class are still there. These would require some more of fundamental change to make, so I have decided to try avoid them if possible. @Smalyshev @daniel any comments, concerns about the new attempt? If the path seems fine, and the performance of the new code acceptable, I'ld turn https://github.com/wmde/WikibaseDataModel/commit/81f4186ea81f890769931322f4b36a3f6a0ae68d into actual pull request to Data Model library, and once it's in, https://gerrit.wikimedia.org/r/#/c/345332 could possibly be merged. I guess it might be still interesting to look at what profiler says on the changed DataModel code. I believe @Ladsgroup could provide it if this would be of help.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. I did another round of benchmarking with @WMDE-leszek's new patch and the result is like this: Without it: root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m24.914s user 0m20.672s sys 0m0.396s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m25.464s user 0m20.988s sys 0m0.456s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m25.086s user 0m20.960s sys 0m0.392s With: root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m25.333s user 0m21.568s sys 0m0.384s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m25.793s user 0m22.044s sys 0m0.424s root@federated-wikis:/var/www/html# time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m25.904s user 0m21.940s sys 0m0.412s I think this is bearable given the advantages it has.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. I've introduced a cache to reduce number of calls to EntityId::getLocalPart and EntityId::getRepositoryName. I think we also need to take a good look why splitSerialization is getting called so much. We do not need to do any transformations to the IDs we get from the database - we get Q123, we add prefix to it and we output wd:Q123 or whatever. That should not require calling tons of functions. Yet we do. And we obsessively validate IDs known to be valid. I think we have a case of over-abstraction there or maybe using the same API for clean and dirty input, which is sub-optimal from performance point of view.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
WMDE-leszek added a comment. Thanks @Smalyshev for looking into this! I've just submitted a new patchset of https://gerrit.wikimedia.org/r/#/c/345332 where I've introduced a cache to reduce number of calls to EntityId::getLocalPart and EntityId::getRepositoryName. As I am really not an expert with the matter, if you had any suggestion what would be a good default value for the cache size (i.e. how many IDs should be cached), that would be appreciated. I guess a thing to consider would be to have a mildly reasonable "general" default (say 1K, 10K, etc), and set it to a bigger number in settings for wikidata(s) so e.g. properties would most likely be all cached. With some sane default we could run the benchmark again and see how this helped. If you have any other suggestion, also please say, and I'll try to address them. Also, if this makes any sense, the code could possibly also changed so that properties are cached separately, so we're sure each property ID is always processed only a single time.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. Some analysis: Master run: https://items-repo.wmflabs.org/xhprof_html/?run=598b8f9c66fab&source=dumpRdf&sort=excl_wt Patched run: https://items-repo.wmflabs.org/xhprof_html/?run=598b8d44818ce&source=dumpRdf&sort=excl_wt We can see that one of the most expensive functions there is Wikibase\DataModel\Entity\EntityId::splitSerialization. In fact, it is super-expensive even in master run, being only second to actual SQL queries, but we get almost 3x calls for it in the patch run. So, the immediate suggested action would be to see whether we can eliminate additional or even better, all calls to splitSerialization or severely reduce them. Most of the calls to splitSerialization and all additional ones come from: EntityId::getLocalPart EntityId::getRepositoryName In the master, top ones are: DispatchingEntityIdParser::parse PrefixMappingEntityIdParser::parse Together, they are 60% of the calls. EntityId::getLocalPart mainly comes from RdfVocabulary::getEntityLName, and EntityId::getRepositoryName from various snak builders. Surprisingly, even though the result of these for every given ID should be static, both call child functions. I think it should be an easy fix to convert both of those function calls to either public variables, of if this is too offensive to good design, functions returning such variables directly, without additional calls. I also note we construct way more ID objects that there presumably are different IDs (e.g. PropertyId ctor is called 34,892 times even though there are about 3K properties in Wikidata altogether) so I wonder if caching id->object may be helpful. Caching all IDs may be too much (even though well within range of any of our 128G servers :) but at least LRU of reasonable size or something? Not sure how easy it would be to add - this cache is useless for most web workloads probably but would be useful for dump. Maybe we could inject something into WikibaseLib.entitytypes.php? It won't save too much - ItemId ctor is 3.2% and PropertyId ctor is 2.8%, but even 6% gain is not too bad. Looking into splitSerialization, we see: Call to assertValidSerialization - we validate serialization format on the IDs we take from database, 4,665,466 times for 7K ids. That means we validate each ID over 600 times. Even if there were a case for validating IDs which we take from DB, which I don't see, there is certainly no case for doing it 600+ times. This should be eliminated. Validation takes 1/4 of time consumed by the function. 4 array functions - explode, array_pop, array_shift, implode. They don't take much time, together being about 5%, but I wonder if it could be simpler. Maybe not, but worth spending some thought. normalizeIdSerialization - this takes 10% of the time, and is completely unnecessary, as far as I can see, for IDs coming from DB. Maybe we should have separate code paths for "dirty" IDs - which come from user, etc. and have all validation and transformation and what not - and for "clean" IDs which come from trusted source so we don't have to spend so much cycles on validating clean data? I am not sure I fully appreciate how IDs work there yet, so we may find further ways to optimize this, let's discuss it. But the above I think are the most immediate thingsTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. It's amazing. Thank you!TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. GUI seems to work now, see https://items-repo.wmflabs.org/xhprof_html/index.php?run=5984fc915da57&source=dumpRdf new runs should be added to https://items-repo.wmflabs.org/xhprof_html/ as they happen. On the filesystem they reside on /var/tmp/xhprof, feel free to clean up those that aren't needed anymore. Didn't analyze the profile itself yet, probably can get to it next week.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. @Smalyshev: I installed XhGui (also with mongodb) but still I get this strange error PHP Fatal error: Class 'XHProfRuns_Default' not found in /var/www/html/extensions/Wikibase/repo/maintenance/dumpRdf.php on line 139. I couldn't find any place to fix it.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
daniel added a comment. In T162371#3487544, @Ladsgroup wrote: @daniel: The paste file is 35K lines and it's bringing my laptop to its knees :( Can you edit your comment and make it link? done. sorry!TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, danielCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. @daniel: The paste file is 35K lines and it's bringing my laptop to its knees :( Can you edit your comment and make it link? @Smalyshev: Yeah, in ssh federated-wikis.eqiad.wmflabs file /var/www/html/extensions/Wikibase/repo/maintenance/dumpRdf.php I added the lines for profiling, I tried "XHProfRuns_Default" but didn't work out. I commented out those parts. Sorry, I just heard about XHPorfiling in this phab card.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. xhprof without GUI is kinda hard to comprehend... where it is installed? I could try to make the GUI work.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, SmalyshevCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
daniel added a comment. To be honest, I have no idea how to make sense of the raw data, and I can't find a visualizytion tool that would take it as input. @Smalyshev, do you have an idea?TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, danielCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. I sent you the raw report, Couldn't make the GUI to work.TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
daniel added a comment. @Ladsgroup thank you! So, this looks like a 10-15% increase. Not too horrible, but significant. It would be good to learn what exactly is causing it. Can you run this though xhprof and see what the bottle necks as in the methods that were changed by the patch?TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, danielCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. Things to remember: In props-repo, properties are local and items are coming from the other repo and vice versa. Database of both of them lives in item-repos instance for simplicity reason, one of the reasons that it takes some time is that it tries to read a remote instances IMO. TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. First run on props-repo with the patch: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7034 entities. real 0m50.007s user 0m30.688s sys 0m0.512s Second run: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m44.122s user 0m27.664s sys 0m0.480s Third run: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m41.044s user 0m25.968s sys 0m0.448sTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. First run on the props-repo.wmflabs.org without the patch: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7034 entities. real 0m40.751s user 0m23.796s sys 0m0.616s Second run: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m39.491s user 0m23.088s sys 0m0.564s Third run: ladsgroup@federated-wikis2:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m42.495s user 0m25.188s sys 0m0.524sTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. First run on the items-repo with this patch: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m32.192s user 0m26.328s sys 0m0.584s Second run: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m33.269s user 0m27.200s sys 0m0.668s Third run: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m35.599s user 0m29.300s sys 0m0.748sTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Ladsgroup added a comment. First run on items-repo (items-repo.wmflabs.org) without this patch: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... Processed 7042 entities. real 0m30.012s user 0m23.724s sys 0m0.788s Second run: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 ... real 0m27.564s user 0m22.016s sys 0m0.652s Third run: ladsgroup@federated-wikis:/var/www/html$ time php extensions/Wikibase/repo/maintenance/dumpRdf.php > /dev/null Dumping shard 0/1 real 0m29.570s user 0m23.360s sys 0m0.684sTASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, PokestarFan, Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T162371: Benchmark RDF dump with foreign ID mapping
Smalyshev added a comment. Did this performance test happen?TASK DETAILhttps://phabricator.wikimedia.org/T162371EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, SmalyshevCc: Lucas_Werkmeister_WMDE, Smalyshev, daniel, WMDE-leszek, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs