dr0ptp4kt added a comment.
Here were the data produced by IGUANA once piped through the CSV utility introduced in https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/3/diffs with a command of the following form (for the attentive reader, note that I had to rename the originally produced files to have an `.nt` extension to make the underlying Jena libraries not throw an exception). `/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar org.aksw.iguana.rp.analysis.TabularTransform -e result.003.nt > result.003.execution.csv` | run | endpointLabel | taskStartDate | successfullQueries | successfullQueriesQPH | avgqps | queryMixesPH | | ---------------- | ------------------- | ------------------------ | ------------------ | --------------------- | ------ | ------------ | | non-randomized 1 | baseline | 2024-01-25T22:18:57.753Z | 15538 | 17512.446990539123 | 32.60590311357346 | 0.9895715087607575 | | non-randomized 1 | wikidata_main_graph | 2024-01-25T23:19:56.948Z | 16773 | 19125.484555828807 | 33.86191297163505 | 1.0807190233276154 | | non-randomized 2 | baseline | 2024-01-26T01:47:41.634Z | 15893 | 17955.609618256018 | 32.97284513274341 | 1.0146131897076351 | | non-randomized 2 | wikidata_main_graph | 2024-01-26T02:48:41.047Z | 16780 | 19145.810254441058 | 34.085209300591515 | 1.0818675625496446 | | randomized 1 | baseline | 2024-01-26T16:51:54.091Z | 15180 | 17068.107622599186 | 32.88786330044905 | 0.9644633340452725 | | randomized 1 | wikidata_main_graph | 2024-01-26T17:52:52.903Z | 15929 | 17969.809300477013 | 33.91560037068121 | 1.0154155676372838 | | randomized 2 | baseline | 2024-01-26T19:37:30.811Z | 15211 | 17054.882354485933 | 33.00710905229813 | 0.9637160170924978 | | randomized 2 | wikidata_main_graph | 2024-01-26T20:38:29.989Z | 16084 | 18210.142239149543 | 34.14020362715409 | 1.0289960015341326 | | Keep in mind that a delay between was introduced in the configuration for these "stress tests" (a "stress test" here means that the execution of the queries goes continuously for the specified time interval at its concurrency and delay spec). This was to more closely model what a somewhat busy, but not completely saturated, WDQS node might experience, although we should be mindful that the server specs are a bit different between these test servers and the WDQS hosts used for serving end user WDQS production requests. When interpreting a value like `avgqps`, remember that this is akin to what might happen if queries were executed serially without delay if it were possible to hold JVM performance constant for such request patterns (do note that this is generally not possible to guarantee, so caveats abound; in other words it's entirely possible that `avgqps` could degrade in reality). The `successfulQueriesQPH` metric is probably the most interesting one. It's suggestive of about a 5%-10% speed advantage for the smaller "main" graph versus a fully populated "full" graph for this query mix when conditions model a somewhat busy WDQS node (again, remember that server spec is a bit different between the SUT and production nodes so there is a caveat). Additional basic summary statistics upon the data from with per-query CSV exports (using the `-q` flag) against the `.nt` files to come. Note that in Andrea's previous analysis these sorts of statistics (as well as some tweaks to get somewhat finer precision via `BigDecimal` instead of `Double` types) were incorporated directly into the Java source of IGUANA - see https://github.com/dice-group/IGUANA/compare/main...AndreaWesterinen:IGUANA:main for changes up to June 13, 2022 against current main branch of IGUANA; n.b., to future readers you may need to re-correlate the code changes when IGUANA upstream changes. But, I opted to make fewer changes to our fork (i.e., I didn't merge Andrea's fork into our fork, even if there is some dependency similarity in the POMs) as this data can be determined in Spark summary stat calls. We may be interested in how to take forward some of the enhancement opportunities for IGUANA upstream should we see the need for more IGUANA work later, but then again we may not do that as our needs are narrower. TASK DETAIL https://phabricator.wikimedia.org/T355037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org