dr0ptp4kt added a comment.

  Here were the data produced by IGUANA once piped through the CSV utility 
introduced in 
https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/3/diffs
 with a command of the following form (for the attentive reader, note that I 
had to rename the originally produced files to have an `.nt` extension to make 
the underlying Jena libraries not throw an exception).
  
  `/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar 
org.aksw.iguana.rp.analysis.TabularTransform -e result.003.nt > 
result.003.execution.csv`
  
  | run              | endpointLabel       | taskStartDate            | 
successfullQueries | successfullQueriesQPH | avgqps | queryMixesPH |
  | ---------------- | ------------------- | ------------------------ | 
------------------ | --------------------- | ------ | ------------ |
  | non-randomized 1 | baseline            | 2024-01-25T22:18:57.753Z | 15538   
           | 17512.446990539123               | 32.60590311357346   | 
0.9895715087607575          |
  | non-randomized 1 | wikidata_main_graph | 2024-01-25T23:19:56.948Z | 16773   
           | 19125.484555828807               | 33.86191297163505   | 
1.0807190233276154          |
  | non-randomized 2 | baseline            | 2024-01-26T01:47:41.634Z | 15893   
           | 17955.609618256018               | 32.97284513274341   | 
1.0146131897076351          |
  | non-randomized 2 | wikidata_main_graph | 2024-01-26T02:48:41.047Z | 16780   
           | 19145.810254441058               | 34.085209300591515   | 
1.0818675625496446          |
  | randomized 1     | baseline            | 2024-01-26T16:51:54.091Z | 15180   
           | 17068.107622599186               | 32.88786330044905   | 
0.9644633340452725          |
  | randomized 1     | wikidata_main_graph | 2024-01-26T17:52:52.903Z | 15929   
           | 17969.809300477013               | 33.91560037068121   | 
1.0154155676372838          |
  | randomized 2     | baseline            | 2024-01-26T19:37:30.811Z | 15211   
           | 17054.882354485933               | 33.00710905229813   | 
0.9637160170924978          |
  | randomized 2     | wikidata_main_graph | 2024-01-26T20:38:29.989Z | 16084   
           | 18210.142239149543               | 34.14020362715409   | 
1.0289960015341326          |
  |
  
  Keep in mind that a delay between was introduced in the configuration for 
these "stress tests" (a "stress test" here means that the execution of the 
queries goes continuously for the specified time interval at its concurrency 
and delay spec). This was to more closely model what a somewhat busy, but not 
completely saturated, WDQS node might experience, although we should be mindful 
that the server specs are a bit different between these test servers and the 
WDQS hosts used for serving end user WDQS production requests. When 
interpreting a value like `avgqps`, remember that this is akin to what might 
happen if queries were executed serially without delay if it were possible to 
hold JVM performance constant for such request patterns (do note that this is 
generally not possible to guarantee, so caveats abound; in other words it's 
entirely possible that `avgqps` could degrade in reality).
  
  The `successfulQueriesQPH` metric is probably the most interesting one. It's 
suggestive of about a 5%-10% speed advantage for the smaller "main" graph 
versus a fully populated "full" graph for this query mix when conditions model 
a somewhat busy WDQS node (again, remember that server spec is a bit different 
between the SUT and production nodes so there is a caveat). Additional basic 
summary statistics upon the data from with per-query CSV exports (using the 
`-q` flag) against the `.nt` files to come.
  
  Note that in Andrea's previous analysis these sorts of statistics (as well as 
some tweaks to get somewhat finer precision via `BigDecimal` instead of 
`Double` types) were incorporated directly into the Java source of IGUANA - see 
https://github.com/dice-group/IGUANA/compare/main...AndreaWesterinen:IGUANA:main
 for changes up to June 13, 2022 against current main branch of IGUANA; n.b., 
to future readers you may need to re-correlate the code changes when IGUANA 
upstream changes. But, I opted to make fewer changes to our fork (i.e., I 
didn't merge Andrea's fork into our fork, even if there is some dependency 
similarity in the POMs) as this data can be determined in Spark summary stat 
calls. We may be interested in how to take forward some of the enhancement 
opportunities for IGUANA upstream should we see the need for more IGUANA work 
later, but then again we may not do that as our needs are narrower.

TASK DETAIL
  https://phabricator.wikimedia.org/T355037

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dr0ptp4kt
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to