dr0ptp4kt added a comment.
Here's the output from the latest run based upon a larger set of queries from
a random sample of WDQS queries.
$ /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar
org.aksw.iguana.rp.analysis.TabularTransform -e result.nt > result.execution.csv
$ cut -f1,3,5,6,7,9 -d"," result.execution.csv | sed 's/,/|/g'
| endpointLabel | taskStartDate | successfullQueries |
successfullQueriesQPH | avgqps | queryMixesPH |
| ------------------- | ------------------------ | ------------------ |
--------------------- | ------ | ------------ |
| baseline | 2024-01-31T23:20:44.567Z | 319857 |
136612.71246575614 | 18.83670491311007 | 1.732300885924224
|
| wikidata_main_graph | 2024-02-01T04:23:01.613Z | 331473 |
147674.12233239523 | 19.55930142298825 | 1.8725637484770261
|
|
Here's the screen capture from Grafana.
F41740308: Screenshot 2024-02-01 at 10.17.28 AM.png
<https://phabricator.wikimedia.org/F41740308>
The `wikidata_main_graph` window completed more queries despite an apparent
bout of increased failing queries (climb began at about 0915 UTC), with a large
garbage collection beginning about 5 minutes later (GC started at about 0920
UTC; the GC actually continued well after the `wikidata_main_graph`'s window
closure at 2024-02-01T09:23:55.639Z). This isn't the most interesting thing as
it only constitutes about 1.5%-3.0% of the `wikidata_main_graph` window
depending on how one looks at it, and I wouldn't necessarily read anything into
whether such GCs would be likely to occur under the same conditions, but I
wanted to note it nonetheless.
To repeat the verbiage from the earlier runs...
> Following below are "per-query" summary stats. I actually just put this
together by bringing CSV data into Google Sheets for now - all of the columns
are calculated upon the "per-query" rows (but you'll see how the Mean
corresponds basically with the value calculated up above). The underlying CSV
data don't bear actual queries (the .nt files from which they're generated do),
...
The CSV data were generated with the following command:
`/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar
org.aksw.iguana.rp.analysis.TabularTransform -q result.nt > result.query.csv`
| Run | Endpoint Label | Mean | Median | Standard Deviation |
Max (fastest) | 99% (very fast) | 0.95 | 0.75 | 0.5 | 0.25 | 1% (pretty slow)
| Total w/ success |
| ------------ | ------------------- | ---- | ------ | ------------------ |
------------- | --------------- | ---- | ---- | ---- | ---- | ----------------
| ---------------- |
| randomized 1 | baseline | 18.8367049131101 | 14.6999663404689
| 16.3589173757083 | 127.433177227691 | 59.009472115968
| 50.5734395961334 | 30.3470335487675 | 14.6999663404689 |
4.97164300568995 | 0 | 319857 |
| randomized 1 | wikidata_main_graph | 19.5593014229883 | 16.0982853987134
| 16.5098295290687 | 121.141149629509 | 58.9613256488317
| 51.0426872548935 | 31.751311031492 | 16.0982853987134 |
5.37249826361878 | 0 | 331473 |
|
Although the max and 99th percentile queries were just ever so slightly
faster on the baseline "full" graph, more generally things were faster on the
non-scholarly "main" graph. The performance difference is obvious but not
dramatic.
Here's the content of `wdqs-split-test-randomized-2024-01-31.yml`, comments
removed for brevity. The main difference in this configuration file from the
earlier presented one is five hours allowed per graph, to accommodate a larger
query mix, and the updated filename pointing to the larger query mix based on
the set of queries from the random sample.
datasets:
- name: "split"
connections:
- name: "baseline"
endpoint: "https://wdqs1022.eqiad.wmnet/sparql"
- name: "wikidata_main_graph"
endpoint: "https://wdqs1024.eqiad.wmnet/sparql"
tasks:
- className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
configuration:
timeLimit: 18000000
warmup:
timeLimit: 30000
workers:
- threads: 4
className: "SPARQLWorker"
queriesFile:
"queries_for_performance_file_renamed_randomized_2024_01_31.txt"
timeOut: 5000
queryHandler:
className: "DelimInstancesQueryHandler"
configuration:
delim: "### BENCH DELIMITER ###"
workers:
- threads: 4
className: "SPARQLWorker"
queriesFile:
"queries_for_performance_file_renamed_randomized_2024_01_31.txt"
timeOut: 60000
parameterName: "query"
gaussianLatency: 100
metrics:
- className: "QMPH"
- className: "QPS"
- className: "NoQPH"
- className: "AvgQPS"
- className: "NoQ"
storages:
- className: "NTFileStorage"
configuration:
fileName: result.nt
TASK DETAIL
https://phabricator.wikimedia.org/T355037
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1,
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi,
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen,
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]