dr0ptp4kt added a comment.
For the first pass, the following configuration is being used for an hour
long test conducted from `stat1006` with config file `wdqs-split-test.yml` as
follows.
datasets:
- name: "split"
connections:
- name: "baseline"
endpoint: "https://wdqs1022.eqiad.wmnet/sparql"
- name: "wikidata_main_graph"
endpoint: "https://wdqs1024.eqiad.wmnet/sparql"
tasks:
- className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
configuration:
timeLimit: 3600000
warmup:
timeLimit: 30000
workers:
- threads: 4
className: "SPARQLWorker"
queriesFile: "queries_for_performance.txt"
timeOut: 5000
queryHandler:
className: "DelimInstancesQueryHandler"
configuration:
delim: "### BENCH DELIMITER ###"
workers:
- threads: 4
className: "SPARQLWorker"
queriesFile: "queries_for_performance.txt"
timeOut: 60000
parameterName: "query"
gaussianLatency: 100
metrics:
- className: "QMPH"
- className: "QPS"
- className: "NoQPH"
- className: "AvgQPS"
- className: "NoQ"
storages:
- className: "NTFileStorage"
configuration:
fileName: result.nt
`queries_for_performance.txt` is based on the following basic code, which
says to get queries known to work against both the full graph and the main
(non-scholarly) graph and returning similar results, so as to reduce garbage
input and somewhat better control the parameters of the test.
scala> val wikidata =
spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/wikidata_classified.parquet")
scala> val full =
spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/full_classified.parquet")
scala> val joined5 = wikidata.as("w").join(full.as("f")).where("w.id = f.id
and w.success = true and w.success = f.success and w.resultSize = f.resultSize
and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"),
lit("\n### BENCH DELIMITER ###"))).distinct
scala> joined5.repartition(1).write.option("compression",
"none").text("queries_for_performance_2024_01_25.txt")
scala> :quit
$ hdfs dfs -copyToLocal
hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_2024_01_25.txt/part-00000-6b8caed3-3a4d-4cb2-bf74-6bbcd7af0478-c000.txt
./queries_for_performance.txt
$ /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -jar iguana-3.3.3.jar
wdqs-split-test.yml
The IGUANA build is based on
https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/4 .
TASK DETAIL
https://phabricator.wikimedia.org/T355037
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1,
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi,
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen,
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]