dr0ptp4kt added a comment.
Now, the screenshot from the randomized order queries. I'll run one more time to see that comparable output is achieved. Those were produced with the following. This latest output file has been moved to `result.nt.003`. scala> val joined6 = wikidata.as("w").join(full.as("f")).where("w.id = f.id and w.success = true and w.success = f.success and w.resultSize = f.resultSize and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"), lit("\n### BENCH DELIMITER ###"))).distinct.sample(withReplacement=false, fraction=1.0, seed=42) scala> joined6.count // matches same as joined5.count scala> joined6.repartition(1).write.option("compression", "none").text("queries_for_performance_randomized_2024_01_26.txt") scala> :quit $ hdfs dfs -copyToLocal hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_randomized_2024_01_26.txt/part-00000-131df78f-da7a-4ffc-aad4-9874342165ca-c000.txt ./queries_for_performance_randomized.txt $ sha1sum queries_for_performance.txt queries_for_performance_randomized.txt $ # they're different $ diff queries_for_performance.txt queries_for_performance_randomized.txt | wc -l $ # they're very different $ cp wdqs-split-test.yml wdqs-split-test-randomized.yml $ # changed pointers to query file to be queries_for_performance_randomized.txt $ bash start-iguana.sh wdqs-split-test-randomized.yml $ mv result.nt result.nt.003 TASK DETAIL https://phabricator.wikimedia.org/T355037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org