dr0ptp4kt added a comment.

  Now, the screenshot from the randomized order queries. I'll run one more time 
to see that comparable output is achieved. Those were produced with the 
following. This latest output file has been moved to `result.nt.003`.
  
    scala> val joined6 = wikidata.as("w").join(full.as("f")).where("w.id = f.id 
and w.success = true and  w.success = f.success and w.resultSize = f.resultSize 
and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"), 
lit("\n### BENCH DELIMITER ###"))).distinct.sample(withReplacement=false, 
fraction=1.0, seed=42)
    scala> joined6.count // matches same as joined5.count
    scala> joined6.repartition(1).write.option("compression", 
"none").text("queries_for_performance_randomized_2024_01_26.txt")
    scala> :quit
    $ hdfs dfs -copyToLocal 
hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_randomized_2024_01_26.txt/part-00000-131df78f-da7a-4ffc-aad4-9874342165ca-c000.txt
 ./queries_for_performance_randomized.txt 
    $ sha1sum queries_for_performance.txt queries_for_performance_randomized.txt
    $ # they're different
    $ diff queries_for_performance.txt queries_for_performance_randomized.txt | 
wc -l
    $ # they're very different
    $ cp wdqs-split-test.yml wdqs-split-test-randomized.yml
    $ # changed pointers to query file to be 
queries_for_performance_randomized.txt
    $ bash start-iguana.sh wdqs-split-test-randomized.yml
    $ mv result.nt result.nt.003

TASK DETAIL
  https://phabricator.wikimedia.org/T355037

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dr0ptp4kt
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to