AndrewTavis_WMDE added a comment.

  Is what we were thinking too, @dcausse :) I'm realizing that where I had the 
`.distinct()` was incorrect though. I think that it should go at the end of the 
full definition of the PySpark df like this:
  
    # Got rid of the sa_and_... because it was getting too verbose
    df_sasc_ids = (
        df_wikidata_rdf.select(col("subject").alias("distinct_sasc_qids"))
        .where(col("predicate") == P31_DIRECT_URL)
        .where(col("object").isin(sasc_qids))
        .alias("df_sasc_ids")
    ).distinct()
  
  Thanks for checking in!

TASK DETAIL
  https://phabricator.wikimedia.org/T342123

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: dcausse, Lydia_Pintscher, dr0ptp4kt, Aklapper, Manuel, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to