mpopov added a comment.

  > are most people at WMF writing spark pythonically and not with queries?
  
  I guess it depends on who you talk to and what they're doing. All of the data 
scientists/analysts I work with use Spark SQL engine and write HiveQL queries, 
often because `hive.run` is too slow. Occasionally I see dot notation for 
advanced PySpark usage (e.g. Morten's survey aggregation data pipeline 
<https://github.com/nettrom/Growth-welcomesurvey-2018/blob/master/T275172_survey_aggregation.ipynb>).
  
  I suspect dot notation-based Spark usage is probably more common among 
software engineers.

TASK DETAIL
  https://phabricator.wikimedia.org/T342111

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, mpopov
Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, 
AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to