EBernhardson added a comment.

  In T342416#9101474 <https://phabricator.wikimedia.org/T342416#9101474>, 
@JAllemandou wrote:
  
  > In T342416#9091146 <https://phabricator.wikimedia.org/T342416#9091146>, 
@EBernhardson wrote:
  >
  >> Similarly we have other jobs that still run today and emit world readable 
dumps without explicitly setting the umask, what is causing the difference?
  >>
  >>   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230716
  >>   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230723
  >>   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230730
  >>   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230806
  >
  > The guess I have about those would be that they are still generated by a 
Hive job. Hive and spark behave differently in regard to permissions when 
generating files. Spark uses the configured umask, while hive reproduces the 
parent-dir patten. I'd be interested to be sure if my guess is correct :)
  
  These are both generated by spark.  The rdf is being imported by a scala 
application while the cirrus dump is imported by pyspark, but they should both 
be using the same underlying implementation. Both applications use 
`df.write.insertInto(table_name)` to instruct spark to do the actual output. 
I'm a bit surprised they end up generating different sets of permissions.
  
  I suppose it's not super important why the cirrus dump is world readable, 
it's fine to be readable, it just hints to me that there is something I don't 
understand about hdfs/spark/permissions happening here.

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to