JAllemandou added a comment.

  In T342416#9091146 <https://phabricator.wikimedia.org/T342416#9091146>, 
@EBernhardson wrote:
  
  > I looked into these, the attached patch should fix it but it leaves an open 
question (@JAllemandou):
  >
  > The `core-site.xml`, along with puppet which writes it out, has the default 
umask of 027 since at least 2021, which prevents world readability. So why do 
we have the following permissions for historical dumps:
  >
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230710
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230716
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230717
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230723
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230724
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230730
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230731
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230806
  
  The world-readable change were manually made by myself to unblock 
@AndrewTavis_WMDE  - I logged my change in the analytics IRC chan but didn't 
ping on the search IRC chan - I should have, please excuse me on this :)
  
  > Similarly we have other jobs that still run today and emit world readable 
dumps without explicitly setting the umask, what is causing the difference?
  >
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230716
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230723
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230730
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230806
  
  The guess I have about those would be that they are still generated by a Hive 
job. Hive and spark behave differently in regard to permissions when generating 
files. Spark uses the configured umask, while hive reproduces the parent-dir 
patten. I'd be interested to be sure if my guess is correct :)

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to