Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/204#issuecomment-39242381
  
    thanks @pwendell.   I'm going to try to look at this more detail in the 
next day or so.
    
    The MapReduce history server would be one thing to compare to.  It has one 
directory (done_intermediate) with sticky bit set where users write the history 
files to with the permissions specified by the user (generally restrictive).    
The History Server runs as a super user and copies the history files from that 
done_intermediate to a done directory that is more restrictive so the world 
can't read/write to it.  The history server serves up the files and restricts 
based on acls. 
    
    The important thing is that we make it so it can be secured and document 
how users do that.  If its manually create some directories and set permissions 
I think that is fine for now.   If Spark is creating directories we need to 
make sure it does the right thing or has configs so that admins can have it set 
the permissions appropriately.
    
    Is there any infrastructure in place to manage/delete the log files?  If 
you are running thousands of applications a day the logs can add up pretty 
quickly.
    
    Can we add docs about the history server?  
    
    This is probably a separate jira, but it would be nice to clarify the 
documentation of config spark.eventLog.dir to indicate if it can go to hdfs or 
other filesystems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to