[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855707#comment-13855707 ]
Robert Joseph Evans commented on YARN-321: ------------------------------------------ The way it currently works is based off of group permissions on a directory (this is from memory from a while ago so I could be off on a few things). In HDFS when you create a file the group of the file is the group of the directory the file is a part of, similar to the sticky bit on a directory in Linux. When an MR job completes it will copy it's history log file, along with a few other files, to a drop box like location called intermediate done and atomically rename it from a temp name to the final name. The directory is world writable, but only readable by a special group that the history server is a part of, but general users are not. The history server then wakes up periodically and will scan that directory for new files, when it sees new files it will move them to a final location that is owned by the headless history server user. If a query comes in for a job that the history server is not aware of, it will also scan the intermediate done directory before failing. Reading history data is done through RPC to the history server, or through the web interface, including RESTful APIs. There is no supported way for an app to read history data directly though the file system. I hope this helps. > Generic application history service > ----------------------------------- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Luke Lu > Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)