[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855707#comment-13855707
 ] 

Robert Joseph Evans commented on YARN-321:
------------------------------------------

The way it currently works is based off of group permissions on a directory 
(this is from memory from a while ago so I could be off on a few things).  In 
HDFS when you create a file the group of the file is the group of the directory 
the file is a part of, similar to the sticky bit on a directory in Linux.  When 
an MR job completes it will copy it's history log file, along with a few other 
files, to a drop box like location called intermediate done and atomically 
rename it from a temp name to the final name.  The directory is world writable, 
but only readable by a special group that the history server is a part of, but 
general users are not.  The history server then wakes up periodically and will 
scan that directory for new files, when it sees new files it will move them to 
a final location that is owned by the headless history server user.  If a query 
comes in for a job that the history server is not aware of, it will also scan 
the intermediate done directory before failing.

Reading history data is done through RPC to the history server, or through the 
web interface, including RESTful APIs.  There is no supported way for an app to 
read history data directly though the file system.  I hope this helps.

> Generic application history service
> -----------------------------------
>
>                 Key: YARN-321
>                 URL: https://issues.apache.org/jira/browse/YARN-321
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Luke Lu
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to