[
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903589#comment-13903589
]
Patrick Wendell commented on YARN-1530:
---------------------------------------
Hey,
Thanks for the explanation! To make sure I understand how this would all work
by walking through an example.
For the Spark UI we are currently implementing the ability to serialize and
write events to HDFS, then load them later from a history server that can
render the UI for jobs that are finished. AFAIK this is basically how MapReduce
works as well (?)
If users have set-up a YARN cluster and they set up event ingestion to this
shared store. Then Spark would need two things to integrate with it:
1. Be able to represent our events in JSON and hook into whatever source the
user has set up for ingestion (flume, HDFS, etc).
2. Be able to render our history timeline UI by reading event data from this
store.
Correct?
The benefit would be that if users set something fancy like flume, they could
leverage the same infrastructure for Spark as for other applications since
there is a shared event model. Also, they would benefit from faster indexed
serving offered by this application when rendering the "history" UI...
Is that the main idea? I'm just trying to figure out what redundant work is
saved by having a generic framework. Since each application writes their own UI
and has their own event model. From what I can tell the benefit is that a
shared ingestion and serving infrastructure can be used.
> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>
> Key: YARN-1530
> URL: https://issues.apache.org/jira/browse/YARN-1530
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
> Attachments: application timeline design-20140108.pdf, application
> timeline design-20140116.pdf, application timeline design-20140130.pdf,
> application timeline design-20140210.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework
> data all by itself as YARN doesn't have a common solution. This JIRA attempts
> to solve the storage, management and serving of per-framework data from
> various applications, both running and finished. The aim is to change YARN to
> collect and store data in a generic manner with plugin points for frameworks
> to do their own thing w.r.t interpretation and serving.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)