[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

Vinod Kumar Vavilapalli (JIRA) Wed, 05 Feb 2014 21:02:50 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893038#comment-13893038
 ]


Vinod Kumar Vavilapalli commented on YARN-1530:
-----------------------------------------------

Thanks for your thoughts, Patrick!

bq. My biggest concern with this design is the notion of sending live data to a 
single node rather than writing through HDFS.
>From the client point of view (AMs and containers), this is really an 
>implementation detail and is part of the event-aggregation system that I 
>referred to in the document. I've seen implementations of at least a couple of 
>these aggregation systems and after getting enough site-specific requests to 
>be able to use Flume/Kafka/simple web-service/HDFS/HBase, I decided to bake in 
>some sort of pluggability here. It is entirely conceivable to do what you are 
>mentioning.

I thought I mentioned about throughput of events. We do care about it for the 
sake of applications like Storm, TEZ, (and now Spark) that push out information 
an order of magnitude more than today's MR. We are pursuing different 
implementations, the first of which is most likely going to be HBase. We can 
optionally do a HBase based implementation without a lot of effort. In fact 
that is exactly what the generic history service (YARN-321) does and we are 
thinking of retrofitting that into this abstraction.

In sum, REST is the user API and there is a different abstraction for 
event-aggregation. With this, I can see a HDFS-bus implementation that does 
what you want.

bq. if we wanted to write an “approved” UI that would be served from within the 
same JVM, what would be the interface between that UI and the indexing service?
Same JVM == AM? IAC, the service is agnostic of where you run the UI code.

bq. what is the security reason why YARN can't link to a framework-specific UI?
I should add more clarity there, perhaps. The fundamental problem is that any 
user can write a YARN app and host his/her own UI. References to these UIs 
eventually land on the YARN consoles (RM/AHS) etc and can be used by malicious 
users to steal others' credentials by XSS and or by simple, unnoticeable 
redirection. That is why today we proxy all application UIs through a central 
proxy-server and ask users to not click on any link that isn't through this 
proxy. Framework specific UIs for serving history also fit in the same pattern.

Let me know if the above make sense.

That said, I'd like to see what can be done here so as to bring Spark on board 
with benefits for both projects.
bq. So it’s unlikely we’d ever add this indexing service as a dependency in the 
way we architect our UI persistence.
If your UI can be written so the presentation layer is separated from the 
information provider services (which you may want to do anyways) and the 
interaction is through REST, I can totally imagine being able to reuse your UI 
code with and without using this YARN specific service. I can even think of 
putting this out of YARN - it doesn't necessarily belong to YARN core - so that 
you can use it in isolation.

The overarching theme is to do what ever it takes to not duplicate this same 
effort (the collection of all main problems-to-solve in the document) in each 
of the individual projects like Spark, Storm, TEZ etc.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: application timeline design-20140108.pdf, application 
> timeline design-20140116.pdf, application timeline design-20140130.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

Reply via email to