Junping Du commented on YARN-3045:
Thanks [~sjlee0] and [~Naganarasimha] for quickly reply.
bq. If these events are attributes of applications, then they should be on the
application entities. If I want to find out all events for some application,
then I should be able to query only the application entity and get all events.
Some of these events are related to both application and NodeManager. We can
claim that it belongs to application but we can see that some events are too
detailed to application but could be more interested for YARN daemons. I can
understand that our design is more application centric now but should be
generic enough to store/retrival YARN daemon centric entities later. Anyway,
before making NM/RM onboard as the first class consumer of ATSv2, I am fine
with making them as application events.
bq. The need to have NodeManagerEntity is something different IMO. Note that
today there are challenges in emitting data without any application context
(e.g. node manager's configuration) as we discussed a few times. If we need to
support that, that needs a different discussion.
I see. I remember to see a JIRA work is to get ride of application context but
cannot find it now. In case we don't have it, how about move this discussion to
YARN-3959? The original scope of that JIRA is application related configuration
only but we could extend it to include daemon configuration if necessary.
bq. my assumption was that the sync/async distinction from the client
perspective mapped to whether the writer may be flushed or not. If not, then we
need to support a 2x2 matrix of possibilities: sync put w/ flush, sync put w/o
flush, async put w/ flush, and async put w/o flush. I thought it would be a
simplifying assumption to align those dimensions.
I think we can simplify 2x2 matrix by omitting the case of sync put w/o flush
as I cannot think a valid case that ack from TimelineCollector without flush
can help on. Rest of three cases sounds solid to me. To make TimelineCollector
can identify flush strategies with async calls, we may need to set severity on
entities need to put and TimelineCollector is configured to flush entities only
above specific severity just like log level does.
bq. I was under the impression that YARN-3367 is only for invoking REST calls
in nonblocking way and thus avoiding threads in the clients. Is it also related
to flush when called only putEntities and not on putEntitiesAsync?
You are right that the goal of YARN-3367 is to get rid of blocking call to put
entities, no matter it calls putEntities() or something else.
putEntitiesAsync() is exactly what we need, and it should be rare case to use
putEntities() once we have putEntitiesAsync except client logic rely on return
bq. I see currently "async" parameter as part of REST request is ignored now,
so i thought based on this param we may need to further flush the writer or is
your thoughts similar to support 2*2 matrix as Sangjin was informing?
Actually, from my above comments, I would prefer the way of (2*2 - 1). :) To
speed up this JIRA's progress, I am fine with keep ignoring sync/async
parameter and do everything async for now and left it out to a dedicated JIRA
to figure out.
Will look at latest patch soon.
> [Event producers] Implement NM writing container lifecycle events to ATS
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Naganarasimha G R
> Attachments: YARN-3045-YARN-2928.002.patch,
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
> YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch,
> YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch,
> Per design in YARN-2928, implement NM writing container lifecycle events and
> container system metrics to ATS.
This message was sent by Atlassian JIRA