[
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated YARN-4696:
---------------------------------
Attachment: YARN-4696-001.patch
Patch -001; thing I had to to do to get my (external, spark integration) test
closer to working.
These are a combination of things that are absolutely needed (disabling RM,
flushing on close()), generally better (exception handling), and needed to
debug what's going on (all the improved logging)
# RM integration can be disabled, the timeline store then only uses modified
times as a liveness test. This includes checks for null around uses of
yarnClient;
# I took the opportunity to clean up service shutdown in the process.
# YARN-4695 recommendations: all worker threads unwrap exceptions and, if
interrupted exceptions, skip the stack trace.
# better logging @ debug (including # of scanned apps)
# {{TimelineWriter}} doesn't rewrap IOEs in IOEs, wraps interrupted exception
into {{InterruptedIOException}}
# {{FileSystemTimelineWriter.close()}} does a {{flush()}}. Stops any last
events getting lost.
There are tests, but not here. Look in
https://github.com/steveloughran/spark-timeline-integration
> EntityGroupFSTimelineStore to work in the absence of an RM
> ----------------------------------------------------------
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Attachments: YARN-4696-001.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the
> configuration pointing to it. This is a new change, and impacts testing where
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is
> running; it falls back to "unknown" if not. If the RM connection was
> optional, the "unknown" codepath could be called directly, relying on age of
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)