[
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated YARN-4696:
---------------------------------
Attachment: YARN-4696-006.patch
Patch 006; ongoing (and currently unsuccessful) attempt to use file:// as a
destination for timeline entities
* some better logging of read problems to differentiate empty file from missing
file.
* add cleanup of TimelineDataManager in try-with-resources
* explictly thrown an FNFE if the active dir isn't found (Rather than a generic
IOE)
* the constant
{{FileSystemTimelineWriter.TIMELINE_SERVICE_ENTITYFILE_FS_SUPPORT_APPEND}} is
public, so that you can turn off append support. I know we want a proper API
here (HADOOP-9565), but it's not done yet: a flag is all you have. Making the
constant public will make it easier to track down use in future.
* includes YARN-4716; flush() interface. This propagates all the way down to
the FS API (good), but as file:// is a CRC filesystem, flush/hflush doesn't
actually work (it buffers until a CRC-block of data is ready). And there's no
way to turn off that feature via a config option.
What I'm seeing then is that when an app completes its changes are picked up
fine. But incomplete apps aren't, instead the scanner is seeing an 0-byte file
and skipping it. Which isn't that useful at all.
I suspect the issue here is hdfs vs file filesystem behaviours, something I
could fix by moving to miniHFDS. My fear here is that people may want to use
file:// or similar FS in production, and what we have today doesn't work.
> EntityGroupFSTimelineStore to work in the absence of an RM
> ----------------------------------------------------------
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: YARN-4696-001.patch, YARN-4696-002.patch,
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the
> configuration pointing to it. This is a new change, and impacts testing where
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is
> running; it falls back to "unknown" if not. If the RM connection was
> optional, the "unknown" codepath could be called directly, relying on age of
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)