[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
---------------------------------
    Attachment: YARN-4696-006.patch

Patch 006; ongoing (and currently unsuccessful) attempt to use file:// as a 
destination for timeline entities

* some better logging of read problems to differentiate empty file from missing 
file.
* add cleanup of TimelineDataManager in try-with-resources
* explictly thrown an FNFE if the active dir isn't found (Rather than a generic 
IOE)
* the constant 
{{FileSystemTimelineWriter.TIMELINE_SERVICE_ENTITYFILE_FS_SUPPORT_APPEND}} is 
public, so that you can turn off append support. I know we want a proper API 
here (HADOOP-9565), but it's not done yet: a flag is all you have. Making the 
constant public will make it easier to track down use in future.
* includes YARN-4716; flush() interface. This propagates all the way down to 
the FS API (good), but as file:// is a CRC filesystem, flush/hflush doesn't 
actually work (it buffers until a CRC-block of data is ready). And there's no 
way to turn off that feature via a config option.

What I'm seeing then is that when an app completes its changes are picked up 
fine. But incomplete apps aren't, instead the scanner is seeing an 0-byte file 
and skipping it. Which isn't that useful at all. 

I suspect the issue here is hdfs vs file filesystem behaviours, something I 
could fix by moving to miniHFDS. My fear here is that people may want to use 
file:// or similar FS in production, and what we have today doesn't work.

> EntityGroupFSTimelineStore to work in the absence of an RM
> ----------------------------------------------------------
>
>                 Key: YARN-4696
>                 URL: https://issues.apache.org/jira/browse/YARN-4696
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-4696-001.patch, YARN-4696-002.patch, 
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to