[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated YARN-4696: --------------------------------- Attachment: YARN-4696-006.patch Patch 006; ongoing (and currently unsuccessful) attempt to use file:// as a destination for timeline entities * some better logging of read problems to differentiate empty file from missing file. * add cleanup of TimelineDataManager in try-with-resources * explictly thrown an FNFE if the active dir isn't found (Rather than a generic IOE) * the constant {{FileSystemTimelineWriter.TIMELINE_SERVICE_ENTITYFILE_FS_SUPPORT_APPEND}} is public, so that you can turn off append support. I know we want a proper API here (HADOOP-9565), but it's not done yet: a flag is all you have. Making the constant public will make it easier to track down use in future. * includes YARN-4716; flush() interface. This propagates all the way down to the FS API (good), but as file:// is a CRC filesystem, flush/hflush doesn't actually work (it buffers until a CRC-block of data is ready). And there's no way to turn off that feature via a config option. What I'm seeing then is that when an app completes its changes are picked up fine. But incomplete apps aren't, instead the scanner is seeing an 0-byte file and skipping it. Which isn't that useful at all. I suspect the issue here is hdfs vs file filesystem behaviours, something I could fix by moving to miniHFDS. My fear here is that people may want to use file:// or similar FS in production, and what we have today doesn't work. > EntityGroupFSTimelineStore to work in the absence of an RM > ---------------------------------------------------------- > > Key: YARN-4696 > URL: https://issues.apache.org/jira/browse/YARN-4696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: 2.8.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: YARN-4696-001.patch, YARN-4696-002.patch, > YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch > > > {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the > configuration pointing to it. This is a new change, and impacts testing where > you have historically been able to test without an RM running. > The sole purpose of the probe is to automatically determine if an app is > running; it falls back to "unknown" if not. If the RM connection was > optional, the "unknown" codepath could be called directly, relying on age of > file as a metric of completion > Options > # add a flag to disable RM connect > # skip automatically if RM not defined/set to 0.0.0.0 > # disable retries on yarn client IPC; if it fails, tag app as unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)