[
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhijie Shen updated YARN-975:
-----------------------------
Attachment: YARN-975.11.patch
Thanks for the comments, [~vinodkv]! I've updated the patch accordingly.
bq. HDFS jar is not needed as a test-dependency in
hadoop-yarn-server-applicationhistoryservice/pom.xml
Removed
bq. Wrap new
ApplicationStartDataPBImpl(ApplicationStartDataProto.parseFrom(entry.value)))
into a method in ApplicationStartData. Similarly others.
Added a set of such methods
bq. getApplicationAttempts(): If there is no history-file, we should throw a
valid exception?
Had some discussion, and decided to throw exception whenever the history file
doesn't exist, or is still under writing no matter the user is getting
app/attempt/container.
bq. finishDtata: Typo
Fixed
bq. We need to limit this (configurable?) and queue any more writes into a
limited number of threads. Can do in a follow up JIRA, please file one.
Will file the ticket once this ticket is done.
bq. you can write a complex key which has an ApplicationId and the start
marker and convert them to bytes when storing via a getBytes() method.
bq. Similarly for ApplicationAttempt and Container suffixes.
I did a compromise here. Instead of concatenating the ID and the suffix, I
created a structure:
{code}
class HistoryDataKey {
String id,
String suffix,
}
{code}
and changed the nested tFile reader/writer accordingly. It has already
simplified the code, and made it more readable. However, I did choose to put
ApplicationId/ApplicationAttemptId/ContainerId directly in to the structure,
because it makes tFile reader more complex given three types of ID are written
in the same file with the order unknown. Moreover, it is inefficient to filter
by ID patterns (e.g. all ApplicationIs start with "application_")
bq. When a HistoryFile exists, HistoryFileWriter should open it in append mode.
Changed accordingly
bq. In both the reader and the writer, you should use IOUtils.cleanup() instead
of explicitly calling close on each stream yourselves everywhere.
Used IOUtils.cleanup() to simplify the IO close code.
bq. Don't think we should do this. Any retries should be inside
FileSystemHistoryStore. We should close the writer in a finally block
Moved it into the final block
bq. Dismantle retriveStartFinishData() into two methods - one for start and one
for finish.
After refactoring the code, start and finish cases share the most code path,
therefore, I choose not to split them. Please see if you're fine with it. On
the other hand, I split mergeHistoryData for start and finish cases, as there's
small shared code path.
bq. TestApplicationHistoryStore was renamed in YARN-956, please update the patch
It was up-to-date in the previous patch.
bq. Test: A single file will only have data about a single application. So
testWriteHistoryData() should not have multiple applications. Similarly
ApplicationAttempt finish to follow after container-finish.
Probably the test case was a bit misunderstanding. The single file exactly
contains 1 application. However, I tested by creating 5 application repeatedly.
I've refactor the test code to be more readable.
bq. Test: We should NOT have this dependency. Java 7 reorders tests in some
cases.
Removed the order dependency.
> Add a file-system implementation for history-storage
> ----------------------------------------------------
>
> Key: YARN-975
> URL: https://issues.apache.org/jira/browse/YARN-975
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: YARN-975.10.patch, YARN-975.11.patch, YARN-975.1.patch,
> YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch,
> YARN-975.6.patch, YARN-975.7.patch, YARN-975.8.patch, YARN-975.9.patch
>
>
> HDFS implementation should be a standard persistence strategy of history
> storage
--
This message was sent by Atlassian JIRA
(v6.1#6144)