[jira] [Updated] (YARN-975) Add a file-system implementation for history-storage

Zhijie Shen (JIRA) Mon, 28 Oct 2013 18:19:45 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhijie Shen updated YARN-975:
-----------------------------

    Attachment: YARN-975.11.patch

Thanks for the comments, [~vinodkv]! I've updated the patch accordingly.

bq. HDFS jar is not needed as a test-dependency in 
hadoop-yarn-server-applicationhistoryservice/pom.xml

Removed

bq. Wrap new 
ApplicationStartDataPBImpl(ApplicationStartDataProto.parseFrom(entry.value))) 
into a method in ApplicationStartData. Similarly others.

Added a set of such methods

bq. getApplicationAttempts(): If there is no history-file, we should throw a 
valid exception?

Had some discussion, and decided to throw exception whenever the history file 
doesn't exist, or is still under writing no matter the user is getting 
app/attempt/container.

bq. finishDtata: Typo

Fixed

bq. We need to limit this (configurable?) and queue any more writes into a 
limited number of threads. Can do in a follow up JIRA, please file one.

Will file the ticket once this ticket is done.

bq.  you can write a complex key which has an ApplicationId and the start 
marker and convert them to bytes when storing via a getBytes() method.
bq. Similarly for ApplicationAttempt and Container suffixes.
I did a compromise here. Instead of concatenating the ID and the suffix, I 
created a structure:
{code}
class HistoryDataKey {
  String id,
  String suffix,
}
{code}
and changed the nested tFile reader/writer accordingly. It has already 
simplified the code, and made it more readable. However, I did choose to put 
ApplicationId/ApplicationAttemptId/ContainerId directly in to the structure, 
because it makes tFile reader more complex given three types of ID are written 
in the same file with the order unknown. Moreover, it is inefficient to filter 
by ID patterns (e.g. all ApplicationIs start with "application_")

bq. When a HistoryFile exists, HistoryFileWriter should open it in append mode.
Changed accordingly

bq. In both the reader and the writer, you should use IOUtils.cleanup() instead 
of explicitly calling close on each stream yourselves everywhere.

Used IOUtils.cleanup() to simplify the IO close code.

bq. Don't think we should do this. Any retries should be inside 
FileSystemHistoryStore. We should close the writer in a finally block

Moved it into the final block

bq. Dismantle retriveStartFinishData() into two methods - one for start and one 
for finish.

After refactoring the code, start and finish cases share the most code path, 
therefore, I choose not to split them. Please see if you're fine with it. On 
the other hand, I split mergeHistoryData for start and finish cases, as there's 
small shared code path.

bq. TestApplicationHistoryStore was renamed in YARN-956, please update the patch

It was up-to-date in the previous patch.

bq. Test: A single file will only have data about a single application. So 
testWriteHistoryData() should not have multiple applications. Similarly 
ApplicationAttempt finish to follow after container-finish.

Probably the test case was a bit misunderstanding. The single file exactly 
contains 1 application. However, I tested by creating 5 application repeatedly. 
I've refactor the test code to be more readable.

bq. Test: We should NOT have this dependency. Java 7 reorders tests in some 
cases.

Removed the order dependency.



> Add a file-system implementation for history-storage
> ----------------------------------------------------
>
>                 Key: YARN-975
>                 URL: https://issues.apache.org/jira/browse/YARN-975
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-975.10.patch, YARN-975.11.patch, YARN-975.1.patch, 
> YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch, 
> YARN-975.6.patch, YARN-975.7.patch, YARN-975.8.patch, YARN-975.9.patch
>
>
> HDFS implementation should be a standard persistence strategy of history 
> storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-975) Add a file-system implementation for history-storage

Reply via email to