[ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847836#comment-15847836
 ] 

Daniel Templeton commented on YARN-6125:
----------------------------------------

Thanks for the patch.  Some comments:

* In the property, let's use "limit" instead of "capacity."  It sounds firmer. 
:)
* Instead of {{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_BYTES}}, I think 
{{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_KB}} might be better.  Then you can 
default to 64 instead of 65536.
* I can't comment on the POM change.  Anyone else want to comment?
* {{Lists.newLinkedList()}} is a Java 6 thing.  Just use the {{LinkedList}} 
constructor directly.
* I don't think we need the default constructor.
* You should do parameter validation in {{RMAppAttemptImpl}} before passing the 
capacity to the constructor.  You can offer a more informative error message 
that way.
* {{ensureNull()}} should probably be {{ensureNotNull()}}.
* I don't see any good reason to have {{inputLength}} be final.
* Your checks at the beginning of {{cutAtLeast()}}, {{checkAndCut()}}, and 
append will bring down the RM if they're violated.  That's bad.  You should 
probably log and ignore instead, probably at the level of the RM, instead of in 
the appender.  Except for {{checkAndCut()}}, where you should either trim down 
the new message to fit or allow the oversized message to be appended.  Probably 
the former.
* Your last append should probably just call the first append, and the first 
one should probably call the second one.
* Just leave out the javadoc on the overridden methods instead of stubbing it 
out.
* The description in {{yarn-default.xml}}, the description should maybe be 
"Defines the maximum capacity of the diagnostic message for each application 
attempt, in bytes.  When using ZooKeeper to store application state behavior, 
it's important to limit the size of the diagnostic messages to prevent YARN 
from overwhelming ZooKeeper.  In cases where 
yarn.resourcemanager.state-store.max-completed-applications is set to a large 
number, it may be desirable to reduce the value of this property to limit the 
total data stored."
* No need for the empty {{setup()}} in the test.
* {{initWithPositiveCapacitySuccess()}} should assert something. :)


> The application attempt's diagnostic message should have a maximum size
> -----------------------------------------------------------------------
>
>                 Key: YARN-6125
>                 URL: https://issues.apache.org/jira/browse/YARN-6125
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Daniel Templeton
>            Assignee: Andras Piros
>            Priority: Critical
>             Fix For: 3.0.0-alpha3
>
>         Attachments: YARN-6125.000.patch, YARN-6125.001.patch, 
> YARN-6125.002.patch
>
>
> We've found through experience that the diagnostic message can grow 
> unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since 
> the message is stored in the state store, it's a bad idea to allow the 
> message to grow unbounded.  Instead, there should be a property that sets a 
> maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were 
> due to the size of the diagnostic messages and not to the size of the 
> classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  
> Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to