[ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581419#comment-16581419
 ] 

Jason Lowe commented on YARN-8242:
----------------------------------

Thanks for updating the patch!

bq. The problem/issue that I faced with that is seeking/skipping to next user 
entry in the localization state is complex, as we do not know who next user is 
or how much information (key/values) is associated with a respective user 
without iterating.

Rather than a full re-iteration, we can seek to a key that we know is after a 
user's localization entries but necessarily before any other user's entry.  
Seeking is very fast and done all the time during recovery, so it would be much 
faster than iterating.  For example, userA's private localization entries will 
have a key prefix of "Localization/private/userA/" and have entries with a 
prefix of either "Localization/private/userA/filecache/" or 
"Localization/private/userA/appcache/".  If we seek to a key that occurs 
lexicographically after those prefixes, like "Localization/private/userA/zzz", 
then we will have an iterator starting after the localization records for userA 
but necessarily before any user that occurs after userA lexicographically.  
That avoids the double-iteration performance problem nor does it rely on 
approaches that would require the previous user iterator to be fully consumed 
to function properly.

bq. So, reading LocalResourceTrackerState might require two different keys.

Yes, one way to solve that is have two iterators for the two payloads, one for 
completed resources and one for started resources.  We know the prefix to seek 
for on each one, so they are easy to setup.

It's a bit trickier to do the full iteration for localized resource state, but 
it should be possible.  I would be fine with punting that to a followup JIRA 
since this current work is still a significant improvement over the old method 
of loading everything at once.

Other comments on the patch:

getLeveldbIterator calls constructors and methods that can can throw 
DBException which is a runtime exception.  Those need to be caught and 
translated to IOException as was done with iterators before this patch.

Some lines were reformatted to split else blocks onto separate lines and remove 
spaces before opening braces which is inconsistent with the coding style.  New 
methods and conditionals were added without whitespace between the parameters 
and the opening brace.  Checkstyle is currently passing with false positives, 
otherwise I would expect it to complain.

typo: getConstainerStateIterator

Rather than redundantly re-parsing a container ID from the key, it would be 
cleaner and more intuitive to have RecoveredContainerState track the container 
ID.  RecoveredContainerState didn't need to explicitly track it before since it 
was always paired with a container ID in a map, but now that we're returning a 
series of objects via an iterator it makes sense to move that key into the 
value object, in this case the RecoveredContainerState.

This comment was not addressed, intentional?
bq. Nit: RCSIterator would be more readable as ContainerStateIterator, e.g.: 
getContainerStateIterator instead of getRCSIterator. Similar comments for the 
other acronym iterator classes.

getNextRecoveredLocalizationEntry implies it could be called for all types of 
localization entries but it only works for private resources.  The name should 
reflect that or it could simply be pulled into RURIterator#getNextItem directly.

getMasterKey is more complicated than it needs to be.  No iterator needed since 
we can lookup keys in the database directly, e.g.:
{code}
  private MasterKey getMasterKey(String dbKey) throws IOException {
    byte[] data = db.get(bytes(dbKey));
    if (data == null || data.length == 0) {
      return null;
    }
    return parseMasterKey(data);
  }
{code}

The synchronization on the various load methods for the memory state store is a 
false promise of safety as they return iterators that can access state 
asynchronously with other state store operations.  For real safety here it 
would need to return an iterator on a copy of the underlying state rather than 
an iterator on the state directly.  leveldb is async-safe but the memory store 
is not.

Why does TestNMLeveldbStateStoreService#loadContainersState explicitly check 
for and skip recovered containers without a start request?  Isn't it the job of 
the iterator to not return those types of entries?



> YARN NM: OOM error while reading back the state store on recovery
> -----------------------------------------------------------------
>
>                 Key: YARN-8242
>                 URL: https://issues.apache.org/jira/browse/YARN-8242
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 2.6.0, 2.9.0, 2.6.5, 2.8.3, 3.1.0, 2.7.6, 3.0.2
>            Reporter: Kanwaljeet Sachdev
>            Assignee: Pradeep Ambati
>            Priority: Critical
>         Attachments: YARN-8242.001.patch, YARN-8242.002.patch, 
> YARN-8242.003.patch, YARN-8242.004.patch, YARN-8242.005.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError.<init> (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init>
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init>
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to