[ 
https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928843#comment-15928843
 ] 

Jason Lowe commented on YARN-6354:
----------------------------------

Sample stacktrace:
{noformat}
2017-03-16 15:17:26,616 INFO  [main] service.AbstractService 
(AbstractService.java:noteFailure(272)) - Service ResourceManager failed in 
state STARTED; cause: java.lang.StringIndexOutOfBoundsException: String index 
out of range: -17
java.lang.StringIndexOutOfBoundsException: String index out of range: -17
        at java.lang.String.substring(String.java:1931)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore.loadReservationState(LeveldbRMStateStore.java:289)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore.loadState(LeveldbRMStateStore.java:274)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:690)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1097)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1137)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1133)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1133)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1173)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1338)
{noformat}

This was broken by YARN-3736.  The recovery code is seeking to the 
RM_RESERVATION_KEY_PREFIX but failing to verify that the keys it sees in the 
loop actually have that key prefix.  Here's the relevant code:
{code}
      iter = new LeveldbIterator(db);
      iter.seek(bytes(RM_RESERVATION_KEY_PREFIX));
      while (iter.hasNext()) {
        Entry<byte[],byte[]> entry = iter.next();
        String key = asString(entry.getKey());

        String planReservationString =
            key.substring(RM_RESERVATION_KEY_PREFIX.length());
        String[] parts = planReservationString.split(SEPARATOR);
        if (parts.length != 2) {
          LOG.warn("Incorrect reservation state key " + key);
          continue;
        }
{code}

The only way to terminate this loop is when the iterator runs out of keys, 
therefore the iteration loop will scan through *all* the keys in the database 
starting at the reservation key to the end.  If any key encountered is too 
short then we'll get the out of bounds exception when we try to do the 
substring.  

Pinging [~adhoot] and [~asuresh] who were involved in YARN-3736.

> RM fails to upgrade to 2.8 with leveldb state store
> ---------------------------------------------------
>
>                 Key: YARN-6354
>                 URL: https://issues.apache.org/jira/browse/YARN-6354
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> When trying to upgrade an RM to 2.8 it fails with a 
> StringIndexOutOfBoundsException trying to load reservation state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to