Jason Lowe updated YARN-1339:

    Attachment: YARN-1339v6.patch

Thanks for the review, Junping!

bq. Shall we add "if (stateStore.canRecover())" so that we only do record work 
when stateStore in levelDB?

Fixed. I added an early-out for this in the recordDeletionTaskInStateStore 

bq. Does it necessary for turning set into an array here? I only see 
iteratoring elements later. If so, it could be simpler to leave it as set.

Theoretically yes.  We need to return a copy of the successors because the set 
is protected by synchronization on the FileDeletionTask.  If we return it 
directly then we could corrupt it via concurrency.  In practice this probably 
doesn't occur, but for maintenance sake I played it on the safe side.  I don't 
think this is going to be a performance problem because in deletion tasks 
typically don't have very many successors.  I suppose we could use a 
ConcurrentMap instead of a HashSet and expose direct access to it, but that 
seemed like overkill -- there might be a lot of deletion tasks lingering around 
and concurrent maps are more memory-intensive.  Or we could use a 
SynchronizedSet and make sure the iterating code locks it appropriately.  Let 
me know if something needs to change here.

bq. May be we should quickly return from recover method if state is empty for 
cases that NM get first-time start or no scheduled fileDeletionTask during NM 

The method is already going to be pretty quick if there aren't any deletion 
tasks recovered.  In that case all it will do is allocate a hash map and hash 
set, iterate over the empty list of tasks, iterate over the empty hash map and 
return.  Given the recover method is only called on startup once and isn't that 
expensive in the nothing-to-do case, an early out seems like an unnecessary 
optimization -- or am I missing something?

bq. I think this is the same with basePaths.add(new Path(basedir)). Also I 
think some path.toUri().toString() can be replaced with path.toString() 

Fixed.  I vaguely recall having some issues using the raw paths on an early 
prototype of the code, but it seems to be working fine without explicit URI 

bq. May be we should throw exceptions in all three methods to keep consistent?

The point of the null state store is to silently eat attempts to store.  It 
only throws for recovery methods to note to developers that there is no way to 
recover from this state store.  If we put exceptions in all the store methods 
then we'll have to sprinkle canRecover() checks throughout the code which I'd 
rather avoid if possible.  NullRMStateStore behaves the same way as do the 
other store methods in the null state store, so in that sense it's consistent.

> Recover DeletionService state upon nodemanager restart
> ------------------------------------------------------
>                 Key: YARN-1339
>                 URL: https://issues.apache.org/jira/browse/YARN-1339
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1339.patch, YARN-1339v2.patch, 
> YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, 
> YARN-1339v6.patch

This message was sent by Atlassian JIRA

Reply via email to