[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481362#comment-14481362
 ] 

Jason Lowe commented on YARN-3448:
----------------------------------

Thanks for the patch, Jonathan.  Interesting approach, and this should 
drastically improve performance for retention processing.  Some comments on the 
patch so far:

I think the code would be easier to follow if we didn't abuse Map.Entry as a 
pair class to associate a WriteBatch to the corresponding DB.  Creating a 
custom utility class to associate these would make the code a lot more readable 
than always needing to deduce that getKey() is a database and getValue() is a 
WriteBatch.

The underlying database throws a runtime exception, and the existing leveldb 
store translates these to IOExceptions.  I think we want to do the same here.  
For example, put has a try..finally block with no catch clauses yet the method 
says it does not throw exceptions like IOException.  Arguably it should throw 
IOException when the database has an error.

The original leveldb code had locking around entities but I don't see it here.  
Since updating entities often involves a read/modify/write operation on the 
database, are we sure it's OK to remove that synchronization?

computeCheckMillis says it needs to be called synchronously, but it looks like 
it can be called without a lock via a number of routes, e.g.:
put -> putIndex -> computeCurrentCheckMillis -> computeCheckMillis
put -> putEntities -> computeCurrentCheckMillis -> computeCheckMillis

These should probably be debug statements, otherwise I think they could be 
quite spammy in the server log.  Also the latter one will always be followed by 
the former because of the loop and may not be that useful in practice, even at 
the debug level.
{code}
+        LOG.info("Trying the  db" + db);
...
+        LOG.info("Trying the previous db" + db);
{code}

This will NPE on entityUpdate if db is null, and the code explicity checks for 
that possibility:
{code}
+      Map.Entry<DB, WriteBatch> entityUpdate = 
entityUpdates.get(roundedStartTime);
+      if (entityUpdate == null) {
+        DB db = entitydb.getDBForStartTime(startAndInsertTime.startTime);
+        if (db != null) {
+          WriteBatch writeBatch = db.createWriteBatch();
+          entityUpdate = new AbstractMap.SimpleImmutableEntry<DB, 
WriteBatch>(db, writeBatch);
+          entityUpdates.put(roundedStartTime, entityUpdate);
+        };
+      }
+      WriteBatch writeBatch = entityUpdate.getValue();
{code}

In the following code we lookup relatedEntityUpdate but then after checking if 
it's null never use it again.  I think we're supposed to be setting up 
relatedEntityUpdate in the block if it's null rather than re-assigning 
entityUpdate.  Then after the null check we should be using relatedEntityUpdate 
rather than entityUpdate to get the proper write batch.
{code}
+            Map.Entry<DB, WriteBatch> relatedEntityUpdate = 
entityUpdates.get(relatedRoundedStartTime);
+            if (relatedEntityUpdate == null) {
+              DB db = entitydb.getDBForStartTime(relatedStartTimeLong);
+              if (db != null) {
+                WriteBatch relatedWriteBatch = db.createWriteBatch();
+                entityUpdate = new AbstractMap.SimpleImmutableEntry<DB, 
WriteBatch>(
+                    db, relatedWriteBatch);
+                entityUpdates.put(relatedRoundedStartTime, entityUpdate);
+              }
+              ;
+            }
+            WriteBatch relatedWriteBatch = entityUpdate.getValue();
{code}

This code is commented out.  Should have been deleted or is there something 
left to do here with respect to related entitites?
{code}
+    /*
+    for (EntityIdentifier relatedEntity : relatedEntitiesWithoutStartTimes) {
+      try {
+        StartAndInsertTime relatedEntityStartAndInsertTime =
+            getAndSetStartTime(relatedEntity.getId(), relatedEntity.getType(),
+            readReverseOrderedLong(revStartTime, 0), null);
+        if (relatedEntityStartAndInsertTime == null) {
+          throw new IOException("Error setting start time for related entity");
+        }
+        byte[] relatedEntityStartTime = writeReverseOrderedLong(
+            relatedEntityStartAndInsertTime.startTime);
+          // This is the new entity, the domain should be the same
+        byte[] key = createDomainIdKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime);
+        writeBatch.put(key, entity.getDomainId().getBytes());
+        ++putCount;
+        writeBatch.put(createRelatedEntityKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime,
+            entity.getEntityId(), entity.getEntityType()), EMPTY_BYTES);
+        ++putCount;
+        writeBatch.put(createEntityMarkerKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime),
+            writeReverseOrderedLong(relatedEntityStartAndInsertTime
+                .insertTime));
+        ++putCount;
+      } catch (IOException e) {
+        LOG.error("Error putting related entity " + relatedEntity.getId() +
+            " of type " + relatedEntity.getType() + " for entity " +
+            entity.getEntityId() + " of type " + entity.getEntityType(), e);
+        TimelinePutError error = new TimelinePutError();
+        error.setEntityId(entity.getEntityId());
+        error.setEntityType(entity.getEntityType());
+        error.setErrorCode(TimelinePutError.IO_EXCEPTION);
+        response.addError(error);
+      }
+    }
+    */
{code}

Similar NPE potential on indexUpdate if db is null:
{code}
+      Map.Entry<DB, WriteBatch> indexUpdate = 
indexUpdates.get(roundedStartTime);
+      if (indexUpdate == null) {
+        DB db = indexdb.getDBForStartTime(startAndInsertTime.startTime);
+        if (db != null) {
+          WriteBatch writeBatch = db.createWriteBatch();
+          indexUpdate = new AbstractMap.SimpleImmutableEntry<DB, 
WriteBatch>(db, writeBatch);
+          indexUpdates.put(roundedStartTime, indexUpdate);
+        };
+      }
+      WriteBatch writeBatch = indexUpdate.getValue();
{code}

getAndSetStartTime and checkStartTimeInDb comments refer to obtaining a lock on 
the entity but no entity-level locking appears to be used.

Nit: put and putWithNoDomainId should be refactored to call a parameterized 
form of common code since they are so similar.  Would also help fix the 
inconsistency where one debug logs and the other info logs.

Nit: there are some extraneous semicolons in the patch, either after braces or 
on lines by themselves.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> ------------------------------------------------------
>
>                 Key: YARN-3448
>                 URL: https://issues.apache.org/jira/browse/YARN-3448
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: YARN-3448.1.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to