narendly commented on a change in pull request #506: Increase parallelism for 
ZkBucketDataAccessor
URL: https://github.com/apache/helix/pull/506#discussion_r334762342
 
 

 ##########
 File path: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZkBucketDataAccessor.java
 ##########
 @@ -85,103 +87,121 @@ public Object deserialize(byte[] data) throws 
ZkMarshallingError {
         return data;
       }
     });
-    _zkBaseDataAccessor = new ZkBaseDataAccessor(_zkClient);
-
-    // TODO: Optimize serialization with Jackson
-    // TODO: Or use a better binary serialization protocol
-    // TODO: Consider making this also binary
-    // TODO: Consider an async write for the metadata as well
-    _znRecordClient = SharedZkClientFactory.getInstance()
-        .buildZkClient(new HelixZkClient.ZkConnectionConfig(zkAddr));
-    _znRecordBaseDataAccessor = new ZkBaseDataAccessor<>(_znRecordClient);
-    _znRecordClient.setZkSerializer(new ZNRecordSerializer());
-
+    _zkBaseDataAccessor = new ZkBaseDataAccessor<>(_zkClient);
     _zkSerializer = new ZNRecordJacksonSerializer();
     _bucketSize = bucketSize;
-    _numVersions = numVersions;
+    _versionTTL = versionTTL;
   }
 
   /**
    * Constructor that uses a default bucket size.
    * @param zkAddr
    */
   public ZkBucketDataAccessor(String zkAddr) {
-    this(zkAddr, DEFAULT_BUCKET_SIZE, DEFAULT_NUM_VERSIONS);
+    this(zkAddr, DEFAULT_BUCKET_SIZE, DEFAULT_VERSION_TTL);
   }
 
   @Override
   public <T extends HelixProperty> boolean compressedBucketWrite(String path, 
T value)
       throws IOException {
-    // Take the ZNrecord and serialize it (get byte[])
+    final long timestamp = System.currentTimeMillis();
+
+    DataUpdater<byte[]> lastWriteVersionUpdater = dataInZk -> {
+      if (dataInZk == null || dataInZk.length == 0) {
+        // No last write version exists, so start with 0
+        return ("0_" + timestamp).getBytes();
 
 Review comment:
   It's possible if we consider cases where there could be multiple writers. Of 
course, in the case where there's one writer, using timestamp would work and 
would simplify things. But the purpose of this change is to increase 
parallelism, and that means we want to allow multiple writers (whether they are 
in the same data centers or not) as well.
   
   Even if we had multiple writers in the same data center, NTP usually doesn't 
guarantee strict ordering of timestamps either.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org

Reply via email to