RE: Use compression to store data in ZK
I like this idea, but we would still need to support bucketizing either way because we cannot guarantee that the compressed version will be compact enough for every use case. What types of compression schemes are you planning to support? Date: Sat, 7 Mar 2015 22:30:15 -0800 Subject: Use compression to store data in ZK From: g.kish...@gmail.com To: dev@helix.apache.org Hi, Currently we have bucketing as one of the options when the number of partitions are large. We have couple of bugs with the handling of bucketized resources (one of them is fatal). One of the reasons to split the znode is because we use JSON to store the data in ZNode. While JSON is good for debugging, its space inefficient. A better option before going to bucketing is to support compression of Ideal state, current state and External View. This also gives good performance. I plan to add this support and make it configurable. Feedback/suggestions thanks, Kishore G
Review Request 31835: [HELIX-572] Fixing External View update logic for bucketized resource
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31835/ --- Review request for helix. Bugs: HELIX-572 Repository: helix-git Description --- commit 6aae15d77ce123b7dc83bc39fccd5c7c210bd972 Author: Kishore Gopalakrishna g.kish...@gmail.com Date: Sun Mar 8 14:20:08 2015 -0700 [HELIX-572] Fixing External View update logic for bucketized resource :100644 100644 169c993... 8c9fc8d... M helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java :100644 100644 207a318... 2b5e2bc... M helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java :100755 100755 82cbcf9... e869f25... M hpost-review.sh Diffs - helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java 169c993 helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java 207a318 hpost-review.sh 82cbcf9 Diff: https://reviews.apache.org/r/31835/diff/ Testing --- Thanks, Kishore Gopalakrishna
Re: Review Request 31832: [HELIX-572] Fixing External View update logic for bucketized resource
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31832/#review75644 --- Ship it! helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java https://reviews.apache.org/r/31832/#comment122854 Remove indent - Kanak Biscuitwala On March 8, 2015, 12:19 a.m., Kishore Gopalakrishna wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31832/ --- (Updated March 8, 2015, 12:19 a.m.) Review request for helix. Repository: helix-git Description --- commit 1cd1327f45d5bda9e6ee8d371353e43a65cae743 Author: Kishore Gopalakrishna g.kish...@gmail.com Date: Sat Mar 7 23:45:45 2015 -0800 [HELIX-572] Fixing External View update logic for bucketized resource :100644 100644 38c1417... 358971d... M helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java :100644 100644 7234658... f83d14e... M helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.jav Diffs - helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java 38c1417 helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java 7234658 Diff: https://reviews.apache.org/r/31832/diff/ Testing --- Added the check for version of external view in the TestBucketizedResource integration test case Thanks, Kishore Gopalakrishna
Re: Use compression to store data in ZK
Yeah, we still need to support it but we can go a long way without bucketing if we compress it. We know we can support 1k partitions with raw json and no bucketing. By adding compression, we can probably go upto 10k partitions (need to validate this) per resource without bucketing. I plan to use GZIP to compress/uncompress. Let me know if there is something better. This is what I am planning to do. We have common ZNRecordSerializer to serialize/deserialize the data. We can simply check for a enableCompression in the simpleFields and if its true, we apply compression. On deserializing we can check for the magic header of GZIP and if it matches, we automatically decompress the data. The advantage of this is we don't to change the api of ZNRecordSerializer or how it is set in various places. When a resource is created if compression is turned on we set enableCompression=true in the idealstate. This will take care of compressing idealstate. We now have to copy this in creation of current state and External View. We should carry it with External View since the controller creates it. For the CurrentState its not straightforward, since it is created by the participants and they don't read the IdealState. We can punt on the current state hoping that size of current state is inversely proportional to the number of nodes in the system. And if there are large number of partitions, the number of nodes might also be large (this is not necessarily true). The other option is to set the enableCompression=true the first time the CurrentState ZNode is created by the participant. Let me know what you think. On Sun, Mar 8, 2015 at 11:09 AM, Kanak Biscuitwala kana...@hotmail.com wrote: I like this idea, but we would still need to support bucketizing either way because we cannot guarantee that the compressed version will be compact enough for every use case. What types of compression schemes are you planning to support? Date: Sat, 7 Mar 2015 22:30:15 -0800 Subject: Use compression to store data in ZK From: g.kish...@gmail.com To: dev@helix.apache.org Hi, Currently we have bucketing as one of the options when the number of partitions are large. We have couple of bugs with the handling of bucketized resources (one of them is fatal). One of the reasons to split the znode is because we use JSON to store the data in ZNode. While JSON is good for debugging, its space inefficient. A better option before going to bucketing is to support compression of Ideal state, current state and External View. This also gives good performance. I plan to add this support and make it configurable. Feedback/suggestions thanks, Kishore G
Review Request 31832: [HELIX-572] Fixing External View update logic for bucketized resource
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31832/ --- Review request for helix. Repository: helix-git Description --- commit 1cd1327f45d5bda9e6ee8d371353e43a65cae743 Author: Kishore Gopalakrishna g.kish...@gmail.com Date: Sat Mar 7 23:45:45 2015 -0800 [HELIX-572] Fixing External View update logic for bucketized resource :100644 100644 38c1417... 358971d... M helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java :100644 100644 7234658... f83d14e... M helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.jav Diffs - helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java 38c1417 helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java 7234658 Diff: https://reviews.apache.org/r/31832/diff/ Testing --- Added the check for version of external view in the TestBucketizedResource integration test case Thanks, Kishore Gopalakrishna
[jira] [Created] (HELIX-573) Add support to compress/uncompress data on ZK
kishore gopalakrishna created HELIX-573: --- Summary: Add support to compress/uncompress data on ZK Key: HELIX-573 URL: https://issues.apache.org/jira/browse/HELIX-573 Project: Apache Helix Issue Type: Improvement Reporter: kishore gopalakrishna Currently we have bucketing as one of the options when the number of partitions are large. We have couple of bugs with the handling of bucketized resources (one of them is fatal). One of the reasons to split the znode is because we use JSON to store the data in ZNode. While JSON is good for debugging, its space inefficient. A better option before going to bucketing is to support compression of Ideal state, current state and External View. This also gives good performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HELIX-573) Add support to compress/uncompress data on ZK
[ https://issues.apache.org/jira/browse/HELIX-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352346#comment-14352346 ] kishore gopalakrishna commented on HELIX-573: - From Kanaks's email: I like this idea, but we would still need to support bucketizing either way because we cannot guarantee that the compressed version will be compact enough for every use case. What types of compression schemes are you planning to support? Add support to compress/uncompress data on ZK - Key: HELIX-573 URL: https://issues.apache.org/jira/browse/HELIX-573 Project: Apache Helix Issue Type: Improvement Reporter: kishore gopalakrishna Assignee: kishore gopalakrishna Currently we have bucketing as one of the options when the number of partitions are large. We have couple of bugs with the handling of bucketized resources (one of them is fatal). One of the reasons to split the znode is because we use JSON to store the data in ZNode. While JSON is good for debugging, its space inefficient. A better option before going to bucketing is to support compression of Ideal state, current state and External View. This also gives good performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 31836: [HELIX-573] Add support to automatically compress/uncompress data in Zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31836/ --- Review request for helix. Repository: helix-git Description --- commit 1ef44c2e9a132df3513a51e3a8dac658236a2263 Author: Kishore Gopalakrishna g.kish...@gmail.com Date: Sun Mar 8 16:40:29 2015 -0700 [HELIX-573] Add support to automatically compress/uncompress data in Zookeeper :100644 100644 4419fdd... 1f34529... M helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java :100644 100644 2d7cb3c... 26d7e2b... M helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java :00 100644 000... 90c1e8e... A helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordSerializer.java :100644 100644 e4b0b25... 95064f8... M helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordStreamingSerializer.java :100755 100755 e869f25... 99ef81c... M hpost-review.sh Diffs - helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java 169c993 helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java 4419fdd helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java 2d7cb3c helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java 207a318 helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordSerializer.java PRE-CREATION helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordStreamingSerializer.java e4b0b25 hpost-review.sh 82cbcf9 Diff: https://reviews.apache.org/r/31836/diff/ Testing --- Added basic test for compress/uncompress Thanks, Kishore Gopalakrishna
Re: Review Request 31836: [HELIX-573] Add support to automatically compress/uncompress data in Zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31836/#review75661 --- helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java https://reviews.apache.org/r/31836/#comment122869 As currently written, compression will be default because valueOf returns true if the string is null. I'm not sure if you want that, or at the very least, it should be made explicit, i.e.: ``` if (record.getBooleanField(enableCompression, defaultValue) { ``` helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java https://reviews.apache.org/r/31836/#comment122870 Lines 91 and 92 can be combined into 1 line. helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java https://reviews.apache.org/r/31836/#comment122868 This log message won't be very useful if the bytes are compressed. helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java https://reviews.apache.org/r/31836/#comment122876 Same comment as above. Use getBooleanField. Also, consider putting this code in a common area since it's copy-pasted from the other serializer class. helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java https://reviews.apache.org/r/31836/#comment122877 Same comment about code duplication. helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java https://reviews.apache.org/r/31836/#comment122878 This method too. - Kanak Biscuitwala On March 8, 2015, 4:48 p.m., Kishore Gopalakrishna wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31836/ --- (Updated March 8, 2015, 4:48 p.m.) Review request for helix. Repository: helix-git Description --- commit 1ef44c2e9a132df3513a51e3a8dac658236a2263 Author: Kishore Gopalakrishna g.kish...@gmail.com Date: Sun Mar 8 16:40:29 2015 -0700 [HELIX-573] Add support to automatically compress/uncompress data in Zookeeper :100644 100644 4419fdd... 1f34529... M helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java :100644 100644 2d7cb3c... 26d7e2b... M helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java :00 100644 000... 90c1e8e... A helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordSerializer.java :100644 100644 e4b0b25... 95064f8... M helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordStreamingSerializer.java :100755 100755 e869f25... 99ef81c... M hpost-review.sh Diffs - helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java 169c993 helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordSerializer.java 4419fdd helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordStreamingSerializer.java 2d7cb3c helix-core/src/test/java/org/apache/helix/integration/TestBucketizedResource.java 207a318 helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordSerializer.java PRE-CREATION helix-core/src/test/java/org/apache/helix/manager/zk/TestZNRecordStreamingSerializer.java e4b0b25 hpost-review.sh 82cbcf9 Diff: https://reviews.apache.org/r/31836/diff/ Testing --- Added basic test for compress/uncompress Thanks, Kishore Gopalakrishna