[jira] [Commented] (CASSANDRA-1735) Using MessagePack for reducing data size

2011-08-14 Thread Parlo Mendez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084871#comment-13084871
 ] 

Parlo Mendez commented on CASSANDRA-1735:
-

The last post is some time ago. What is the current status of messagepack 
implementation in cassandra? I think it would be very nice.

Parlo

 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2011-01-30 Thread Muga Nishizawa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988655#comment-12988655
 ] 

Muga Nishizawa commented on CASSANDRA-1735:
---

Hi T Jake Luciani,

I would like to notify that we have cleared the license issues with MessagePack.

As you pointed out earlier, MessagePack used to require XNIO (LGPL) for network 
communication.  We replaced XNIO with Apache MINA (Apache License) in 
MessagePack. Javassist which was another issue is a dual license (LGPL and MPL) 
module, and is used by other apache products as MPL.  

So we believe that we have cleared license related issues at the moment.

Please check URL below for more details.  
https://github.com/msgpack/msgpack/
https://github.com/msgpack/msgpack-rpc/ 

 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2010-11-29 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964798#action_12964798
 ] 

T Jake Luciani commented on CASSANDRA-1735:
---

It appears msgpack requires jassist and xnio both of which are LGPL.

This means we can't include msgpack support in our disrtibution see 
http://www.apache.org/legal/3party.html

 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2010-11-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934520#action_12934520
 ] 

Jonathan Ellis commented on CASSANDRA-1735:
---

Gary wrote some performance tests in CASSANDRA-1765 and saw MessagePack 
performance worse than Thrift.  Is something wrong with his code?

 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2010-11-16 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932756#action_12932756
 ] 

Terje Marthinussen commented on CASSANDRA-1735:
---

I am very curious how the serialization in messagepack could compete with the 
serialization used on the data side for cassandra (SSTables) and how we could 
benefit from having the same serialization in both those places.

Anyone has any thoughts?



 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2010-11-16 Thread Muga Nishizawa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932822#action_12932822
 ] 

Muga Nishizawa commented on CASSANDRA-1735:
---

Jonathan,

Thanks for your response.

What kind of performance improvement do you see with this patch?

Performance improvement available with this patch will be the following:
* Reducing serialization cost and the data size
* Increase throughput between clients and a Cassandra node

I have also measured the performance of MessagePack, from the viewpoints of 
reducing serialization cost and throughput.  I will discuss details below.

== Reduction of serialization cost and the data size ==

(Summary)
MessagePack has proved to be better in reducing serialzation cost and the data 
size compared to other serialization libraries in the test below.  

(Test environment)
I used jvm-serializers which is a well-known benchmark and compared 
performances with Protocol Buffers, Thrift, and Avro.  Machine used for this 
benchmark has Core2 Duo 2GHz with 1GB RAM.

(Results)
 create ser   +same   deser   +shal   +deep 
  total   size  +dfl
protobuf 683601629733338   34543759   
9775239   149
thrift  572628755653479   36163770 
10057349   197
msgpack 291493547503468   35453708   
8748236   150
avro 2698640936237480   9301   10481 
16890221   133

(Comments)
It may be better to compare serialization cost using objects with Cassandra 
like a Column object.  But such objects and sizes vary by users, and is not 
suitable for comparing serialization cost of various data.  According to the 
above result, the size of MessagePack's serialized data is slightly larger than 
Avro.  But MessagePack has significantly low serialization cost compared to 
Avro and Thrift.  

== Increasing throughput ==

(Summary)
I compared MessagePack based RPC of Cassandra to that of Thrift.  Random read 
throughput of MessagePack based RPC is 15% higher than that of Thrift and 
random write throughput is 21% higher.  

(Test environment)
In this evaluation, Cassandra node ran as a standalone on a machine with Core2 
Duo 2GHz and 1GB RAM.  Client programs ran on two machines both with Core2 Duo 
2GHz and 1GB RAM.  Client program was based on ring cache.  It created 100 
threads per a JVM on each machine and accesses to a Cassandra node with ring 
cache.  

(Results)
* Thrift based RPC part of Cassandra
  * Random read: 5,200 query/sec.
  * Random write: 11,200 query/sec.
* MessagePack based RPC part of Cassandra
  * Random read: 6,000 query/sec.
  * Random write: 13,600 query/sec.

(Comments)
I measured the max throughput of random access (read/write) after 100 items 
(size of each item is small) were stored in the Cassandra node.  The reason is 
because I wanted to make the state of CPU bottle neck for the Cassandra node.  
If the Cassandra node is the state of Disk IO bottle neck, I thought that I 
cannot properly evaluate max throughput of the RPC part.  

I did not measure the amount of data transferred in network during the 
evaluation directly.  But from the benchmark result of jvm-serializers, I 
believe that the amount of transferred data for MessagePack-based Cassandra 
would be reduced compared to that of Thrift.  


 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 

[jira] Commented: (CASSANDRA-1735) Using MessagePack for reducing data size

2010-11-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931844#action_12931844
 ] 

Jonathan Ellis commented on CASSANDRA-1735:
---

Thanks, this is exciting!

What kind of performance improvement do you see with this patch?

 Using MessagePack for reducing data size
 

 Key: CASSANDRA-1735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1735
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
 Environment: Fedora11,  JDK1.6.0_20
Reporter: Muga Nishizawa
 Attachments: 
 0001-implement-a-Cassandra-RPC-part-with-MessagePack.patch, 
 dependency_libs.zip


 For improving Cassandra performance, I implemented a Cassandra RPC part with 
 MessagePack.  The implementation details are attached as a patch.  The patch 
 works on Cassandra 0.7.0-beta3.  Please check it.  
 MessagePack is one of object serialization libraries for cross-languages like 
 Thrift and Protocol Buffers but it is much faster, small, and easy to 
 implement.  MessagePack allows reducing serialization cost and data size in 
 network and disk.  
 MessagePack websites are
 * website: http://msgpack.org/
 This website compares MessagePack, Thrift and JSON.  
 * desing details: 
 http://redmine.msgpack.org/projects/msgpack/wiki/FormatDesign
 * source code: https://github.com/msgpack/msgpack/
 Performance of the data serialization library is one of the most important 
 issues for developing a distributed database in Java.  If the performance is 
 bad, it significantly reduces the overall database performance.  Java's GC 
 also runs many times.  Cassandra has this problem as well.  
 For reducing data size in network between a client and Cassandra, I 
 prototyped the implementation of a Cassandra RPC part with MessagePack and 
 MessagePack-RPC.  The implementation is very simple.  MessagePack-RPC can 
 reuse the existing Thrift based CassandraServer 
 (org.apache.cassandra.thrift.CassandraServer)
 while adapting MessagePack's communication protocol and data serialization.  
 Major features of MessagePack-RPC are 
 * Asynchronous RPC
 * Parallel Pipelining
 * Connection pooling
 * Delayed return
 * Event-driven I/O
 * more details: http://redmine.msgpack.org/projects/msgpack/wiki/RPCDesign
 * source code: https://github.com/msgpack/msgpack-rpc/
 The attached patch includes a ring cache program for MessagePack and its test 
 program.  
 You can check the behavior of the Cassandra RPC with MessagePack.  
 Thanks in advance, 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.