Re: Memory footprint of Calliope: Spark -> Cassandra writes

2014-06-17 Thread tj opensource
Gerard, We haven't done a test on Calliope vs a driver. The thing is Calliope builds on C* thrift (and latest build on DS driver) and the performance in terms of simple write will be similar to any existing driver. But then that is not the use case for Calliope. It is built to be used from Spark

Re: Memory footprint of Calliope: Spark -> Cassandra writes

2014-06-17 Thread Andrew Ash
Gerard, Strings in particular are very inefficient because they're stored in a two-byte format by the JVM. If you use the Kryo serializer and have use StorageLevel.MEMORY_ONLY_SER then Kryo stores Strings in UTF8, which for ASCII-like strings will take half the space. Andrew On Tue, Jun 17, 20

Re: Memory footprint of Calliope: Spark -> Cassandra writes

2014-06-17 Thread Gerard Maas
Hi Rohit, Thanks a lot for looking at this. The intention of calculating the data upfront it to only benchmark the time it takes store in records/sec eliminating the generation factor from it (which will be different on the real scenario, reading from HDFS) I used a profiler today and indeed it's

Memory footprint of Calliope: Spark -> Cassandra writes

2014-06-16 Thread Gerard Maas
Hi, I've been doing some testing with Calliope as a way to do batch load from Spark into Cassandra. My initial results are promising on the performance area, but worrisome on the memory footprint side. I'm generating N records of about 50 bytes each and using the UPDATE mutator to insert them int