When the data is streamed into the cluster by the bulk loader it is compressed 
on the receiving end (if the target CF has compression enabled).

If you are able to reproduce this  can you create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA ? 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 10:00 PM, Andy Cobley wrote:

> My (limited) experience of moving form 0.8 to 1.0 is that you do have to use 
> rebuildsstables.  I'm guessing BlukLoading is bypassing the compression ?
> 
> Andy
> 
> On 28 Jun 2012, at 10:53, jmodha wrote:
> 
>> Hi,
>> 
>> We are migrating our Cassandra cluster from v1.0.3 to v1.1.1, the data is
>> migrated using SSTableLoader to an empty Cassandra cluster.
>> 
>> The data in the source cluster (v1.0.3) is uncompressed and the target
>> cluster (1.1.1) has the column family created with compression turned on.
>> 
>> What we are seeing is that once the data has been loaded into the target
>> cluster, the size is similar to the data in the source cluster. Our
>> expectation is that since we have turned on compression in the target
>> cluster, the amount of data would be reduced.
>> 
>> We have tried running the "rebuildsstables" nodetool command on a node after
>> data has been loaded and we do indeed see a huge reduction in size e.g. from
>> 30GB to 10GB for a given column family. We were hoping to see this at the
>> point of loading the data in via the SSTableLoader.
>> 
>> Is this behaviour expected? 
>> 
>> Do we need to run the rebuildsstables command on all nodes to actually
>> compress the data after it has been streamed in?
>> 
>> Thanks.
>> 
>> --
>> View this message in context: 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
>> Nabble.com.
> 
> 
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 
> 

Reply via email to