Re: 0.8.1 Vs 1.0.7

2012-03-19 Thread Chris Goffinet
When creating a new CF, defaults are now in fact compression enabled.


On Sat, Mar 17, 2012 at 5:50 AM, R. Verlangen ro...@us2.nl wrote:

 Check your log for messages about rebuilding indices: that might grow your
 dataset some.

 One thing is for sure: the data import removed all the crap that lasted in
 the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
 dramatic but not unlogical at all.


 2012/3/16 Jeremiah Jordan jeremiah.jor...@morningstar.com

  I would guess more aggressive compaction settings, did you update rows
 or insert some twice?
 If you run major compaction a couple times on the 0.8.1 cluster does the
 data size get smaller?

 You can use the describe command to check if compression got turned on.

 -Jeremiah

  --
 *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
 *Sent:* Thursday, March 15, 2012 4:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* 0.8.1 Vs 1.0.7

  Hi,

  I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
 were a little bit surprising

  0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch

  XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB
 12.50%
 XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB
 12.50%
 XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB
 12.50%
 XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB
 12.50%
 XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB
 12.50%
 XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB
 12.50%
 XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB
 12.50%
 XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB
 12.50%

  1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
 yet to join ring],
 PropertyFileSnitch

  XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
 XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%

 XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%

 XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%

 XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%

 XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%

 XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%

 XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%

  There seems to be 3X savings in size for the same dataset running
 1.0.7. I have not enabled compression for any of the CFs. Will it be
 enabled by default when creating a new CF in 1.0.7? cassandra.yaml is also
 mostly identical.

  Thanks and Regards,
 Ravi





Re: 0.8.1 Vs 1.0.7

2012-03-17 Thread R. Verlangen
Check your log for messages about rebuilding indices: that might grow your
dataset some.

One thing is for sure: the data import removed all the crap that lasted in
the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
dramatic but not unlogical at all.

2012/3/16 Jeremiah Jordan jeremiah.jor...@morningstar.com

  I would guess more aggressive compaction settings, did you update rows
 or insert some twice?
 If you run major compaction a couple times on the 0.8.1 cluster does the
 data size get smaller?

 You can use the describe command to check if compression got turned on.

 -Jeremiah

  --
 *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
 *Sent:* Thursday, March 15, 2012 4:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* 0.8.1 Vs 1.0.7

  Hi,

  I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
 were a little bit surprising

  0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch

  XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB
 12.50%
 XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB
 12.50%
 XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB
 12.50%
 XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB
 12.50%
 XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB
 12.50%
 XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB
 12.50%
 XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB
 12.50%
 XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB
 12.50%

  1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
 yet to join ring],
 PropertyFileSnitch

  XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
 XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
 XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%

 XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
 XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%

 XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%

 XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
 XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%

  There seems to be 3X savings in size for the same dataset running 1.0.7.
 I have not enabled compression for any of the CFs. Will it be enabled by
 default when creating a new CF in 1.0.7? cassandra.yaml is also mostly
 identical.

  Thanks and Regards,
 Ravi



RE: 0.8.1 Vs 1.0.7

2012-03-16 Thread Jeremiah Jordan
I would guess more aggressive compaction settings, did you update rows or 
insert some twice?
If you run major compaction a couple times on the 0.8.1 cluster does the data 
size get smaller?

You can use the describe command to check if compression got turned on.

-Jeremiah


From: Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
Sent: Thursday, March 15, 2012 4:41 AM
To: user@cassandra.apache.org
Subject: 0.8.1 Vs 1.0.7

Hi,

I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results were a 
little bit surprising

0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch

XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB   12.50%
XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB   12.50%
XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB   12.50%
XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB   12.50%
XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB   12.50%
XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB   12.50%
XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB   12.50%
XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB   12.50%

1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c yet to 
join ring],
PropertyFileSnitch

XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%
XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%
XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%
XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%

There seems to be 3X savings in size for the same dataset running 1.0.7. I have 
not enabled compression for any of the CFs. Will it be enabled by default when 
creating a new CF in 1.0.7? cassandra.yaml is also mostly identical.

Thanks and Regards,
Ravi