It seems to be normal to explode data size during repair. For our case, we have 
a node around 200G with RF =3, during repair, it goes to as high as 300G. We 
are using LCS, it creates more than 5000 compaction tasks and takes more than a 
day to finish. We are on 1.1.6

There is parallel LCS feature on 1.2, it is supposed to speed up the LCS. Let 
us know how it goes for you since you are using LCS on 1.2

Also there are a few JIRAs related to this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2698
https://issues.apache.org/jira/browse/CASSANDRA-3721


Thanks.
-Wei

----- Original Message -----
From: "aaron morton" <aa...@thelastpickle.com>
To: user@cassandra.apache.org
Sent: Wednesday, March 6, 2013 8:29:16 AM
Subject: Re: should I file a bug report on this or is this normal?



15. Size of nreldata is now 220K ….it has exploded in size!!!!!! 
This may be explained by fragmentation in the sstables, which compaction would 
eventually resolve. 


During repair the data came from multiple nodes and created multiple sstables 
for each CF. Streaming copies part of an SSTable on the source and creates an 
SSTable on the destination. This pattern is different to all writes for a CF 
going to the same sstable when flushed. 


To compare apples to apples run a major compaction after the initial data load, 
and after the repair. 



1. Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to 
small) bytes of data while in the initial data it was 2192 bytes for 
43038(small to large) bytes of data? 
The size of the BF depends on the number of rows and the false positive rate. 
Not the size of the -Data.db component on disk. 



2. Why is there 3 levels? With such a small set of data, I would think it would 
flush one data file like the original data but instead there is 3 files. 
See above. 


Cheers 








----------------- 
Aaron Morton 
Freelance Cassandra Developer 
New Zealand 


@aaronmorton 
http://www.thelastpickle.com 


On 6/03/2013, at 6:40 AM, "Hiller, Dean" < dean.hil...@nrel.gov > wrote: 


I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2 

My test was as so 

1. Start up 4 node cassandra cluster 
2. Populate with initial test data (no other data is added to system after this 
point!!!) 
3. Run nodetool drain on every node(move stuff from commit log to sstables) 
4. Stop and start cassandra cluster to have it running again 
5. Get size of nreldata CF folder is 128kB 
6. Go to node 3, run snapshot and mv snapshots directory OUT of nreldata 
7. Get size of nreldata CF folder is 128kB 
8. On node 3, run nodetool drain 
9. Get size of nreldataCF folder is still 128kB 
10. Stop cassandra node 
11. Rm <keyspace>/nreldata/*.db 
12. Size of nreldata CF is 8kb(odd of an empty folder but ok) 
13. Start cassandra 
14. Nodetool repair databus5 nreldata 
15. Size of nreldata is now 220K ….it has exploded in size!!!!!! 

I ran this QA test as we see data size explosion in production as well(I can't 
be 100% sure if this is the same thing though as above is such a small data 
set). Would leveled compaction be a bit more stable in terms of size ratios and 
such. 

QUESTIONS 

1. Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to 
small) bytes of data while in the initial data it was 2192 bytes for 
43038(small to large) bytes of data? 
2. Why is there 3 levels? With such a small set of data, I would think it would 
flush one data file like the original data but instead there is 3 files. 

My files after repair have levels 5, 6, and 7. My files before deletion of the 
CF have just level 1. After repair files are 
-rw-rw-r--. 1 cassandra cassandra 54 Mar 6 07:18 
databus5-nreldata-ib-5-CompressionInfo.db 
-rw-rw-r--. 1 cassandra cassandra 29118 Mar 6 07:18 
databus5-nreldata-ib-5-Data.db 
-rw-rw-r--. 1 cassandra cassandra 3856 Mar 6 07:18 
databus5-nreldata-ib-5-Filter.db 
-rw-rw-r--. 1 cassandra cassandra 37000 Mar 6 07:18 
databus5-nreldata-ib-5-Index.db 
-rw-rw-r--. 1 cassandra cassandra 4772 Mar 6 07:18 
databus5-nreldata-ib-5-Statistics.db 
-rw-rw-r--. 1 cassandra cassandra 383 Mar 6 07:18 
databus5-nreldata-ib-5-Summary.db 
-rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 databus5-nreldata-ib-5-TOC.txt 
-rw-rw-r--. 1 cassandra cassandra 46 Mar 6 07:18 
databus5-nreldata-ib-6-CompressionInfo.db 
-rw-rw-r--. 1 cassandra cassandra 14271 Mar 6 07:18 
databus5-nreldata-ib-6-Data.db 
-rw-rw-r--. 1 cassandra cassandra 816 Mar 6 07:18 
databus5-nreldata-ib-6-Filter.db 
-rw-rw-r--. 1 cassandra cassandra 18248 Mar 6 07:18 
databus5-nreldata-ib-6-Index.db 
-rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:18 
databus5-nreldata-ib-6-Statistics.db 
-rw-rw-r--. 1 cassandra cassandra 230 Mar 6 07:18 
databus5-nreldata-ib-6-Summary.db 
-rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 databus5-nreldata-ib-6-TOC.txt 
-rw-rw-r--. 1 cassandra cassandra 46 Mar 6 07:18 
databus5-nreldata-ib-7-CompressionInfo.db 
-rw-rw-r--. 1 cassandra cassandra 14271 Mar 6 07:18 
databus5-nreldata-ib-7-Data.db 
-rw-rw-r--. 1 cassandra cassandra 816 Mar 6 07:18 
databus5-nreldata-ib-7-Filter.db 
-rw-rw-r--. 1 cassandra cassandra 18248 Mar 6 07:18 
databus5-nreldata-ib-7-Index.db 
-rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:18 
databus5-nreldata-ib-7-Statistics.db 
-rw-rw-r--. 1 cassandra cassandra 230 Mar 6 07:18 
databus5-nreldata-ib-7-Summary.db 
-rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 databus5-nreldata-ib-7-TOC.txt 

Before repair files(from my moved snapshot as I moved it out of the directory 
so cassandra no longer had it)…. 
-rw-rw-r--. 1 cassandra cassandra 62 Mar 6 07:11 
databus5-nreldata-ib-1-CompressionInfo.db 
-rw-rw-r--. 1 cassandra cassandra 43038 Mar 6 07:11 
databus5-nreldata-ib-1-Data.db 
-rw-rw-r--. 1 cassandra cassandra 2192 Mar 6 07:11 
databus5-nreldata-ib-1-Filter.db 
-rw-rw-r--. 1 cassandra cassandra 55248 Mar 6 07:11 
databus5-nreldata-ib-1-Index.db 
-rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:11 
databus5-nreldata-ib-1-Statistics.db 
-rw-rw-r--. 1 cassandra cassandra 499 Mar 6 07:11 
databus5-nreldata-ib-1-Summary.db 
-rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:11 databus5-nreldata-ib-1-TOC.txt 

Thanks, 
Dean 



Reply via email to