RE: compaction throughput rate not even close to 16MB

2013-04-25 Thread Viktor Jevdokimov
Our experience with compactions shows that more columns to merge for the same 
row, more CPU it takes.

For example, testing and choosing between 2 data models with supercolumns (we 
still need supercolumns since composite columns lacks some functionality):
  1. supercolumns with many columns
  2.  supercolumns with one column (columns from model 1 merged to one blob 
value)
We found that model 2 compaction performs 4 times faster.

The same for regular column families.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
 Sent: Wednesday, April 24, 2013 23:38
 To: user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 Thanks much!!!  Better to hear at least one other person sees the same thing
 ;).  Sometimes these posts just go silent.

 Dean

 From: Edward Capriolo
 edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 Reply-To:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 24, 2013 2:33 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 I have noticed the same. I think in the real world your compaction
 throughput is limited by other things. If I had to speculate I would say that
 compaction can remove expired tombstones, however doing this requires
 bloom filter checks, etc.

 I think that setting is more important with multi threaded compaction and/or
 more compaction slots. In those cases it may actually throttle something.


 On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean
 dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
 I was wondering about the compactionthroughput.  I never see ours get
 even close to 16MB and I thought this is supposed to throttle compaction,
 right?  Ours is constantly less than 3MB/sec from looking at our logs or do I
 have this totally wrong?  How can I see the real throughput so that I can
 understand how to throttle it when I need to?

 94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms =
 2.365603MB/s.  2,350,114 total rows, 2,350,022 unique.  Row merge counts
 were {1:2349930, 2:92, }

 Thanks,
 Dean






compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean





Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Edward Capriolo
I have noticed the same. I think in the real world your compaction
throughput is limited by other things. If I had to speculate I would say
that compaction can remove expired tombstones, however doing this requires
bloom filter checks, etc.

I think that setting is more important with multi threaded compaction
and/or more compaction slots. In those cases it may actually throttle
something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I was wondering about the compactionthroughput.  I never see ours get even
 close to 16MB and I thought this is supposed to throttle compaction, right?
  Ours is constantly less than 3MB/sec from looking at our logs or do I have
 this totally wrong?  How can I see the real throughput so that I can
 understand how to throttle it when I need to?

 94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms =
 2.365603MB/s.  2,350,114 total rows, 2,350,022 unique.  Row merge counts
 were {1:2349930, 2:92, }

 Thanks,
 Dean






Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Robert Coli
On Wed, Apr 24, 2013 at 1:33 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I think that setting is more important with multi threaded compaction and/or
 more compaction slots. In those cases it may actually throttle something.

Or if you're simultaneously doing a repair, which does a validation
compaction, which will (should?) also be subject to the throttle?

=Rob


Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
Thanks much!!!  Better to hear at least one other person sees the same thing 
;).  Sometimes these posts just go silent.

Dean

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 24, 2013 2:33 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: compaction throughput rate not even close to 16MB

I have noticed the same. I think in the real world your compaction throughput 
is limited by other things. If I had to speculate I would say that compaction 
can remove expired tombstones, however doing this requires bloom filter checks, 
etc.

I think that setting is more important with multi threaded compaction and/or 
more compaction slots. In those cases it may actually throttle something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean






Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Wei Zhu
Same here. We disable the throttling and our disk and CPU usage both low ( 
10%) and still takes hours for LCS compaction to finish after a repair. For 
this cluster, we don't delete any data, so we can rule out tombstones. Not sure 
what is holding compaction back. My observation is that for the LCS which 
involves large number of SSTables (since we set SSTable size too small at 10M 
and sometimes one compactions involves up to 10 G of data = 1000 SSTables), the 
throughout put is smaller. So my theory is that open/close file handlers have 
substantial impact on the throughput. 

By the way, we are on SSD.

-Wei


 From: Hiller, Dean dean.hil...@nrel.gov
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Wednesday, April 24, 2013 1:37 PM
Subject: Re: compaction throughput rate not even close to 16MB
 

Thanks much!!!  Better to hear at least one other person sees the same thing 
;).  Sometimes these posts just go silent.

Dean

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 24, 2013 2:33 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: compaction throughput rate not even close to 16MB

I have noticed the same. I think in the real world your compaction throughput 
is limited by other things. If I had to speculate I would say that compaction 
can remove expired tombstones, however doing this requires bloom filter checks, 
etc.

I think that setting is more important with multi threaded compaction and/or 
more compaction slots. In those cases it may actually throttle something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean