Re: Adding New Node Issue

2015-04-23 Thread Andrei Ivanov
Thomas,

>From our experience, C* is almost degrading quite a bit when we bootstrap
new nodes - no idea why, was never able to get any help or hints. And we
never reach anywhere close to 200Mbps. Though we also see higher CPU
usage.Actually, there is another way of adding nodes, I guess. Like start
the new node w/o auto bootstrap and initiate a rebuild. But this approach
is not completely flawless.

Andrei.

On Thu, Apr 23, 2015 at 11:50 PM, Thomas Miller 
wrote:

> Andrei,
>
>
>
> I did not see that bug report. Thanks for the heads up on that.
>
>
>
> I am thinking that that is still not the issue though since if this were
> the case then I should be seeing higher than 200Mbps on that interface. I
> am able to see that the two streaming nodes never get over 200Mbps via my
> Zabbix monitoring software. If this bug was affecting us I should see those
> interface getting hammered, right?
>
>
>
> Thanks,
>
> Thomas Miller
>
>
>
> *From:* Andrei Ivanov [mailto:aiva...@iponweb.net]
> *Sent:* Thursday, April 23, 2015 4:40 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding New Node Issue
>
>
>
> Thomas, just in case you missed it there is a bug with throughput setting
> prior to 2.0.13, here is the link:
>
> https://issues.apache.org/jira/browse/CASSANDRA-8852
>
>
>
> So, it may happen you are setting it to 1600 megabytes
>
>
>
> Andrei
>
>
>
> On Thu, Apr 23, 2015 at 11:22 PM, Ali Akhtar  wrote:
>
> What version are you running?
>
>
>
> On Fri, Apr 24, 2015 at 12:51 AM, Thomas Miller 
> wrote:
>
> Jeff,
>
>
>
> Thanks for the response. I had come across that as a possible solution
> previously but there are discrepancies that would lead me to think that
> that is not the issue.
>
>
>
> It appears our stream throughput is currently set to 200Mbps but unless
> the Cassandra service shares that same throughput limitation to serve its
> data also, it does not seem like 200Mbps bandwidth usage would overwhelm
> the nodes. The 200Mbps bandwidth usage is only on two of the four nodes
> when adding the new node. It seems like the other two nodes should be able
> to handle requests still. When my backups run at night they hit around
> 300Mbps bandwidth usage and we have no timeouts at all.
>
>
>
> Then there is the question of why, when we stopped the Cassandra service
> on the joining node, the timeouts did not stop? Opscenter did not show that
> node anymore and “nodetool status” verified that. We were thinking that
> maybe gossip caused the existing nodes to think that there was still a node
> joining but since the new node was shutdown it was not actually joining,
> but that is not confirmed.
>
>
>
>
>
> Thanks,
>
> Thomas Miller
>
>
>
> *From:* Jeff Ferland [mailto:j...@tubularlabs.com]
> *Sent:* Thursday, April 23, 2015 2:46 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding New Node Issue
>
>
>
> Sounds to me like your stream throughput value is too high. `notetool
> getstreamthroughput` and `notetool setstreamthroughput` will update this
> value live. Limit it to something lower so that the system isn’t overloaded
> by streaming. The bottleneck that slows things down is mostly to be disk or
> network.
>
>
>
> On Apr 23, 2015, at 11:18 AM, Thomas Miller  wrote:
>
>
>
> Hello,
>
>
>
> Yesterday we ran into a serious issue while joining a new node to our
> existing 4 node Cassandra cluster (version 2.0.7). The average node data
> size is 152GB’s with a replication factor of 3. The node was prepped just
> like the following document describes -
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
> .
>
>
>
> When I started the new node, Opscenter showed the node as “Active –
> Joining” but we immediately began getting timeouts on our websites because
> lookups were taking too long. On the 4 existing nodes the network interface
> showed about 200Mbps being used, the CPU never went over 20% and the memory
> usage barely changed.
>
>
>
> The question I have is, does adding a new node cause some sort of
> throttling that would affect our webservers from being able to function as
> normal? The only thing that we can think of that might have had some affect
> was that a repair was just finishing on one of the nodes when the new node
> was added. The repair ended up finishing while the new node was in the
> joining state but the timeouts did not go away afterwards.
>
>
>
> Our impatience got the better of us so we ended up stopping the Cassandra
> service on the new node because it appeared, at the time, to have stalled
> out in t

Re: Adding New Node Issue

2015-04-23 Thread Andrei Ivanov
Thomas, just in case you missed it there is a bug with throughput setting
prior to 2.0.13, here is the link:
https://issues.apache.org/jira/browse/CASSANDRA-8852

So, it may happen you are setting it to 1600 megabytes

Andrei

On Thu, Apr 23, 2015 at 11:22 PM, Ali Akhtar  wrote:

> What version are you running?
>
> On Fri, Apr 24, 2015 at 12:51 AM, Thomas Miller 
> wrote:
>
>> Jeff,
>>
>>
>>
>> Thanks for the response. I had come across that as a possible solution
>> previously but there are discrepancies that would lead me to think that
>> that is not the issue.
>>
>>
>>
>> It appears our stream throughput is currently set to 200Mbps but unless
>> the Cassandra service shares that same throughput limitation to serve its
>> data also, it does not seem like 200Mbps bandwidth usage would overwhelm
>> the nodes. The 200Mbps bandwidth usage is only on two of the four nodes
>> when adding the new node. It seems like the other two nodes should be able
>> to handle requests still. When my backups run at night they hit around
>> 300Mbps bandwidth usage and we have no timeouts at all.
>>
>>
>>
>> Then there is the question of why, when we stopped the Cassandra service
>> on the joining node, the timeouts did not stop? Opscenter did not show that
>> node anymore and “nodetool status” verified that. We were thinking that
>> maybe gossip caused the existing nodes to think that there was still a node
>> joining but since the new node was shutdown it was not actually joining,
>> but that is not confirmed.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Thomas Miller
>>
>>
>>
>> *From:* Jeff Ferland [mailto:j...@tubularlabs.com]
>> *Sent:* Thursday, April 23, 2015 2:46 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Adding New Node Issue
>>
>>
>>
>> Sounds to me like your stream throughput value is too high. `notetool
>> getstreamthroughput` and `notetool setstreamthroughput` will update this
>> value live. Limit it to something lower so that the system isn’t overloaded
>> by streaming. The bottleneck that slows things down is mostly to be disk or
>> network.
>>
>>
>>
>> On Apr 23, 2015, at 11:18 AM, Thomas Miller 
>> wrote:
>>
>>
>>
>> Hello,
>>
>>
>>
>> Yesterday we ran into a serious issue while joining a new node to our
>> existing 4 node Cassandra cluster (version 2.0.7). The average node data
>> size is 152GB’s with a replication factor of 3. The node was prepped just
>> like the following document describes -
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>> .
>>
>>
>>
>> When I started the new node, Opscenter showed the node as “Active –
>> Joining” but we immediately began getting timeouts on our websites because
>> lookups were taking too long. On the 4 existing nodes the network interface
>> showed about 200Mbps being used, the CPU never went over 20% and the memory
>> usage barely changed.
>>
>>
>>
>> The question I have is, does adding a new node cause some sort of
>> throttling that would affect our webservers from being able to function as
>> normal? The only thing that we can think of that might have had some affect
>> was that a repair was just finishing on one of the nodes when the new node
>> was added. The repair ended up finishing while the new node was in the
>> joining state but the timeouts did not go away afterwards.
>>
>>
>>
>> Our impatience got the better of us so we ended up stopping the Cassandra
>> service on the new node because it appeared, at the time, to have stalled
>> out in the joining state and nothing more was being streamed to it. But
>> even stopping it did not allow the cluster to resume its normal operation
>> and we were still getting timeouts. We tried rebooting our web servers and
>> then our 4 existing Cassandra servers but none of it worked.
>>
>>
>>
>> We never saw any errors/exceptions in the Cassandra and system logs at
>> all. It completely mystified us why there would be no errors/exceptions
>> unless this was working as intended.
>>
>>
>>
>> We ended up getting it working by adding the new node again and just
>> letting it go until it finally finished joining, and everything magically
>> started working again. We noticed towards the end it was barely streaming
>> anything (Opscenter was not showing any running streams towards the end) by
>> checking the size of the data directory and we saw it growing and shrinking
>> ever so slightly.
>>
>>
>>
>> We have to add one more new node and then decommission two of the
>> existing nodes so we can perform some hardware maintenance on the server
>> those two existing nodes are on, but we are hesitant to try this again
>> without scheduling a maintenance window for this node add and
>> decommissioning process.
>>
>>
>>
>> So to reiterate what I am asking, does adding a node cause the cluster to
>> be unusable/timeout? Also, can we expect the decommissioning of the other
>> two nodes to cause the same type of downtimes since they have to stream
>> their content out to the other nodes in

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-23 Thread Andrei Ivanov
Just in case it helps - we are running C* with sstable sizes of something
like 2.5 TB and ~4TB/node. No evident problems except the time it takes to
compact.

Andrei.

On Wed, Apr 22, 2015 at 5:36 PM, Anuj Wadehra 
wrote:

> Thanks Robert!!
>
> The JIRA was very helpful in understanding how tombstone threshold is
> implemented. And ticket also says that running major compaction weekly is
> an alternative. I actually want to understand if I run major compaction on
> a cf with 500gb of data and a single giant file is created. Do you see any
> problems with Cassandra processing such a huge file?  Is there any Max
> sstable size beyond which performance etc degrades? What are the
> implications?
>
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> 
> --
>   *From*:"Robert Coli" 
> *Date*:Fri, 17 Apr, 2015 at 10:55 pm
> *Subject*:Re: Drawbacks of Major Compaction now that Automatic Tombstone
> Compaction Exists
>
> On Tue, Apr 14, 2015 at 8:29 PM, Anuj Wadehra 
> wrote:
>
>> By automatic tombstone compaction, I am referring to tombstone_threshold
>> sub property under compaction strategy in CQL. It is 0.2 by default. So
>> what I understand from the Datastax documentation is that even if a sstable
>> does not find sstables of similar size (STCS) , an automatic tombstone
>> compaction will trigger on sstable when 20% data is tombstone. This
>> compaction works on single sstable only.
>>
>
> Overall system behavior is discussed here :
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6654?focusedCommentId=13914587&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13914587
>
> They are talking about LCS, but the principles apply, but with an overlay
> of how STS behaves.
>
> =Rob
>
>


Sstables remain after compaction (C* 2.0.13)

2015-04-07 Thread Andrei Ivanov
Hi all,

I know, there was a thread with the same topic a while ago. But my problem
is that I'm seeing exactly the same behavior with C*2.0.13. I.e. compacted
sstables remain there after compaction for a long time (say ~24 hours,
never waited longer than that). Those sstables are removed upon restart,
but restarting nodes all the time doesn't look like a super cool idea.

Any hints? Ideas?

Don't know if it makes any difference - we have pretty large nodes
(~4TB/node) with STCS.

Thanks in advance, Andrei.


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Ah, clear then. SSD usage imposes a different bias in terms of costs;-)

On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev  wrote:
> Andrei,
>
> Oh, yes, I have scanned the top of your previous email but overlooked the
> last part.
>
> I am using SSDs so I prefer to put extra work to keep my system performing
> and save expensive disk space. So far I've been able to size the system more
> or less correctly so these LCS limitations do not cause too much troubles.
> But I do keep the CF "sharding" option as backup - for me it will be
> relatively easy to implement it.
>
>
> On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov  wrote:
>>
>> Nikolai,
>>
>> Just in case you've missed my comment in the thread (guess you have) -
>> increasing sstable size does nothing (in our case at least). That is,
>> it's not worse but the load pattern is still the same - doing nothing
>> most of the time. So, I switched to STCS and we will have to live with
>> extra storage cost - storage is way cheaper than cpu etc anyhow:-)
>>
>> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev 
>> wrote:
>> > Hi Jean-Armel,
>> >
>> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
>> > there
>> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
>> > 2.0.10.
>> >
>> > I have about 1,8Tb of data per node now in total, which falls into that
>> > range.
>> >
>> > As I said, it is really a problem with large amount of data in a single
>> > CF,
>> > not total amount of data. Quite often the nodes are idle yet having
>> > quite a
>> > bit of pending compactions. I have discussed it with other members of C*
>> > community and DataStax guys and, they have confirmed my observation.
>> >
>> > I believe that increasing the sstable size won't help at all and
>> > probably
>> > will make the things worse - everything else being equal, of course. But
>> > I
>> > would like to hear from Andrei when he is done with his test.
>> >
>> > Regarding the last statement - yes, C* clearly likes many small servers
>> > more
>> > than fewer large ones. But it is all relative - and can be all
>> > recalculated
>> > to $$$ :) C* is all about partitioning of everything - storage,
>> > traffic...Less data per node and more nodes give you lower latency,
>> > lower
>> > heap usage etc, etc. I think I have learned this with my project.
>> > Somewhat
>> > hard way but still, nothing is better than the personal experience :)
>> >
>> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce 
>> > wrote:
>> >>
>> >> Hi Andrei, Hi Nicolai,
>> >>
>> >> Which version of C* are you using ?
>> >>
>> >> There are some recommendations about the max storage per node :
>> >>
>> >> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>> >>
>> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> >> handle 10x
>> >> (3-5TB)".
>> >>
>> >> I have the feeling that those recommendations are sensitive according
>> >> many
>> >> criteria such as :
>> >> - your hardware
>> >> - the compaction strategy
>> >> - ...
>> >>
>> >> It looks that LCS lower those limitations.
>> >>
>> >> Increasing the size of sstables might help if you have enough CPU and
>> >> you
>> >> can put more load on your I/O system (@Andrei, I am interested by the
>> >> results of your  experimentation about large sstable files)
>> >>
>> >> From my point of view, there are some usage patterns where it is better
>> >> to
>> >> have many small servers than a few large servers. Probably, it is
>> >> better to
>> >> have many small servers if you need LCS for large tables.
>> >>
>> >> Just my 2 cents.
>> >>
>> >> Jean-Armel
>> >>
>> >> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>> >>>
>> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
>> >>> 
>> >>> wrote:
>> >>>>
>> >>>> One of the obvious recommendations I have received was to run more
>> >>>> than
>> >>>> one instance of C* per host. Makes sense - it will reduce the amount
>> >>>> of data
>> >>>> per node and will make better use of the resources.
>> >>>
>> >>>
>> >>> This is usually a Bad Idea to do in production.
>> >>>
>> >>> =Rob
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Nikolai Grigoriev
>> > (514) 772-5178
>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Nikolai,

Just in case you've missed my comment in the thread (guess you have) -
increasing sstable size does nothing (in our case at least). That is,
it's not worse but the load pattern is still the same - doing nothing
most of the time. So, I switched to STCS and we will have to live with
extra storage cost - storage is way cheaper than cpu etc anyhow:-)

On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev  wrote:
> Hi Jean-Armel,
>
> I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there
> are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10.
>
> I have about 1,8Tb of data per node now in total, which falls into that
> range.
>
> As I said, it is really a problem with large amount of data in a single CF,
> not total amount of data. Quite often the nodes are idle yet having quite a
> bit of pending compactions. I have discussed it with other members of C*
> community and DataStax guys and, they have confirmed my observation.
>
> I believe that increasing the sstable size won't help at all and probably
> will make the things worse - everything else being equal, of course. But I
> would like to hear from Andrei when he is done with his test.
>
> Regarding the last statement - yes, C* clearly likes many small servers more
> than fewer large ones. But it is all relative - and can be all recalculated
> to $$$ :) C* is all about partitioning of everything - storage,
> traffic...Less data per node and more nodes give you lower latency, lower
> heap usage etc, etc. I think I have learned this with my project. Somewhat
> hard way but still, nothing is better than the personal experience :)
>
> On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce  wrote:
>>
>> Hi Andrei, Hi Nicolai,
>>
>> Which version of C* are you using ?
>>
>> There are some recommendations about the max storage per node :
>> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>>
>> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> handle 10x
>> (3-5TB)".
>>
>> I have the feeling that those recommendations are sensitive according many
>> criteria such as :
>> - your hardware
>> - the compaction strategy
>> - ...
>>
>> It looks that LCS lower those limitations.
>>
>> Increasing the size of sstables might help if you have enough CPU and you
>> can put more load on your I/O system (@Andrei, I am interested by the
>> results of your  experimentation about large sstable files)
>>
>> From my point of view, there are some usage patterns where it is better to
>> have many small servers than a few large servers. Probably, it is better to
>> have many small servers if you need LCS for large tables.
>>
>> Just my 2 cents.
>>
>> Jean-Armel
>>
>> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>>>
>>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
>>> wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of 
 data
 per node and will make better use of the resources.
>>>
>>>
>>> This is usually a Bad Idea to do in production.
>>>
>>> =Rob
>>>
>>
>>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Yep, Marcus, I know. It's mainly a question of cost of those extra x2
disks, you know. Our "final" setup will be more like 30TB, so doubling
it is still some cost. But i guess, we will have to live with it

On Tue, Nov 25, 2014 at 1:26 PM, Marcus Eriksson  wrote:
> If you are that write-heavy you should definitely go with STCS, LCS
> optimizes for reads by doing more compactions
>
> /Marcus
>
> On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov  wrote:
>>
>> Hi Jean-Armel, Nikolai,
>>
>> 1. Increasing sstable size doesn't work (well, I think, unless we
>> "overscale" - add more nodes than really necessary, which is
>> prohibitive for us in a way). Essentially there is no change.  I gave
>> up and will go for STCS;-(
>> 2. We use 2.0.11 as of now
>> 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data
>> (GP SSD)
>>
>> Jean-Armel, I believe that what you say about many small instances is
>> absolutely true. But, is not good in our case - we write a lot and
>> almost never read what we've written. That is, we want to be able to
>> read everything, but in reality we hardly read 1%, I think. This
>> implies that smaller instances are of no use in terms of read
>> performance for us. And generally nstances/cpu/ram is more expensive
>> than storage. So, we really would like to have instances with large
>> storage.
>>
>> Andrei.
>>
>>
>>
>>
>>
>> On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce 
>> wrote:
>> > Hi Andrei, Hi Nicolai,
>> >
>> > Which version of C* are you using ?
>> >
>> > There are some recommendations about the max storage per node :
>> >
>> > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>> >
>> > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> > handle
>> > 10x
>> > (3-5TB)".
>> >
>> > I have the feeling that those recommendations are sensitive according
>> > many
>> > criteria such as :
>> > - your hardware
>> > - the compaction strategy
>> > - ...
>> >
>> > It looks that LCS lower those limitations.
>> >
>> > Increasing the size of sstables might help if you have enough CPU and
>> > you
>> > can put more load on your I/O system (@Andrei, I am interested by the
>> > results of your  experimentation about large sstable files)
>> >
>> > From my point of view, there are some usage patterns where it is better
>> > to
>> > have many small servers than a few large servers. Probably, it is better
>> > to
>> > have many small servers if you need LCS for large tables.
>> >
>> > Just my 2 cents.
>> >
>> > Jean-Armel
>> >
>> > 2014-11-24 19:56 GMT+01:00 Robert Coli :
>> >>
>> >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
>> >> 
>> >> wrote:
>> >>>
>> >>> One of the obvious recommendations I have received was to run more
>> >>> than
>> >>> one instance of C* per host. Makes sense - it will reduce the amount
>> >>> of data
>> >>> per node and will make better use of the resources.
>> >>
>> >>
>> >> This is usually a Bad Idea to do in production.
>> >>
>> >> =Rob
>> >>
>> >
>> >
>
>


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Hi Jean-Armel, Nikolai,

1. Increasing sstable size doesn't work (well, I think, unless we
"overscale" - add more nodes than really necessary, which is
prohibitive for us in a way). Essentially there is no change.  I gave
up and will go for STCS;-(
2. We use 2.0.11 as of now
3. We are running on EC2 c3.8xlarge instances with EBS volumes for data (GP SSD)

Jean-Armel, I believe that what you say about many small instances is
absolutely true. But, is not good in our case - we write a lot and
almost never read what we've written. That is, we want to be able to
read everything, but in reality we hardly read 1%, I think. This
implies that smaller instances are of no use in terms of read
performance for us. And generally nstances/cpu/ram is more expensive
than storage. So, we really would like to have instances with large
storage.

Andrei.





On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce  wrote:
> Hi Andrei, Hi Nicolai,
>
> Which version of C* are you using ?
>
> There are some recommendations about the max storage per node :
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>
> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle
> 10x
> (3-5TB)".
>
> I have the feeling that those recommendations are sensitive according many
> criteria such as :
> - your hardware
> - the compaction strategy
> - ...
>
> It looks that LCS lower those limitations.
>
> Increasing the size of sstables might help if you have enough CPU and you
> can put more load on your I/O system (@Andrei, I am interested by the
> results of your  experimentation about large sstable files)
>
> From my point of view, there are some usage patterns where it is better to
> have many small servers than a few large servers. Probably, it is better to
> have many small servers if you need LCS for large tables.
>
> Just my 2 cents.
>
> Jean-Armel
>
> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>>
>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
>> wrote:
>>>
>>> One of the obvious recommendations I have received was to run more than
>>> one instance of C* per host. Makes sense - it will reduce the amount of data
>>> per node and will make better use of the resources.
>>
>>
>> This is usually a Bad Idea to do in production.
>>
>> =Rob
>>
>
>


Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
OK, let's see - my cluster is recompacting now;-) I will let you know
if this helps

On Mon, Nov 24, 2014 at 5:48 PM, Nikolai Grigoriev  wrote:
> I was thinking about that option and I would be curious to find out how does
> this change help you. I suspected that increasing sstable size won't help
> too much because the compaction throughput (per task/thread) is still the
> same. So, it will simply take 4x longer to finish a compaction task. It is
> possible that because of that the CPU will be under-used for even longer.
>
> My data model, unfortunately, requires this amount of data. And I suspect
> that regardless of how it is organized I won't be able to optimize it - I do
> need these rows to be in one row so I can read them quickly.
>
> One of the obvious recommendations I have received was to run more than one
> instance of C* per host. Makes sense - it will reduce the amount of data per
> node and will make better use of the resources. I would go for it myself,
> but it may be a challenge for the people in operations. Without a VM this
> would be more tricky for them to operate such a thing and I do not want any
> VMs there.
>
> Another option is to probably simply shard my data between several identical
> tables in the same keyspace. I could also think about different keyspaces
> but I prefer not to spread the data for the same logical "tenant" across
> multiple keyspaces. Use my primary key's hash and then simply do something
> like mod 4 and add this to the table name :) This would effectively reduce
> the number of sstables and amount of data per table (CF). I kind of like
> this idea more - yes, a bit more challenge at coding level but obvious
> benefits without extra operational complexity.
>
>
> On Mon, Nov 24, 2014 at 9:32 AM, Andrei Ivanov  wrote:
>>
>> Nikolai,
>>
>> This is more or less what I'm seeing on my cluster then. Trying to
>> switch to bigger sstables right now (1Gb)
>>
>> On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev 
>> wrote:
>> > Andrei,
>> >
>> > Oh, Monday mornings...Tb :)
>> >
>> > On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov 
>> > wrote:
>> >>
>> >> Nikolai,
>> >>
>> >> Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables
>> >> with 256Mb table size...
>> >>
>> >> Andrei
>> >>
>> >> On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev
>> >> 
>> >> wrote:
>> >> > Jean-Armel,
>> >> >
>> >> > I have only two large tables, the rest is super-small. In the test
>> >> > cluster
>> >> > of 15 nodes the largest table has about 110M rows. Its total size is
>> >> > about
>> >> > 1,26Gb per node (total disk space used per node for that CF). It's
>> >> > got
>> >> > about
>> >> > 5K sstables per node - the sstable size is 256Mb. cfstats on a
>> >> > "healthy"
>> >> > node look like this:
>> >> >
>> >> > Read Count: 8973748
>> >> > Read Latency: 16.130059053251774 ms.
>> >> > Write Count: 32099455
>> >> > Write Latency: 1.6124713938912671 ms.
>> >> > Pending Tasks: 0
>> >> > Table: wm_contacts
>> >> > SSTable count: 5195
>> >> > SSTables in each level: [27/4, 11/10, 104/100, 1053/1000,
>> >> > 4000,
>> >> > 0,
>> >> > 0, 0, 0]
>> >> > Space used (live), bytes: 1266060391852
>> >> > Space used (total), bytes: 1266144170869
>> >> > SSTable Compression Ratio: 0.32604853410787327
>> >> > Number of keys (estimate): 25696000
>> >> > Memtable cell count: 71402
>> >> > Memtable data size, bytes: 26938402
>> >> > Memtable switch count: 9489
>> >> > Local read count: 8973748
>> >> > Local read latency: 17.696 ms
>> >> > Local write count: 32099471
>> >> > Local write latency: 1.732 ms
>> >> > Pending tasks: 0
>> >> > Bloom filter false positives: 32248
>> >> > Bloom filter false ratio: 0.50685
>> >> > Bloom filter space used, bytes: 20744432
>> >> > Compacted partition minimum bytes: 104
>> >> > Compacted partition maximum bytes

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Nikolai,

This is more or less what I'm seeing on my cluster then. Trying to
switch to bigger sstables right now (1Gb)

On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev  wrote:
> Andrei,
>
> Oh, Monday mornings...Tb :)
>
> On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov  wrote:
>>
>> Nikolai,
>>
>> Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables
>> with 256Mb table size...
>>
>> Andrei
>>
>> On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev 
>> wrote:
>> > Jean-Armel,
>> >
>> > I have only two large tables, the rest is super-small. In the test
>> > cluster
>> > of 15 nodes the largest table has about 110M rows. Its total size is
>> > about
>> > 1,26Gb per node (total disk space used per node for that CF). It's got
>> > about
>> > 5K sstables per node - the sstable size is 256Mb. cfstats on a "healthy"
>> > node look like this:
>> >
>> > Read Count: 8973748
>> > Read Latency: 16.130059053251774 ms.
>> > Write Count: 32099455
>> > Write Latency: 1.6124713938912671 ms.
>> > Pending Tasks: 0
>> > Table: wm_contacts
>> > SSTable count: 5195
>> > SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000,
>> > 0,
>> > 0, 0, 0]
>> > Space used (live), bytes: 1266060391852
>> > Space used (total), bytes: 1266144170869
>> > SSTable Compression Ratio: 0.32604853410787327
>> > Number of keys (estimate): 25696000
>> > Memtable cell count: 71402
>> > Memtable data size, bytes: 26938402
>> > Memtable switch count: 9489
>> > Local read count: 8973748
>> > Local read latency: 17.696 ms
>> > Local write count: 32099471
>> > Local write latency: 1.732 ms
>> > Pending tasks: 0
>> > Bloom filter false positives: 32248
>> > Bloom filter false ratio: 0.50685
>> > Bloom filter space used, bytes: 20744432
>> > Compacted partition minimum bytes: 104
>> > Compacted partition maximum bytes: 3379391
>> > Compacted partition mean bytes: 172660
>> > Average live cells per slice (last five minutes): 495.0
>> > Average tombstones per slice (last five minutes): 0.0
>> >
>> > Another table of similar structure (same number of rows) is about 4x
>> > times
>> > smaller. That table does not suffer from those issues - it compacts well
>> > and
>> > efficiently.
>> >
>> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce 
>> > wrote:
>> >>
>> >> Hi Nikolai,
>> >>
>> >> Please could you clarify a little bit what you call "a large amount of
>> >> data" ?
>> >>
>> >> How many tables ?
>> >> How many rows in your largest table ?
>> >> How many GB in your largest table ?
>> >> How many GB per node ?
>> >>
>> >> Thanks.
>> >>
>> >>
>> >>
>> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>> >>>
>> >>> Hi Nikolai,
>> >>>
>> >>> Thanks for those informations.
>> >>>
>> >>> Please could you clarify a little bit what you call "
>> >>>
>> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>> >>>>
>> >>>> Just to clarify - when I was talking about the large amount of data I
>> >>>> really meant large amount of data per node in a single CF (table).
>> >>>> LCS does
>> >>>> not seem to like it when it gets thousands of sstables (makes 4-5
>> >>>> levels).
>> >>>>
>> >>>> When bootstraping a new node you'd better enable that option from
>> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still
>> >>>> be a
>> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially
>> >>>> it had
>> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
>> >>>> not go
>> >>>> down. Number of sstables at L0  is over 11K and it is slowly slowly
>> >>>> building
>> >>>> upper levels. Total number of sstables is 4x the normal amount. 

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Nikolai,

Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables
with 256Mb table size...

Andrei

On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev  wrote:
> Jean-Armel,
>
> I have only two large tables, the rest is super-small. In the test cluster
> of 15 nodes the largest table has about 110M rows. Its total size is about
> 1,26Gb per node (total disk space used per node for that CF). It's got about
> 5K sstables per node - the sstable size is 256Mb. cfstats on a "healthy"
> node look like this:
>
> Read Count: 8973748
> Read Latency: 16.130059053251774 ms.
> Write Count: 32099455
> Write Latency: 1.6124713938912671 ms.
> Pending Tasks: 0
> Table: wm_contacts
> SSTable count: 5195
> SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000, 0,
> 0, 0, 0]
> Space used (live), bytes: 1266060391852
> Space used (total), bytes: 1266144170869
> SSTable Compression Ratio: 0.32604853410787327
> Number of keys (estimate): 25696000
> Memtable cell count: 71402
> Memtable data size, bytes: 26938402
> Memtable switch count: 9489
> Local read count: 8973748
> Local read latency: 17.696 ms
> Local write count: 32099471
> Local write latency: 1.732 ms
> Pending tasks: 0
> Bloom filter false positives: 32248
> Bloom filter false ratio: 0.50685
> Bloom filter space used, bytes: 20744432
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 3379391
> Compacted partition mean bytes: 172660
> Average live cells per slice (last five minutes): 495.0
> Average tombstones per slice (last five minutes): 0.0
>
> Another table of similar structure (same number of rows) is about 4x times
> smaller. That table does not suffer from those issues - it compacts well and
> efficiently.
>
> On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce  wrote:
>>
>> Hi Nikolai,
>>
>> Please could you clarify a little bit what you call "a large amount of
>> data" ?
>>
>> How many tables ?
>> How many rows in your largest table ?
>> How many GB in your largest table ?
>> How many GB per node ?
>>
>> Thanks.
>>
>>
>>
>> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>>>
>>> Hi Nikolai,
>>>
>>> Thanks for those informations.
>>>
>>> Please could you clarify a little bit what you call "
>>>
>>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>>>>
>>>> Just to clarify - when I was talking about the large amount of data I
>>>> really meant large amount of data per node in a single CF (table). LCS does
>>>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>>>
>>>> When bootstraping a new node you'd better enable that option from
>>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>>>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it 
>>>> had
>>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go
>>>> down. Number of sstables at L0  is over 11K and it is slowly slowly 
>>>> building
>>>> upper levels. Total number of sstables is 4x the normal amount. Now I am 
>>>> not
>>>> entirely sure if this node will ever get back to normal life. And believe 
>>>> me
>>>> - this is not because of I/O, I have SSDs everywhere and 16 physical cores.
>>>> This machine is barely using 1-3 cores at most of the time. The problem is
>>>> that allowing STCS fallback is not a good option either - it will quickly
>>>> result in a few 200Gb+ sstables in my configuration and then these sstables
>>>> will never be compacted. Plus, it will require close to 2x disk space on
>>>> EVERY disk in my JBOD configuration...this will kill the node sooner or
>>>> later. This is all because all sstables after bootstrap end at L0 and then
>>>> the process slowly slowly moves them to other levels. If you have write
>>>> traffic to that CF then the number of sstables and L0 will grow quickly -
>>>> like it happens in my case now.
>>>>
>>>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>>>> is implemented it may be better.
>>>>
>>>>
>>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>>>> wrote:
>>>

Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Jean-Armel,

I have the same problem/state as Nikolai. Here are my stats:
~ 1 table
~ 10B records
~ 2TB/node x 6 nodes

Nikolai,
I'm sort of wondering if switching to some larger sstable_size_in_mb
(say 4096 or 8192 or something) with LCS may be a solution, even if
not absolutely permanent?
As for huge sstables, I do already have some 400-500GB tables. The
only idea how I can manage to compact them in the future is to offline
split them at some point. Does it make sense?

(I'm still doing a test drive and really need to understand how we are
going to handle that in production)

Andrei.



On Mon, Nov 24, 2014 at 10:30 AM, Jean-Armel Luce  wrote:
> Hi Nikolai,
>
> Please could you clarify a little bit what you call "a large amount of data"
> ?
>
> How many tables ?
> How many rows in your largest table ?
> How many GB in your largest table ?
> How many GB per node ?
>
> Thanks.
>
>
>
> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>>
>> Hi Nikolai,
>>
>> Thanks for those informations.
>>
>> Please could you clarify a little bit what you call "
>>
>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>>>
>>> Just to clarify - when I was talking about the large amount of data I
>>> really meant large amount of data per node in a single CF (table). LCS does
>>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>>
>>> When bootstraping a new node you'd better enable that option from
>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it had
>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go
>>> down. Number of sstables at L0  is over 11K and it is slowly slowly building
>>> upper levels. Total number of sstables is 4x the normal amount. Now I am not
>>> entirely sure if this node will ever get back to normal life. And believe me
>>> - this is not because of I/O, I have SSDs everywhere and 16 physical cores.
>>> This machine is barely using 1-3 cores at most of the time. The problem is
>>> that allowing STCS fallback is not a good option either - it will quickly
>>> result in a few 200Gb+ sstables in my configuration and then these sstables
>>> will never be compacted. Plus, it will require close to 2x disk space on
>>> EVERY disk in my JBOD configuration...this will kill the node sooner or
>>> later. This is all because all sstables after bootstrap end at L0 and then
>>> the process slowly slowly moves them to other levels. If you have write
>>> traffic to that CF then the number of sstables and L0 will grow quickly -
>>> like it happens in my case now.
>>>
>>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>>> is implemented it may be better.
>>>
>>>
>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>>> wrote:
>>>>
>>>> Stephane,
>>>>
>>>> We are having a somewhat similar C* load profile. Hence some comments
>>>> in addition Nikolai's answer.
>>>> 1. Fallback to STCS - you can disable it actually
>>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>>> may work just fine. That is, till the moment you decide to join
>>>> another node - chances are that the newly added node will not be able
>>>> to compact what it gets from old nodes. In your case, if you switch
>>>> strategy the same thing may happen. This is all due to limitations
>>>> mentioned by Nikolai.
>>>>
>>>> Andrei,
>>>>
>>>>
>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
>>>> wrote:
>>>> > ABUSE
>>>> >
>>>> >
>>>> >
>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>>>> >
>>>> >
>>>> >
>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>>>> > Para: user@cassandra.apache.org
>>>> > Asunto: Re: Compaction Strategy guidance
>>>> > Importancia: Alta
>>>> >
>>>> >
>>>> >
>>>> > Stephane,
>>>> >
>>>> > As everything good, LCS comes at certain price.
>>>> >
>>>> > LCS will put most load on you I/O system (if you use spindles - you
>>>> > may need
>>>> > to b

Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Stephane,

We are having a somewhat similar C* load profile. Hence some comments
in addition Nikolai's answer.
1. Fallback to STCS - you can disable it actually
2. Based on our experience, if you have a lot of data per node, LCS
may work just fine. That is, till the moment you decide to join
another node - chances are that the newly added node will not be able
to compact what it gets from old nodes. In your case, if you switch
strategy the same thing may happen. This is all due to limitations
mentioned by Nikolai.

Andrei,


On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.  wrote:
> ABUSE
>
>
>
> YA NO QUIERO MAS MAILS SOY DE MEXICO
>
>
>
> De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
> Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
> Para: user@cassandra.apache.org
> Asunto: Re: Compaction Strategy guidance
> Importancia: Alta
>
>
>
> Stephane,
>
> As everything good, LCS comes at certain price.
>
> LCS will put most load on you I/O system (if you use spindles - you may need
> to be careful about that) and on CPU. Also LCS (by default) may fall back to
> STCS if it is falling behind (which is very possible with heavy writing
> activity) and this will result in higher disk space usage. Also LCS has
> certain limitation I have discovered lately. Sometimes LCS may not be able
> to use all your node's resources (algorithm limitations) and this reduces
> the overall compaction throughput. This may happen if you have a large
> column family with lots of data per node. STCS won't have this limitation.
>
>
>
> By the way, the primary goal of LCS is to reduce the number of sstables C*
> has to look at to find your data. With LCS properly functioning this number
> will be most likely between something like 1 and 3 for most of the reads.
> But if you do few reads and not concerned about the latency today, most
> likely LCS may only save you some disk space.
>
>
>
> On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
> wrote:
>
> Hi there,
>
>
>
> use case:
>
>
>
> - Heavy write app, few reads.
>
> - Lots of updates of rows / columns.
>
> - Current performance is fine, for both writes and reads..
>
> - Currently using SizedCompactionStrategy
>
>
>
> We're trying to limit the amount of storage used during compaction. Should
> we switch to LeveledCompactionStrategy?
>
>
>
> Thanks
>
>
>
>
> --
>
> Nikolai Grigoriev
> (514) 772-5178


Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Amazing how I missed the -Dcassandra.disable_stcs_in_l0=true option -
I have LeveledManifest source opened the whole day;-)

On Tue, Nov 18, 2014 at 9:15 PM, Andrei Ivanov  wrote:
> Thanks a lot for your support, Marcus - that is useful beyond all
> recognition!;-) And I will try #6621 right away.
>
> Sincerely, Andrei.
>
> On Tue, Nov 18, 2014 at 8:50 PM, Marcus Eriksson  wrote:
>> you should stick to as small nodes as possible yes :)
>>
>> There are a few relevant tickets related to bootstrap and LCS:
>> https://issues.apache.org/jira/browse/CASSANDRA-6621 - startup with
>> -Dcassandra.disable_stcs_in_l0=true to not do STCS in L0
>> https://issues.apache.org/jira/browse/CASSANDRA-7460 - (3.0) send source
>> sstable level when bootstrapping
>>
>> On Tue, Nov 18, 2014 at 3:33 PM, Andrei Ivanov  wrote:
>>>
>>> OK, got it.
>>>
>>> Actually, my problem is not that we constantly having many files at
>>> L0. Normally, quite a few of them - that is, nodes are managing to
>>> compact incoming writes in a timely manner.
>>>
>>> But it looks like when we join a new node, it receives tons of files
>>> from existing nodes (and they end up at L0, right?) and that seems to
>>> be where our problems start. In practice, in what I call the "old"
>>> cluster, compaction became a problem at ~2TB nodes. (You, know, we are
>>> trying to save something on HW - we are running on EC2 with EBS
>>> volumes)
>>>
>>> Do I get it right that, we better stick to cmaller nodes?
>>>
>>>
>>>
>>> On Tue, Nov 18, 2014 at 5:20 PM, Marcus Eriksson 
>>> wrote:
>>> > No, they will get compacted into smaller sstables in L1+ eventually
>>> > (once
>>> > you have less than 32 sstables in L0, an ordinary L0 -> L1 compaction
>>> > will
>>> > happen)
>>> >
>>> > But, if you consistently get many files in L0 it means that compaction
>>> > is
>>> > not keeping up with your inserts and you should probably expand your
>>> > cluster
>>> > (or consider going back to SizeTieredCompactionStrategy for the tables
>>> > that
>>> > take that many writes)
>>> >
>>> > /Marcus
>>> >
>>> > On Tue, Nov 18, 2014 at 2:49 PM, Andrei Ivanov 
>>> > wrote:
>>> >>
>>> >> Marcus, thanks a lot! It explains a lot those huge tables are indeed at
>>> >> L0.
>>> >>
>>> >> It seems that they start to appear as a result of some "massive"
>>> >> operations (join, repair, rebuild). What's their fate in the future?
>>> >> Will they continue to propagate like this through levels? Is there
>>> >> anything that can be done to avoid/solve/prevent this?
>>> >>
>>> >> My fears here are around a feeling that those big tables (like in my
>>> >> "old" cluster) will be hardly compactable in the future...
>>> >>
>>> >> Sincerely, Andrei.
>>> >>
>>> >> On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson 
>>> >> wrote:
>>> >> > I suspect they are getting size tiered in L0 - if you have too many
>>> >> > sstables
>>> >> > in L0, we will do size tiered compaction on sstables in L0 to improve
>>> >> > performance
>>> >> >
>>> >> > Use tools/bin/sstablemetadata to get the level for those sstables, if
>>> >> > they
>>> >> > are in L0, that is probably the reason.
>>> >> >
>>> >> > /Marcus
>>> >> >
>>> >> > On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov 
>>> >> > wrote:
>>> >> >>
>>> >> >> Dear all,
>>> >> >>
>>> >> >> I have the following problem:
>>> >> >> - C* 2.0.11
>>> >> >> - LCS with default 160MB
>>> >> >> - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx)
>>> >> >> - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx)
>>> >> >>
>>> >> >> I would expect the sstables to be of +- maximum 160MB. Despite this
>>> >> >> I
>>> >> >> see files like:
>>> >> >> 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db
>>> >> >> or
>>> >> >> 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db
>>> >> >>
>>> >> >> Am I missing something? What could be the reason? (Actually this is
>>> >> >> a
>>> >> >> "fresh" cluster - on an "old" one I'm seeing 500GB sstables). I'm
>>> >> >> getting really desperate in my attempt to understand what's going
>>> >> >> on.
>>> >> >>
>>> >> >> Thanks in advance Andrei.
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>


Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Thanks a lot for your support, Marcus - that is useful beyond all
recognition!;-) And I will try #6621 right away.

Sincerely, Andrei.

On Tue, Nov 18, 2014 at 8:50 PM, Marcus Eriksson  wrote:
> you should stick to as small nodes as possible yes :)
>
> There are a few relevant tickets related to bootstrap and LCS:
> https://issues.apache.org/jira/browse/CASSANDRA-6621 - startup with
> -Dcassandra.disable_stcs_in_l0=true to not do STCS in L0
> https://issues.apache.org/jira/browse/CASSANDRA-7460 - (3.0) send source
> sstable level when bootstrapping
>
> On Tue, Nov 18, 2014 at 3:33 PM, Andrei Ivanov  wrote:
>>
>> OK, got it.
>>
>> Actually, my problem is not that we constantly having many files at
>> L0. Normally, quite a few of them - that is, nodes are managing to
>> compact incoming writes in a timely manner.
>>
>> But it looks like when we join a new node, it receives tons of files
>> from existing nodes (and they end up at L0, right?) and that seems to
>> be where our problems start. In practice, in what I call the "old"
>> cluster, compaction became a problem at ~2TB nodes. (You, know, we are
>> trying to save something on HW - we are running on EC2 with EBS
>> volumes)
>>
>> Do I get it right that, we better stick to cmaller nodes?
>>
>>
>>
>> On Tue, Nov 18, 2014 at 5:20 PM, Marcus Eriksson 
>> wrote:
>> > No, they will get compacted into smaller sstables in L1+ eventually
>> > (once
>> > you have less than 32 sstables in L0, an ordinary L0 -> L1 compaction
>> > will
>> > happen)
>> >
>> > But, if you consistently get many files in L0 it means that compaction
>> > is
>> > not keeping up with your inserts and you should probably expand your
>> > cluster
>> > (or consider going back to SizeTieredCompactionStrategy for the tables
>> > that
>> > take that many writes)
>> >
>> > /Marcus
>> >
>> > On Tue, Nov 18, 2014 at 2:49 PM, Andrei Ivanov 
>> > wrote:
>> >>
>> >> Marcus, thanks a lot! It explains a lot those huge tables are indeed at
>> >> L0.
>> >>
>> >> It seems that they start to appear as a result of some "massive"
>> >> operations (join, repair, rebuild). What's their fate in the future?
>> >> Will they continue to propagate like this through levels? Is there
>> >> anything that can be done to avoid/solve/prevent this?
>> >>
>> >> My fears here are around a feeling that those big tables (like in my
>> >> "old" cluster) will be hardly compactable in the future...
>> >>
>> >> Sincerely, Andrei.
>> >>
>> >> On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson 
>> >> wrote:
>> >> > I suspect they are getting size tiered in L0 - if you have too many
>> >> > sstables
>> >> > in L0, we will do size tiered compaction on sstables in L0 to improve
>> >> > performance
>> >> >
>> >> > Use tools/bin/sstablemetadata to get the level for those sstables, if
>> >> > they
>> >> > are in L0, that is probably the reason.
>> >> >
>> >> > /Marcus
>> >> >
>> >> > On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov 
>> >> > wrote:
>> >> >>
>> >> >> Dear all,
>> >> >>
>> >> >> I have the following problem:
>> >> >> - C* 2.0.11
>> >> >> - LCS with default 160MB
>> >> >> - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx)
>> >> >> - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx)
>> >> >>
>> >> >> I would expect the sstables to be of +- maximum 160MB. Despite this
>> >> >> I
>> >> >> see files like:
>> >> >> 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db
>> >> >> or
>> >> >> 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db
>> >> >>
>> >> >> Am I missing something? What could be the reason? (Actually this is
>> >> >> a
>> >> >> "fresh" cluster - on an "old" one I'm seeing 500GB sstables). I'm
>> >> >> getting really desperate in my attempt to understand what's going
>> >> >> on.
>> >> >>
>> >> >> Thanks in advance Andrei.
>> >> >
>> >> >
>> >
>> >
>
>


Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
OK, got it.

Actually, my problem is not that we constantly having many files at
L0. Normally, quite a few of them - that is, nodes are managing to
compact incoming writes in a timely manner.

But it looks like when we join a new node, it receives tons of files
from existing nodes (and they end up at L0, right?) and that seems to
be where our problems start. In practice, in what I call the "old"
cluster, compaction became a problem at ~2TB nodes. (You, know, we are
trying to save something on HW - we are running on EC2 with EBS
volumes)

Do I get it right that, we better stick to cmaller nodes?



On Tue, Nov 18, 2014 at 5:20 PM, Marcus Eriksson  wrote:
> No, they will get compacted into smaller sstables in L1+ eventually (once
> you have less than 32 sstables in L0, an ordinary L0 -> L1 compaction will
> happen)
>
> But, if you consistently get many files in L0 it means that compaction is
> not keeping up with your inserts and you should probably expand your cluster
> (or consider going back to SizeTieredCompactionStrategy for the tables that
> take that many writes)
>
> /Marcus
>
> On Tue, Nov 18, 2014 at 2:49 PM, Andrei Ivanov  wrote:
>>
>> Marcus, thanks a lot! It explains a lot those huge tables are indeed at
>> L0.
>>
>> It seems that they start to appear as a result of some "massive"
>> operations (join, repair, rebuild). What's their fate in the future?
>> Will they continue to propagate like this through levels? Is there
>> anything that can be done to avoid/solve/prevent this?
>>
>> My fears here are around a feeling that those big tables (like in my
>> "old" cluster) will be hardly compactable in the future...
>>
>> Sincerely, Andrei.
>>
>> On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson 
>> wrote:
>> > I suspect they are getting size tiered in L0 - if you have too many
>> > sstables
>> > in L0, we will do size tiered compaction on sstables in L0 to improve
>> > performance
>> >
>> > Use tools/bin/sstablemetadata to get the level for those sstables, if
>> > they
>> > are in L0, that is probably the reason.
>> >
>> > /Marcus
>> >
>> > On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov 
>> > wrote:
>> >>
>> >> Dear all,
>> >>
>> >> I have the following problem:
>> >> - C* 2.0.11
>> >> - LCS with default 160MB
>> >> - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx)
>> >> - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx)
>> >>
>> >> I would expect the sstables to be of +- maximum 160MB. Despite this I
>> >> see files like:
>> >> 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db
>> >> or
>> >> 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db
>> >>
>> >> Am I missing something? What could be the reason? (Actually this is a
>> >> "fresh" cluster - on an "old" one I'm seeing 500GB sstables). I'm
>> >> getting really desperate in my attempt to understand what's going on.
>> >>
>> >> Thanks in advance Andrei.
>> >
>> >
>
>


Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Marcus, thanks a lot! It explains a lot those huge tables are indeed at L0.

It seems that they start to appear as a result of some "massive"
operations (join, repair, rebuild). What's their fate in the future?
Will they continue to propagate like this through levels? Is there
anything that can be done to avoid/solve/prevent this?

My fears here are around a feeling that those big tables (like in my
"old" cluster) will be hardly compactable in the future...

Sincerely, Andrei.

On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson  wrote:
> I suspect they are getting size tiered in L0 - if you have too many sstables
> in L0, we will do size tiered compaction on sstables in L0 to improve
> performance
>
> Use tools/bin/sstablemetadata to get the level for those sstables, if they
> are in L0, that is probably the reason.
>
> /Marcus
>
> On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov  wrote:
>>
>> Dear all,
>>
>> I have the following problem:
>> - C* 2.0.11
>> - LCS with default 160MB
>> - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx)
>> - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx)
>>
>> I would expect the sstables to be of +- maximum 160MB. Despite this I
>> see files like:
>> 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db
>> or
>> 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db
>>
>> Am I missing something? What could be the reason? (Actually this is a
>> "fresh" cluster - on an "old" one I'm seeing 500GB sstables). I'm
>> getting really desperate in my attempt to understand what's going on.
>>
>> Thanks in advance Andrei.
>
>


LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Dear all,

I have the following problem:
- C* 2.0.11
- LCS with default 160MB
- Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx)
- Compacted partition mean bytes: 6750 (for cf/table xxx.xxx)

I would expect the sstables to be of +- maximum 160MB. Despite this I
see files like:
192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db
or
631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db

Am I missing something? What could be the reason? (Actually this is a
"fresh" cluster - on an "old" one I'm seeing 500GB sstables). I'm
getting really desperate in my attempt to understand what's going on.

Thanks in advance Andrei.