date:20180220

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Prasenjit Sarkar

Jeff,

I don't think you can push the topic of usability back to developers by
asking them to open JIRAs. It is upon the technical leaders of the
Cassandra community to take the initiative in this regard. We can argue
back and forth on the dynamics of open source projects, but the usability
concerns of Cassandra is a reality that can not be ignored.

Prasenjit

PS My views, not those of my employer

On Tue, Feb 20, 2018 at 10:22 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> If you watch this video through you'll see why usability is so important.
> You can't ignore usability issues.
>
> Cassandra does not exist in a vacuum.  The competitors are world class.
>
> The video is on the New Cassandra API for Azure Cosmos DB:
> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>
> Kenneth Brotman
>
> -Original Message-
> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
> Sent: Tuesday, February 20, 2018 1:28 AM
> To: user@cassandra.apache.org; James Briggs
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Hi,
>
> I have to add my own two cents here as the main thing that keeps me from
> really running Cassandra is the amount of pain running it incurs.
> Not so much because it's actually painful but because the tools are so
> different and the documentation and best practices are scattered across a
> dozen outdated DataStax articles and this mailing list etc.. We've been
> hesitant (although our use case is perfect for using Cassandra) to deploy
> Cassandra to any critical systems as even after a year of running it we
> still don't have the operational experience to confidently run critical
> systems with it.
>
> Simple things like a foolproof / safe cluster-wide S3 Backup (like
> Elasticsearch has it) would for example solve a TON of issues for new
> people. I don't need it auto-scheduled or something, but having to
> configure cron jobs across the whole cluster is a pain in the ass for small
> teams.
> To be honest, even the way snapshots are done right now is already super
> painful. Every other system I operated so far will just create one backup
> folder I can export, in C* the Backup is scattered across a bunch of
> different Keyspace folders etc.. needless to say that it took a while until
> I trusted my backup scripts fully.
>
> And especially for a Database I believe Backup/Restore needs to be a
> non-issue that's documented front and center. If not smaller teams just
> don't have the resources to dedicate to learning and building the tools
> around it.
>
> Now that the team is getting larger we could spare the resources to
> operate these things, but switching from a well-understood RDBMs schema to
> Cassandra is now incredibly hard and will probably take years.
>
> greetings Daniel
>
> On Tue, 20 Feb 2018 at 05:56 James Briggs 
> wrote:
>
> > Kenneth:
> >
> > What you said is not wrong.
> >
> > Vertica and Riak are examples of distributed databases that don't
> > require hand-holding.
> >
> > Cassandra is for Java-programmer DIYers, or more often Datastax
> > clients, at this point.
> > Thanks, James.
> >
> > --
> > *From:* Kenneth Brotman 
> > *To:* user@cassandra.apache.org
> > *Cc:* d...@cassandra.apache.org
> > *Sent:* Monday, February 19, 2018 4:56 PM
> >
> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
> >
> > Jeff, you helped me figure out what I was missing.  It just took me a
> > day to digest what you wrote.  I’m coming over from another type of
> > engineering.  I didn’t know and it’s not really documented.  Cassandra
> > runs in a data center.  Now days that means the nodes are going to be
> > in managed containers, Docker containers, managed by Kerbernetes,
> > Meso or something, and for that reason anyone operating Cassandra in a
> > real world setting would not encounter the issues I raised in the way I
> described.
> >
> > Shouldn’t the architectural diagrams people reference indicate that in
> > some way?  That would have help me.
> >
> > Kenneth Brotman
> >
> > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> > *Sent:* Monday, February 19, 2018 10:43 AM
> > *To:* 'user@cassandra.apache.org'
> > *Cc:* 'd...@cassandra.apache.org'
> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
> >
> > Well said.  Very fair.  I wouldn’t mind hearing from others still
> > You’re a good guy!
> >
> > Kenneth Brotman
> >
> > *From:* Jeff Jirsa [mailto:jji...@gmail.com ]
> > *Sent:* Monday, February 19, 2018 9:10 AM
> > *To:* cassandra
> > *Cc:* Cassandra DEV
> > *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
> >
> > There's a lot of things below I disagree with, but it's ok. I
> > convinced myself not to nit-pick every point.
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of
> > Stefan's work with cert management
> >
> > Beyond that, I encourage

Re: LEAK DETECTED while minor compaction

2018-02-20 Thread Jeff Jirsa

Your bloom filter settings look broken. Did you set the FP ratio to 0? If so 
that’s a bad idea and we should have stopped you from doing it.


-- 
Jeff Jirsa


> On Feb 20, 2018, at 11:01 PM, Дарья Меленцова  wrote:
> 
> Hello.
> 
> Could you help me with LEAK DETECTED error while minor compaction process?
> 
> There is a table with a lot of small record 6.6*10^9 (mapping
> (eventId, boxId) -> cellId)).
> Minor compaction starts and then fails on 99% done with an error:
> 
> Stacktrace
> ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,032 Ref.java:207 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@1ca1bf87) to class
> org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@308695651:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big-Index.db
> was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@1659d4f7) to class
> org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1398495320:[Memory@[0..dc),
> Memory@[0..898)] was not released before the reference was garbage
> collected
> ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@42978833) to class
> org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1648504648:[[OffHeapBitSet]]
> was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@3a64a19b) to class
> org.apache.cassandra.io.sstable.format.SSTableReader$DescriptorTypeTidy@863282967:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big
> was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@4ddc775a) to class
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Cleanup@1041709510:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big-Data.db
> was not released before the reference was garbage collected
> 
> I have tried increase max heap size (8GB -> 16GB), but got the same error.
> How can I resolve the issue?
> 
> 
> Cassandra parameters and the problem table
> Cassandra v 2.2.9
> MAX_HEAP_SIZE="16G"
> java version "1.8.0_121"
> 
> compaction = {'min_threshold': '4', 'enabled': 'True', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
> 
> nodetoole tablestats
> Read Count: 1454605
>Read Latency: 2.0174777647540054 ms.
>Write Count: 12034909
>Write Latency: 0.044917336558174224 ms.
>Pending Flushes: 0
>Table: PerBoxEventSeriesEventIds
>SSTable count: 20
>Space used (live): 885969527458
>Space used (total): 885981801994
>Space used by snapshots (total): 0
>Off heap memory used (total): 19706226232
>SSTable Compression Ratio: 0.5722091068132875
>Number of keys (estimate): 6614724684
>Memtable cell count: 375796
>Memtable data size: 31073510
>Memtable off heap memory used: 0
>Memtable switch count: 30
>Local read count: 1454605
>Local read latency: NaN ms
>Local write count: 12034909
>Local write latency: NaN ms
>Pending flushes: 0
>Bloom filter false positives: 0
>Bloom filter false ratio: 0.0
>Bloom filter space used: -4075791744
>Bloom filter off heap memory used: 17399044576
>Index summary off heap memory used: 2091833184
>Compression metadata off heap memory used: 215348472
>Compacted partition minimum bytes: 104
>Compacted partition maximum bytes: 149
>Compacted partition mean bytes: 149
>Average live cells per slice (last five minutes): NaN
>Maximum live cells per slice (last five minutes): 0
>Average tombstones per slice (last five minutes): NaN
>Maximum tombstones per slice (last five minutes): 0
> 
> Thank You
> Darya Melentsova
> 
> email: ifire...@gmail.com
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>

LEAK DETECTED while minor compaction

2018-02-20 Thread Дарья Меленцова

Hello.

Could you help me with LEAK DETECTED error while minor compaction process?

There is a table with a lot of small record 6.6*10^9 (mapping
(eventId, boxId) -> cellId)).
Minor compaction starts and then fails on 99% done with an error:

Stacktrace
ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,032 Ref.java:207 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@1ca1bf87) to class
org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@308695651:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big-Index.db
was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@1659d4f7) to class
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1398495320:[Memory@[0..dc),
Memory@[0..898)] was not released before the reference was garbage
collected
ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@42978833) to class
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1648504648:[[OffHeapBitSet]]
was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@3a64a19b) to class
org.apache.cassandra.io.sstable.format.SSTableReader$DescriptorTypeTidy@863282967:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big
was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2018-02-05 10:06:17,033 Ref.java:207 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@4ddc775a) to class
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Cleanup@1041709510:/storage1/cassandra_events/data/EventsKeyspace/PerBoxEventSeriesEvents-41847c3049a211e6af50b9221207cca8/tmplink-lb-102593-big-Data.db
was not released before the reference was garbage collected

I have tried increase max heap size (8GB -> 16GB), but got the same error.
How can I resolve the issue?


Cassandra parameters and the problem table
Cassandra v 2.2.9
MAX_HEAP_SIZE="16G"
java version "1.8.0_121"

compaction = {'min_threshold': '4', 'enabled': 'True', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
compression = {'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'}

nodetoole tablestats
Read Count: 1454605
Read Latency: 2.0174777647540054 ms.
Write Count: 12034909
Write Latency: 0.044917336558174224 ms.
Pending Flushes: 0
Table: PerBoxEventSeriesEventIds
SSTable count: 20
Space used (live): 885969527458
Space used (total): 885981801994
Space used by snapshots (total): 0
Off heap memory used (total): 19706226232
SSTable Compression Ratio: 0.5722091068132875
Number of keys (estimate): 6614724684
Memtable cell count: 375796
Memtable data size: 31073510
Memtable off heap memory used: 0
Memtable switch count: 30
Local read count: 1454605
Local read latency: NaN ms
Local write count: 12034909
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: -4075791744
Bloom filter off heap memory used: 17399044576
Index summary off heap memory used: 2091833184
Compression metadata off heap memory used: 215348472
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0

Thank You
Darya Melentsova

email: ifire...@gmail.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Kenneth Brotman

If you watch this video through you'll see why usability is so important.  You 
can't ignore usability issues.  

Cassandra does not exist in a vacuum.  The competitors are world class.  

The video is on the New Cassandra API for Azure Cosmos DB:
https://www.youtube.com/watch?v=1Sf4McGN1AQ

Kenneth Brotman

-Original Message-
From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com] 
Sent: Tuesday, February 20, 2018 1:28 AM
To: user@cassandra.apache.org; James Briggs
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Hi,

I have to add my own two cents here as the main thing that keeps me from really 
running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so 
different and the documentation and best practices are scattered across a dozen 
outdated DataStax articles and this mailing list etc.. We've been hesitant 
(although our use case is perfect for using Cassandra) to deploy Cassandra to 
any critical systems as even after a year of running it we still don't have the 
operational experience to confidently run critical systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like 
Elasticsearch has it) would for example solve a TON of issues for new people. I 
don't need it auto-scheduled or something, but having to configure cron jobs 
across the whole cluster is a pain in the ass for small teams.
To be honest, even the way snapshots are done right now is already super 
painful. Every other system I operated so far will just create one backup 
folder I can export, in C* the Backup is scattered across a bunch of different 
Keyspace folders etc.. needless to say that it took a while until I trusted my 
backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a non-issue 
that's documented front and center. If not smaller teams just don't have the 
resources to dedicate to learning and building the tools around it.

Now that the team is getting larger we could spare the resources to operate 
these things, but switching from a well-understood RDBMs schema to Cassandra is 
now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs 
wrote:

> Kenneth:
>
> What you said is not wrong.
>
> Vertica and Riak are examples of distributed databases that don't 
> require hand-holding.
>
> Cassandra is for Java-programmer DIYers, or more often Datastax 
> clients, at this point.
> Thanks, James.
>
> --
> *From:* Kenneth Brotman 
> *To:* user@cassandra.apache.org
> *Cc:* d...@cassandra.apache.org
> *Sent:* Monday, February 19, 2018 4:56 PM
>
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Jeff, you helped me figure out what I was missing.  It just took me a 
> day to digest what you wrote.  I’m coming over from another type of 
> engineering.  I didn’t know and it’s not really documented.  Cassandra 
> runs in a data center.  Now days that means the nodes are going to be 
> in managed containers, Docker containers, managed by Kerbernetes,  
> Meso or something, and for that reason anyone operating Cassandra in a 
> real world setting would not encounter the issues I raised in the way I 
> described.
>
> Shouldn’t the architectural diagrams people reference indicate that in 
> some way?  That would have help me.
>
> Kenneth Brotman
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Monday, February 19, 2018 10:43 AM
> *To:* 'user@cassandra.apache.org'
> *Cc:* 'd...@cassandra.apache.org'
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Well said.  Very fair.  I wouldn’t mind hearing from others still  
> You’re a good guy!
>
> Kenneth Brotman
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com ]
> *Sent:* Monday, February 19, 2018 9:10 AM
> *To:* cassandra
> *Cc:* Cassandra DEV
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a lot of things below I disagree with, but it's ok. I 
> convinced myself not to nit-pick every point.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of 
> Stefan's work with cert management
>
> Beyond that, I encourage you to do what Michael suggested: open JIRAs 
> for things you care strongly about, work on them if you have time. 
> Sometime this year we'll schedule a NGCC (Next Generation Cassandra 
> Conference) where we talk about future project work and direction, I 
> encourage you to attend if you're able (I encourage anyone who cares 
> about the direction of Cassandra to attend, it's probably be either 
> free or very low cost, just to cover a venue and some food). If 
> nothing else, you'll meet some of the teams who are working on the 
> project, and learn why they've selected the projects on which they're 
> working. You'll have an opportunity to pitch

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-20 Thread Jeff Jirsa

You add the nodes with rf=0 so there’s no streaming, then bump it to rf=1 and 
run repair, then rf=2 and run repair, then rf=3 and run repair, then you either 
change the app to use local quorum in the new dc, or reverse the process by 
decreasing the rf in the original dc by 1 at a time

-- 
Jeff Jirsa


> On Feb 20, 2018, at 8:51 PM, Kyrylo Lebediev  wrote:
> 
> I'd say, "add new DC, then remove old DC" approach is more risky especially 
> if they use QUORUM CL (in this case they will need to change CL to 
> LOCAL_QUORUM, otherwise they'll run into a lot of blocking read repairs).
> Also, if there is a chance to get rid of streaming, it worth doing as usually 
> direct data copy (not by means of C*) is more effective and less troublesome.
> 
> Regards,
> Kyrill
> 
> 
> From: Nitan Kainth 
> Sent: Wednesday, February 21, 2018 1:04:05 AM
> To: user@cassandra.apache.org
> Subject: Re: Best approach to Replace existing 8 smaller nodes in production 
> cluster with New 8 nodes that are bigger in capacity, without a downtime
> 
> You can also create a new DC and then terminate old one.
> 
> Sent from my iPhone
> 
>> On Feb 20, 2018, at 2:49 PM, Kyrylo Lebediev  
>> wrote:
>> 
>> Hi,
>> Consider using this approach, replacing nodes one by one: 
>> https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
>> 
>> Regards,
>> Kyrill
>> 
>> 
>> From: Leena Ghatpande 
>> Sent: Tuesday, February 20, 2018 10:24:24 PM
>> To: user@cassandra.apache.org
>> Subject: Best approach to Replace existing 8 smaller nodes in production 
>> cluster with New 8 nodes that are bigger in capacity, without a downtime
>> 
>> Best approach to replace existing 8 smaller 8 nodes in production cluster 
>> with New 8 nodes that are bigger in capacity without a downtime
>> 
>> We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 
>> 8 nodes that are bigger in capacity in terms of RAM,CPU and Diskspace 
>> without a downtime.
>> The RF is set to 3 currently, and we have 2 large tables with upto 70Million 
>> rows
>> 
>> What would be the best approach to implement this
>>- Add 1 New Node and Decomission 1 Old node at a time?
>>- Add all New nodes to the cluster, and then decommission old nodes ?
>>If we do this, can we still keep the RF=3 while we have 16 nodes at a 
>> point in the cluster before we start decommissioning?
>>   - How long do we wait in between adding a Node or decomissiing to ensure 
>> the process is complete before we proceed?
>>   - Any tool that we can use to monitor if the add/decomission node is done 
>> before we proceed to next
>> 
>> Any other suggestion?
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-20 Thread Kyrylo Lebediev

I'd say, "add new DC, then remove old DC" approach is more risky especially if 
they use QUORUM CL (in this case they will need to change CL to LOCAL_QUORUM, 
otherwise they'll run into a lot of blocking read repairs).
Also, if there is a chance to get rid of streaming, it worth doing as usually 
direct data copy (not by means of C*) is more effective and less troublesome.

Regards,
Kyrill


From: Nitan Kainth 
Sent: Wednesday, February 21, 2018 1:04:05 AM
To: user@cassandra.apache.org
Subject: Re: Best approach to Replace existing 8 smaller nodes in production 
cluster with New 8 nodes that are bigger in capacity, without a downtime

You can also create a new DC and then terminate old one.

Sent from my iPhone

> On Feb 20, 2018, at 2:49 PM, Kyrylo Lebediev  wrote:
>
> Hi,
> Consider using this approach, replacing nodes one by one: 
> https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
>
> Regards,
> Kyrill
>
> 
> From: Leena Ghatpande 
> Sent: Tuesday, February 20, 2018 10:24:24 PM
> To: user@cassandra.apache.org
> Subject: Best approach to Replace existing 8 smaller nodes in production 
> cluster with New 8 nodes that are bigger in capacity, without a downtime
>
> Best approach to replace existing 8 smaller 8 nodes in production cluster 
> with New 8 nodes that are bigger in capacity without a downtime
>
> We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 8 
> nodes that are bigger in capacity in terms of RAM,CPU and Diskspace without a 
> downtime.
> The RF is set to 3 currently, and we have 2 large tables with upto 70Million 
> rows
>
> What would be the best approach to implement this
> - Add 1 New Node and Decomission 1 Old node at a time?
> - Add all New nodes to the cluster, and then decommission old nodes ?
> If we do this, can we still keep the RF=3 while we have 16 nodes at a 
> point in the cluster before we start decommissioning?
>- How long do we wait in between adding a Node or decomissiing to ensure 
> the process is complete before we proceed?
>- Any tool that we can use to monitor if the add/decomission node is done 
> before we proceed to next
>
> Any other suggestion?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-20 Thread kurt greaves

Probably a lot of work but it would be incredibly useful for vnodes if
flushing was range aware (to be used with RangeAwareCompactionStrategy).
The writers are already range aware for JBOD, but that's not terribly
valuable ATM.

On 20 February 2018 at 21:57, Jeff Jirsa  wrote:

> There are some arguments to be made that the flush should consider
> compaction strategy - would allow a bug flush to respect LCS filesizes or
> break into smaller pieces to try to minimize range overlaps going from l0
> into l1, for example.
>
> I have no idea how much work would be involved, but may be worthwhile.
>
>
> --
> Jeff Jirsa
>
>
> On Feb 20,  2018, at 1:26 PM, Jon Haddad  wrote:
>
> The file format is independent from compaction.  A compaction strategy
> only selects sstables to be compacted, that’s it’s only job.  It could have
> side effects, like generating other files, but any decent compaction
> strategy will account for the fact that those other files don’t exist.
>
> I wrote a blog post a few months ago going over some of the nuance of
> compaction you mind find informative: http://thelastpickle.com/blog/2017/
> 03/16/compaction-nuance.html
>
> This is also the wrong mailing list, please direct future user questions
> to the user list.  The dev list is for development of Cassandra itself.
>
> Jon
>
> On Feb 20, 2018, at 1:10 PM, Carl Mueller 
> wrote:
>
> When memtables/CommitLogs are flushed to disk/sstable, does the sstable go
> through sstable organization specific to each compaction strategy, or is
> the sstable creation the same for all compactionstrats and it is up to the
> compaction strategy to recompact the sstable if desired?
>
>
>

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread kurt greaves

>
> Outside of rack awareness, would the next primary ranges take the replica
> ranges?

Yes.

Re: Performance Of IN Queries On Wide Rows

2018-02-20 Thread Eric Stevens

Someone can correct me if I'm wrong, but I believe if you do a large IN()
on a single partition's cluster keys, all the reads are going to be served
from a single replica.  Compared to many concurrent individual equal
statements you can get the performance gain of leaning on several replicas
for parallelism.

On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins 
wrote:

> Hello,
>
> When querying large wide rows for multiple specific values is it
> better to do separate queries for each value...or do it with one query
> and an "IN"? I am using Cassandra 2.1.14
>
> I am asking because I had changed my app to use 'IN' queries and it
> **appears** to be slower rather than faster. I had assumed that the
> "IN" query should be faster...as I assumed it only needs to go down
> the read path once (i.e. row cache -> memtable -> key cache -> bloom
> filter -> index summary -> index -> compaction -> sstable) rather than
> once for each entry? Or are there some additional caveats that I
> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
> query entries, closeness of 'IN' query values in the SSTable etc.)?
>
> thanks in advance,
> Gareth Collins
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-20 Thread Nitan Kainth

You can also create a new DC and then terminate old one.

Sent from my iPhone

> On Feb 20, 2018, at 2:49 PM, Kyrylo Lebediev  wrote:
> 
> Hi,
> Consider using this approach, replacing nodes one by one: 
> https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
> 
> Regards,
> Kyrill
> 
> 
> From: Leena Ghatpande 
> Sent: Tuesday, February 20, 2018 10:24:24 PM
> To: user@cassandra.apache.org
> Subject: Best approach to Replace existing 8 smaller nodes in production 
> cluster with New 8 nodes that are bigger in capacity, without a downtime
> 
> Best approach to replace existing 8 smaller 8 nodes in production cluster 
> with New 8 nodes that are bigger in capacity without a downtime
> 
> We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 8 
> nodes that are bigger in capacity in terms of RAM,CPU and Diskspace without a 
> downtime.
> The RF is set to 3 currently, and we have 2 large tables with upto 70Million 
> rows
> 
> What would be the best approach to implement this
> - Add 1 New Node and Decomission 1 Old node at a time?
> - Add all New nodes to the cluster, and then decommission old nodes ?
> If we do this, can we still keep the RF=3 while we have 16 nodes at a 
> point in the cluster before we start decommissioning?
>- How long do we wait in between adding a Node or decomissiing to ensure 
> the process is complete before we proceed?
>- Any tool that we can use to monitor if the add/decomission node is done 
> before we proceed to next
> 
> Any other suggestion?
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Installing the common service to start cassandrea

2018-02-20 Thread Jeff Hechter



Hi,

I have cassandra running on my machine(Windows). I have downloaded
commons-daemon-1.1.0-bin-windows.zip and extracted it to
cassandra\bin\daemon. I successfully created the service using
cassandra.bat -install.

When I go to start the service I get error below. When I start from the
command line it works fine. Any idea where I can update the location of the
IBM Jre.

The description for Event ID 2 from source IBM Java cannot be found. Either
the component that raises this event is not installed on your local
computer or the installation is corrupted. You can install or repair the
component on the local computer.


Thank You
Jeff Hechter

Scrum Master - Spectrum Control Install Development

Phone: 1-520-799-5146
Email : jhech...@us.ibm.com

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-20 Thread Jeff Jirsa

There are some arguments to be made that the flush should consider compaction 
strategy - would allow a bug flush to respect LCS filesizes or break into 
smaller pieces to try to minimize range overlaps going from l0 into l1, for 
example.

I have no idea how much work would be involved, but may be worthwhile.


-- 
Jeff Jirsa


> On Feb 20,  2018, at 1:26 PM, Jon Haddad  wrote:
> 
> The file format is independent from compaction.  A compaction strategy only 
> selects sstables to be compacted, that’s it’s only job.  It could have side 
> effects, like generating other files, but any decent compaction strategy will 
> account for the fact that those other files don’t exist. 
> 
> I wrote a blog post a few months ago going over some of the nuance of 
> compaction you mind find informative: 
> http://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
> 
> This is also the wrong mailing list, please direct future user questions to 
> the user list.  The dev list is for development of Cassandra itself.
> 
> Jon
> 
>> On Feb 20, 2018, at 1:10 PM, Carl Mueller  
>> wrote:
>> 
>> When memtables/CommitLogs are flushed to disk/sstable, does the sstable go
>> through sstable organization specific to each compaction strategy, or is
>> the sstable creation the same for all compactionstrats and it is up to the
>> compaction strategy to recompact the sstable if desired?
>

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-20 Thread Jon Haddad

The file format is independent from compaction.  A compaction strategy only 
selects sstables to be compacted, that’s it’s only job.  It could have side 
effects, like generating other files, but any decent compaction strategy will 
account for the fact that those other files don’t exist. 

I wrote a blog post a few months ago going over some of the nuance of 
compaction you mind find informative: 
http://thelastpickle.com/blog/2017/03/16/compaction-nuance.html 

This is also the wrong mailing list, please direct future user questions to the 
user list.  The dev list is for development of Cassandra itself.

Jon

> On Feb 20, 2018, at 1:10 PM, Carl Mueller  
> wrote:
> 
> When memtables/CommitLogs are flushed to disk/sstable, does the sstable go
> through sstable organization specific to each compaction strategy, or is
> the sstable creation the same for all compactionstrats and it is up to the
> compaction strategy to recompact the sstable if desired?

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jürgen Albersdorfer

We do archiving data in Order to make assumptions on it in future. So, yes we 
expect to grow continously. In the mean time I learned to go for predictable 
grow per partition rather than unpredictable large partitioning. So today we 
are growing 250.000.000 Records per Day going into a single table and heading 
towards to about 100 times that number this year. A Partition will grow one 
Record a Day, which should give us good horizontal scaleability, but means 
250.000.000 to 25.000.000.000 partitions. Hope this Numbers should not make me 
feel uncomfortable :)

Von meinem iPhone gesendet

> Am 20.02.2018 um 21:39 schrieb Jeff Jirsa :
> 
> At a past job, we set the limit at around 60 hosts per cluster - anything 
> bigger than that got single token. Anything smaller, and we'd just tolerate 
> the inconveniences of vnodes. But that was before the new vnode token 
> allocation went into 3.0, and really assumed things that may not be true for 
> you (it was a cluster that started at 60 hosts and grew up to 480 in steps, 
> so we'd want to grow quickly - having single token allowed us to grow from 
> 60-120 in 2 days, and then 120-180 in 2 days, and so on).
> 
> Are you always going to be growing, or is it a short/temporary thing?
> There are users of vnodes (at big, public companies) that go up into the 
> hundreds of nodes.
> 
> Most people running cassandra start sharding clusters rather than going past 
> a thousand or so nodes - I know there's at least one person I talked to in 
> IRC with a 1700 host cluster, but that'd be beyond what I'd ever do 
> personally.
> 
> 
> 
>> On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer 
>>  wrote:
>> Thanks Jeff,
>> your answer is really not what I expected to learn - which is again more 
>> manual doing as soon as we start really using C*. But I‘m happy to be able 
>> to learn it now and have still time to learn the neccessary Skills and ask 
>> the right questions on how to correctly drive big data with C* until we 
>> actually start using it, and I‘m glad to have People like you around caring 
>> about this questions. Thanks. This still convinces me having bet on the 
>> right horse, even when it might become a rough ride.
>> 
>> By the way, is it possible to migrate towards to smaller token ranges? What 
>> is the recommended way doing so? And which number of nodes is the typical 
>> ‚break even‘?
>> 
>> Von meinem iPhone gesendet
>> 
>>> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa :
>>> 
>>> The scenario you describe is the typical point where people move away from 
>>> vnodes and towards single-token-per-node (or a much smaller number of 
>>> vnodes).
>>> 
>>> The default setting puts you in a situation where virtually all hosts are 
>>> adjacent/neighbors to all others (at least until you're way into the 
>>> hundreds of hosts), which means you'll stream from nearly all hosts. If you 
>>> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the 
>>> number of streams drop as well.
>>> 
>>> Many people with "large" clusters statically allocate tokens to make it 
>>> predictable - if you have a single token per host, you can add multiple 
>>> hosts at a time, each streaming from a small number of neighbors, without 
>>> overlap.
>>> 
>>> It takes a bit more tooling (or manual token calculation) outside of 
>>> cassandra, but works well in practice for "large" clusters.
>>> 
>>> 
>>> 
>>> 
 On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer 
  wrote:
 Hi, I'm wondering if it is possible resp. would it make sense to limit 
 concurrent streaming when joining a new node to cluster.
 
 I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining 
 another Node every day.
 The 'nodetool netstats' shows it always streams data from all other nodes.
 
 How far will this scale? - What happens when I have hundrets or even 
 thousends of Nodes?
 
 Has anyone experience with such a Situation?
 
 Thanks, and regards
 Jürgen
>>> 
>

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-20 Thread Kyrylo Lebediev

Hi,
Consider using this approach, replacing nodes one by one: 
https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/

Regards,
Kyrill


From: Leena Ghatpande 
Sent: Tuesday, February 20, 2018 10:24:24 PM
To: user@cassandra.apache.org
Subject: Best approach to Replace existing 8 smaller nodes in production 
cluster with New 8 nodes that are bigger in capacity, without a downtime

Best approach to replace existing 8 smaller 8 nodes in production cluster with 
New 8 nodes that are bigger in capacity without a downtime

We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 8 
nodes that are bigger in capacity in terms of RAM,CPU and Diskspace without a 
downtime.
The RF is set to 3 currently, and we have 2 large tables with upto 70Million 
rows

What would be the best approach to implement this
 - Add 1 New Node and Decomission 1 Old node at a time?
 - Add all New nodes to the cluster, and then decommission old nodes ?
 If we do this, can we still keep the RF=3 while we have 16 nodes at a 
point in the cluster before we start decommissioning?
- How long do we wait in between adding a Node or decomissiing to ensure 
the process is complete before we proceed?
- Any tool that we can use to monitor if the add/decomission node is done 
before we proceed to next

Any other suggestion?


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jeff Jirsa

At a past job, we set the limit at around 60 hosts per cluster - anything
bigger than that got single token. Anything smaller, and we'd just tolerate
the inconveniences of vnodes. But that was before the new vnode token
allocation went into 3.0, and really assumed things that may not be true
for you (it was a cluster that started at 60 hosts and grew up to 480 in
steps, so we'd want to grow quickly - having single token allowed us to
grow from 60-120 in 2 days, and then 120-180 in 2 days, and so on).

Are you always going to be growing, or is it a short/temporary thing?
There are users of vnodes (at big, public companies) that go up into the
hundreds of nodes.

Most people running cassandra start sharding clusters rather than going
past a thousand or so nodes - I know there's at least one person I talked
to in IRC with a 1700 host cluster, but that'd be beyond what I'd ever do
personally.



On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer <
jalbersdor...@gmail.com> wrote:

> Thanks Jeff,
> your answer is really not what I expected to learn - which is again more
> manual doing as soon as we start really using C*. But I‘m happy to be able
> to learn it now and have still time to learn the neccessary Skills and ask
> the right questions on how to correctly drive big data with C* until we
> actually start using it, and I‘m glad to have People like you around caring
> about this questions. Thanks. This still convinces me having bet on the
> right horse, even when it might become a rough ride.
>
> By the way, is it possible to migrate towards to smaller token ranges?
> What is the recommended way doing so? And which number of nodes is the
> typical ‚break even‘?
>
> Von meinem iPhone gesendet
>
> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa :
>
> The scenario you describe is the typical point where people move away from
> vnodes and towards single-token-per-node (or a much smaller number of
> vnodes).
>
> The default setting puts you in a situation where virtually all hosts are
> adjacent/neighbors to all others (at least until you're way into the
> hundreds of hosts), which means you'll stream from nearly all hosts. If you
> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the
> number of streams drop as well.
>
> Many people with "large" clusters statically allocate tokens to make it
> predictable - if you have a single token per host, you can add multiple
> hosts at a time, each streaming from a small number of neighbors, without
> overlap.
>
> It takes a bit more tooling (or manual token calculation) outside of
> cassandra, but works well in practice for "large" clusters.
>
>
>
>
> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer <
> jalbersdor...@gmail.com> wrote:
>
>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>> concurrent streaming when joining a new node to cluster.
>>
>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>> another Node every day.
>> The 'nodetool netstats' shows it always streams data from all other nodes.
>>
>> How far will this scale? - What happens when I have hundrets or even
>> thousends of Nodes?
>>
>> Has anyone experience with such a Situation?
>>
>> Thanks, and regards
>> Jürgen
>>
>
>

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jürgen Albersdorfer

Thanks Jeff,
your answer is really not what I expected to learn - which is again more manual 
doing as soon as we start really using C*. But I‘m happy to be able to learn it 
now and have still time to learn the neccessary Skills and ask the right 
questions on how to correctly drive big data with C* until we actually start 
using it, and I‘m glad to have People like you around caring about this 
questions. Thanks. This still convinces me having bet on the right horse, even 
when it might become a rough ride.

By the way, is it possible to migrate towards to smaller token ranges? What is 
the recommended way doing so? And which number of nodes is the typical ‚break 
even‘?

Von meinem iPhone gesendet

> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa :
> 
> The scenario you describe is the typical point where people move away from 
> vnodes and towards single-token-per-node (or a much smaller number of vnodes).
> 
> The default setting puts you in a situation where virtually all hosts are 
> adjacent/neighbors to all others (at least until you're way into the hundreds 
> of hosts), which means you'll stream from nearly all hosts. If you drop the 
> number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the number of 
> streams drop as well.
> 
> Many people with "large" clusters statically allocate tokens to make it 
> predictable - if you have a single token per host, you can add multiple hosts 
> at a time, each streaming from a small number of neighbors, without overlap.
> 
> It takes a bit more tooling (or manual token calculation) outside of 
> cassandra, but works well in practice for "large" clusters.
> 
> 
> 
> 
>> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer 
>>  wrote:
>> Hi, I'm wondering if it is possible resp. would it make sense to limit 
>> concurrent streaming when joining a new node to cluster.
>> 
>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining another 
>> Node every day.
>> The 'nodetool netstats' shows it always streams data from all other nodes.
>> 
>> How far will this scale? - What happens when I have hundrets or even 
>> thousends of Nodes?
>> 
>> Has anyone experience with such a Situation?
>> 
>> Thanks, and regards
>> Jürgen
>

Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-20 Thread Leena Ghatpande

Best approach to replace existing 8 smaller 8 nodes in production cluster with 
New 8 nodes that are bigger in capacity without a downtime

We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 8 
nodes that are bigger in capacity in terms of RAM,CPU and Diskspace without a 
downtime.
The RF is set to 3 currently, and we have 2 large tables with upto 70Million 
rows

What would be the best approach to implement this
 - Add 1 New Node and Decomission 1 Old node at a time?
 - Add all New nodes to the cluster, and then decommission old nodes ?
 If we do this, can we still keep the RF=3 while we have 16 nodes at a 
point in the cluster before we start decommissioning?
- How long do we wait in between adding a Node or decomissiing to ensure 
the process is complete before we proceed?
- Any tool that we can use to monitor if the add/decomission node is done 
before we proceed to next

Any other suggestion?

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jeff Jirsa

The scenario you describe is the typical point where people move away from
vnodes and towards single-token-per-node (or a much smaller number of
vnodes).

The default setting puts you in a situation where virtually all hosts are
adjacent/neighbors to all others (at least until you're way into the
hundreds of hosts), which means you'll stream from nearly all hosts. If you
drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the
number of streams drop as well.

Many people with "large" clusters statically allocate tokens to make it
predictable - if you have a single token per host, you can add multiple
hosts at a time, each streaming from a small number of neighbors, without
overlap.

It takes a bit more tooling (or manual token calculation) outside of
cassandra, but works well in practice for "large" clusters.

On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer <
jalbersdor...@gmail.com> wrote:

> Hi, I'm wondering if it is possible resp. would it make sense to limit
> concurrent streaming when joining a new node to cluster.
>
> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
> another Node every day.
> The 'nodetool netstats' shows it always streams data from all other nodes.
>
> How far will this scale? - What happens when I have hundrets or even
> thousends of Nodes?
>
> Has anyone experience with such a Situation?
>
> Thanks, and regards
> Jürgen
>

Performance Of IN Queries On Wide Rows

2018-02-20 Thread Gareth Collins

Hello,

When querying large wide rows for multiple specific values is it
better to do separate queries for each value...or do it with one query
and an "IN"? I am using Cassandra 2.1.14

I am asking because I had changed my app to use 'IN' queries and it
**appears** to be slower rather than faster. I had assumed that the
"IN" query should be faster...as I assumed it only needs to go down
the read path once (i.e. row cache -> memtable -> key cache -> bloom
filter -> index summary -> index -> compaction -> sstable) rather than
once for each entry? Or are there some additional caveats that I
should be aware of for 'IN' query performance (e.g. ordering of 'IN'
query entries, closeness of 'IN' query values in the SSTable etc.)?

thanks in advance,
Gareth Collins

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra counter readtimeout error

2018-02-20 Thread Carl Mueller

How "hot" are your partition keys in these counters?

I would think, theoretically, if specific partition keys are getting
thousands of counter increments/mutations updates, then compaction won't
"compact" those together into the final value, and you'll start
experiencing the problems people get with rows with thousands of tombstones.

So if you had an event 'birthdaypartyattendance'

and you had 1110 separate updates doing +1s/+2s/+3s to the attendance count
for that event (what a bday party!), then when you went to select that
final attendance value, with many of those increments may still be on other
nodes and not fully replicated, then it will have to read 1110 cells and
accumulate them to the final value. When replication has completed and
compaction runs, it should amalgamate those. QUORUM-write will help with
ensuring the counter mutations are written to the proper number of nodes,
with the usual three node wait overhead.

DISCLAIMER: I don't have working knowledge of the code in distributed
counters. I just know they are a really hard problem and don't work great
in 2.x. As said, 3.x seems to be a lot better.

On Mon, Feb 19, 2018 at 10:43 AM, Alain RODRIGUEZ 
wrote:

> Hi Javier,
>
> Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
> version and 3.11 a better series.
>
> Excuse my misunderstanding, your table seems to be better designed than
> thought.
>
> Welcome to the Apache Cassandra community!
>
> C*heers ;-)
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2018-02-19 9:31 GMT+00:00 Javier Pareja :
>
>> Hi,
>>
>> Thank you for your reply.
>>
>> As I was bothered by this problem, last night I upgraded the cluster to
>> version 3.11.1 and everything is working now. As far as I can tell the
>> counter table can be read now. I will be doing more testing today with this
>> version but it is looking good.
>>
>> To answer your questions:
>> - I might not have explained the table definition very well but the table
>> does not have 6 partitions, but 6 partition keys. There are thousands of
>> partitions in that table, a combination of all those partition keys. I also
>> made sure that the partitions remained small when designing the table.
>> - I also enabled tracing in the CQLSH but it showed nothing when querying
>> this row. It however did when querying other tables...
>>
>> Thanks again for your reply!! I am very excited to be part of the
>> Cassandra user base.
>>
>> Javier
>>
>>
>>
>> F Javier Pareja
>>
>> On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ 
>> wrote:
>>
>>>
>>> Hello,
>>>
>>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>>
>>>
>>> I think the root issue is this ^. There might be some inefficiency or
>>> issues with counter, but this design, makes Cassandra relatively
>>> inefficient in most cases and using standard columns or counters
>>> indifferently.
>>>
>>> Cassandra data is supposed to be well distributed for a maximal
>>> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
>>> chances that the load is fairly imbalanced. If you have less nodes, it's
>>> still probably poorly balanced. Also reading from a small number of
>>> sstables and in parallel within many nodes ideally to split the work and
>>> make queries efficient, but in this case cassandra is reading huge
>>> partitions from one node most probably. When the size of the request is too
>>> big it can timeout. I am not sure how pagination works with counters, but I
>>> believe even if pagination is working, at some point, you are just reading
>>> too much (or too inefficiently) and the timeout is reached.
>>>
>>> I imagined it worked well for a while as counters are very small columns
>>> / tables compared to any event data but at some point you might have
>>> reached 'physical' limit, because you are pulling *all* the information
>>> you need from one partition (and probably many SSTables)
>>>
>>> Is there really no other way to design this use case?
>>>
>>> When data starts to be inserted, I can query the counters correctly from
 that particular row but after a few minutes updating the table with
 thousands of events, I get a read timeout every time

>>>
>>> Troubleshot:
>>> - Use tracing to understand what takes so long with your queries
>>> - Check for warns / error in the logs. Cassandra use to complain when it
>>> is unhappy with the configurations. There a lot of interesting and it's
>>> been a while I last had a failure with no relevant informations in the logs.
>>> - Check SSTable per read and other read performances for this counter
>>> table. Using some monitoring could make the reason of this timeout obvious.
>>> If you use Datadog for example, I guess that a quick look at the "Read
>>> Path" Dashboard would help. If you are using

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller

Ahhh, the topology strategy does that.

But if one were to maintain the same rack topology and was adding nodes
just within the racks... Hm, might not be possible in new nodes. ALthough
AWS "racks" are at the availability zone IIRC, so that would be doable.

Outside of rack awareness, would the next primary ranges take the replica
ranges?

On Tue, Feb 20, 2018 at 11:45 AM, Jon Haddad  wrote:

> That’s why you use a NTS + a snitch, it picks replaces based on rack
> awareness.
>
>
> On Feb 20, 2018, at 9:33 AM, Carl Mueller 
> wrote:
>
> So in theory, one could double a cluster by:
>
> 1) moving snapshots of each node to a new node.
> 2) for each snapshot moved, figure out the primary range of the new node
> by taking the old node's primary range token and calculating the midpoint
> value between that and the next primary range start token
> 3) the RFs should be preserved since the snapshot have a replicated set of
> data for the old primary range, the next primary has a RF already, and so
> does the n+1 primary range already
>
> data distribution will be the same as the old primary range distirubtion.
>
> Then nodetool clean and repair would get rid of old data ranges not needed
> anymore.
>
> In practice, is this possible? I have heard Priam can double clusters and
> they do not use vnodes. I am assuming they do a similar approach but they
> only have to calculate single tokens?
>
> On Tue, Feb 20, 2018 at 11:21 AM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> As I understand it: Replicas of data are replicated to the next primary
>> range owner.
>>
>> As tokens are randomly generated (at least in 2.1.x that I am on), can't
>> we have this situation:
>>
>> Say we have RF3, but the tokens happen to line up where:
>>
>> NodeA handles 0-10
>> NodeB handles  11-20
>> NodeA handlea 21-30
>> NodeB handles 31-40
>> NodeC handles 40-50
>>
>> The key aspect of that is that the random assignment of primary range
>> vnode tokens has resulted in NodeA and NodeB being the primaries for four
>> adjacent primary ranges.
>>
>> IF RF is replicated by going to the next adjacent nodes in the primary
>> range, and we are, say RF3, then B will have a replica of A, and then the
>> THIRD REPLICA IS BACK ON A.
>>
>> Is the RF distribution durable to this by ignoring the reappearance of A
>> and then cycling through until a unique node (NodeC) is encountered, and
>> then that becomes the third replica?
>>
>>
>>
>>
>
>

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Russell Bateman

I ask Cassandra to be a database that is high-performance, highly 
scalable with no single point of failure. Anything "cool" that's added 
beyond must be added only as a separate, optional ring around Cassandra 
and must not get in the way of my usage.


Yes, I would like some help with some of what's listed here, but you 
should understand that most shops adopting Cassandra are already going 
to have DevOps/database management personnel, expertise, methods, 
protocols and, in some instances, tools already in place. Even the small 
shop I work in has guys saddled with taking care of Cassandra (I'm a 
developer and not one of these guys) and seem not to share these 
concerns because they've already got it covered (like the specific YAML 
configuration complaint).


If there were an option or two I'd like to see, one would be the ability 
to duplicate data centers exactly (as part of what we stipulate when 
creating our KEYSPACE), but this is probably something I want because of 
what we were doing up until or what we wanted when we adopted Cassandra 
for our future product direction. I would also like to see an option in 
Cassandra configuration for absolutelylocking out access to certain 
commands (like DROP TABLE, DROP INDEXand DELETE).


From my point of view as a developer, I've had to do many of these 
things also for MongoDB, PostgreSQL, MySQL and other databases over my 
career.


I'm not criticizing these concerns and suggestions. I'm just pointing 
out that, in my opinion, not everything said here is in the realm of, 
"duh, Cassandra needs to grow up."


There's so much right about Cassandra, from the great, unequaled 
technology to the very liberal licensing model without which I could not 
be here.


Russ Bateman


On 02/18/2018 10:39 PM, Kenneth Brotman wrote:


Cassandra feels like an unfinished program to me.  The problem is not 
that it’s open source or cutting edge.  It’s an open source cutting 
edge program that lacks some of its basic functionality.  We are all 
stuck addressing fundamental mechanical tasks for Cassandra because 
the basic code that would do that part has not been contributed yet.


Ease of use issues need to be given much more attention.  For an 
administrator, the ease of use of Cassandra is very poor.


Furthermore, currently Cassandra is an idiot.  We have to do 
everything for Cassandra. Contrast that with the fact that we are in 
the dawn of artificial intelligence.


Software exists to automate tasks for humans, not mechanize humans to 
administer tasks for a database.  I’m an engineering type.  My job is 
to apply science and technology to solve real world problems.  And 
that’s where I need an organization’s I.T. talent to focus; not in 
crank starting an unfinished database.


For example, I should be able to go to any node, replace the 
Cassandra.yaml file and have a prompt on the display ask me if I want 
to update all the yaml files across the cluster.  I shouldn’t have to 
manually modify yaml files on each node or have to create a script for 
some third party automation tool to do it.


I should not have to turn off service, clear directories, restart 
service in coordination with the other nodes.  It’s already a computer 
system.  It can do those things on its own.


How about read repair.  First there is something wrong with the name.  
Maybe it should be called Consistency Repair.  An administrator 
shouldn’t have to do anything.  It should be a behavior of Cassandra 
that is programmed in. It should consider the GC setting of each node, 
calculate how often it has to run repair, when it should run it so all 
the nodes aren’t trying at the same time and when other circumstances 
indicate it should also run it.


Certificate management should be automated.

Cluster wide management should be a big theme in any next major 
release. What is a major release?  How many major releases could a 
program have before all the coding for basic stuff like installation, 
configuration and maintenance is included!


Finish the basic coding of Cassandra, make it easy to use for 
administrators, make is smart, add cluster wide management.  Keep 
Cassandra competitive or it will soon be the old Model T we all 
remember fondly.


I ask the Committee to compile a list of all such items, make a plan, 
and commit to including the completed and tested code as part of major 
release 5.0.  I further ask that release 4.0 not be delayed and then 
there be an unusually short skip to version 5.0.


Kenneth Brotman

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Jon Haddad

That’s why you use a NTS + a snitch, it picks replaces based on rack awareness.

> On Feb 20, 2018, at 9:33 AM, Carl Mueller  
> wrote:
> 
> So in theory, one could double a cluster by:
> 
> 1) moving snapshots of each node to a new node.
> 2) for each snapshot moved, figure out the primary range of the new node by 
> taking the old node's primary range token and calculating the midpoint value 
> between that and the next primary range start token
> 3) the RFs should be preserved since the snapshot have a replicated set of 
> data for the old primary range, the next primary has a RF already, and so 
> does the n+1 primary range already
> 
> data distribution will be the same as the old primary range distirubtion.
> 
> Then nodetool clean and repair would get rid of old data ranges not needed 
> anymore.
> 
> In practice, is this possible? I have heard Priam can double clusters and 
> they do not use vnodes. I am assuming they do a similar approach but they 
> only have to calculate single tokens?
> 
> On Tue, Feb 20, 2018 at 11:21 AM, Carl Mueller  > wrote:
> As I understand it: Replicas of data are replicated to the next primary range 
> owner. 
> 
> As tokens are randomly generated (at least in 2.1.x that I am on), can't we 
> have this situation:
> 
> Say we have RF3, but the tokens happen to line up where:
> 
> NodeA handles 0-10
> NodeB handles  11-20
> NodeA handlea 21-30
> NodeB handles 31-40
> NodeC handles 40-50
> 
> The key aspect of that is that the random assignment of primary range vnode 
> tokens has resulted in NodeA and NodeB being the primaries for four adjacent 
> primary ranges. 
> 
> IF RF is replicated by going to the next adjacent nodes in the primary range, 
> and we are, say RF3, then B will have a replica of A, and then the THIRD 
> REPLICA IS BACK ON A. 
> 
> Is the RF distribution durable to this by ignoring the reappearance of A and 
> then cycling through until a unique node (NodeC) is encountered, and then 
> that becomes the third replica?
> 
> 
> 
>

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller

So in theory, one could double a cluster by:

1) moving snapshots of each node to a new node.
2) for each snapshot moved, figure out the primary range of the new node by
taking the old node's primary range token and calculating the midpoint
value between that and the next primary range start token
3) the RFs should be preserved since the snapshot have a replicated set of
data for the old primary range, the next primary has a RF already, and so
does the n+1 primary range already

data distribution will be the same as the old primary range distirubtion.

Then nodetool clean and repair would get rid of old data ranges not needed
anymore.

In practice, is this possible? I have heard Priam can double clusters and
they do not use vnodes. I am assuming they do a similar approach but they
only have to calculate single tokens?

On Tue, Feb 20, 2018 at 11:21 AM, Carl Mueller  wrote:

> As I understand it: Replicas of data are replicated to the next primary
> range owner.
>
> As tokens are randomly generated (at least in 2.1.x that I am on), can't
> we have this situation:
>
> Say we have RF3, but the tokens happen to line up where:
>
> NodeA handles 0-10
> NodeB handles  11-20
> NodeA handlea 21-30
> NodeB handles 31-40
> NodeC handles 40-50
>
> The key aspect of that is that the random assignment of primary range
> vnode tokens has resulted in NodeA and NodeB being the primaries for four
> adjacent primary ranges.
>
> IF RF is replicated by going to the next adjacent nodes in the primary
> range, and we are, say RF3, then B will have a replica of A, and then the
> THIRD REPLICA IS BACK ON A.
>
> Is the RF distribution durable to this by ignoring the reappearance of A
> and then cycling through until a unique node (NodeC) is encountered, and
> then that becomes the third replica?
>
>
>
>

vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller

As I understand it: Replicas of data are replicated to the next primary
range owner.

As tokens are randomly generated (at least in 2.1.x that I am on), can't we
have this situation:

Say we have RF3, but the tokens happen to line up where:

NodeA handles 0-10
NodeB handles  11-20
NodeA handlea 21-30
NodeB handles 31-40
NodeC handles 40-50

The key aspect of that is that the random assignment of primary range vnode
tokens has resulted in NodeA and NodeB being the primaries for four
adjacent primary ranges.

IF RF is replicated by going to the next adjacent nodes in the primary
range, and we are, say RF3, then B will have a replica of A, and then the
THIRD REPLICA IS BACK ON A.

Is the RF distribution durable to this by ignoring the reappearance of A
and then cycling through until a unique node (NodeC) is encountered, and
then that becomes the third replica?

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-20 Thread Carl Mueller

Ok, so vnodes are random assignments under normal circumstances (I'm in
2.1.x, I'm assuming a derivative approach was in the works that would avoid
some hot node aspects of random primary range assingment for new nodes once
you had one or two or three in a cluster).

So... couldn't I just "engineer" the token assignments to be new primary
ranges that derive from the replica I am pulling the sstables from... say
take the primary range of the previous node's tokens and just take the
midpoint of it's range? If we stand up enough nodes, then the implicit hot
ranging going on here is mitigated.

The sstables can have data in them that is outside the primary range,
correct? Nodetool clean can get rid of data it doesn't need.

That leaves replicated ranges. Is there any means of replicated range
distribution that isn't the node with the next range slice in the overally
primary range? If we are splitting the old node's primary range, then the
replicas would travel with it and the new node would instantly become a
replica of the old node. the next primary ranges also have the replicas.

On Fri, Feb 16, 2018 at 3:58 PM, Carl Mueller 
wrote:

> Thanks. Yeah, it appears this would only be doable if we didn't have
> vnodes and used old single token clusters. I guess Priam has something
> where you increase the cluster by whole number multiples. Then there's the
> issue of doing quorum read/writes if there suddenly is a new replica range
> with grey-area ownership/responsiblity for the range, like where
> LOCAL_QUORUM becomes a bit illdefined if more than one node is being added
> to a cluster.
>
> I guess the only way that would work is if the nodes were some multiple of
> the vnode count and vnodes distributed themselves consistently, so that
> expansions of RF multiples might be consistent and precomputable for
> responsible ranges.
>
> I will read that talk.
>
> On Thu, Feb 15, 2018 at 7:39 PM, kurt greaves 
> wrote:
>
>> Ben did a talk
>> 
>> that might have some useful information. It's much more complicated with
>> vnodes though and I doubt you'll be able to get it to be as rapid as you'd
>> want.
>>
>> sets up schema to match
>>
>> This shouldn't be necessary. You'd just join the node as usual but with
>> auto_bootstrap: false and let the schema be propagated.
>>
>> Is there an issue if the vnodes tokens for two nodes are identical? Do
>>> they have to be distinct for each node?
>>
>> Yeah. This is annoying I know. The new node will take over the tokens of
>> the old node, which you don't want.
>>
>>
>>> Basically, I was wondering if we just use this to double the number of
>>> nodes with identical copies of the node data via snapshots, and then later
>>> on cassandra can pare down which nodes own which data.
>>
>> There wouldn't be much point to adding nodes with the same (or almost the
>> same) tokens. That would just be shifting load. You'd essentially need a
>> very smart allocation algorithm to come up with good token ranges, but then
>> you still have the problem of tracking down the relevant SSTables from the
>> nodes. Basically, bootstrap does this for you ATM and only streams the
>> relevant sections of SSTables for the new node. If you were doing it from
>> backups/snapshots you'd need to either do the same thing (eek) or copy all
>> the SSTables from all the relevant nodes.
>>
>> With single token nodes this becomes much easier. You can likely get away
>> with only copying around double/triple the data (depending on how you add
>> tokens to the ring and RF and node count).
>>
>> I'll just put it out there that C* is a database and really isn't
>> designed to be rapidly scalable. If you're going to try, be prepared to
>> invest A LOT of time into it.
>> 
>>
>
>

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Carl Mueller

I think what is really necessary is providing table-level recipes for
storing data. We need a lot of real world examples and the resulting
schema, compaction strategies, and tunings that were performed for them.
Right now I don't see such crucial cookbook data in the project.

AI is a bit ridiculous, we'd need to AI a collection of big data systems,
and then cassandra would need to have an entirely separate AI system
ingesting ALL THE DATA that comes into the already Big Data system in order
to... what... what would we have the AI do? restructure schemas? Switch
compaction strategeis? Add/subtract nodes? Increase/decrease RF? Those are
all insane things to allocate to AI approaches which are not transparent to
the factors and processing that led to the conclusions. The best we could
hope for are recommendations.

On Tue, Feb 20, 2018 at 5:39 AM, Kyrylo Lebediev 
wrote:

> Agree with you, Daniel, regarding gaps in documentation.
>
>
> ---
>
> At the same time I disagree with the folks who are complaining in this
> thread about some functionality like 'advanced backup' etc is missing out
> of the box.
>
> We all live in the time where there are literally tons of open-source
> tools (automation, monitoring) and languages are available, also there are
> some really powerful SaaS solutions on the market which support C*
> (Datadog, for instance).
>
>
> For example, while C* provides basic building blocks for anti-entropy
> repairs [I mean basic usage of 'nodetool repair' is not suitable for
> large production clusters], Reaper (many thanks to Spotify and
> TheLastPickle!) which uses this basic functionality solves the  task very
> well for real-world C* setups.
>
>
> Something is missing  / could be improved in your opinion - we're in era
> of open-source. Create your own tool, let's say for C* backups automation
> using EBS snapshots, and upload it on GitHub.
>
>
> C* is a DB-engine, not a fully-automated self-contained suite.
> End-users are able to work on automation of routine [3rd party projects],
> meanwhile C* contributors may focus on core functionality.
>
> --
>
> Going back to documentation topic, as far as I understand, DataStax is no
> longer main C* contributor  and is focused on own C*-based proprietary
> software [correct me smb if I'm wrong].
>
> This has led us to the situation when development of C* is progressing (as
> far as I understand, work is done mainly by some large C* users having
> enough resources to contribute to the C* project to get the features they
> need), but there is no single company which has taken over actualization of
> C* documentation / Wiki.
>
> Honestly, even DataStax's documentation is  too concise and  is missing a
> lot of important details.
>
> [BTW, just've taken a look at https://cassandra.apache.org/doc/latest/
> and it looks not that 'bad':  despite of TODOs it contains a lot of
> valuable information]
>
>
> So, I feel the C* Community has to join efforts on enriching existing
> documentation / resurrection of Wiki [where can be placed howto's,
> information about 3rd party automations and integrations etc].
>
> By the Community I mean all of us including myself.
>
>
>
> Regards,
>
> Kyrill
> --
> *From:* Daniel Hölbling-Inzko 
> *Sent:* Tuesday, February 20, 2018 11:28:13 AM
> *To:* user@cassandra.apache.org; James Briggs
>
> *Cc:* d...@cassandra.apache.org
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> Hi,
>
> I have to add my own two cents here as the main thing that keeps me from
> really running Cassandra is the amount of pain running it incurs.
> Not so much because it's actually painful but because the tools are so
> different and the documentation and best practices are scattered across a
> dozen outdated DataStax articles and this mailing list etc.. We've been
> hesitant (although our use case is perfect for using Cassandra) to deploy
> Cassandra to any critical systems as even after a year of running it we
> still don't have the operational experience to confidently run critical
> systems with it.
>
> Simple things like a foolproof / safe cluster-wide S3 Backup (like
> Elasticsearch has it) would for example solve a TON of issues for new
> people. I don't need it auto-scheduled or something, but having to
> configure cron jobs across the whole cluster is a pain in the ass for small
> teams.
> To be honest, even the way snapshots are done right now is already super
> painful. Every other system I operated so far will just create one backup
> folder I can export, in C* the Backup is scattered across a bunch of
> different Keyspace folders etc.. needless to say that it took a while until
> I trusted my backup scripts fully.
>
> And especially for a Database I believe Backup/Restore needs to be a
> non-issue that's documented front and center. If not smaller teams just
> don't have the resources to dedicate to learning and building the tools
>

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Nicolas Guyomar

Yes you are right, it limit how much data a node will send while streaming
data (repair, boostrap etc) total to other node, so that is does not affec
this node performance.

Boostraping is initiated by the boostraping node itself, which determine,
based on his token, which nodes to ask data from, then it compute its
"streaming plan" and init every session at the same time  (look
at org.apache.cassandra.streaming.StreamResultFuture#init )

But I can see in the code  that connection for session are done
sequentially in a FixedPoolSize executor limited by the node number of
processor, so, if I understand correctly the code, this might be the limit
you might be looking for.

You should have at the same time a limited ongoing streaming session
because of that limit, but I have to admit that because of all the async
method/callback in the code I might be wrong :(


On 20 February 2018 at 14:08, Jürgen Albersdorfer 
wrote:

> Hi Nicolas,
> I have seen that ' stream_throughput_outbound_megabits_per_sec', but
> afaik this limits what each node will provide at a maximum.
> What I'm more concerned of is the vast amount of connections to handle and
> the concurrent threads of which at least two get started for every single
> streaming connection.
> I'm a former Java Developer and I know that Threads are expensive even to
> just have them sitting around. JVM is doing fine handling some hundreds of
> threads, but what about some thousands?
>
> And about the cleanup - we are currently massively scaling out a startup
> database. I thought of doing cleanup after that ramp-up phase when scaling
> slows down to maybe a node every 3 to 7 days in average.
> So for now I want to add 32 Machines at one after one and then care about
> the cleanup afterwards one by one.
>
> regards,
> Jürgen
>
> 2018-02-20 13:56 GMT+01:00 Nicolas Guyomar :
>
>> Hi Jurgen,
>>
>> stream_throughput_outbound_megabits_per_sec is the "given total
>> throughput in Mbps", so it does limit the "concurrent throughput" IMHO,
>> is it not what you are looking for?
>>
>> The only limits I can think of are :
>> - number of connection between every node and the one boostrapping
>> - number of pending compaction (especially if you have lots of
>> keyspace/table) that could lead to some JVM problem maybe ?
>>
>> Anyway, because while bootstrapping, a node is not accepting reads,
>> configuration like compactionthroughput, concurrentcompactor and
>> streamingthroughput can be set on the fly using nodetool, so you can
>> quickly ajust them
>>
>> Out of curiosity, do you run "nodetool cleanup" in parallel on every
>> nodes left after a boostrap, or do you spread the "cleanup load" ? I have
>> not seen yet one adding a node every day like this ;) have fun !
>>
>>
>>
>> On 20 February 2018 at 13:42, Jürgen Albersdorfer <
>> jalbersdor...@gmail.com> wrote:
>>
>>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>>> concurrent streaming when joining a new node to cluster.
>>>
>>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>>> another Node every day.
>>> The 'nodetool netstats' shows it always streams data from all other
>>> nodes.
>>>
>>> How far will this scale? - What happens when I have hundrets or even
>>> thousends of Nodes?
>>>
>>> Has anyone experience with such a Situation?
>>>
>>> Thanks, and regards
>>> Jürgen
>>>
>>
>>
>

Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen


Dear Apache Enthusiast,

(You’re receiving this message because you’re subscribed to a user@ or 
dev@ list of one or more Apache Software Foundation projects.)


We’re pleased to announce the upcoming ApacheCon [1] in Montréal, 
September 24-27. This event is all about you — the Apache project community.


We’ll have four tracks of technical content this time, as well as lots 
of opportunities to connect with your project community, hack on the 
code, and learn about other related (and unrelated!) projects across the 
foundation.


The Call For Papers (CFP) [2] and registration are now open. Register 
early to take advantage of the early bird prices and secure your place 
at the event hotel.


Important dates
March 30: CFP closes
April 20: CFP notifications sent
	August 24: Hotel room block closes (please do not wait until the last 
minute)


Follow @ApacheCon on Twitter to be the first to hear announcements about 
keynotes, the schedule, evening events, and everything you can expect to 
see at the event.


See you in Montréal!

Sincerely, Rich Bowen, V.P. Events,
on behalf of the entire ApacheCon team

[1] http://www.apachecon.com/acna18
[2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jürgen Albersdorfer

Hi Nicolas,
I have seen that ' stream_throughput_outbound_megabits_per_sec', but afaik
this limits what each node will provide at a maximum.
What I'm more concerned of is the vast amount of connections to handle and
the concurrent threads of which at least two get started for every single
streaming connection.
I'm a former Java Developer and I know that Threads are expensive even to
just have them sitting around. JVM is doing fine handling some hundreds of
threads, but what about some thousands?

And about the cleanup - we are currently massively scaling out a startup
database. I thought of doing cleanup after that ramp-up phase when scaling
slows down to maybe a node every 3 to 7 days in average.
So for now I want to add 32 Machines at one after one and then care about
the cleanup afterwards one by one.

regards,
Jürgen

2018-02-20 13:56 GMT+01:00 Nicolas Guyomar :

> Hi Jurgen,
>
> stream_throughput_outbound_megabits_per_sec is the "given total
> throughput in Mbps", so it does limit the "concurrent throughput" IMHO,
> is it not what you are looking for?
>
> The only limits I can think of are :
> - number of connection between every node and the one boostrapping
> - number of pending compaction (especially if you have lots of
> keyspace/table) that could lead to some JVM problem maybe ?
>
> Anyway, because while bootstrapping, a node is not accepting reads,
> configuration like compactionthroughput, concurrentcompactor and 
> streamingthroughput
> can be set on the fly using nodetool, so you can quickly ajust them
>
> Out of curiosity, do you run "nodetool cleanup" in parallel on every nodes
> left after a boostrap, or do you spread the "cleanup load" ? I have not
> seen yet one adding a node every day like this ;) have fun !
>
>
>
> On 20 February 2018 at 13:42, Jürgen Albersdorfer  > wrote:
>
>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>> concurrent streaming when joining a new node to cluster.
>>
>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>> another Node every day.
>> The 'nodetool netstats' shows it always streams data from all other nodes.
>>
>> How far will this scale? - What happens when I have hundrets or even
>> thousends of Nodes?
>>
>> Has anyone experience with such a Situation?
>>
>> Thanks, and regards
>> Jürgen
>>
>
>

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Nicolas Guyomar

Hi Jurgen,

stream_throughput_outbound_megabits_per_sec is the "given total throughput
in Mbps", so it does limit the "concurrent throughput" IMHO, is it not what
you are looking for?

The only limits I can think of are :
- number of connection between every node and the one boostrapping
- number of pending compaction (especially if you have lots of
keyspace/table) that could lead to some JVM problem maybe ?

Anyway, because while bootstrapping, a node is not accepting reads,
configuration like compactionthroughput, concurrentcompactor and
streamingthroughput
can be set on the fly using nodetool, so you can quickly ajust them

Out of curiosity, do you run "nodetool cleanup" in parallel on every nodes
left after a boostrap, or do you spread the "cleanup load" ? I have not
seen yet one adding a node every day like this ;) have fun !



On 20 February 2018 at 13:42, Jürgen Albersdorfer 
wrote:

> Hi, I'm wondering if it is possible resp. would it make sense to limit
> concurrent streaming when joining a new node to cluster.
>
> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
> another Node every day.
> The 'nodetool netstats' shows it always streams data from all other nodes.
>
> How far will this scale? - What happens when I have hundrets or even
> thousends of Nodes?
>
> Has anyone experience with such a Situation?
>
> Thanks, and regards
> Jürgen
>

Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Jürgen Albersdorfer

Hi, I'm wondering if it is possible resp. would it make sense to limit
concurrent streaming when joining a new node to cluster.

I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining another
Node every day.
The 'nodetool netstats' shows it always streams data from all other nodes.

How far will this scale? - What happens when I have hundrets or even
thousends of Nodes?

Has anyone experience with such a Situation?

Thanks, and regards
Jürgen

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Kyrylo Lebediev

Agree with you, Daniel, regarding gaps in documentation.


---

At the same time I disagree with the folks who are complaining in this thread 
about some functionality like 'advanced backup' etc is missing out of the box.

We all live in the time where there are literally tons of open-source tools 
(automation, monitoring) and languages are available, also there are some 
really powerful SaaS solutions on the market which support C* (Datadog, for 
instance).


For example, while C* provides basic building blocks for anti-entropy repairs 
[I mean basic usage of 'nodetool repair' is not suitable for large production 
clusters], Reaper (many thanks to Spotify and TheLastPickle!) which uses this 
basic functionality solves the  task very well for real-world C* setups.


Something is missing  / could be improved in your opinion - we're in era of 
open-source. Create your own tool, let's say for C* backups automation using 
EBS snapshots, and upload it on GitHub.


C* is a DB-engine, not a fully-automated self-contained suite.
End-users are able to work on automation of routine [3rd party projects], 
meanwhile C* contributors may focus on core functionality.

--

Going back to documentation topic, as far as I understand, DataStax is no 
longer main C* contributor  and is focused on own C*-based proprietary software 
[correct me smb if I'm wrong].

This has led us to the situation when development of C* is progressing (as far 
as I understand, work is done mainly by some large C* users having enough 
resources to contribute to the C* project to get the features they need), but 
there is no single company which has taken over actualization of C* 
documentation / Wiki.

Honestly, even DataStax's documentation is  too concise and  is missing a lot 
of important details.

[BTW, just've taken a look at https://cassandra.apache.org/doc/latest/ and it 
looks not that 'bad':  despite of TODOs it contains a lot of valuable 
information]


So, I feel the C* Community has to join efforts on enriching existing 
documentation / resurrection of Wiki [where can be placed howto's, information 
about 3rd party automations and integrations etc].

By the Community I mean all of us including myself.



Regards,

Kyrill


From: Daniel Hölbling-Inzko 
Sent: Tuesday, February 20, 2018 11:28:13 AM
To: user@cassandra.apache.org; James Briggs
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Hi,

I have to add my own two cents here as the main thing that keeps me from really 
running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so 
different and the documentation and best practices are scattered across a dozen 
outdated DataStax articles and this mailing list etc.. We've been hesitant 
(although our use case is perfect for using Cassandra) to deploy Cassandra to 
any critical systems as even after a year of running it we still don't have the 
operational experience to confidently run critical systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like 
Elasticsearch has it) would for example solve a TON of issues for new people. I 
don't need it auto-scheduled or something, but having to configure cron jobs 
across the whole cluster is a pain in the ass for small teams.
To be honest, even the way snapshots are done right now is already super 
painful. Every other system I operated so far will just create one backup 
folder I can export, in C* the Backup is scattered across a bunch of different 
Keyspace folders etc.. needless to say that it took a while until I trusted my 
backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a non-issue 
that's documented front and center. If not smaller teams just don't have the 
resources to dedicate to learning and building the tools around it.

Now that the team is getting larger we could spare the resources to operate 
these things, but switching from a well-understood RDBMs schema to Cassandra is 
now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs  
wrote:
Kenneth:

What you said is not wrong.

Vertica and Riak are examples of distributed databases that don't require 
hand-holding.

Cassandra is for Java-programmer DIYers, or more often Datastax clients, at 
this point.
Thanks, James.


From: Kenneth Brotman 
To: user@cassandra.apache.org
Cc: d...@cassandra.apache.org
Sent: Monday, February 19, 2018 4:56 PM

Subject: RE: Cassandra Needs to Grow Up by Version Five!

Jeff, you helped me figure out what I was missing.  It just took me a day to 
digest what you wrote.  I’m coming over from another type of engineering.  I 
didn’t know and it’s not

Re: Right sizing Cassandra data nodes

2018-02-20 Thread Rahul Singh

Node density is active data managed in the cluster divided by the number of 
active nodes. Eg. If you you have 500TB or active data under management then 
you would need 250-500 nodes to get beast like optimum performance. It also 
depends on how much memory is on the boxes and if you are using SSD drives. SSD 
doesn’t replace memory but it doesn’t hurt.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 19, 2018, 5:55 PM -0500, Charulata Sharma (charshar) 
, wrote:
> Thanks for the response Rahul. I did not understand the “node density” point.
>
> Charu
>
> From: Rahul Singh 
> Reply-To: "user@cassandra.apache.org" 
> Date: Monday, February 19, 2018 at 12:32 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: Right sizing Cassandra data nodes
>
> 1. I would keep opscenter on different cluster. Why unnecessarily put traffic 
> and computing for opscenter data on a real business data cluster?
> 2. Don’t put more than 1-2 TB per node. Maybe 3TB. Node density as it 
> increases creates more replication, read repairs , etc and memory usage for 
> doing the compactions etc.
> 3. Can have as much as you want for snapshots as long as you have it on 
> another disk or even move it to a SAN / NAS. All you may care about us the 
> most recent snapshot on the physical machine / disks on a live node.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 19, 2018, 3:08 PM -0500, Charulata Sharma (charshar) 
> , wrote:
>
> > Hi All,
> >
> > Looking for some insight into how application data archive and purge is 
> > carried out for C* database. Are there standard guidelines on calculating 
> > the amount of space that can be used for storing data in a specific node.
> >
> > Some pointers that I got while researching are;
> >
> > -  Allocate 50% space for compaction, e.g. if data size is 50GB 
> > then allocate 25GB for compaction.
> > -  Snapshot strategy. If old snapshots are present, then they 
> > occupy the disk space.
> > -  Allocate some percentage of storage (  ) for system tables 
> > and OpsCenter tables ?
> >
> > We have a scenario where certain transaction data needs to be archived 
> > based on business rules and some purged, so before deciding on an A 
> > strategy, I am trying to analyze
> > how much transactional data can be stored given the current node capacity. 
> > I also found out that the space available metric shown in Opscenter is not 
> > very reliable because it doesn’t show
> > the snapshot space. In our case, we have a huge snapshot size. For some 
> > unexplained reason, we seem to be taking snapshots of our data every hour 
> > and purging them only after 7 days.
> >
> >
> > Thanks,
> > Charu
> > Cisco Systems.
> >
> >
> >

Re: newbie , to use cassandra when query is arbitrary?

2018-02-20 Thread Rahul Singh

Technically no. Cassandra is a NoSQL database. It is a columnar store — and so 
it’s not a set of relations that can be arbitrarily queried. The sstable 
structure is building for heavy writes and specific partook specific queries. 
If you need the ability for arbitrary queries you are using the wrong database 
and need to lean on a real index that you make through your own tables, 
secondary indices, or store the index in a real sparse matrix style index like 
lucene — as implemented in Elassandra or DSE SolR. I believe Stratio also has a 
lucene based secondary index for that purpose.

120GB isn’t a lot of data and you could actually store the whole database in 
memory in a relational DB. I would say that it’s “tiny” compared to real Big 
used of Casandra. Properly optimized , if your data or Data distribution needs 
don’t grow to Web scale, you could achieve what you need in other systems.

I would ask my self the following questions:

1. Will I need to scale my database to thousands or millions of Operations per 
second and / or do I anticipate it growing to where the data cannot fit on one 
computer’s disk or memory.

2. Will I need to synchronize Data across Data centers both physical and 
logical and have the need from #1.

3. Do I need Cassandra or do I want Cassandra? Those who need Cassandra badly 
say yes to 1 and 2. Everyone else wants it to be cool.

You can always use Cassandra for both heavy reads and heavy writes and the 
leverage index technology like lucene to help when doing arbitrary queries. Or 
you can use something else like MySQL / MariaDB for and then replicate the data 
through CQRS architecture to have a highly available database for read purposes 
only.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 19, 2018, 9:44 PM -0500, Rajesh Kishore , wrote:
> Hi Rahul,
>
> I cannot confirm the size wrt Cassandra, but usually in berkley db for 10 M 
> records , it takes around 120 GB. Any operation takes hardly 2 to 3 ms when 
> query is performed on index attribute.
>
> Usually 10 to 12 columns are the OOTB behaviour but one can configure any 
> attribute to be indexed on the fly. Main issue is , what should be the 
> strategy to partition the records if your query is not fixed ?
>
>
> Regards,
> Rajesh
>
> > On Tue, Feb 20, 2018 at 2:09 AM, Rahul Singh  
> > wrote:
> > > What is the data size in TB / Gb and what what is the Operations Per 
> > > second for read and write.
> > > Cassandra is both for high volume and high velocity for read and write.
> > >
> > > How many of the columns need to be indexed? You may find that doing a 
> > > secondary index is helpful or looking to Elassandra / DSE SolR if your 
> > > queries need to be on arbitrary columns across those hundred.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 19, 2018, 11:31 AM -0500, Rajesh Kishore 
> > > , wrote:
> > > > It can be minimum of 20 m to 10 billions
> > > >
> > > > With each entry can contain upto 100 columns
> > > >
> > > > Rajesh
> > > >
> > > > > On 19 Feb 2018 9:02 p.m., "Rahul Singh" 
> > > > >  wrote:
> > > > > > How much data do you need to store and what is the frequency of 
> > > > > > reads and writes.
> > > > > >
> > > > > > --
> > > > > > Rahul Singh
> > > > > > rahul.si...@anant.us
> > > > > >
> > > > > > Anant Corporation
> > > > > >
> > > > > > On Feb 19, 2018, 3:44 AM -0500, Rajesh Kishore 
> > > > > > , wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I am a newbie to Cassandra world, got some understanding of the 
> > > > > > > product.
> > > > > > > I have a application (which is kind of datastore) for other 
> > > > > > > applications, the user queries are not fixed i.e the queries can 
> > > > > > > come with any attributes.
> > > > > > > In this case, is it recommended to use cassandra ? What benefits 
> > > > > > > we can get ?
> > > > > > >
> > > > > > > Background - The application currently  using berkely db for 
> > > > > > > maintaining entries, we are trying to evaluate if other backend 
> > > > > > > can fit with the requirement we have.
> > > > > > >
> > > > > > > Now, if we want to use cassandra , I broadly see one table which 
> > > > > > > would contain all the entries. Now, the question is what should 
> > > > > > > be the correct partitioning majors ?
> > > > > > > entity is
> > > > > > > Entry {
> > > > > > > id varchar,
> > > > > > > objectclasses list
> > > > > > > sn
> > > > > > > cn
> > > > > > > ...
> > > > > > > ...
> > > > > > > }
> > > > > > >
> > > > > > > and query can be anything like
> > > > > > > a) get all entries based on sn=*
> > > > > > > b) get all entries based on sn=A and cn=b
> > > > > > > c) get all entries based on sn=A OR objeclass contains person
> > > > > > > ..
> > > > > > > 
> > > > > > >
> > > > > > > Please advise.
> >

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Daniel Hölbling-Inzko

Hi,

I have to add my own two cents here as the main thing that keeps me from
really running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so
different and the documentation and best practices are scattered across a
dozen outdated DataStax articles and this mailing list etc.. We've been
hesitant (although our use case is perfect for using Cassandra) to deploy
Cassandra to any critical systems as even after a year of running it we
still don't have the operational experience to confidently run critical
systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like
Elasticsearch has it) would for example solve a TON of issues for new
people. I don't need it auto-scheduled or something, but having to
configure cron jobs across the whole cluster is a pain in the ass for small
teams.
To be honest, even the way snapshots are done right now is already super
painful. Every other system I operated so far will just create one backup
folder I can export, in C* the Backup is scattered across a bunch of
different Keyspace folders etc.. needless to say that it took a while until
I trusted my backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a
non-issue that's documented front and center. If not smaller teams just
don't have the resources to dedicate to learning and building the tools
around it.

Now that the team is getting larger we could spare the resources to operate
these things, but switching from a well-understood RDBMs schema to
Cassandra is now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs 
wrote:

> Kenneth:
>
> What you said is not wrong.
>
> Vertica and Riak are examples of distributed databases that don't require
> hand-holding.
>
> Cassandra is for Java-programmer DIYers, or more often Datastax clients,
> at this point.
> Thanks, James.
>
> --
> *From:* Kenneth Brotman 
> *To:* user@cassandra.apache.org
> *Cc:* d...@cassandra.apache.org
> *Sent:* Monday, February 19, 2018 4:56 PM
>
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Jeff, you helped me figure out what I was missing.  It just took me a day
> to digest what you wrote.  I’m coming over from another type of
> engineering.  I didn’t know and it’s not really documented.  Cassandra runs
> in a data center.  Now days that means the nodes are going to be in managed
> containers, Docker containers, managed by Kerbernetes,  Meso or something,
> and for that reason anyone operating Cassandra in a real world setting
> would not encounter the issues I raised in the way I described.
>
> Shouldn’t the architectural diagrams people reference indicate that in
> some way?  That would have help me.
>
> Kenneth Brotman
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Monday, February 19, 2018 10:43 AM
> *To:* 'user@cassandra.apache.org'
> *Cc:* 'd...@cassandra.apache.org'
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Well said.  Very fair.  I wouldn’t mind hearing from others still.  You’re
> a good guy!
>
> Kenneth Brotman
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com ]
> *Sent:* Monday, February 19, 2018 9:10 AM
> *To:* cassandra
> *Cc:* Cassandra DEV
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a lot of things below I disagree with, but it's ok. I convinced
> myself not to nit-pick every point.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of
> Stefan's work with cert management
>
> Beyond that, I encourage you to do what Michael suggested: open JIRAs for
> things you care strongly about, work on them if you have time. Sometime
> this year we'll schedule a NGCC (Next Generation Cassandra Conference)
> where we talk about future project work and direction, I encourage you to
> attend if you're able (I encourage anyone who cares about the direction of
> Cassandra to attend, it's probably be either free or very low cost, just to
> cover a venue and some food). If nothing else, you'll meet some of the
> teams who are working on the project, and learn why they've selected the
> projects on which they're working. You'll have an opportunity to pitch your
> vision, and maybe you can talk some folks into helping out.
>
> - Jeff
>
>
>
>
> On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
> Comments inline
>
> >-Original Message-
> >From: Jeff Jirsa [mailto:jji...@gmail.com]
> >Sent: Sunday, February 18, 2018 10:58 PM
> >To: user@cassandra.apache.org
> >Cc: d...@cassandra.apache.org
> >Subject: Re: Cassandra Needs to Grow Up by Version Five!
> >
> >Comments inline
> >
> >
> >> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.INVALID> wrote:
> >>
> > >Cassandra feels like an unfinished program to me. The

37 matches

Mail list logo