Counter table in Cassandra

2019-05-28 Thread Garvit Sharma
Hi,

I am using counter tables in Cassandra and I want to understand how the
concurrent updates to counter table are handled in Cassandra.

There are more than one threads who are responsible for updating the
counter for a partition key. Multiple threads can also update the counter
for the same key.

In case when more than one threads updating the counter for the same key,
how Cassandra is handling the race condition?

UPDATE cycling.popular_count
 SET popularity = popularity + 1
 WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;


Are there overheads of using counter tables?
Are there alternatives to counter tables?

Thanks,
-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.


Sstableloader

2019-05-28 Thread Rahul Reddy
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying
to migrate dse 5.0.7 to Apache 3.11.1 ?


Re: Can sstable corruption cause schema mismatch?

2019-05-28 Thread Nitan Kainth
Thank you Alain.

Nodetool describecluster shows some nodes unreachable, different output from 
each node. 
Node1 can see all 4 nodes up.
Node 2 says node 4 and node 5 unreachable
Node 3 complains about node node 2 and node 1

Nodetool status shows all nodes up and read writes are working for most most 
operations. 

Network looks good. Any other ideas?


Regards,
Nitan
Cell: 510 449 9629

> On May 28, 2019, at 11:21 AM, Alain RODRIGUEZ  wrote:
> 
> Hello Nitan,
> 
>> 1. Can sstable corruption in application tables cause schema mismatch?
> 
> I would say it should not. I could imagine in the case that the corrupted 
> table hits some 'system' keyspace sstable. If not I don' see how corrupted 
> data can impact the schema on the node.
>  
>> 2. Do we need to disable repair while adding storage while Cassandra is down?
> 
> I think you don't have to, but that it's a good idea.
> Repairs would fail as soon/long as you have a node down that should be 
> involved (I think there is an option to change that behaviour now).
> Anyway, stopping repair and restarting it when all nodes are probably allows 
> you a better understanding/control of what's going on. Also, it reduces the 
> load in time of troubles or maintenance, when the cluster is somewhat weaker.
> 
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 
>> Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a écrit :
>> Hi,
>> 
>> Two questions:
>> 1. Can sstable corruption in application tables cause schema mismatch?
>> 2. Do we need to disable repair while adding storage while Cassandra is down?
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629


Re: Can sstable corruption cause schema mismatch?

2019-05-28 Thread Alain RODRIGUEZ
Hello Nitan,

1. Can sstable corruption in application tables cause schema mismatch?
>

I would say it should not. I could imagine in the case that the corrupted
table hits some 'system' keyspace sstable. If not I don' see how corrupted
data can impact the schema on the node.


> 2. Do we need to disable repair while adding storage while Cassandra is
> down?
>

I think you don't have to, but that it's a good idea.
Repairs would fail as soon/long as you have a node down that should be
involved (I think there is an option to change that behaviour now).
Anyway, stopping repair and restarting it when all nodes are probably
allows you a better understanding/control of what's going on. Also, it
reduces the load in time of troubles or maintenance, when the cluster is
somewhat weaker.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a écrit :

> Hi,
>
> Two questions:
> 1. Can sstable corruption in application tables cause schema mismatch?
> 2. Do we need to disable repair while adding storage while Cassandra is
> down?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>


Can sstable corruption cause schema mismatch?

2019-05-28 Thread Nitan Kainth
Hi,

Two questions:
1. Can sstable corruption in application tables cause schema mismatch?
2. Do we need to disable repair while adding storage while Cassandra is down?


Regards,
Nitan
Cell: 510 449 9629

Unsubscribe

2019-05-28 Thread Steve Luo
Unsubscribe


RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R
This may sound a bit harsh, but I teach my developers that if they are trying 
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for 
its high availability and scalability characteristics. We love no downtime. 
ALLOW FILTERING is breaking the rules of availability and scalability.

Look at the full text of the error (not just the ending):
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING.
It is being polite, but it does warn you that performance is unpredictable. I 
can predict this: allow filtering will not scale. It won’t scale to large 
numbers of nodes (with small tables) or to large numbers of rows (regardless of 
node count). If you ignore the admittedly too polite warning, Cassandra will 
try to answer your query. It does it with a brute force, scan everything 
approach on all nodes (because you didn’t give it any partitions to target 
directly). That gets expensive and dangerous quickly. And, yes, it can endanger 
the whole cluster.

As an administrator, I do think that Cassandra should be able to protect itself 
better, perhaps by allowing the administrator to disallow those queries at all. 
It does at least warn you.


From: Attila Wind 
Sent: Tuesday, May 28, 2019 4:47 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to 
prevent such behavior?


Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror the data 
(somehow) OR add denormalized tables (write + code complexity overhead) to 
fulfill those queries

Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them being aware 
of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running those 
queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in Prod I 
can simply instruct them test query like this in the test system for sure

cheers
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 28. 8:59, shalom sagges wrote:
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at my 
company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times, you 
will feel it.

I sincerely believe the best way would be to educate the users and remodel the 
data. Perhaps you need to denormalize your tables or at least use secondary 
indices (I prefer to keep it as simple as possible and denormalize).
If it's a cluster for analytics, perhaps you need to build a designated cluster 
only for that so if something does break or get too pressured, normal 
activities wouldn't be affected, but there are pros and cons for that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind 
 wrote:

Hi Gurus,

Looks we stopped this thread. However I would be very much curious answers 
regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as we are 
planning to run analysis queries by hand exactly like that over the cluster...

thanks!
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW 
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an index 
on the requested column (preferably queried together with a known partition 
key).


b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

I think it can justify the unresponsiveness. 

RE: [EXTERNAL] Re: Python driver concistency problem

2019-05-28 Thread Durity, Sean R
This is a stretch, but are you using authentication and/or authorization? In my 
understanding the queries executed for you to do the authentication and/or 
authorization are usually done at LOCAL_ONE (or QUORUM for cassandra user), but 
maybe there is something that is changed in the security setup? Any UDTs or 
triggers involved in the query? To me, your error seems more like a query being 
executed “for you” instead of your actual query.


Sean Durity


From: Vlad 
Sent: Wednesday, May 22, 2019 6:53 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Python driver concistency problem

That's the issue - I do not use consistency ALL. I set QUORUM or ONE but it 
still performs with ALL.

On Wednesday, May 22, 2019 12:42 PM, shalom sagges 
mailto:shalomsag...@gmail.com>> wrote:

In a lot of cases, the issue is with the data model.
Can you describe the table?
Can you provide the query you use to retrieve the data?
What's the load on your cluster?
Are there lots of tombstones?

You can set the consistency level to ONE, just to check if you get responses. 
Although normally I would never use ALL unless I run a DDL command.
I prefer local_quorum if I want my consistency to be strong while keeping 
Cassandra's high availability.

Regards,









The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Usage of IN at creating materialized view

2019-05-28 Thread Alptug Tokgoz
Hello, 


I have struggled with the following problem for last a couple of days. First of 
all I am using cqlsh 5.0.1 ,Cassandra 3.11.1 and CQL spec 3.4.4.


What I am trying to do is creating a materialized view, named income_periods 
from a table named income. The problem is, i am able to create the materialized 
view table without an error but the table does not meet the expected feature 
which is i can not get the rows for item in ('BFG','GEL').  My code is here:
CREATE MATERIALIZED VIEW income_periods AS
   SELECT symbol,period,item FROM income
   WHERE period IS NOT NULL AND symbol IS NOT NULL AND item in ('BFG'; 
'GEL')
   PRIMARY KEY (symbol, period, item);

Have the log;

INFO  [Native-Transport-Requests-3] 2019-05-27 13:07:50,121 
MigrationManager.java:369 - Create new view: 
org.apache.cassandra.config.ViewDefinition@4501ac19[ksName=mtxfundamentals,viewName=income_periods,baseTableId=0375b280-42e1-11e8-871f-dfe7f527f8be,baseTableName=income,includeAllColumns=false,whereClause=period
 IS NOT NULL AND symbol IS NOT NULL AND item IN ('BFG', 
'GEL'),metadata=org.apache.cassandra.config.CFMetaData@365eed1c[cfId=6e008d90-8080-11e9-b077-ebc67cf7305e,ksName=mtxfundamentals,cfName=income_periods,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'},
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@fcf27507, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.Int32Type, 
org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] | 
[]],partitionKeyColumns=[symbol],clusteringColumns=[period, 
item],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[period,
 symbol, item],droppedColumns={},triggers=[],indexes=[]]]
INFO  [MigrationStage:1] 2019-05-27 13:07:50,312 ColumnFamilyStore.java:408 - 
Initializing mtxfundamentals.income_periods
WARN  [CompactionExecutor:5838] 2019-05-27 13:07:50,328 ViewBuilder.java:189 - 
Materialized View failed to complete, sleeping 5 minutes before restarting
org.apache.cassandra.exceptions.InvalidRequestException: IN restrictions are 
not supported on indexed columns
    at 
org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest(RequestValidations.java:199)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.restrictions.SingleColumnRestriction$INRestriction.addRowFilterTo(SingleColumnRestriction.java:222)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.restrictions.ClusteringColumnRestrictions.addRowFilterTo(ClusteringColumnRestrictions.java:212)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.restrictions.StatementRestrictions.getRowFilter(StatementRestrictions.java:626)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getRowFilter(SelectStatement.java:776)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getRangeCommand(SelectStatement.java:587)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getQuery(SelectStatement.java:305)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getQuery(SelectStatement.java:295)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.db.view.View.getReadQuery(View.java:185) 
~[apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:75) 
~[apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:158) 
~[apache-cassandra-3.11.1.jar:3.11.1]
    at 
org.apache.cassandra.db.compaction.CompactionManager$16.run(CompactionManager.java:1719)
 [apache-cassandra-3.11.1.jar:3.11.1]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_102]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_102]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_102]
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.1.jar:3.11.1]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]


I also tried to work only with the SELECT statement as the following, and i was 
able to select the values as 'BFG' or 'GEL' of item column.

   SELECT * 

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Attila Wind

Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror 
the data (somehow) OR add denormalized tables (write + code complexity 
overhead) to fulfill those queries


Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them 
being aware of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running 
those queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in 
Prod I can simply instruct them test query like this in the test system 
for sure


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 28. 8:59, shalom sagges wrote:

Hi Attila,

I'm definitely no guru, but I've experienced several cases where 
people at my company used allow filtering and caused major performance 
issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many 
times, you will feel it.


I sincerely believe the best way would be to educate the users and 
remodel the data. Perhaps you need to denormalize your tables or at 
least use secondary indices (I prefer to keep it as simple as possible 
and denormalize).
If it's a cluster for analytics, perhaps you need to build a 
designated cluster only for that so if something does break or get too 
pressured, normal activities wouldn't be affected, but there are pros 
and cons for that idea too.


Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind  
wrote:


Hi Gurus,

Looks we stopped this thread. However I would be very much curious
answers regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now...
Especially as we are planning to run analysis queries by hand
exactly like that over the cluster...

thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I
using ALLOW FILTERING in the first place? What happens if I
remove it from the query?
I prefer to denormalize the data to multiple tables or at least
create an index on the requested column (preferably queried
together with a known partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW
FILTERING, you are doing something like a full table scan in a
relational database.

There is a lot of information on the internet regarding this
subject such as

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind
  wrote:

Hi,

"When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by node,
searching for the requested data."

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.
When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by
node, searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* 

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread shalom sagges
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at
my company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times,
you will feel it.

I sincerely believe the best way would be to educate the users and remodel
the data. Perhaps you need to denormalize your tables or at least use
secondary indices (I prefer to keep it as simple as possible and
denormalize).
If it's a cluster for analytics, perhaps you need to build a designated
cluster only for that so if something does break or get too pressured,
normal activities wouldn't be affected, but there are pros and cons for
that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind  wrote:

> Hi Gurus,
>
> Looks we stopped this thread. However I would be very much curious answers
> regarding b) ...
>
> Anyone any comments on that?
> I do see this as a potential production outage risk now... Especially as
> we are planning to run analysis queries by hand exactly like that over the
> cluster...
>
> thanks!
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +36 31 7811355
>
>
> On 2019. 05. 23. 11:42, shalom sagges wrote:
>
> a) Interesting... But only in case you do not provide partitioning key
> right? (so IN() is for partitioning key?)
>
> I think you should ask yourself a different question. Why am I using ALLOW
> FILTERING in the first place? What happens if I remove it from the query?
> I prefer to denormalize the data to multiple tables or at least create an
> index on the requested column (preferably queried together with a known
> partition key).
>
> b) Still does not explain or justify "all 8 nodes to halt and
> unresponsiveness to external requests" behavior... Even if servers are busy
> with the request seriously becoming non-responsive...?
>
> I think it can justify the unresponsiveness. When using ALLOW FILTERING,
> you are doing something like a full table scan in a relational database.
>
> There is a lot of information on the internet regarding this subject such
> as
> https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
>
> Hope this helps.
>
> Regards,
>
> On Thu, May 23, 2019 at 7:33 AM Attila Wind 
>  wrote:
>
>> Hi,
>>
>> "When you run a query with allow filtering, Cassandra doesn't know where
>> the data is located, so it has to go node by node, searching for the
>> requested data."
>>
>> a) Interesting... But only in case you do not provide partitioning key
>> right? (so IN() is for partitioning key?)
>>
>> b) Still does not explain or justify "all 8 nodes to halt and
>> unresponsiveness to external requests" behavior... Even if servers are busy
>> with the request seriously becoming non-responsive...?
>>
>> cheers
>> Attila Wind
>>
>> http://www.linkedin.com/in/attilaw
>> Mobile: +36 31 7811355
>>
>>
>> On 2019. 05. 23. 0:37, shalom sagges wrote:
>>
>> Hi Vsevolod,
>>
>> 1) Why such behavior? I thought any given SELECT request is handled by a
>> limited subset of C* nodes and not by all of them, as per connection
>> consistency/table replication settings, in case.
>> When you run a query with allow filtering, Cassandra doesn't know where
>> the data is located, so it has to go node by node, searching for the
>> requested data.
>>
>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>> I'm not familiar with such a flag. In my case, I just try to educate the
>> R teams.
>>
>> Regards,
>>
>> On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> We have an 8 node C* cluster with large volume of unbalanced data. Usual
>>> per-partition selects work somewhat fine, and are processed by limited
>>> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
>>> such command stalls all 8 nodes to halt and unresponsiveness to external
>>> requests while disk IO jumps to 100% across whole cluster. In several
>>> minutes all nodes seem to finish ptocessing the request and cluster goes
>>> back to being responsive. Replication level across whole data is 3.
>>>
>>> 1) Why such behavior? I thought any given SELECT request is handled by a
>>> limited subset of C* nodes and not by all of them, as per connection
>>> consistency/table replication settings, in case.
>>>
>>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>>>
>>> Thank you all very much in advance,
>>> Vsevolod Filaretov.
>>>
>>


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Attila Wind

Hi Gurus,

Looks we stopped this thread. However I would be very much curious 
answers regarding b) ...


Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as 
we are planning to run analysis queries by hand exactly like that over 
the cluster...


thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



RE: CassKop : a Cassandra operator for Kubernetes developped by Orange

2019-05-28 Thread Per Otterström
Thanks for open-sourcing your work! Looks very promising.

/pelle

From: sebastien.allam...@orange.com 
Sent: den 27 maj 2019 08:13
To: user@cassandra.apache.org; attila.wind@swf.technology
Subject: Re: CassKop : a Cassandra operator for Kubernetes developped by Orange

Hello Atilla,

The CassKop project was started a year ago internally at Orange with a small 
team. Discussing with the community we decided to move it to github with an 
open source Apache 2 licence.

The only commit on github don't relate the real project history.

We are really open to cooperate, our goal is to have the better solution for 
production uses.
Sébastien

Le 25 mai 2019 à 11:54, Attila Wind 
mailto:attilaw@swf.technology>> a écrit :

Maybe my understanding is wrong and I am not really a "deployment guru" but it 
looks like to me that

Orange (https://github.com/Orange-OpenSource/cassandra-k8s-operator, 1 
contributor and 1 commit for now on 2019-05-24)
and sky-uk/cassandra-operator (https://github.com/sky-uk/cassandra-operator , 
it's in alpha phase and not recommended in production, 3 contributors, 24 
commits btw 2019.03.25-2019.05.21, 32 Issues)
are developing something I could use in my OWN(!) Kubernetes based solution 
(even on premise if I want or whatever)
They are both open source. Right?

While
Datastax and Instaclustr are commercial players and offer the solution in a 
tightly-coupled way with Cloud only
(I just took a quick look on Instaclustr but could not even figure out pricing 
info for this service... probably I am lame... or not? :-))

So this looks to me a nice competition...
What do I miss?

ps.: maybe Orange and sky-uk/cassandra-operator guys should cooperate..?? 
Others are clearly building business around it

cheers
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 24. 20:36, John Sanda wrote:
There is also
https://github.com/sky-uk/cassandra-operator

On Fri, May 24, 2019 at 2:34 PM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
Fantastic! Now there are three teams making k8s operators for C*: Datastax, 
Instaclustr, and now Orange.

rahul.xavier.si...@gmail.com

http://cassandra.link

I'm speaking at #DataStaxAccelerate, the world's premiere #ApacheCassandra 
conference, and I want to see you there! Use my code Singh50 for 50% off your 
registration. www.datastax.com/accelerate


On Fri, May 24, 2019 at 9:07 AM Jean-Armel Luce 
mailto:jaluc...@gmail.com>> wrote:
Hi folks,

We are excited to announce that CassKop, a Cassandra operator for Kubernetes 
developped by Orange teams, is now ready for Beta testing.

CassKop works as a usual K8S controller (reconcile the real state with a 
desired state) and automates the Cassandra operations through JMX. All the 
operations are launched by calling standard K8S APIs (kubectl apply ...) or by 
using a K8S plugin (kubectl casskop ...).

CassKop is developed in GO, based on CoreOS operator-sdk framework.
Main features already available :
- deploying a rack aware cluster (or AZ aware cluster)
- scaling up & down (including cleanups)
- setting and modifying configuration parameters (C* and JVM parameters)
- adding / removing a datacenter in Cassandra (all datacenters must be in the 
same region)
- rebuilding nodes
- removing node or replacing node (in case of hardware failure)
- upgrading C* or Java versions (including upgradesstables)
- monitoring (using Prometheus/Grafana)
- ...

By using local and persistent volumes, it is possible to handle failures or 
stop/start nodes for maintenance operations with no transfer of data between 
nodes.
Moreover, we can deploy cassandra-reaper in K8S and use it for scheduling 
repair sessions.
For now, we can deploy a C* cluster only as a mono-region cluster. We will work 
during the next weeks to be able to deploy a C* cluster as a multi regions 
cluster.

Still in the roadmap :
- Network encryption
- Monitoring (exporting logs and metrics)
- backup & restore
- multi-regions support

We'd be interested to hear you try this and let us know what you think!

Please read the description and installation instructions on 
https://github.com/Orange-OpenSource/cassandra-k8s-operator.
For a quick start, you can also follow this step by step guide : 
https://orange-opensource.github.io/cassandra-k8s-operator/index.html?slides=Slides-CassKop-demo.md#1


The CassKop Team
--

- John

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete