from:"Jack Krupansky \(JIRA\)"

[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253959#comment-15253959
 ] 

Jack Krupansky commented on CASSANDRA-11547:


It would be nice to have three distinct layers of defense for clock drift:

1. External monitoring service to alert users when the clocks on a cluster may 
be drifting and a super-alert when any clock in the cluster gets too far out of 
range. Hopefully catch and correct clock drift before cluster gets into trouble.

2. A warning from Cassandra itself if node clock gets more than a minor 
threshold out of sync with the majority of the cluster.

3. A strong warning or even freeze if node's clock is more than a major 
threshold out of sync with majority of cluster.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11566) read time out when do count(*)

2016-04-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248694#comment-15248694
 ] 

Jack Krupansky edited comment on CASSANDRA-11566 at 4/19/16 9:59 PM:
-

I suspect that this timeout is simply because cqlsh is set to only allow 10 
seconds for a request by default. Try setting the request timeout to some 
largish number, like 2000 (seconds) using the {{--request-timeout}} command 
line option for cqlsh:

{code}
cqlsh --request-timeout=2000 ...
{code}

To be clear, even if setting a longer timeout works, it is not advisable to 
perform such a slow and resource-intensive operation on a production cluster 
unless absolutely necessary.


was (Author: jkrupan):
I suspect that this timeout is simply because cqlsh is set to only allow 10 
seconds for a request by default. Try setting the request timeout to some 
largish number, like 2000 (seconds) using the {{--request-timeout}} command 
line option for cqlsh:

{code}
cqlsh --request-timeout=2000 ...
{code}


> read time out when do count(*)
> --
>
> Key: CASSANDRA-11566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11566
> Project: Cassandra
>  Issue Type: Bug
> Environment: staging
>Reporter: nizar
> Fix For: 3.3
>
>
> Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I 
> set the limit to 1, it would make sense if the limit is high number .. 
> However only limit 1 and still timing out sounds odd?
> [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
> cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false 
> limit 1;
> OperationTimedOut: errors={}, last_host=
> my key look like this :
> CREATE MATERIALIZED VIEW test.my_view AS
>   SELECT *
>   FROM table_name
>   WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS  NOT NULL AND id 
> IS NOT NULL
>   PRIMARY KEY ( ( s_id, flag ), time, id )
>   WITH CLUSTERING ORDER BY ( time ASC );
>  I have 5 nodes with replica 3
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc': '3'}  AND durable_writes = true;
> Below was the result for nodetoolcfstats
> Keyspace: test
> Read Count: 128770
> Read Latency: 1.42208769123243 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: tableName
> SSTable count: 3
> Space used (live): 280777032
> Space used (total): 280777032
> Space used by snapshots (total): 0
> Off heap memory used (total): 2850227
> SSTable Compression Ratio: 0.24706731995327527
> Number of keys (estimate): 1277211
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 3
> Local read latency: 0.396 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 1589848
> Bloom filter off heap memory used: 1589824
> Index summary off heap memory used: 1195691
> Compression metadata off heap memory used: 64712
> Compacted partition minimum bytes: 311
> Compacted partition maximum bytes: 535
> Compacted partition mean bytes: 458
> Average live cells per slice (last five minutes): 102.92671205446536
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Table: my_view
> SSTable count: 4
> Space used (live): 126114270
> Space used (total): 126114270
> Space used by snapshots (total): 0
> Off heap memory used (total): 91588
> SSTable Compression Ratio: 0.1652453778228639
> Number of keys (estimate): 8
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 128767
> Local read latency: 1.590 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 64
> Index summary off heap memory used: 140
> Compression metadata off heap memory used: 91384
> Compacted partition minimum bytes: 3974
> Compacted partition maximum bytes: 386857368
> Compacted partition mean bytes: 26034715
> Average live cells per slice (last five minutes): 102.99462595230145
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Thank you.
> Nizar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)

2016-04-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248742#comment-15248742
 ] 

Jack Krupansky commented on CASSANDRA-11566:


This issue may also be considered a duplicate of CASSANDRA-9051.

For reference, setting the {{--request-timeout}} parameter on the command line 
and the {{request_timeout}} option in the {{\[connection]}} section of the 
{{cqlshrc}} file are documented here:
http://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlsh.html

> read time out when do count(*)
> --
>
> Key: CASSANDRA-11566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11566
> Project: Cassandra
>  Issue Type: Bug
> Environment: staging
>Reporter: nizar
> Fix For: 3.3
>
>
> Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I 
> set the limit to 1, it would make sense if the limit is high number .. 
> However only limit 1 and still timing out sounds odd?
> [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
> cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false 
> limit 1;
> OperationTimedOut: errors={}, last_host=
> my key look like this :
> CREATE MATERIALIZED VIEW test.my_view AS
>   SELECT *
>   FROM table_name
>   WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS  NOT NULL AND id 
> IS NOT NULL
>   PRIMARY KEY ( ( s_id, flag ), time, id )
>   WITH CLUSTERING ORDER BY ( time ASC );
>  I have 5 nodes with replica 3
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc': '3'}  AND durable_writes = true;
> Below was the result for nodetoolcfstats
> Keyspace: test
> Read Count: 128770
> Read Latency: 1.42208769123243 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: tableName
> SSTable count: 3
> Space used (live): 280777032
> Space used (total): 280777032
> Space used by snapshots (total): 0
> Off heap memory used (total): 2850227
> SSTable Compression Ratio: 0.24706731995327527
> Number of keys (estimate): 1277211
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 3
> Local read latency: 0.396 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 1589848
> Bloom filter off heap memory used: 1589824
> Index summary off heap memory used: 1195691
> Compression metadata off heap memory used: 64712
> Compacted partition minimum bytes: 311
> Compacted partition maximum bytes: 535
> Compacted partition mean bytes: 458
> Average live cells per slice (last five minutes): 102.92671205446536
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Table: my_view
> SSTable count: 4
> Space used (live): 126114270
> Space used (total): 126114270
> Space used by snapshots (total): 0
> Off heap memory used (total): 91588
> SSTable Compression Ratio: 0.1652453778228639
> Number of keys (estimate): 8
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 128767
> Local read latency: 1.590 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 64
> Index summary off heap memory used: 140
> Compression metadata off heap memory used: 91384
> Compacted partition minimum bytes: 3974
> Compacted partition maximum bytes: 386857368
> Compacted partition mean bytes: 26034715
> Average live cells per slice (last five minutes): 102.99462595230145
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Thank you.
> Nizar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)

2016-04-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248694#comment-15248694
 ] 

Jack Krupansky commented on CASSANDRA-11566:


I suspect that this timeout is simply because cqlsh is set to only allow 10 
seconds for a request by default. Try setting the request timeout to some 
largish number, like 2000 (seconds) using the {{--request-timeout}} command 
line option for cqlsh:

{code}
cqlsh --request-timeout=2000 ...
{code}


> read time out when do count(*)
> --
>
> Key: CASSANDRA-11566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11566
> Project: Cassandra
>  Issue Type: Bug
> Environment: staging
>Reporter: nizar
> Fix For: 3.3
>
>
> Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I 
> set the limit to 1, it would make sense if the limit is high number .. 
> However only limit 1 and still timing out sounds odd?
> [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
> cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false 
> limit 1;
> OperationTimedOut: errors={}, last_host=
> my key look like this :
> CREATE MATERIALIZED VIEW test.my_view AS
>   SELECT *
>   FROM table_name
>   WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS  NOT NULL AND id 
> IS NOT NULL
>   PRIMARY KEY ( ( s_id, flag ), time, id )
>   WITH CLUSTERING ORDER BY ( time ASC );
>  I have 5 nodes with replica 3
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc': '3'}  AND durable_writes = true;
> Below was the result for nodetoolcfstats
> Keyspace: test
> Read Count: 128770
> Read Latency: 1.42208769123243 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: tableName
> SSTable count: 3
> Space used (live): 280777032
> Space used (total): 280777032
> Space used by snapshots (total): 0
> Off heap memory used (total): 2850227
> SSTable Compression Ratio: 0.24706731995327527
> Number of keys (estimate): 1277211
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 3
> Local read latency: 0.396 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 1589848
> Bloom filter off heap memory used: 1589824
> Index summary off heap memory used: 1195691
> Compression metadata off heap memory used: 64712
> Compacted partition minimum bytes: 311
> Compacted partition maximum bytes: 535
> Compacted partition mean bytes: 458
> Average live cells per slice (last five minutes): 102.92671205446536
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Table: my_view
> SSTable count: 4
> Space used (live): 126114270
> Space used (total): 126114270
> Space used by snapshots (total): 0
> Off heap memory used (total): 91588
> SSTable Compression Ratio: 0.1652453778228639
> Number of keys (estimate): 8
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 128767
> Local read latency: 1.590 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 64
> Index summary off heap memory used: 140
> Compression metadata off heap memory used: 91384
> Compacted partition minimum bytes: 3974
> Compacted partition maximum bytes: 386857368
> Compacted partition mean bytes: 26034715
> Average live cells per slice (last five minutes): 102.99462595230145
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Thank you.
> Nizar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)

2016-04-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245719#comment-15245719
 ] 

Jack Krupansky commented on CASSANDRA-11566:


COUNT(\*) should only be used on relatively small tables or for relatively 
narrow token ranges or relatively narrow slices of wide rows - probably no more 
than thousands or maybe hundreds of rows and will depend on your data and your 
hardware. Your table may have only 8 "keys", but that means 8 partition keys, 
not primary keys. Your table is 120 MB, which is not large as tables go, but 
may in fact be large enough to fail to complete the count operation in a small 
amount of time.

Try doing a count for each of the partition keys in that table. Maybe one of 
the rows is very wide and causing performance to bog down.

(FWIW, if you need to write "(\*)" in Jira, you need to escape the \* with a 
backslash, as in "(\*)".)

> read time out when do count(*)
> --
>
> Key: CASSANDRA-11566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11566
> Project: Cassandra
>  Issue Type: Bug
> Environment: staging
>Reporter: nizar
> Fix For: 3.3
>
>
> Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I 
> set the limit to 1, it would make sense if the limit is high number .. 
> However only limit 1 and still timing out sounds odd?
> [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
> cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false 
> limit 1;
> OperationTimedOut: errors={}, last_host=
> my key look like this :
> CREATE MATERIALIZED VIEW test.my_view AS
>   SELECT *
>   FROM table_name
>   WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS  NOT NULL AND id 
> IS NOT NULL
>   PRIMARY KEY ( ( s_id, flag ), time, id )
>   WITH CLUSTERING ORDER BY ( time ASC );
>  I have 5 nodes with replica 3
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc': '3'}  AND durable_writes = true;
> Below was the result for nodetoolcfstats
> Keyspace: test
> Read Count: 128770
> Read Latency: 1.42208769123243 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: tableName
> SSTable count: 3
> Space used (live): 280777032
> Space used (total): 280777032
> Space used by snapshots (total): 0
> Off heap memory used (total): 2850227
> SSTable Compression Ratio: 0.24706731995327527
> Number of keys (estimate): 1277211
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 3
> Local read latency: 0.396 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 1589848
> Bloom filter off heap memory used: 1589824
> Index summary off heap memory used: 1195691
> Compression metadata off heap memory used: 64712
> Compacted partition minimum bytes: 311
> Compacted partition maximum bytes: 535
> Compacted partition mean bytes: 458
> Average live cells per slice (last five minutes): 102.92671205446536
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Table: my_view
> SSTable count: 4
> Space used (live): 126114270
> Space used (total): 126114270
> Space used by snapshots (total): 0
> Off heap memory used (total): 91588
> SSTable Compression Ratio: 0.1652453778228639
> Number of keys (estimate): 8
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 0
> Local read count: 128767
> Local read latency: 1.590 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 64
> Index summary off heap memory used: 140
> Compression metadata off heap memory used: 91384
> Compacted partition minimum bytes: 3974
> Compacted partition maximum bytes: 386857368
> Compacted partition mean bytes: 26034715
> Average live cells per slice (last five minutes): 102.99462595230145
> Maximum live cells per slice (last five minutes): 103
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Thank you.
> Nizar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-04-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235141#comment-15235141
 ] 

Jack Krupansky commented on CASSANDRA-9754:
---

Any idea how a new wide partition will perform relative to the same amount of 
data and same number of clustering rows divided into bucketed partitions? For 
example, a single 1 GB wide partition vs. ten 100 MB partitions (same partition 
key plus a 0-9 bucket number) vs. a hundred 10 MB partitions (0-99 bucket 
number), for two access patterns: 1) random access a row or short slice, and 2) 
a full bulk read of the 1 GB of data, one moderate slice at a time.

Or maybe the question is equivalent to asking what the cost is to access the 
last row of the 1 GB partition vs. the last row of the tenth or hundredth 
bucket of the bucketed equivalent.

No precision required. Just inquiring whether we can get rid of bucketing as a 
preferred data modeling strategy, at least for the common use cases where the 
sum of the buckets is roughly 2 GB or less..

The bucketing approach does have the side effect of distributing the buckets 
around the cluster, which could be a good thing, or maybe not.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-07 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231487#comment-15231487
 ] 

Jack Krupansky commented on CASSANDRA-8844:
---

Since this new feature has evolved significantly since the original 
description, is there a good summary available for the current form of the 
feature? Not like full doc or the internal implementation details, but a 
concise summary at the user level, like where the CDC data will be stored, its 
format, how to retrieve it, and potential performance impact, both in terms of 
amount of CPU time required and additional memory required if CDC is enabled. 
Thanks.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order,

[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216937#comment-15216937
 ] 

Jack Krupansky commented on CASSANDRA-11383:


+1 for using [~jrwest]'s most recent two comments here as the source for the 
doc changes that I myself was referring to here.

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216763#comment-15216763
 ] 

Jack Krupansky commented on CASSANDRA-11383:


Thanks, [~jrwest]. I think that I finally don't have any additional questions!

BTW, the DataStax Distribution of Cassandra (DDC) for 3.4 is out now, so the 
DataStax Cassandra doc has been updated for 3.4, including SASI:
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndexConcept.html
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html

That happened four days ago, so maybe some of our recent discussion since then 
should get cycled into the doc. For example, your comments about range queries 
on SPARSE data. I'll pings docs to alert them of the discussion here, but you 
guys are free to highlight whatever info you think users should know about.

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216233#comment-15216233
 ] 

Jack Krupansky commented on CASSANDRA-11383:


Thanks, [~jrwest] and [~doanduyhai]. I think I finally have the SASI 
terminology down now - SPARSE modes means that the index is sparse (few index 
entries per original column value) while the column data is dense (many 
distinct values.) And that non-SPARSE (AKA PREFIX) mode, the default mode, 
supports any cardinality of data, especially the low cardinality data that 
SPARSE mode does not support.

Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is 
considered advisable/recommended for high cardinality column data, where SPARSE 
mode is nominally a better choice. Maybe that is strictly a matter of whether 
the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is 
required, but if not, SPARSE mode sounds like the better choice. But I don't 
have a handle on the internal index structures to know if that's absolutely the 
case - that a PREFIX index for SPARSE data would necessarily be larger and/or 
slower than a SPARSE index for high cardinality data. I would hope so, but it 
would be good to have that confirmed.

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216168#comment-15216168
 ] 

Jack Krupansky commented on CASSANDRA-11383:


1. Was the conclusion that a SPARSE SASI index would work well even for low 
cardinality data (as in the original reported case, for period_end_month_int), 
or was there some application-level change required to adapt to a SASI change 
as well?

2. Is it now official that a non-SPARSE SASI index (e.g., PREFIX) can be used 
for non-TEXT data (int in particular), at least for the case of exact match 
lookup?


> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215912#comment-15215912
 ] 

Jack Krupansky commented on CASSANDRA-11448:


Curious that we haven't had an acronym for Out Of Space in more common usage. 
In fact, this is the first time I've seen it. OOM is so common and so obvious, 
but OOS seems so foreign. Maybe that's because disk drives are so big these 
days that most people will now no longer come close to... running OOS on an 
HDD. SSD changes that with the (currently) much smaller drive size.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202141#comment-15202141
 ] 

Jack Krupansky commented on CASSANDRA-11383:


bq.  recreated them one by one but with no avail, it eventually OOM after a 
while

But are you waiting for each to finish its build before proceeding to the next? 
I mean, can even one index alone complete a build?

Or, can you create the first 2 or 3 and let them run in parallel to completion 
before proceeding to the next. Maybe there is some practical limit to how many 
indexes you can build in parallel before the rate of garbage generation exceeds 
the rate of GC with all of this going on in parallel.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202080#comment-15202080
 ] 

Jack Krupansky commented on CASSANDRA-11383:


1. How large are each of the text fields being indexed? Are they fairly short 
or are some quite long (and not tokenized, either)? I'm wondering if maybe a 
wide column is causing difficulty.
2. Does OOM occur if SASI indexes are created one at a time - serially, waiting 
for full index to build before moving on to the next?
3. Do you need a 32G heap to build just one index? I cringe when I see a heap 
larger than 14G. See if you can get a single SASI index build to work in 10-12G 
or less.


> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202876#comment-15202876
 ] 

Jack Krupansky commented on CASSANDRA-11383:


The int field could easily be made a text field if that would make SASI work 
better (you can even do prefix query by year then.)

Point 1 is precisely what SASI SPARSE is designed for. It also is what 
Materialized Views (formerly Global Indexes) is for and MV is even better for 
since it eliminates the need to scan multiple nodes since the rows get 
collected based on the new partition key that can include the indexed data 
value.

You're using cardinality backwards - it is supposed to be a measure of the 
number of distinct values in a column, not the number of rows containing each 
value. See: https://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29. 
Granted, in ERD cardinality is the count of rows in a second table for each 
column value in a given table (one to n, n to one, etc.), but in the context of 
an index there is only one table involved, although you could consider the 
index to be a table, but that would be a little odd. In any case, best to stick 
with the standard SQL meaning of the cardinality of data values in a column. 
So, to be clear, an email address is high cardinality and gender is low 
cardinality. And the end of month int field is low cardinality or not dense in 
the original SASI doc terminology.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202877#comment-15202877
 ] 

Jack Krupansky commented on CASSANDRA-11383:


Sorry for any extra noise I may have generated here - [~xedin] has the info he 
needs without me.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202816#comment-15202816
 ] 

Jack Krupansky edited comment on CASSANDRA-11383 at 3/19/16 3:52 PM:
-

The terminology is a bit confusing here - everybody understands what a sparse 
matrix is, but exactly what constitutes sparseness in a column is very unclear. 
What is clear is that  the cardinality (number of distinct values) is low for 
that int field. A naive person (okay... me) would have thought that sparse data 
meant few distinct values, which is what the int field is (36 distinct values.)

I decided to check the doc to see what it says about SPARSE, but discovered 
that the doc doesn't exist yet in the main Cassandra doc - I sent a message to 
d...@datastax.com about that (turns out, they sync the doc to the DataStax 
Distribution of Cassandra (DDC) and DDC 3.4 is not out yet, coming soon.) So I 
went back to the orginal, pre-integration doc (https://github.com/xedin/sasi) 
and see that there is separate, non-integrated doc for SASI in the Cassandra 
source tree - https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which 
makes clear that "SPARSE, which is meant to improve performance of querying 
large, dense number ranges like timestamps for data inserted every 
millisecond." Oops... SPARSE=dense, but in any case SPARSE is designed for high 
cardinality of distinct values, which the int field is clearly not.

I would argue that SASI should give a strongly-worded warning if the column 
data for a SPARSE index has low cardinality - low number of distinct column 
values and high number of index values per column value.


was (Author: jkrupan):
The terminology is a bit confusing here - everybody understands what a sparse 
matrix is, but exactly what constitutes sparseness in a column is very unclear. 
What is clear is that  the cardinality (number of distinct values) is low for 
that int field. A naive person (okay... me) would have thought that sparse data 
meant few distinct values, which is what the int field is (36 distinct values.)

I decided to check the doc to see what it says about SPARSE, but discovered 
that the doc doesn't exist yet in the main Cassandra doc - I sent a message to 
d...@datastax.com about that. So I went back to the orginal, pre-integration 
doc (https://github.com/xedin/sasi) and see that there is separate, 
non-integrated doc for SASI in the Cassandra source tree - 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear 
that "SPARSE, which is meant to improve performance of querying large, dense 
number ranges like timestamps for data inserted every millisecond." Oops... 
SPARSE=dense, but in any case SPARSE is designed for high cardinality of 
distinct values, which the int field is clearly not.

I would argue that SASI should give a strongly-worded warning if the column 
data for a SPARSE index has low cardinality - low number of distinct column 
values and high number of index values per column value.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202816#comment-15202816
 ] 

Jack Krupansky commented on CASSANDRA-11383:


The terminology is a bit confusing here - everybody understands what a sparse 
matrix is, but exactly what constitutes sparseness in a column is very unclear. 
What is clear is that  the cardinality (number of distinct values) is low for 
that int field. A naive person (okay... me) would have thought that sparse data 
meant few distinct values, which is what the int field is (36 distinct values.)

I decided to check the doc to see what it says about SPARSE, but discovered 
that the doc doesn't exist yet in the main Cassandra doc - I sent a message to 
d...@datastax.com about that. So I went back to the orginal, pre-integration 
doc (https://github.com/xedin/sasi) and see that there is separate, 
non-integrated doc for SASI in the Cassandra source tree - 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear 
that "SPARSE, which is meant to improve performance of querying large, dense 
number ranges like timestamps for data inserted every millisecond." Oops... 
SPARSE=dense, but in any case SPARSE is designed for high cardinality of 
distinct values, which the int field is clearly not.

I would argue that SASI should give a strongly-worded warning if the column 
data for a SPARSE index has low cardinality - low number of distinct column 
values and high number of index values per column value.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202196#comment-15202196
 ] 

Jack Krupansky commented on CASSANDRA-11383:


Just to make sure I understand what's going on...

1. The first index is for the territory_code column, whose values are simple 
2-character country codes from allCountries which has 8 entries, with 'FR' 
repeated 3 times in that list of 8 country codes.
2. How many rows are generated per machine - is it 100 * 40,000,000 = 4 billion?
3. That means that the SASI index will have six unique index values, each with 
roughly 4 billion / 8 = 500 million rows, correct? (Actually, 5 of the 6 unique 
values will have 500 million rows and the 6th will have 1.5 billion rows (3 
times 500 million.) Sounds like a great stress test for SASI!
4. That's just for the territory_code column.
5. Some of the columns have only 2 unique values, like commercial_offer_code. 
That would mean 2 billion rows for each indexed unique value. An even more 
excellent stress test for SASI!


> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

2016-03-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202241#comment-15202241
 ] 

Jack Krupansky commented on CASSANDRA-11383:


What's the table schema? Is period_end_month_int text or int?

period_end_month_int has 3 years times 12 months = 36 unique values, so 3.4 
billion / 36 = 94.44 million rows for each indexed unique value.

> SASI index build leads to massive OOM
> -
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
> Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-03-10 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189407#comment-15189407
 ] 

Jack Krupansky commented on CASSANDRA-9754:
---

Is this issue still considered a Minor priority? Seems like a bigger deal to 
me. +1 for making it a Major priority - unless there is a longer list of even 
bigger fish in the queue.

Just today there is a user on the list struggling with time series data and 
really not wanting to have to split a partition that he needs to be able to 
scan. Of source, scanning a super-wide partition will still be a very bad idea 
anyway, but at least more narrow scans would still be workable with this 
improvement in place.

Is this a 3.x improvement or 4.x or beyond? +1 for 3.x (3.6? 3.8?).

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11169) [sasi] exception thrown when trying to index row with index on set

2016-02-17 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151289#comment-15151289
 ] 

Jack Krupansky commented on CASSANDRA-11169:


To be clear, "Fixed" means that a plain English error is given for the CQL 
statement rather than a nasty-looking exception.

Is it still the intent to eventually/soon implement indexing of the column 
values of collection columns? Is there a Jira for that? Is it like a 3.x 
improvement or more like 4.x?

> [sasi] exception thrown when trying to index row with index on set
> 
>
> Key: CASSANDRA-11169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11169
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jon Haddad
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I have a brand new cluster, built off 1944bf507d66b5c103c136319caeb4a9e3767a69
> I created a new table with a set, then a SASI index on the set.  I 
> tried to insert a row with a set, Cassandra throws an exception and becomes 
> unavailable.
> {code}
> cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table a (id int PRIMARY KEY , s set );
> cqlsh:test> create CUSTOM INDEX on a(s) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> cqlsh:test> insert into a (id, s) values (1, {'jon', 'haddad'});
> WriteTimeout: code=1100 [Coordinator node timed out waiting for replica 
> nodes' responses] message="Operation timed out - received only 0 responses." 
> info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra stacktrace:
> {code}
> java.lang.AssertionError: null
>   at org.apache.cassandra.db.rows.BTreeRow.getCell(BTreeRow.java:212) 
> ~[main/:na]
>   at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.getValueOf(ColumnIndex.java:194)
>  ~[main/:na]
>   at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.index(ColumnIndex.java:95) 
> ~[main/:na]
>   at 
> org.apache.cassandra.index.sasi.SASIIndex$1.insertRow(SASIIndex.java:247) 
> ~[main/:na]
>   at 
> org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onInserted(SecondaryIndexManager.java:808)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:335)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:295)
>  ~[main/:na]
>   at org.apache.cassandra.utils.btree.BTree.buildInternal(BTree.java:136) 
> ~[main/:na]
>   at org.apache.cassandra.utils.btree.BTree.build(BTree.java:118) 
> ~[main/:na]
>   at org.apache.cassandra.utils.btree.BTree.update(BTree.java:177) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:156)
>  ~[main/:na]
>   at org.apache.cassandra.db.Memtable.put(Memtable.java:244) ~[main/:na]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1216) 
> ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:531) ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na]
>   at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) 
> ~[main/:na]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:228) ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy$$Lambda$201/413275033.run(Unknown 
> Source) ~[na:na]
>   at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1343)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2520)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_45]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[main/:na]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [main/:na]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [main/:na]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11145) Materialized View throws error if Map type is in base table

2016-02-09 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140331#comment-15140331
 ] 

Jack Krupansky commented on CASSANDRA-11145:


Sounds like a dup of CASSANDRA-11069. The good news is that there is a 
workaround: "all collection columns must be selected in a materialised view" - 
make sure to explicitly list each collection column from the base table in the 
MV SELECT. You can still use "*" to get all columns from the base, but also 
need to add the collection column names.

Kind of surprised that this bug didn't have a priority for 3.3.

> Materialized View throws error if Map type is in base table
> ---
>
> Key: CASSANDRA-11145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11145
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Patrick McFadin
>Priority: Critical
>
> Using the following test setup:
> {code}CREATE TABLE test (
>   a int PRIMARY KEY,
>   b text,
>   c map
> );
> CREATE MATERIALIZED VIEW test_mv AS
> SELECT a, b
> FROM test
> WHERE a IS NOT NULL AND b IS NOT NULL
> PRIMARY KEY(b, a);
> {code}
> When inserting data to the base table:
> {code}
> INSERT INTO test (a,b,c)
> VALUES(1, 'b', {'c':'c'});
> {code}
> The insert will fail and a stack trace is generated in the logs:
> {code}
> ERROR [SharedPool-Worker-2] 2016-02-10 05:25:05,957 StorageProxy.java:1339 - 
> Failed to apply mutation locally : {}
> java.lang.IllegalStateException: [ColumnDefinition{name=c, 
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
>  kind=REGULAR, position=-1}] is not a subset of []
>   at 
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:532) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.Columns$Serializer.serializedSubsetSize(Columns.java:484)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializedRowBodySize(UnfilteredSerializer.java:277)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:249)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:236)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:229)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serializedSize(UnfilteredRowIteratorSerializer.java:171)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:716)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:372)
>  ~[main/:na]
>   at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:498) ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na]
>   at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) 
> ~[main/:na]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:748) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.view.ViewManager.pushViewReplicaUpdates(ViewManager.java:149)
>  ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:516) ~[main/:na]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na]
>   at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) 
> ~[main/:na]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:228) ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy$$Lambda$197/1675816556.run(Unknown 
> Source) ~[na:na]
>   at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1333)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2510)
>  [main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [main/:na]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [main/:na]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [main/:na]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
>

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-02-09 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139712#comment-15139712
 ] 

Jack Krupansky commented on CASSANDRA-9754:
---

bq. large CQL partitions (4GB,75GB,etc)

What is the intended target/sweet spot for large partitions... 1GB, 2GB, 4GB, 
8GB, 10GB, 15GB, 16GB, or... what? Will random access to larger partitions 
create any significant heap/off-heap memory demand, or will heap/memory simply 
become the total rows accessed regardless of how they might be bucketed into 
partitions?

Will we be able to tell people that bucketing of partitions is now never 
needed, or will there now just be a larger bucket size, like 4GB/partition 
rather than the 10MB or 50MB or 100MB that some of us recommend today?

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-02-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132534#comment-15132534
 ] 

Jack Krupansky commented on CASSANDRA-11067:


Clarification question about SASI itself (as oppose to Cassandra 
syntax/semantics): If the column is tokenized, is the original raw literal text 
for each column also still available for indexing or are only the 
tokenized/analyzed terms indexed?

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-02-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132531#comment-15132531
 ] 

Jack Krupansky commented on CASSANDRA-11067:


Clarification question: Will SASI apply the analyzer to the LIKE string?

Then... what will happen if that analysis produces more than one term? In Solr 
land that is expected and the semantics is phrase query. What will SASI do? 
Will it be an error or be treated as a list of AND terms?

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-02-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132517#comment-15132517
 ] 

Jack Krupansky commented on CASSANDRA-11067:


For reference, over in Solr land users constantly struggle with how to combine 
exact and partial matching - sometimes they want an absolute literal match for 
the full field/column, sometimes a wildcard on that full field, sometimes 
keyword tokenization, sometimes wildcard on tokenized terms, sometimes phrases 
of tokenized terms, and sometimes phrases from the full literal string. 
Unfortunately, Solr doesn't have a direct answer for that, so people are forced 
to copy the field (typically a ) directive and then one field is 
the literal string and the other is the tokenized field. That gives them 
complete control at query time, so q=name_literal:Joe would only match when the 
full name is Joe while q=name_tokenized:joe would match for any name with joe. 
Similarly, q=name_lit:Jo* would only match names with Jo as a prefix, while 
q=name_tok:jo* would match Joe Smith as well as Bill Johnson.

The user might also opt to copy to yet a third field which is tokenized but 
with the so-called keyword tokenizer which permits the string to be normalized 
but not broken into tokens. The common case is to lower case, but other common 
cases would be to eliminate punctuation, replace certain prefixes and suffixes, 
or whatever.

The real point there is that "exact" match is still a range of possibilities.

One of the issues here for Cassandra is whether you really want to combine 
these two separate exactness semantics that Solr keeps separate.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10368) Support Restricting non-PK Cols in Materialized View Select Statements

2016-01-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122565#comment-15122565
 ] 

Jack Krupansky commented on CASSANDRA-10368:


I just stumbled on this issue as kind of a loose end. Is there any intent to 
support this feature any time soon, assuming that the implementation is not a 
big deal?

> Support Restricting non-PK Cols in Materialized View Select Statements
> --
>
> Key: CASSANDRA-10368
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10368
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> CASSANDRA-9664 allows materialized views to restrict primary key columns in 
> the select statement.  Due to CASSANDRA-10261, the patch did not include 
> support for restricting non-PK columns.  Now that the timestamp issue has 
> been resolved, we can add support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11075) Consider making SASI the default index implementation

2016-01-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121779#comment-15121779
 ] 

Jack Krupansky commented on CASSANDRA-11075:


bq. A good start would probably be run all our dtest and utests on a version 
where SASI is hard-coded as default.

Maybe it would make sense to introduce a config setting to select the default 
indexing. I presume that it would mean the default mode would be SPARSE, which 
may make sense for the traditional use cases of Cassandra secondary indexes - 
cardinality is not too high and not too low.

Syntax-size, OPTIONS can only be specified when USING is specified. That would 
only be an issue if there weren't keywords for all the SASI options. I vaguely 
recall [~jbellis] objecting some time ago in some completely unrelated context 
about cluttering up the CQL syntax with lots of keywords for options, so it 
might make sense to loosen up the CREATE INDEX syntax to allow WITH OPTIONS 
even when a class is not specified. The mode might make sense as a keyword, but 
then we get to the analyzer class and case sensitivity and the keyword clutter 
would start getting out of hand.

> Consider making SASI the default index implementation
> -
>
> Key: CASSANDRA-11075
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11075
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Pavel Yaskevich
>
> We now have 2 secondary index implementation in tree: the old native ones and 
> SASI. Moving forward, that feels like one too much to maintain, especially 
> since it seems that SASI is an overall better implementation.
> So we should gather enough data to decide if SASI is indeed always better (or 
> at least sufficiently better than we're convinced no-one would want to stick 
> with the native implementation), and if that's the case, we should consider 
> making it the default (and ultimately get rid of the current implementation).
> So first, we should at least:
> # double check that SASI handles all cases that the native implementation 
> handles. A good start would probably be run all our dtest and utests on a 
> version where SASI is hard-coded as default.
> # compare the performance of SASI and native indexes. In particular our 
> native indexes, in all their weaknesses, have the advantage of not doing a 
> read-before-write. Haven't looked at SASI much so I don't know if it's the 
> case but anyway, we need numbers on both reads and writes.
> Once we have that, if we do decide to make SASI the default, then we need to 
> figure out what is the upgrade path (and whether we add extra syntax for SASI 
> specific options).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-01-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119654#comment-15119654
 ] 

Jack Krupansky commented on CASSANDRA-11067:


Thanks, [~slebresne], for opening that separate issue. My apologies for taking 
advantage of the vague and general wording of the title/summary of this 
particular Jira. I had considered making my suggestions on the original ticket, 
but didn't when I saw that it was already "closed" and this one is suggestively 
labeled "Improve SASI syntax" (rather than "Restore = semantics for SASI".) 
Again, sorry for the distraction from getting SASI done for 3.4 ASAP.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-01-26 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117501#comment-15117501
 ] 

Jack Krupansky commented on CASSANDRA-11067:


Awesome! Watch out, SQL!

One more nit...

The fact that a SASI index needs to be "CUSTOM" and an explicit class name is 
needed feels a little hokey to me. Is there a longer-term plan to fully 
integrate SASI so it is a first-class feature rather than simply an add-on? In 
fact, is there any reason not to make it the default secondary indexing (other 
than the fact that is new and experimental and unproven in the real world yet)? 
Having the mode be a keyword rather than all this extra lexical distraction 
would feel better to me.

But if this is billed as experimental in 3.4, maybe there is no real harm in 
deferring first-class status until a future feature release.

Still, it would be nice to be able to say CREATE PREFIX INDEX or CREATE SUFFIX 
INDEX or CREATE SPARSE INDEX.


> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10699) Make schema alterations strongly consistent

2016-01-25 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116229#comment-15116229
 ] 

Jack Krupansky commented on CASSANDRA-10699:


Will resolution of this ticket enable concurrent clients to successfully 
perform CREATE TABLE IF NOT EXISTS? Or will that still be problematic? I just 
want to know if this is the ticket to point people to for concurrent CREATE 
TABLE IF NOT EXISTS issues.

In the mean time, should we update the doc to effectively say that concurrent 
CREATE TABLE IF NOT EXISTS is not supported and that it is the responsibility 
of the user to absolutely refrain from attempting any potentially concurrent 
attempts to CREATE TABLE IF NOT EXISTS for a given table?

A related doc issue is how the user can tell that the CREATE TABLE has 
successfully completed around the ring. IOW, if cqlsh returns success, is the 
table really created on all nodes? Is a nodetool tablestats a reliable check - 
if all nodes are listed then the CREATE TABLE has succeeded/completed on all 
nodes?


> Make schema alterations strongly consistent
> ---
>
> Key: CASSANDRA-10699
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10699
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Aleksey Yeschenko
> Fix For: 3.x
>
>
> Schema changes do not necessarily commute. This has been the case before 
> CASSANDRA-5202, but now is particularly problematic.
> We should employ a strongly consistent protocol instead of relying on 
> marshalling {{Mutation}} objects with schema changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-01-25 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115501#comment-15115501
 ] 

Jack Krupansky commented on CASSANDRA-11067:


How about a severely restricted LIKE that only permits patterns ending with a 
"%" for prefix query (LIKE 'J%') or with a "%" at either end for contains (LIKE 
'%abc%') or ending with a % for suffix query (LIKE '%smith')? Then it would be 
fully compatible with SQL.

In any case, "=" would then attempt an exact match using the SASI index? That 
would allow both exact and inexact matching for each column using a single 
index.

If we can't have this restricted LIKE, descriptive keyword operators like 
SUFFIX and PREFIX would seem desirable. Could the existing CONTAINS operator 
also be used? They would also handle the case where the prefix/suffix/contains 
string is a parameter - otherwise the user has to do a messy concat.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Sam Tunnicliffe
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

2016-01-24 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114614#comment-15114614
 ] 

Jack Krupansky commented on CASSANDRA-10937:


bq. Number of keys (estimate): 10142095

That indicates that you have over 99% of your data on a single node, which is a 
slam-dunk antipattern. Check the numbers to make sure what you posted are 
valid, and if so, you'll need to redesign your partition key to distribute the 
data to more partition keys so that they get assigned to other nodes.

And if your client is sending INSERT requests to the various nodes of your 
cluster, five of them will have to forward those requests to that one node.

You need to get this resolved before attempting anything else.

Was this with RF=1? Presumably since those INSERTS are not being replicated to 
another node, or else the key count would have been roughly comparable on that 
other node.


> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> --
>
> Key: CASSANDRA-10937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>Reporter: Peter Kovgan
>Priority: Critical
> Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, 
> more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, 
> test5.rar, test_2.1.rar, test_2.1_logs_older.rar, 
> test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 
> to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also java load client and instructions how set-up and use 
> it.(test2.rar)
> Update:
> Later on test repeated with lesser load (10 mes/sec) with more relaxed 
> CPU (idle 25%), with only 2 test clients, but anyway test failed.
> Update:
> DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, 
> not 10-12.
> Attachments:
> test2.rar -contains most of material
> more-logs.rar - contains additional nodes logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836
]

Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 6:57 PM:
-

Is there also a way to query a SASI-indexed column by exact value? I mean, it
seems as if by enabling prefix or contains, that it will always query by prefix
or contains. For example, if I want to query for full first name, like where
their full first name really is "J" and not get "John" and "James" as well,
while at other times I am indeed looking for names starting with a prefix of
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact
match, and one a prefix match. Hmmm... in which case, which gets used if I just
specify a column name?

CREATE INDEX first_name_full ON mytable (first_name)...
CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)...

(I may be confused here - can you specify an index name in place of a column
name in a relation in a SELECT/WHERE clause (SELECT... WHERE...
first_name_exact = 'Joe')? I don't see any doc/spec that indicates that you
can. I'm not sure why I thought that you could. But I don't see any code that
detects and fails on this case at CREATE INDEX time. The code checks for
"everything but name" rather than detecting two non-keys/values indexes on the
same column.)

It would be good to have an example that illustrates this. In fact, I would
argue that first and last names are perfect examples of where you really do
need to query on both exact match and partial match. In fact, I'm not sure I
can think of any examples of non-tokenized text fields where you don't want to
reserve the ability to find an exact match even if you do need partial matches
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which
case, would I be better off with a SPARSE index for first_name_full, or would a
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost
automatically be converted to SPARSE. After all, the current recommended best
practice is to avoid secondary indexes where the column cardinality is either
very high or very low, which seems to be a match for SPARSE, although the
precise meaning of SPARSE is still a bit fuzzy for me.

Maybe, for the first_name use case I mentioned the user would be better off
with a first_name Materialized View using first_name in the PK instead of the
SPARSE SASI index. In fact, by placing first_name in the partition key of the
MV I could assure that all base table rows with the same first name would be on
the same node.

If all of that is true, we will need to give users some decent guidance on when
to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.)

was (Author: jkrupan):
Is there also a way to query a SASI-indexed column by exact value? I mean, it
seems as if by enabling prefix or contains, that it will always query by prefix
or contains. For example, if I want to query for full first name, like where
their full first name really is "J" and not get "John" and "James" as well,
while at other times I am indeed looking for names starting with a prefix of
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact
match, and one a prefix match. Hmmm... in which case, which gets used if I just
specify a column name?

CREATE INDEX first_name_full ON mytable (first_name)...
CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)...

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836
 ] 

Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 5:58 PM:
-

Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON mytable (first_name)...
CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.

Maybe, for the first_name use case I mentioned the user would be better off 
with a first_name Materialized View using first_name in the PK instead of the 
SPARSE SASI index. In fact, by placing first_name in the partition key of the 
MV I could assure that all base table rows with the same first name would be on 
the same node.

If all of that is true, we will need to give users some decent guidance on when 
to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.)


was (Author: jkrupan):
Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON mytable (first_name)...
CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbr

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836
 ] 

Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 4:55 PM:
-

Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON mytable (first_name)...
CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.


was (Author: jkrupan):
Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON table 
CREATE CUSTOM INDEX first_name_prefix ...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836
 ] 

Jack Krupansky commented on CASSANDRA-10661:


Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON table 
CREATE CUSTOM INDEX first_name_prefix ...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113816#comment-15113816
 ] 

Jack Krupansky commented on CASSANDRA-10661:


So is this stuff actually ready to release? I mean, consistent with the new 
philosophy that "trunk is always releasable"? IOW, if it does get committed, it 
will be in 3.4 no matter what? I only ask because it just seemed that there was 
stuff in flux fairly recently (a couple days ago), suggested it wasn't quite 
baked enough to be considered "releasable". 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9472) Reintroduce off heap memtables

2016-01-22 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112647#comment-15112647
 ] 

Jack Krupansky commented on CASSANDRA-9472:
---

A couple quick questions:

1. Does this Jira move memtables "entirely" offheap, or just "partially"? (Back 
in July the discussion was that fully offheap was too large an effort.)
2. Is there still an "arena" allocation onheap?
3. What ballpark fraction of a typical Cassandra heap is consumed by memtables 
- 80%, more, less?
4. Does moving memtables offheap get Cassandra to the point where a default JVM 
heap allocation is sufficient? If not, please be sure to offer new recommended 
best practice guidance as to how to estimate heap requirements when memtables 
are offheap.
5. What heuristic rule/threshold is used to determine how much of system memory 
can be consumed by offheap memtables? Is that limit user-controllable by a 
(documented) configuration setting?
6. Are offheap memtables an optional configuration setting, or hardwired?
7. Is this coming soon, like 3.4, or is it still a ways off?



> Reintroduce off heap memtables
> --
>
> Key: CASSANDRA-9472
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9472
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.x
>
>
> CASSANDRA-8099 removes off heap memtables. We should reintroduce them ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

2016-01-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110798#comment-15110798
 ] 

Jack Krupansky commented on CASSANDRA-10937:


A few more questions:

1. When nodes do crash, what happens when you restart them? Do they immediately 
crash again immediately or run for many hours?
2. Is it just a single node crashing or do like all the nodes fail around the 
same time, like falling dominoes?

Just to be clear, the fact that the cluster seemed fine for 48 hours does not 
tell us whether it might have been near the edge of failing for quite some time 
and maybe the precise pattern of load just statistically became the straw that 
broke the camel's back at that moment. That's why it's important to know what 
happened after you restarted and resumed the test after the crash as 48 hours.

It it really was a resource leak, then reducing the heap would make the failure 
occur sooner. Determine what the minimal heap size is to run the test at all - 
set it low enough so the test won't run even for a minute, then increase the 
heap so it does run, then decrease it by less than you increased it - a binary 
search for the exact heap size that is needed for the test to run even for a 
few minutes or an hour. At least then you would have an easy to reproduce test 
case. So if you can tune the heap so that the test can run successfully for say 
10 minutes before reliably hitting the OOM, then you can see how much you need 
to reduce the load (throttling the app) to be able to run without hitting OOM.

I'm not saying that there is absolutely no chance that there is a resource 
leak, just simply that there are still a lot of open questions to answer about 
usage before we can leap to that conclusion. Ultimately, we do have to have a 
reliable repo test case before anything can be done.

In any case, at least at this stage it seems clear that you probably do need a 
much larger cluster (more nodes with less load on each node.) Yes, it's 
unfortunate the Cassandra won't give you a nice clean message that says that, 
but that ultimate requirement remains unchanged - pending answers to all of the 
open questions.


> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> --
>
> Key: CASSANDRA-10937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>Reporter: Peter Kovgan
>Priority: Critical
> Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, 
> more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, 
> test5.rar, test_2.1.rar, test_2.1_logs_older.rar, 
> test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 
> to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also java load client and instructions how set-up and use 
> it.(test2.rar)
> Update:
> Later on test repeated with lesser load (10 mes/sec) with more relaxed 
> CPU (idle 25%), with only 2 test c

[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

2016-01-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110761#comment-15110761
 ] 

Jack Krupansky commented on CASSANDRA-10937:


Sorry, [~tierhetze], but as a matter of policy I don't download or read 
doc/docx files. Please post the essential text here.

> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> --
>
> Key: CASSANDRA-10937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>Reporter: Peter Kovgan
>Priority: Critical
> Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, 
> more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, 
> test5.rar, test_2.1.rar, test_2.1_logs_older.rar, 
> test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 
> to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also java load client and instructions how set-up and use 
> it.(test2.rar)
> Update:
> Later on test repeated with lesser load (10 mes/sec) with more relaxed 
> CPU (idle 25%), with only 2 test clients, but anyway test failed.
> Update:
> DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, 
> not 10-12.
> Attachments:
> test2.rar -contains most of material
> more-logs.rar - contains additional nodes logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

2016-01-18 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105495#comment-15105495
]

Jack Krupansky edited comment on CASSANDRA-10937 at 1/18/16 5:16 PM:
-

I still don't see any reason to believe that there is a bug here and that the
primary issue is that you are overloading the cluster. Sure, Cassandra should
do a better job of shedding/failing excessive incoming requests, and there is
an open Jira ticket to add just such a freature, but even with that new
feature, the net effect will be the same - it will still be up to the
application and operations to properly size the cluster and throttle
application load before it gets to Cassandra.

OOM is not typically an indication of a software bug. Sure, sometimes code has
memory leaks, but with a highly dynamic system such as Cassandra, it typically
means either a misconfigured JVM or just very heavy load. Sometimes OOM simply
means that there is a lot of background processing going on (like compactions
or hinted handoff) that is having trouble keeping up with incoming requests.
Sometimes OOM occurs because you have too large a heap which defers GC but then
GC takes too long and further incoming requests simply generate more pressure
on the heap faster than that massive GC can deal with it.

It is indeed tricky to make sure the JVM has enough heap but not too much. DSE
typically runs with a larger heap by default. You can try increasing your heap
to 10 or 12G. But if you make the heap too big, the big GC can bite you as
described above. In that case, the heap needs to be reduced. Typically you
don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it
typically means the load on that node is simply too heavy.

Be sure to review the recommendations in this blog post on reasonable
recommendations:
http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra

A few questions that will help us better understand what you are really trying
to do:

1. How much reading are you doing and when relative to writes?
2. Are you doing any updates or deletes? (Cause compaction, which can fall
behind your write/update load.)
3. How much data is on the cluster (rows)?
4. How many tables?
5. What RF? RF=3 would be the recommendation, but if you have a heavy read load
you may need RF=5, although heavy load usually means you just need a lot more
nodes so that the fraction of incoming requests going to a particular node are
dramatically reduced. RF>3 is only needed if there is high load for each
particular row or partition.
6. Have you tested using cassandra-stress? That's the gold standard around here.
7. Are your clients using token-aware routing? (Otherwise a write must be
bounced from the coordinating node to the node owning the token for the
partition key.)
8. Are you using batches for your writes? If, so, do all the writes in one
batch have the same partition key? (If not, adds more network hops.)
9. What expectations did you have as to how many writes/reads a given number of
nodes should be able to handle?

was (Author: jkrupan):
I still don't see any reason to believe that there is a bug here and that the
primary issue is that you are overloading the cluster. Sure, Cassandra should
do a better job of shedding/failing excessive incoming requests, and there is
an open Jira ticket to add just such a freature, but even with that new
feature, the net effect will be the same - it will still be up to the
application and operations to properly size the cluster and throttle
application load before it gets to Cassandra.

Be sure to review the recommendations in this blog post on reasonable
recommendations:
http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra

A few questions that will help us better understand what you ar

[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

2016-01-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105495#comment-15105495
 ] 

Jack Krupansky commented on CASSANDRA-10937:


I still don't see any reason to believe that there is a bug here and that the 
primary issue is that you are overloading the cluster. Sure, Cassandra should 
do a better job of shedding/failing excessive incoming requests, and there is 
an open Jira ticket to add just such a freature, but even with that new 
feature, the net effect will be the same - it will still be up to the 
application and operations to properly size the cluster and throttle 
application load before it gets to Cassandra.

OOM is not typically an indication of a software bug. Sure, sometimes code has 
memory leaks, but with a highly dynamic system such as Cassandra, it typically 
means either a misconfigured JVM or just very heavy load. Sometimes OOM simply 
means that there is a lot of background processing going on (like compactions 
or hinted handoff) that is having trouble keeping up with incoming requests. 
Sometimes OOM occurs because you have too large a heap which defers GC but then 
GC takes too long and further incoming requests simply generate more pressure 
on the heap faster than that massive GC can deal with it.

It is indeed tricky to make sure the JVM has enough heap but not too much. DSE 
typically runs with a larger heap by default. You can try increasing your heap 
to 10 or 12G. But if you make the heap too big, the big GC can bite you as 
described above. In that case, the heap needs to be reduced. Typically you 
don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it 
typically means the load on that node is simply too heavy.

Be sure to review the recommendations in this blog post on reasonable 
recommendations:
http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra

A few questions that will help us better understand what you are really trying 
to do:

1. How much reading are you doing and when relative to writes?
2. Are you doing any updates or deletes? (Cause compaction, which can fall 
behind your write/update load.)
3. How much data is on the cluster (rows)?
4. How many tables?
5. What RF? RF=3 would be the recommendation, but if you have a heavy read load 
you may need RF=5.
6. Have you tested using cassandra-stress? That's the gold standard around here.
7. Are your clients using token-aware routing? (Otherwise a write must be 
bounced from the coordinating node to the node owning the token for the 
partition key.)
8. Are you using batches for your writes? If, so, do all the writes in one 
batch have the same partition key? (If not, adds more network hops.)


> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> --
>
> Key: CASSANDRA-10937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>Reporter: Peter Kovgan
>Priority: Critical
> Attachments: gc-stat.txt, more-logs.rar, some-heap-stats.rar, 
> test2.rar, test3.rar, test4.rar, test5.rar, test_2.1.rar, 
> test_2.1_logs_older.rar, test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, e

[jira] [Commented] (CASSANDRA-10922) Inconsistent query results

2016-01-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102056#comment-15102056
 ] 

Jack Krupansky commented on CASSANDRA-10922:


Normally this kind of investigation should be pursued on the user list before 
assuming that there is an actual bug, but now that we are here...

1. Describe how you got to this situation - did you upgrade the old cluster or 
copy and import sstables, or... exactly what?
2. Provide the schema. Anxious to know the type of that column and exactly why 
you are using hex.
3. Create a dummy table with exactly the same schema and compose INSERT 
statements that insert the data you are querying. Does the query work fine on 
that dummy table? Post that schema, INSERTS, SELECTs, and output here.
4. Create that same dummy table in a single-node test cluster running C* 2.2.3, 
execute those dummy INSERTs, see that the query works the way it used to, 
"upgrade" that test database to C* 3.0.2 the same way you did your main cluster 
and see if the query fails in the way you have reported. If it doesn't... then 
nobody here will have much to go on. IN any case, be sure to post the exact 
steps you used.

That's a lot of work to do, but start by posting the schema and the output of 
the query that shows both rows.


> Inconsistent query results
> --
>
> Key: CASSANDRA-10922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10922
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Maxim Podkolzine
>Priority: Critical
>
> I have a DB created with Cassandra 2.2.3. And currently I'm running it by 
> Cassandra 3.0.2.
> The value of a particular cell is returned depending on the query I run (in 
> cqlsh):
> - returned when iterate all columns, i.e.
> SELECT value FROM "3xupsource".Content WHERE databaseid=0x2112 LIMIT 2
> (I can see the columns 0x and 0x0100 there, the values seem 
> correct)
> - not returned when I specify a particular column
> SELECT value FROM "3xupsource".Content WHERE databaseid=0x2112 AND 
> columnid=0x0100
> Other queries like SELECT value FROM "3xupsource".Content WHERE 
> databaseid=0x2112 AND columnid=0x work consistently.
> There is nothing in Cassandra error log, so it does not look like a 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava

2016-01-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096425#comment-15096425
 ] 

Jack Krupansky commented on CASSANDRA-10528:


Pardon my interruption of the relevant discussion flow here, but can somebody 
point me to a deeper discussion of exactly how TPC applies to more complex 
requests? I mean, I can fully grasp TPC for the common use case where the 
client is doing a point query (exact row PK specified) with a token-aware 
driver and the row is fully in a memtable - a reasonably direct control flow 
path, but what about all the semi-common access patterns that are not so 
direct, namely anything that would require I/O or a network hop or is fairly 
CPU-intensive for a non-trivial amount of time?

The simplest example being a non-token aware query that the coordinator node 
has to send to another node. Is this thread/core completely tied up while 
waiting for the remote response across the network? IOW, is the redesign for a 
100% pure-TPC architecture or is it for a hybrid, with TPC only for some use 
cases and then SEDA (queuing) when the control flow path is no longer direct 
and fast?

And what of requests that become I/O intensive, such as sstables when there has 
been heavy updating and compaction has fallen behind (maybe because it doesn't 
have enough threads)?

And then there are scan-intensive operations that are just going to take a long 
time. Wouldn't be architecturally better to break them into chunks such that 
each check gets TPC treatment but then the overall aggregation gets queue/SEDA 
treatment, so that such resource-intensive operations don't interfere with 
higher-volume, lower-latency point queries that TPC does better with?

And then there are scatter-gather type queries (especially DSE 
Search/Solr/Lucene) which have a much greater network latency factor. First, 
tying up a full thread/core while this requests are mostly sitting idle waiting 
for the network seems excessive. Second, a queued/SEDA model that supports 
chunking/partitioning the overall request down to more atomic (TPC) requests so 
that they can run in parallel on multiple threads/cores would seem highly 
desirable.

In short, will SEDA be completely gone or just TPC added for the cases where it 
is most relevant?

Thanks! In any case, it's great to see architectural progress on this front.


> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as 
> well as their 
> [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading].  The 
> primary concept of RX is the 
> [Obervable|http://reactivex.io/documentation/observable.html] which is 
> essentially a stream of stuff you can subscribe to and act on, chain, etc. 
> This is quite similar to [Java 8 streams 
> api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html]
>  (or I should say streams api is similar to it).  The difference is java 8 
> streams can't be used for asynchronous events while RxJava can.
> Another improvement since I last tried integrating RxJava is the completion 
> of CASSANDRA-8099 which provides is a very iterable/incremental approach to 
> our storage engine.  *Iterators and Observables are well paired conceptually 
> so morphing our current Storage engine to be async is much simpler now.*
> In an e

[jira] [Commented] (CASSANDRA-10985) OOM during bulk read(slice query) operation

2016-01-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090139#comment-15090139
 ] 

Jack Krupansky commented on CASSANDRA-10985:


How big a slice are you trying to read? I'd recommend no more than 5K columns 
in a single request and issue multiple requests.

Very large operations are an anti-pattern even if they do manage to sort of 
work.

Was this working before for you and suddenly stopped working or was this the 
first time you tried a slice of this size?

You're dealing with Thrift, so don't expect too much support.


> OOM during bulk read(slice query) operation
> ---
>
> Key: CASSANDRA-10985
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10985
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability
> Environment: OS : Linux 6.5
> RAM : 126GB
> assign heap size: 8GB
>Reporter: sumit thakur
>
> The thread java.lang.Thread @ 0x55000a4f0 Thrift:6 keeps local variables with 
> total size 16,214,953,728 (98.23%) bytes.
> The memory is accumulated in one instance of "java.lang.Thread" loaded by 
> "".
> The stacktrace of this Thread is available. See stacktrace.
> Keywords
> java.lang.Thread
> --
> Trace: 
> Thrift:6
>   at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:48)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.read(Ljava/io/DataInput;I)Ljava/nio/ByteBuffer;
>  (ByteBufferUtil.java:401)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithVIntLength(Lorg/apache/cassandra/io/util/DataInputPlus;)Ljava/nio/ByteBuffer;
>  (ByteBufferUtil.java:339)
>   at 
> org.apache.cassandra.db.marshal.AbstractType.readValue(Lorg/apache/cassandra/io/util/DataInputPlus;)Ljava/nio/ByteBuffer;
>  (AbstractType.java:391)
>   at 
> org.apache.cassandra.db.rows.BufferCell$Serializer.deserialize(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/LivenessInfo;Lorg/apache/cassandra/config/ColumnDefinition;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;)Lorg/apache/cassandra/db/rows/Cell;
>  (BufferCell.java:298)
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(Lorg/apache/cassandra/config/ColumnDefinition;Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;Lorg/apache/cassandra/db/rows/Row$Builder;Lorg/apache/cassandra/db/LivenessInfo;)V
>  (UnfilteredSerializer.java:453)
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;IILorg/apache/cassandra/db/rows/Row$Builder;)Lorg/apache/cassandra/db/rows/Row;
>  (UnfilteredSerializer.java:431)
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;Lorg/apache/cassandra/db/rows/Row$Builder;)Lorg/apache/cassandra/db/rows/Unfiltered;
>  (UnfilteredSerializer.java:360)
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext()Lorg/apache/cassandra/db/rows/Unfiltered;
>  (UnfilteredRowIteratorSerializer.java:217)
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext()Ljava/lang/Object;
>  (UnfilteredRowIteratorSerializer.java:210)
>   at org.apache.cassandra.utils.AbstractIterator.hasNext()Z 
> (AbstractIterator.java:47)
>   at org.apache.cassandra.db.transform.BaseRows.hasNext()Z (BaseRows.java:108)
>   at 
> org.apache.cassandra.db.LegacyLayout$3.computeNext()Lorg/apache/cassandra/db/LegacyLayout$LegacyCell;
>  (LegacyLayout.java:658)
>   at org.apache.cassandra.db.LegacyLayout$3.computeNext()Ljava/lang/Object; 
> (LegacyLayout.java:640)
>   at org.apache.cassandra.utils.AbstractIterator.hasNext()Z 
> (AbstractIterator.java:47)
>   at 
> org.apache.cassandra.thrift.CassandraServer.thriftifyColumns(Lorg/apache/cassandra/config/CFMetaData;Ljava/util/Iterator;)Ljava/util/List;
>  (CassandraServer.java:112)
>   at 
> org.apache.cassandra.thrift.CassandraServer.thriftifyPartition(Lorg/apache/cassandra/db/rows/RowIterator;ZZI)Ljava/util/List;
>  (CassandraServer.java:250)
>   at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(Ljava/util/List;ZILorg/apache/cassandra/db/ConsistencyLevel;Lorg/apache/cassandra/service/ClientState;)Ljava/util/Map;
>  (CassandraServer.java:270)
>   at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(Ljava/lang/String;Ljava/util/List;Lorg/apache/cassandra/thrift/ColumnParent;ILorg/apache/cassandra/thrift/Sli

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646738#comment-14646738
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

The CQL.textile for MV still shows parentheses around the selection list, which 
is not the case in SELECT.

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 alpha 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews

2015-07-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646736#comment-14646736
 ] 

Jack Krupansky commented on CASSANDRA-9927:
---

The CQL.textile for MV still shows parentheses being required around the 
selection list, which is not the case in SELECT.

> Security for MaterializedViews
> --
>
> Key: CASSANDRA-9927
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9927
> Project: Cassandra
>  Issue Type: Task
>Reporter: T Jake Luciani
> Fix For: 3.0 beta 1
>
>
> We need to think about how to handle security wrt materialized views. Since 
> they are based on a source table we should possibly inherit the same security 
> model as that table.  
> However I can see cases where users would want to create different security 
> auth for different views.  esp once we have CASSANDRA-9664 and users can 
> filter out sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-9927) Security for MaterializedViews

2015-07-29 Thread Jack Krupansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CASSANDRA-9927:
--
Comment: was deleted

(was: The CQL.textile for MV still shows parentheses being required around the 
selection list, which is not the case in SELECT.)

> Security for MaterializedViews
> --
>
> Key: CASSANDRA-9927
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9927
> Project: Cassandra
>  Issue Type: Task
>Reporter: T Jake Luciani
> Fix For: 3.0 beta 1
>
>
> We need to think about how to handle security wrt materialized views. Since 
> they are based on a source table we should possibly inherit the same security 
> model as that table.  
> However I can see cases where users would want to create different security 
> auth for different views.  esp once we have CASSANDRA-9664 and users can 
> filter out sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632464#comment-14632464
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Are there any significant advantages or disadvantages of using an MV as a pure 
global index - no data columns other than the primary key columns?

Consider the use case of large customer data rows with customer id as the 
primary key, and you wish to log in by any of customer id, user id, email 
address, social security number, full name and age or birth date, and name 
alone, but you really want to simply immediately map any of those alternative 
logins to the customer id so that the main customer data tables can be accessed 
directly rather than having all of the data replicated in a bunch of MVs.

So, each of the four MVs would not need any non-PK data columns per se, since 
the base table PK is (must be, right?) in the MV PK, I think. Does this make 
sense? Would there be any special efficiency (or inefficiency) to having 
essentially empty partitions? For example:

{code}
CREATE TABLE cust (id text, email text, ssn text, name text, address text, zip 
text, birth timestamp, data map, pwd text, PRIMARY KEY (id));
CREATE MATERIALIZED VIEW email AS SELECT id,email FROM cust PRIMARY KEY (email, 
id);
CREATE MATERIALIZED VIEW ssn AS SELECT id,ssn FROM cust PRIMARY KEY (ssn, id);
CREATE MATERIALIZED VIEW name AS SELECT id,name FROM cust PRIMARY KEY (name, 
id);
CREATE MATERIALIZED VIEW email AS SELECT id,name,zip,birth FROM cust PRIMARY 
KEY ((name,zip,birth), id);
{code}

Incidentally, the lookup by name alone would not necessarily be unique - it 
might not be for an end-user login per se but for a customer service agent who 
would view the list and then ask the customer some questions to narrow down 
which specific customer they are.

Does this specific use case represent what might be considered a best practice 
use of MVs? If not, why not or what improvements could be made?


> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631496#comment-14631496
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

bq. multiple MVs being updated

It would be good to get a handle on what the scalability of MVs per base table 
is in terms of recommended best practice. Hundreds? Thousands? A few dozen? 
Maybe just a handful, like 5 or 10 or a dozen?

I hate it when a feature like this gets implemented without scalability in mind 
and then some poor/idiot user comes along and tries a use case which is way out 
of line with the implemented architecture but we provide no guidance as to what 
the practical limits really are (e.g., number of tables - thousands vs. 
hundreds.)

It seems to me that the primary use case is for query tables, where an app 
might typically have a handful of queries and probably not more than a small 
number of dozens in even extreme cases.

In any case, it would be great to be clear about the design limit for number of 
MVs per base table - and to make sure some testing gets done to assure that the 
number is practical.

And by design limit I don't mean a hard limit where more will cause an explicit 
error, but where performance is considered acceptable.

Are the MV updates occurring in parallel with each other, or are they serial? 
How many MVs could a base table have before the MV updates effectively become 
serialized?

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-16 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629847#comment-14629847
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

bq. force users to include _all_ columns from the original PK in the MV PK.

I don't follow the rationale and that seems over-limiting. For example, if my 
base table was id, name, and address, with id as the PK, I couldn't have MV 
with just name or address as the PK key according to this requirement, right?

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628189#comment-14628189
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

What CL will apply when MV rows are deleted on TTL expiration?

Presumably each of the replicas of the base table will have its TTL expiration 
triggering roughly at the same time, each local change presumably triggering a 
delete of the MV, but the MV has replicas as well.

Maybe ANY is reasonable for CL for MV update on TTL since the app is not 
performing an explicit operation with explicit expectations.

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623370#comment-14623370
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

bq. we allow <= 1 non-pk column in a MV partition key however the error message 
we log on multiple attempts reads "Cannot include non-primary key column '%s' 
in materialized view partition key". We should log that <= 1 are allowed 
instead.

Wow, is that really true? It sounds like a crippling restriction. Is that 
simply a short-term expediency for the initial elease or a hard-core long-term 
restriction?

Just as an example if I had a table with name, address and id, with id as the 
primary key, I couldn't have an MV with just name or just address or just name 
and address as the partition key, right?

In particular, this restriction seems to preclude pure inverted index MVs - 
where the non-key content of the row is used to index the key for the row.

Still waiting to read an updated CQL spec - especially any such limitations.

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618763#comment-14618763
 ] 

Jack Krupansky edited comment on CASSANDRA-6477 at 7/8/15 3:22 PM:
---

I don't see updated CQL.textile for CREATE MV on the branch. Coming soon?

Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax:

1. Missing the IF NOT EXISTS clause.
2. Has parentheses around the  list, but SELECT does not have that.
3. Unclear whether AS or functions are supported in the column name list, but 
selectStatement would certainly allow that.
4. Has FROM (), which should be FROM , I think.



was (Author: jkrupan):
I don't see updated CQL.textile for CREATE MV on the branch. Coming soon?

Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax:

1. Missing the IF NOT EXISTS clause.
2. Has parentheses around the  list, but SELECT does not have that.
3. Unclear whether AS on functions are supported in the column name list, but 
selectStatement would certainly allow that.
4. Has FROM (), which should be FROM , I think.


> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618763#comment-14618763
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

I don't see updated CQL.textile for CREATE MV on the branch. Coming soon?

Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax:

1. Missing the IF NOT EXISTS clause.
2. Has parentheses around the  list, but SELECT does not have that.
3. Unclear whether AS on functions are supported in the column name list, but 
selectStatement would certainly allow that.
4. Has FROM (), which should be FROM , I think.


> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-07 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616866#comment-14616866
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

1. Are MV updates still eventually consistent (not guaranteed)?

2. Is there any way for the app to assure that the MV update have been 
completed to some desired CL?

3. Will a repair to the base table assure that all MV are consistent?

4. Can a single MV be repaired to assure that it is consistent? (Especially 
since the data for a MV on a node will be derived from data on other nodes due 
to differences in the partition keys.)

Great to see such an exciting new feature take shape!


> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster

2015-05-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561752#comment-14561752
 ] 

Jack Krupansky commented on CASSANDRA-6977:
---

[~jasonstack], this issue was resolved as a duplicate of CASSANDRA-7444 which 
notes:

{quote}
The patch should change it from linear wrt the total number of tables in the 
schema, to linear wrt the number of tables in a keyspace. So if you are 
creating 1000s of tables in a single keyspace we expect no change at all.
{quote}


> attempting to create 10K column families fails with 100 node cluster
> 
>
> Key: CASSANDRA-6977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6977
> Project: Cassandra
>  Issue Type: Bug
> Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances
>Reporter: Daniel Meyer
>Assignee: Rocco Varela
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, 
> keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv
>
>
> During this test we are attempting to create a total of 1K keyspaces with 10 
> column families each to bring the total column families to 10K.  With a 5 
> node cluster this operation can be completed; however, it fails with 100 
> nodes.  Please see the two charts.  For the 5 node case the time required to 
> create each keyspace and subsequent 10 column families increases linearly 
> until the number of keyspaces is 1K.  For a 100 node cluster there is a 
> sudden increase in latency between 450 keyspaces and 550 keyspaces.  The test 
> ends when the test script times out.  After the test script times out it is 
> impossible to reconnect to the cluster with the datastax python driver 
> because it cannot connect to the host:
> cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', 
> {'10.199.5.98': OperationTimedOut()}
> It was found that running the following stress command does work from the 
> same machine the test script runs on.
> cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT
> It should be noted that this test was initially done with DSE 4.0 and c* 
> version 2.0.5.24 and in that case it was not possible to run stress against 
> the cluster even locally on a node due to not finding the host.
> Attached are system logs from one of the nodes, charts showing schema 
> creation latency for 5 and 100 node clusters and virtualvm tracer data for 
> cpu, memory, num_threads and gc runs, tpstat output and the test script.
> The test script was on an m1.large aws instance outside of the cluster under 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-05-22 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556055#comment-14556055
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

1. Has a decision been made on refresh modes? It sounds like the focus is on 
"always consistent", as opposed to manual refresh or one-time without refresh 
or on some time interval, but is that simply the default, preferred refresh 
mode, or the only mode that will be available (initially)?

2. What happens if an MV is created for a base table that is already populated? 
Will the operation block while all existing data is propagated to the MV, or 
will that propagation happen in the background (in which case, is there a way 
to monitor its status and completion?), or is that not supported (initially)?

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0 beta 1
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9435) Documetation bug

2015-05-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552493#comment-14552493
 ] 

Jack Krupansky commented on CASSANDRA-9435:
---

All of the doc references have a capital C in EC2 in the snitch class names, 
which the source code for the classes have as Ec2, with an uncapitalized c. 
Damn case-sensitivity!

For example,
http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchesAbout_c.html
http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architectureSnitchEC2_t.html

The EC2 multi-region snitch has the same issue.

> Documetation bug
> 
>
> Key: CASSANDRA-9435
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9435
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
> Environment: Debian 7
>Reporter: jincer
>Priority: Minor
> Fix For: 2.1.5
>
>
> Hello, you have some inaccuracy at docs on your website. 
> When i try to change snitch from default to EC2Snitch (endpoint_snitch: 
> EC2Snitch) i have message, that " Unable to find snitch class 
> 'org.apache.cassandra.locator.EC2Snitch' " .
> Only when i change "endpoint_snitch: EC2Snitch" to "endpoint_snitch: 
> Ec2Snitch" it has been started. 
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-05-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541814#comment-14541814
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

When exactly would population of the MV occur? What refresh options would 
initially be supported? Would population/refresh begin instantly when the MV is 
created, by default, or would an explicit command be required to begin 
population? Earlier I linked to the Oracle doc on MV, so a comparison to Oracle 
for refresh options might be nice, especially for users migrating from Oracle. 
Where would the state of refresh be stored, and how can a user monitor it? On 
each node of the base table?

PostgreSQL doesn't seem to have as many options:
http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html

With RF>1, which of the nodes containing a given token would push an update to 
the MV? All of them? Presumably the push can be token-aware, so that each push 
only goes to RF=n nodes based on the PK of the MV insert row. Would a 
consistency level be warranted for the push? Would there be hints as well? And 
repair of an MV if the rate of updates of the base table overwhelms the update 
bandwidth of the (many) MVs for the base table?

Any thoughts on throttling of the flow of updates from other nodes so that 
population of a MV does not overwhelm or interfere with normal cluster 
operation? What default, and what override? What would be a reasonable default, 
and what would be best practice advice for a maximum?

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-05-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540714#comment-14540714
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Back to the original description, will the revised MV purpose address the high 
cardinality issue? That may depend on what guidance the spec offers for how 
data modelers should set up the primary key columns in terms of partition (or 
routing!) columns vs. clustering columns.

Is the basic concept that although the selected rows might be scattered across 
multiple nodes in the base table, the goal is that they would cluster together 
on a single node for the MV table based on careful specification of partition 
key columns in the MV?

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-05-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540699#comment-14540699
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Is it fair to say that the primary technique for using this feature is to have 
one base table and n views of that table, each with a different selection of 
the base columns as the primary key of the view, with all rows selected, 
possibly projected differently but with different keys?

Would it also be sensible to select a subset of rows? Although that might 
confuse some users who might think it would give them sophisticated ad hoc 
queries when in fact the query column values are fixed. For example, select all 
rows for a specific state. In this way, it doesn't offer what a global index 
would offer.

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-05-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540664#comment-14540664
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Still waiting for an updated description for the ticket. In particular, what 
specific use cases is this feature designed to handle well, and is it 
definitively expert-only, or will there be use cases that are safe for normal 
users. The key thing (ha ha!) is whether this feature will provide capabilities 
to make it much easier for people to migrate from SQL to Cassandra in terms of 
the denormalization process, and do it in a way that people can pick up easily 
in Data Modeling 101 training. A couple of examples would help a lot - like 
test cases.

> Materialized Views (was: Global Indexes)
> 
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Global indexes

2015-04-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504086#comment-14504086
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

It would be helpful if someone were to update the description and primary use 
case(s) for this feature.

My understanding of the original use case was to avoid the fan out from the 
coordinator node on an indexed query - the global index would contain the 
partition keys for matched rows so that only the node(s) containing those 
partition key(s) would be needed. So, my question at this stage is whether the 
intention is that the initial cut of MV would include a focus on that 
performance optimization use case, or merely focus on the increased general 
flexibility of MV instead. Would the initial implementation of MV even 
necessarily use a GI? Would local vs. global index be an option to be specified?

Also, whether it is GI or MV, what guidance will the spec, doc, and training 
give users as to its performance and scalability? My concern with GI was that 
it works well for small to medium-sized clusters, but not with very large 
clusters. So, what would the largest cluster that a user could use a GI for? 
And also how many GI's make sense. For example, with 1 billion rows per node, 
and 50 nodes, and a GI on 10 columns, that would be... 1B * 50 * 10 = 500 
billion index entries on each node, right? Seems like a  bit much for a JVM 
heap or even off-heap memory. Maybe 500M * 20 * 4 = 40 billion index entries 
per node would be a wiser upper limit, and even that may be a bit extreme.




> Global indexes
> --
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Global indexes

2015-04-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503936#comment-14503936
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Oracle has lots of options for the REFRESH clause of the CREATE MATERIALIZED 
VIEW statement:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm

Notes on that syntax:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm#i2064161

Full MV syntax:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm

You can request that a materialized view be automatically refreshed when the 
base tables are updated using the "REFRESH ON COMMIT" option. The update 
transaction pauses while the views are updated - "Specify ON COMMIT to indicate 
that a fast refresh is to occur whenever the database commits a transaction 
that operates on a master table of the materialized view. This clause may 
increase the time taken to complete the commit, because the database performs 
the refresh operation as part of the commit process."

You can also refresh on time intervals, on demand, or no refresh ever. 
Originally MV was known as SNAPSHOT - a one-time snapshot of a view of the base 
tables/query.

Oracle has a FAST refresh, which depends on a MATERIALIZED VIEW LOG, which must 
be created for the base table(s). Otherwise a COMPLETE refresh is required.


> Global indexes
> --
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Global indexes

2015-04-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503273#comment-14503273
 ] 

Jack Krupansky commented on CASSANDRA-6477:
---

Why not call the feature "high cardinality index" since that's the use case it 
is focused on, right?

My personal preference would be to have a "cardinality" option clause with 
option values like "low", "medium", "high", and "unique". The default being 
"low". A global index would be implied for "high" and "unique" cardinality.

> Global indexes
> --
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Carl Yeksigian
>  Labels: cql
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8889) CQL spec is missing doc for support of bind variables for LIMIT, TTL, and TIMESTAMP

2015-03-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345711#comment-14345711
 ] 

Jack Krupansky commented on CASSANDRA-8889:
---

Thanks. The change for the special variable names looks fine, but the grammar 
for LIMIT, TTL, and TIMESTAMP still says "" - it needs to be "( 
 |  )".

> CQL spec is missing doc for support of bind variables for LIMIT, TTL, and 
> TIMESTAMP
> ---
>
> Key: CASSANDRA-8889
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8889
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Reporter: Jack Krupansky
>Assignee: Tyler Hobbs
>Priority: Minor
>
> CASSANDRA-4450 added the ability to specify a bind variable for the integer 
> value of a LIMIT, TTL, or TIMESTAMP option, but the CQL spec has not been 
> updated to reflect this enhancement.
> Also, the special predefined bind variable names are not documented in the 
> CQL spec: "[limit]", "[ttl]", and "[timestamp]".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8889) CQL spec is missing doc for support of bind variables for LIMIT, TTL, and TIMESTAMP

2015-03-02 Thread Jack Krupansky (JIRA)

Jack Krupansky created CASSANDRA-8889:
-

 Summary: CQL spec is missing doc for support of bind variables for 
LIMIT, TTL, and TIMESTAMP
 Key: CASSANDRA-8889
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8889
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation & website
Reporter: Jack Krupansky
Priority: Minor


CASSANDRA-4450 added the ability to specify a bind variable for the integer 
value of a LIMIT, TTL, or TIMESTAMP option, but the CQL spec has not been 
updated to reflect this enhancement.

Also, the special predefined bind variable names are not documented in the CQL 
spec: "[limit]", "[ttl]", and "[timestamp]".




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8814) Formatting of code blocks in CQL doc in github is a little messed up

2015-02-16 Thread Jack Krupansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CASSANDRA-8814:
--
Description: 
Although the html version of the CQL doc on the website looks fine, the textile 
conversion of the source files in github looks a little messed up. In 
particular, the "p." paragraph directives that terminate "bc.." block code 
directives are not properly recognized and then the following text gets 
subsumed into the code block. The directives look fine, as per my read of the 
textile doc, but it appears that the textile converter used by github requires 
that there be a blank line before the "p." directive to end the code block. It 
also requires a space after the dot for "p. ".

If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will 
see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, 
but only where the syntax code block was multiple lines. This is not a problem 
where the "bc." directive is used with a single dot for a single line, as 
opposed to the "bc.." directive used with a double dot for a block of lines. Or 
in the case of the CREATE KEYSPACE section you see all of the notes crammed 
into what should be the "Sample" box.

See:
https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile

This problem ("p." not recognized to terminate a code block unless followed by 
a space and preceded by a blank line) actually occurs for the interactive 
textile formatter as well:
http://txstyle.org/doc/4/block-code


  was:
Although the html version of the CQL doc on the website looks fine, the textile 
conversion of the source files in github looks a little messed up. In 
particular, the "p." paragraph directives that terminate "bc.." block code 
directives are not properly recognized and then the following text gets 
subsumed into the code block. The directives look fine, as per my read of the 
textile doc, but it appears that the textile converter used by github requires 
that there be a blank line before the "p." directive to end the code block. It 
also requires a space after the dot for "p. ".

If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will 
see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, 
but only where the syntax code block was multiple lines. This is not a problem 
where the "bc." directive is used with a single dot for a single line, as 
opposed to the "bc.." directive used with a double dot for a block of lines. Or 
in the case of the CREATE KEYSPACE section you see all of the notes crammed 
into what should be the "Sample" box.

See:
https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile

This problem ("p." not recognized to termined a code block unless followed by a 
space and preceded by a blank line) actually occurs for the interactive textile 
formatter as well:
http://txstyle.org/doc/4/block-code



> Formatting of code blocks in CQL doc in github is a little messed up
> 
>
> Key: CASSANDRA-8814
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8814
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation & website
>Reporter: Jack Krupansky
>Priority: Minor
>
> Although the html version of the CQL doc on the website looks fine, the 
> textile conversion of the source files in github looks a little messed up. In 
> particular, the "p." paragraph directives that terminate "bc.." block code 
> directives are not properly recognized and then the following text gets 
> subsumed into the code block. The directives look fine, as per my read of the 
> textile doc, but it appears that the textile converter used by github 
> requires that there be a blank line before the "p." directive to end the code 
> block. It also requires a space after the dot for "p. ".
> If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you 
> will see stray "p." directives as well as "\_\_Sample\_\_" text in the code 
> blocks, but only where the syntax code block was multiple lines. This is not 
> a problem where the "bc." directive is used with a single dot for a single 
> line, as opposed to the "bc.." directive used with a double dot for a block 
> of lines. Or in the case of the CREATE KEYSPACE section you see all of the 
> notes crammed into what should be the "Sample" box.
> See:
> https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile
> https://github.co

[jira] [Created] (CASSANDRA-8814) Formatting of code blocks in CQL doc in github is a little messed up

2015-02-16 Thread Jack Krupansky (JIRA)

Jack Krupansky created CASSANDRA-8814:
-

 Summary: Formatting of code blocks in CQL doc in github is a 
little messed up
 Key: CASSANDRA-8814
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8814
 Project: Cassandra
  Issue Type: Task
  Components: Documentation & website
Reporter: Jack Krupansky
Priority: Minor


Although the html version of the CQL doc on the website looks fine, the textile 
conversion of the source files in github looks a little messed up. In 
particular, the "p." paragraph directives that terminate "bc.." block code 
directives are not properly recognized and then the following text gets 
subsumed into the code block. The directives look fine, as per my read of the 
textile doc, but it appears that the textile converter used by github requires 
that there be a blank line before the "p." directive to end the code block. It 
also requires a space after the dot for "p. ".

If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will 
see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, 
but only where the syntax code block was multiple lines. This is not a problem 
where the "bc." directive is used with a single dot for a single line, as 
opposed to the "bc.." directive used with a double dot for a block of lines. Or 
in the case of the CREATE KEYSPACE section you see all of the notes crammed 
into what should be the "Sample" box.

See:
https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile
https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile

This problem ("p." not recognized to termined a code block unless followed by a 
space and preceded by a blank line) actually occurs for the interactive textile 
formatter as well:
http://txstyle.org/doc/4/block-code




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8135) documentation missing for CONTAINS keyword

2015-02-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322240#comment-14322240
 ] 

Jack Krupansky commented on CASSANDRA-8135:
---

That web page appears to be the 2.0 doc, while CONTAINS is in 2.1.

The 2.0 doc:
https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile

The 2.1 doc:
https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

And of course you can always consult the DataStax CQL doc:
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Although it does seem odd that DataStax has that as "3.1" even thought the CQL 
doc for 2.1 should be CQL 3.2.0.

Hey, [~thobbs], since you did some of the recent edits to the CQL spec, I'm 
curious what the process is for deciding which CQL version doc should be posted 
on the Apache Cassandra site in that doc directory. I would think that CQL doc 
for both 2.0 and 2.1 should be published. Maybe more to the point, the doc/spec 
fo CQL 3.1.7 and 3.2.0 should be published since both C* 2.0 and 2.1 are 
commonly used. And DataStax should also have doc for both CQL 3.1.7 and 3.2.0 - 
I mean, having doc for CONTAINS doesn't help a DSE customer since DSE doesn't 
support C* 2.1 yet.



> documentation missing for CONTAINS keyword
> --
>
> Key: CASSANDRA-8135
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8135
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Documentation & website
>Reporter: Jon Haddad
>
> the contains keyword was covered in this blog entry 
> http://www.datastax.com/dev/blog/cql-in-2-1 but is missing from the 
> documentation https://cassandra.apache.org/doc/cql3/CQL.html#collections



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8538) Clarify default write time timestamp for CQL insert and update

2014-12-27 Thread Jack Krupansky (JIRA)

Jack Krupansky created CASSANDRA-8538:
-

 Summary: Clarify default write time timestamp for CQL insert and 
update
 Key: CASSANDRA-8538
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8538
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation & website
Reporter: Jack Krupansky
Priority: Minor


The current CQL spec (and downstream doc) says that the default timestamp for 
write time for CQL inserts and updates is "the current time of the insertion", 
but that is somewhat vague and non-specific. In particular, is that the time 
when the coordinator node parses the CQL statement, or... when the row is 
inserted or updated on the target nodes, or... something else?

In particular, if the coordinator doesn't own the token of the primary key, 
will the owning node set the write time or does the coordinator node do that?

Obviously the application can set an explicit TIMESTAMP, but this issue is 
concerned with the default if that explicit option is not used.

Also, will all replicas of the insert or update share the precisely same write 
time, or will they reflect the actual time when each particular replica row is 
inserted or updated on each of the replica nodes?

Finally, if a batch statement is used to insert or update multiple rows, will 
they all share the same write time (e.g., the time the batch statement was 
parsed) or when each replica row is actually inserted or updated on the target 
(if the coordinator node does not own the token of the partition key) or 
replica nodes?

It would also be helpful if the tracing option was specific as to which time is 
the official write time for the insert or update.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7769) Implement pg-style dollar syntax for string constants

2014-09-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121468#comment-14121468
 ] 

Jack Krupansky commented on CASSANDRA-7769:
---

Those patterns don't seem to recognize empty string sequences or non-name 
sequences for the delimiter marker. The PG rules allow both. Or even single 
letter sequences, for that matter. Or... upper case. It would be good to list a 
set of test use cases, which can also be included in doc.

> Implement pg-style dollar syntax for string constants
> -
>
> Key: CASSANDRA-7769
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7769
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 7769.txt, 7769v2.txt
>
>
> Follow-up of CASSANDRA-7740:
> {{$function$...$function$}} in addition to string style variant.
> See also 
> http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-09-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118109#comment-14118109
 ] 

Jack Krupansky commented on CASSANDRA-7447:
---

Thanks, [~benedict], so Just to paraphrase, "columnar" is referring to the CQL 
use of "clustering columns" - multiple/many CQL rows per partition, and 
"row-oriented" is referring to a primary key consisting of only partition key 
columns with no clustering columns, so that "row-oriented" means only a single 
CQL row per partition, right?

One clarification, does the delta encoding require that each CQL row have only 
one column, so that each adjacent "cell" is for the same CQL column, or... is 
delta-coding effective when the CQL row has a sequence of columns that map to a 
repeating sequence of adjacent cells, but the cells for a particular CQL column 
are never immediately adjacent in the partition?


> New sstable format with support for columnar layout
> ---
>
> Key: CASSANDRA-7447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance, storage
> Fix For: 3.0
>
> Attachments: ngcc-storage.odp
>
>
> h2. Storage Format Proposal
> C* has come a long way over the past few years, and unfortunately our storage 
> format hasn't kept pace with the data models we are now encouraging people to 
> utilise. This ticket proposes a collections of storage primitives that can be 
> combined to serve these data models more optimally.
> It would probably help to first state the data model at the most abstract 
> level. We have a fixed three-tier structure: We have the partition key, the 
> clustering columns, and the data columns. Each have their own characteristics 
> and so require their own specialised treatment.
> I should note that these changes will necessarily be delivered in stages, and 
> that we will be making some assumptions about what the most useful features 
> to support initially will be. Any features not supported will require 
> sticking with the old format until we extend support to all C* functionality.
> h3. Partition Key
> * This really has two components: the partition, and the value. Although the 
> partition is primarily used to distribute across nodes, it can also be used 
> to optimise lookups for a given key within a node
> * Generally partitioning is by hash, and for the moment I want to focus this 
> ticket on the assumption that this is the case
> * Given this, it makes sense to optimise our storage format to permit O(1) 
> searching of a given partition. It may be possible to achieve this with 
> little overhead based on the fact we store the hashes in order and know they 
> are approximately randomly distributed, as this effectively forms an 
> immutable contiguous split-ordered list (see Shalev/Shavit, or 
> CASSANDRA-7282), so we only need to store an amount of data based on how 
> imperfectly distributed the hashes are, or at worst a single value per block.
> * This should completely obviate the need for a separate key-cache, which 
> will be relegated to supporting the old storage format only
> h3. Primary Key / Clustering Columns
> * Given we have a hierarchical data model, I propose the use of a 
> cache-oblivious trie
> * The main advantage of the trie is that it is extremely compact and 
> _supports optimally efficient merges with other tries_ so that we can support 
> more efficient reads when multiple sstables are touched
> * The trie will be preceded by a small amount of related data; the full 
> partition key, a timestamp epoch (for offset-encoding timestamps) and any 
> other partition level optimisation data, such as (potentially) a min/max 
> timestamp to abort merges earlier
> * Initially I propose to limit the trie to byte-order comparable data types 
> only (the number of which we can expand through translations of the important 
> types that are not currently)
> * Crucially the trie will also encapsulate any range tombstones, so that 
> these are merged early in the process and avoids re-iterating the same data
> * Results in true bidirectional streaming without having to read entire range 
> into memory
> h3. Values
> There are generally two approaches to storing rows of data: columnar, or 
> row-oriented. The above two data structures can be combined with a value 
> storage scheme that is based on either. However, given the current model we 
> have of reading large 64Kb blocks for any read, I am inclined to focus on 
> columnar support first, as this delivers order-of-magnitude benefits to those 
> users with the correct workload, while for most workloads our 64Kb blocks are 
> large enough to store row-orie

[jira] [Comment Edited] (CASSANDRA-7855) Genralize use of IN for compound partition keys

2014-09-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117360#comment-14117360
 ] 

Jack Krupansky edited comment on CASSANDRA-7855 at 9/1/14 1:34 PM:
---

bq. not necessary a better idea than parallelizing queries server side

Server side? Or should that be *client* side?

Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. 
"given". And earlier in the description "later" s.b. "latter". And even 
earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only 
support an IN".



was (Author: jkrupan):
bq. not necessary a better idea than parallelizing queries server side

Server side? Or should that be CLIENT side?

Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. 
"given". And earlier in the description "later" s.b. "latter". And even 
earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only 
support an IN".


> Genralize use of IN for compound partition keys
> ---
>
> Key: CASSANDRA-7855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7855
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: cql
> Fix For: 2.0.11
>
>
> When you have a compount partition key, we currently only support to have a 
> {{IN}} on the last column of that partition key. So given:
> {noformat}
> CREATE TABLE foo (
> k1 int,
> k2 int,
> v int,
> PRIMARY KEY ((k1, k2))
> )
> {noformat}
> we allow
> {noformat}
> SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2)
> {noformat}
> but not
> {noformat}
> SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2)
> {noformat}
> There is no particular reason for us not supporting the later (to the best of 
> my knowledge) since it's reasonably straighforward, so we should fix it.
> I'll note that using {{IN}} on a partition key is not necessarily a better 
> idea than parallelizing queries server client side so this syntax, when 
> introduced, should probably be used sparingly, but given we do support IN on 
> partition keys, I see no reason not to extend it to compound PK properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-09-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117375#comment-14117375
 ] 

Jack Krupansky commented on CASSANDRA-7447:
---

I'm also interested in the impact of the proposal on storage and performance on:

1. Secondary indexes - either native Cassandra, manually maintained index 
tables, or even DSE/Solr indexing. Would it make them faster, more compact, 
less needed, or no net impact?

2. Filtering - Would it enable more/faster filtering, discourage it, or no net 
change?


> New sstable format with support for columnar layout
> ---
>
> Key: CASSANDRA-7447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance, storage
> Fix For: 3.0
>
> Attachments: ngcc-storage.odp
>
>
> h2. Storage Format Proposal
> C* has come a long way over the past few years, and unfortunately our storage 
> format hasn't kept pace with the data models we are now encouraging people to 
> utilise. This ticket proposes a collections of storage primitives that can be 
> combined to serve these data models more optimally.
> It would probably help to first state the data model at the most abstract 
> level. We have a fixed three-tier structure: We have the partition key, the 
> clustering columns, and the data columns. Each have their own characteristics 
> and so require their own specialised treatment.
> I should note that these changes will necessarily be delivered in stages, and 
> that we will be making some assumptions about what the most useful features 
> to support initially will be. Any features not supported will require 
> sticking with the old format until we extend support to all C* functionality.
> h3. Partition Key
> * This really has two components: the partition, and the value. Although the 
> partition is primarily used to distribute across nodes, it can also be used 
> to optimise lookups for a given key within a node
> * Generally partitioning is by hash, and for the moment I want to focus this 
> ticket on the assumption that this is the case
> * Given this, it makes sense to optimise our storage format to permit O(1) 
> searching of a given partition. It may be possible to achieve this with 
> little overhead based on the fact we store the hashes in order and know they 
> are approximately randomly distributed, as this effectively forms an 
> immutable contiguous split-ordered list (see Shalev/Shavit, or 
> CASSANDRA-7282), so we only need to store an amount of data based on how 
> imperfectly distributed the hashes are, or at worst a single value per block.
> * This should completely obviate the need for a separate key-cache, which 
> will be relegated to supporting the old storage format only
> h3. Primary Key / Clustering Columns
> * Given we have a hierarchical data model, I propose the use of a 
> cache-oblivious trie
> * The main advantage of the trie is that it is extremely compact and 
> _supports optimally efficient merges with other tries_ so that we can support 
> more efficient reads when multiple sstables are touched
> * The trie will be preceded by a small amount of related data; the full 
> partition key, a timestamp epoch (for offset-encoding timestamps) and any 
> other partition level optimisation data, such as (potentially) a min/max 
> timestamp to abort merges earlier
> * Initially I propose to limit the trie to byte-order comparable data types 
> only (the number of which we can expand through translations of the important 
> types that are not currently)
> * Crucially the trie will also encapsulate any range tombstones, so that 
> these are merged early in the process and avoids re-iterating the same data
> * Results in true bidirectional streaming without having to read entire range 
> into memory
> h3. Values
> There are generally two approaches to storing rows of data: columnar, or 
> row-oriented. The above two data structures can be combined with a value 
> storage scheme that is based on either. However, given the current model we 
> have of reading large 64Kb blocks for any read, I am inclined to focus on 
> columnar support first, as this delivers order-of-magnitude benefits to those 
> users with the correct workload, while for most workloads our 64Kb blocks are 
> large enough to store row-oriented data in a column-oriented fashion without 
> any performance degradation (I'm happy to consign very large row support to 
> phase 2). 
> Since we will most likely target both behaviours eventually, I am currently 
> inclined to suggest that static columns, sets and maps be targeted for a 
> row-oriented release, as they don't naturally fit in a columna

[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-09-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117373#comment-14117373
 ] 

Jack Krupansky commented on CASSANDRA-7447:
---

Can you guys provide two very simple table definitions and corresponding CQL 
queries that exemplify row vs. columnar storage and processing optimality?

IOW, the two key test cases that would confirm the extent to which the goals of 
this issue are met, although that might include a narrative about how much 
updating and deletions are impacting storage and performance as well.

> New sstable format with support for columnar layout
> ---
>
> Key: CASSANDRA-7447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance, storage
> Fix For: 3.0
>
> Attachments: ngcc-storage.odp
>
>
> h2. Storage Format Proposal
> C* has come a long way over the past few years, and unfortunately our storage 
> format hasn't kept pace with the data models we are now encouraging people to 
> utilise. This ticket proposes a collections of storage primitives that can be 
> combined to serve these data models more optimally.
> It would probably help to first state the data model at the most abstract 
> level. We have a fixed three-tier structure: We have the partition key, the 
> clustering columns, and the data columns. Each have their own characteristics 
> and so require their own specialised treatment.
> I should note that these changes will necessarily be delivered in stages, and 
> that we will be making some assumptions about what the most useful features 
> to support initially will be. Any features not supported will require 
> sticking with the old format until we extend support to all C* functionality.
> h3. Partition Key
> * This really has two components: the partition, and the value. Although the 
> partition is primarily used to distribute across nodes, it can also be used 
> to optimise lookups for a given key within a node
> * Generally partitioning is by hash, and for the moment I want to focus this 
> ticket on the assumption that this is the case
> * Given this, it makes sense to optimise our storage format to permit O(1) 
> searching of a given partition. It may be possible to achieve this with 
> little overhead based on the fact we store the hashes in order and know they 
> are approximately randomly distributed, as this effectively forms an 
> immutable contiguous split-ordered list (see Shalev/Shavit, or 
> CASSANDRA-7282), so we only need to store an amount of data based on how 
> imperfectly distributed the hashes are, or at worst a single value per block.
> * This should completely obviate the need for a separate key-cache, which 
> will be relegated to supporting the old storage format only
> h3. Primary Key / Clustering Columns
> * Given we have a hierarchical data model, I propose the use of a 
> cache-oblivious trie
> * The main advantage of the trie is that it is extremely compact and 
> _supports optimally efficient merges with other tries_ so that we can support 
> more efficient reads when multiple sstables are touched
> * The trie will be preceded by a small amount of related data; the full 
> partition key, a timestamp epoch (for offset-encoding timestamps) and any 
> other partition level optimisation data, such as (potentially) a min/max 
> timestamp to abort merges earlier
> * Initially I propose to limit the trie to byte-order comparable data types 
> only (the number of which we can expand through translations of the important 
> types that are not currently)
> * Crucially the trie will also encapsulate any range tombstones, so that 
> these are merged early in the process and avoids re-iterating the same data
> * Results in true bidirectional streaming without having to read entire range 
> into memory
> h3. Values
> There are generally two approaches to storing rows of data: columnar, or 
> row-oriented. The above two data structures can be combined with a value 
> storage scheme that is based on either. However, given the current model we 
> have of reading large 64Kb blocks for any read, I am inclined to focus on 
> columnar support first, as this delivers order-of-magnitude benefits to those 
> users with the correct workload, while for most workloads our 64Kb blocks are 
> large enough to store row-oriented data in a column-oriented fashion without 
> any performance degradation (I'm happy to consign very large row support to 
> phase 2). 
> Since we will most likely target both behaviours eventually, I am currently 
> inclined to suggest that static columns, sets and maps be targeted for a 
> row-oriented release, as they don't

[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys

2014-09-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117363#comment-14117363
 ] 

Jack Krupansky commented on CASSANDRA-7855:
---

And terminology - this improvement is for "composite partition keys" - 
"compound" refers to "primary keys" with clustering columns rather than just 
the partition key portion of the primary key.


> Genralize use of IN for compound partition keys
> ---
>
> Key: CASSANDRA-7855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7855
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: cql
> Fix For: 2.0.11
>
>
> When you have a compount partition key, we currently only support to have a 
> {{IN}} on the last column of that partition key. So given:
> {noformat}
> CREATE TABLE foo (
> k1 int,
> k2 int,
> v int,
> PRIMARY KEY ((k1, k2))
> )
> {noformat}
> we allow
> {noformat}
> SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2)
> {noformat}
> but not
> {noformat}
> SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2)
> {noformat}
> There is no particular reason for us not supporting the later (to the best of 
> my knowledge) since it's reasonably straighforward, so we should fix it.
> I'll note that using {{IN}} on a partition key is not necessary a better idea 
> than parallelizing queries server side so this syntax, when introduced, 
> should probably be used sparingly, but give we do support IN on partition 
> keys, I see no reason not to extend it to compound PK properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys

2014-09-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117360#comment-14117360
 ] 

Jack Krupansky commented on CASSANDRA-7855:
---

bq. not necessary a better idea than parallelizing queries server side

Server side? Or should that be CLIENT side?

Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. 
"given". And earlier in the description "later" s.b. "latter". And even 
earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only 
support an IN".


> Genralize use of IN for compound partition keys
> ---
>
> Key: CASSANDRA-7855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7855
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: cql
> Fix For: 2.0.11
>
>
> When you have a compount partition key, we currently only support to have a 
> {{IN}} on the last column of that partition key. So given:
> {noformat}
> CREATE TABLE foo (
> k1 int,
> k2 int,
> v int,
> PRIMARY KEY ((k1, k2))
> )
> {noformat}
> we allow
> {noformat}
> SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2)
> {noformat}
> but not
> {noformat}
> SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2)
> {noformat}
> There is no particular reason for us not supporting the later (to the best of 
> my knowledge) since it's reasonably straighforward, so we should fix it.
> I'll note that using {{IN}} on a partition key is not necessary a better idea 
> than parallelizing queries server side so this syntax, when introduced, 
> should probably be used sparingly, but give we do support IN on partition 
> keys, I see no reason not to extend it to compound PK properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7813) Decide how to deal with conflict between native and user-defined functions

2014-08-24 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108411#comment-14108411
 ] 

Jack Krupansky commented on CASSANDRA-7813:
---

+0 for requiring empty namespace. It assures that a conflict with native 
functions is avoided, but... adds to the burden of people migrating from RDBMS 
SQL and makes CQL doc and training materials look overly complex. A top goal 
should be to make CQL look a lot easier and more approachable, not more 
complicated.

CQL doesn't have to 100% mimic SQL, but at least learn from it and avoid 
reinventing the horse unless there is a clear and compelling benefit to a wide 
range of developers. If CQL is going to detour from SQL for some reason, at 
least clearly refer to the specific SQL rule and the specific rationale for 
doing so. Being "better" than SQL is a positive, but merely being "different" 
is a distinct negative, IMHO.

I would be +1 for ALLOWING empty namespace to override native functions as long 
as it is not absolutely required.

Compromise: produce a semi-noisy WARNING whenever an unqualified UDF is used, 
and then let the developer set an option to disable that warning. That way, 
newbie app developers will at least start out with the caution to be careful 
with unqualified UDF references, but still be able to write clean and simple 
CQL.

Another possibility is to suggest a naming convention, such as an application 
prefix such as "u_" or "u." which would be unlikely in any future CQL native 
functions.

The fact that CQL does not have a stable collection of native functions is a 
distinct negative for the project - makes it seem rather immature compared to 
RDBMS SQL. Maybe... come up with a roadmap for obvious future enhancements and 
then reserve those names, or at least give a noisy warning that the names can 
be overridden, but at the risk of future upgrade incompatibility.


> Decide how to deal with conflict between native and user-defined functions
> --
>
> Key: CASSANDRA-7813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7813
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>  Labels: cql
> Fix For: 3.0
>
>
> We have a bunch of native/hardcoded functions (now(), dateOf(), ...) and in 
> 3.0, user will be able to define new functions. Now, there is a very high 
> change that we will provide more native functions over-time (to be clear, I'm 
> not particularly for adding native functions for allthethings just because we 
> can, but it's clear that we should ultimately provide more than what we 
> have). Which begs the question: how do we want to deal with the problem of 
> adding a native function potentially breaking a previously defined 
> user-defined function?
> A priori I see the following options (maybe there is more?):
> # don't do anything specific, hoping that it won't happen often and consider 
> it a user problem if it does.
> # reserve a big number of names that we're hoping will cover all future need.
> # make native function and user-defined function syntactically distinct so it 
> cannot happen.
> I'm not a huge fan of solution 1). Solution 2) is actually what we did for 
> UDT but I think it's somewhat less practical here: there is so much types 
> that it makes sense to provide natively and so it wasn't too hard to come up 
> with a reasonably small list of types name to reserve just in case. This 
> feels a lot harder for functions to me.
> Which leaves solution 3). Since we already have the concept of namespaces for 
> functions, a simple idea would be to force user function to have namespace. 
> We could even allow that namespace to be empty as long as we force the 
> namespace separator (so we'd allow {{bar::foo}} and {{::foo}} for user 
> functions, but *not* {{foo}} which would be reserved for native function).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7740) Parsing of UDF body is broken

2014-08-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092746#comment-14092746
 ] 

Jack Krupansky commented on CASSANDRA-7740:
---

PostgreSQL-style dollar-quoted string constants would be nice. See:
http://www.postgresql.org/docs/8.2/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

But as [~slebresne] suggested, that should be a global feature of string 
constants, not limited to this feature.

> Parsing of UDF body is broken
> -
>
> Key: CASSANDRA-7740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7740
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>
> The parsing of function body introduced by CASSANDRA-7395 is somewhat broken. 
> It blindly parse everything up to {{END_BODY}}, which as 2 problems:
> # it parse function body as if it was part of the CQL syntax, so anything 
> that don't happen to be a valid CQL token won't even parse.
> # something like
> {noformat}
> CREATE FUNCTION foo() RETURNS text LANGUAGE JAVA BODY return "END_BODY"; 
> END_BODY;
> {noformat}
> will not parse correctly.
> I don't think we can accept random syntax like that. A better solution (which 
> is the one Postgresql uses) is to pass the function body as a normal string. 
> And in fact I'd be in favor of reusing Postgresql syntax (because why not), 
> that is to have:
> {noformat}
> CREATE FUNCTION foo() RETURNS text LANGUAGE JAVA AS 'return "END_BODY"';
> {noformat}
> One minor annoyance might be, for certain languages, the necessity to double 
> every quote inside the string. But in a separate ticket we could introduce 
> Postregsql solution of adding an [alternate syntax for string 
> constants|http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6377) ALLOW FILTERING should allow seq scan filtering

2014-08-06 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087826#comment-14087826
 ] 

Jack Krupansky commented on CASSANDRA-6377:
---

[~ztyx], here's the relevant restriction from the DataStax CQL doc:

bq. The WHERE clause is composed of conditions on the columns that are part of 
the primary key or are indexed.

See:
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

And the same restriction from the project spec for CQL3:

bq. The  specifies which rows must be queried. It is composed of 
relations on the columns that are part of the PRIMARY KEY and/or have a 
secondary index defined on them.

See:
https://cassandra.apache.org/doc/cql3/CQL.html

Was there part of either of those two statements that is maybe worded too 
vaguely, or is the issue how you would have found those statements more easily? 
Improving doc usability is a priority.


> ALLOW FILTERING should allow seq scan filtering
> ---
>
> Key: CASSANDRA-6377
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6377
> Project: Cassandra
>  Issue Type: Bug
>  Components: API
>Reporter: Jonathan Ellis
>Assignee: Sylvain Lebresne
>  Labels: cql
> Fix For: 3.0
>
>
> CREATE TABLE emp_table2 (
> empID int PRIMARY KEY,
> firstname text,
> lastname text,
> b_mon text,
> b_day text,
> b_yr text,
> );
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>VALUES (100,'jane','doe','oct','31','1980');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>VALUES (101,'john','smith','jan','01','1981');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>VALUES (102,'mary','jones','apr','15','1982');
> INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) 
>VALUES (103,'tim','best','oct','25','1982');
>
> SELECT b_mon,b_day,b_yr,firstname,lastname FROM emp_table2 
> WHERE b_mon='oct' ALLOW FILTERING;
> Bad Request: No indexed columns present in by-columns clause with Equal 
> operator



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7683) Always allow CREATE TABLE IF NOT EXISTS if it exists

2014-08-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085517#comment-14085517
 ] 

Jack Krupansky commented on CASSANDRA-7683:
---

Could you at least update the CQL3 spec to indicate the actual semantics, as I 
have suggested?

> Always allow CREATE TABLE IF NOT EXISTS if it exists
> 
>
> Key: CASSANDRA-7683
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7683
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Jens Rantil
>Priority: Minor
>
> Background: I have a table that I'd like to make sure exists when I boot up 
> my application. To make the life easier for our developers I execute an 
> `ALTER TABLE IF EXISTS`.
> In production I am using user based authorization and for security reasons 
> regular production users are not allowed to CREATE TABLEs.
> Problem: When a user without CREATE permission executes `ALTER TABLE IF 
> EXISTS` for a table that already exists, the command fails telling me the 
> user is not allowed to execute `CREATE TABLE`. It feels kinda ridiculous that 
> this fails when I'm not actually creating the table.
> Proposal: That the permission check only should be done if the table is only 
> actually to be created. 
> Workaround: Right now, I have a boolean that checks if in production and in 
> that case don't try to create the table. Another approach would be to 
> manually check if the table exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7683) Always allow CREATE TABLE IF NOT EXISTS if it exists

2014-08-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084605#comment-14084605
 ] 

Jack Krupansky commented on CASSANDRA-7683:
---

bq.  I execute an `ALTER TABLE IF EXISTS`.

Ummm... there's no such command, at least in the CQL3 spec! I suspect that you 
simply meant "CREATE TABLE IF NOT EXISTS".

Assumming that, I think the CQL3 spec suggests that you should indeed be able 
to do what you suggest - or the spec needs to be revised to specifically 
disallow it:

{code}
Attempting to create an already existing table will return an error unless the 
IF NOT EXISTS option is used. If it is used, the statement will be a no-op if 
the table already exists.
{code}

So, unless, somebody wants to propose changing that second sentence to "If it 
is used, the statement will be a no-op if the table already exists, unless the 
user does not have CREATE permission, in which case the request will return an 
error" the Wish should be considered reasonable.

Personally, this one seems to be in a very gray area - fielder's choice, flip a 
coin.

Maybe the proper argument to make here is that the user wishes to have a single 
script that can be used by a range of users and for completeness includes the 
CREATE TABLE so it can be used for initial as well as incremental operations. 
It that context it would make sense, but... I may be reading too much into the 
users' intentions!


> Always allow CREATE TABLE IF NOT EXISTS if it exists
> 
>
> Key: CASSANDRA-7683
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7683
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Jens Rantil
>Priority: Minor
>
> Background: I have a table that I'd like to make sure exists when I boot up 
> my application. To make the life easier for our developers I execute an 
> `ALTER TABLE IF EXISTS`.
> In production I am using user based authorization and for security reasons 
> regular production users are not allowed to CREATE TABLEs.
> Problem: When a user without CREATE permission executes `ALTER TABLE IF 
> EXISTS` for a table that already exists, the command fails telling me the 
> user is not allowed to execute `CREATE TABLE`. It feels kinda ridiculous that 
> this fails when I'm not actually creating the table.
> Proposal: That the permission check only should be done if the table is only 
> actually to be created. 
> Workaround: Right now, I have a boolean that checks if in production and in 
> that case don't try to create the table. Another approach would be to 
> manually check if the table exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)

2014-08-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084005#comment-14084005
 ] 

Jack Krupansky commented on CASSANDRA-7395:
---

In RDBMS terminology, these are strictly "single-row functions", as opposed to 
"aggregate functions", correct?

Not that aggregate UDFs wouldn't be useful, but they are more complex.

See:
http://docs.oracle.com/database/121/SQLRF/functions002.htm


> Support for pure user-defined functions (UDF)
> -
>
> Key: CASSANDRA-7395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7395
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
>Assignee: Robert Stupp
>  Labels: cql
> Fix For: 3.0
>
> Attachments: 7395-dtest.txt, 7395.txt, udf-create-syntax.png, 
> udf-drop-syntax.png
>
>
> We have some tickets for various aspects of UDF (CASSANDRA-4914, 
> CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of 
> ocean-boiling.
> Let's start with something simple: allowing pure user-defined functions in 
> the SELECT clause of a CQL query.  That's it.
> By "pure" I mean, must depend only on the input parameters.  No side effects. 
>  No exposure to C* internals.  Column values in, result out.  
> http://en.wikipedia.org/wiki/Pure_function



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement

2014-07-31 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081542#comment-14081542
 ] 

Jack Krupansky commented on CASSANDRA-7654:
---

As far as the cqlsh angle, copying from a CSV file might be a lot more 
convenient anyway, if inserting more than just a very few rows.

> CQL INSERT improvement
> --
>
> Key: CASSANDRA-7654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7654
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Priority: Minor
>
> It would be nice to be able to add multiple rows using a single {{INSERT}}.
> Restricted to the same partition.
> For example:
> Current behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 1, 41);
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 2, 42);
> {noformat}
> Wanted behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES
> ('foo', 1, 41),
> ('foo', 2, 42),
> ('foo', 3, 42),
> ('foo', 4, 42);
> {noformat}
> Assumed table def:
> {noformat}
> CREATE TABLE comp_key (
>   key TEXT,
>   clust   INT,
>   num_val DECIMAL,
>   PRIMARY KEY ( key, clust )
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement

2014-07-31 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081511#comment-14081511
 ] 

Jack Krupansky commented on CASSANDRA-7654:
---

This appears to be the duplicate - CASSANDRA-5959 - "CQL3 support for 
multi-column insert in a single operation (Batch Insert / Batch Mutate)", which 
was in turn resolved as a duplicate of CASSANDRA-4693 - "CQL Protocol should 
allow multiple PreparedStatements to be atomically executed", which is for the 
feature that [~tjake] referenced.

It used a slightly different syntax, factoring out the partition key:

{code}
insert into results (row_id, (index,value)) values 
((0,text0), (1,text1), (2,text2), ..., (N,textN));
{code}

Which highlights the fact that the example in this issue did not even have the 
partition key specified in either the primary key or the insert column list.

For convenient (future) reference, the batch prepared statement replaces:

{code}
PreparedStatement ps = session.prepare(
   "BEGIN BATCH" +
   "   INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, 
?);" +
   "   INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, 
?);" +
   "   INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, 
?);" +
   "APPLY BATCH"
);
session.execute(ps.bind(uid, mid1, title1, body1, uid, mid2, title2, body2, 
uid, mid3, title3, body3));
{code}

with
{code}
PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, 
title, body) VALUES (?, ?, ?, ?)");
BatchStatement batch = new BatchStatement();
batch.add(ps.bind(uid, mid1, title1, body1));
batch.add(ps.bind(uid, mid2, title2, body2));
batch.add(ps.bind(uid, mid3, title3, body3));
session.execute(batch);
{code}

Granted, that doesn't help with cqlsh. It also doesn't help with DevCenter 
either, right?


> CQL INSERT improvement
> --
>
> Key: CASSANDRA-7654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7654
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Priority: Minor
>
> It would be nice to be able to add multiple rows using a single {{INSERT}}.
> Restricted to the same partition.
> For example:
> Current behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 1, 41);
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 2, 42);
> {noformat}
> Wanted behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES
> ('foo', 1, 41),
> ('foo', 2, 42),
> ('foo', 3, 42),
> ('foo', 4, 42);
> {noformat}
> Assumed table def:
> {noformat}
> CREATE TABLE comp_key (
>   key TEXT,
>   clust   INT,
>   num_val DECIMAL,
>   PRIMARY KEY ( key, clust )
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement

2014-07-31 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080766#comment-14080766
 ] 

Jack Krupansky commented on CASSANDRA-7654:
---

1. For doc purposes, what is the rationale for restricting the insert to a 
single partition?

2. Will subsequent inserts occur if any of the inserts fail due to consistency 
or for any other reason?

3. Can the app assume that the inserts will be attempted in parallel?

4. Will the driver route the insert to the insert a node that owns that 
partition key?

4a. Should all of the inserts really be routed to the same node, or distributed 
according to RF? (Driver question.)

5. Is it also proposed to enhance the driver to support such a "batch" 
insertion of documents?
 

> CQL INSERT improvement
> --
>
> Key: CASSANDRA-7654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7654
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>
> It would be nice to be able to add multiple rows using a single {{INSERT}}.
> Restricted to the same partition.
> For example:
> Current behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 1, 41);
> INSERT INTO comp_key (key, num_val)
>   VALUES ('foo', 2, 42);
> {noformat}
> Wanted behaviour:
> {noformat}
> INSERT INTO comp_key (key, num_val)
>   VALUES
> ('foo', 1, 41),
> ('foo', 2, 42),
> ('foo', 3, 42),
> ('foo', 4, 42);
> {noformat}
> Assumed table def:
> {noformat}
> CREATE TABLE comp_key (
>   key TEXT,
>   clust   INT,
>   num_val DECIMAL,
>   PRIMARY KEY ( key, clust )
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7642) Adaptive Consistency

2014-07-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079392#comment-14079392
 ] 

Jack Krupansky commented on CASSANDRA-7642:
---

Maybe this ends up being a doc issue - detailing best practice for achieving 
adaptive consistency. And just better doc for consistency in general.

I mean, as more and more lower-skilled RDBMS ACID-heads get sucked into NoSQL 
Cassandra, understanding and managing consistency is only going to get a bigger 
and bigger issue.


> Adaptive Consistency
> 
>
> Key: CASSANDRA-7642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7642
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Rustam Aliyev
> Fix For: 3.0
>
>
> h4. Problem
> At minimum, application requires consistency level of X, which must be fault 
> tolerant CL. However, when there is no failure it would be advantageous to 
> use stronger consistency Y (Y>X).
> h4. Suggestion
> Application defines minimum (X) and maximum (Y) consistency levels. C* can 
> apply adaptive consistency logic to use Y whenever possible and downgrade to 
> X when failure occurs.
> Implementation should not negatively impact performance. Therefore, state has 
> to be maintained globally (not per request).
> h4. Example
> {{MIN_CL=LOCAL_QUORUM}}
> {{MAX_CL=EACH_QUORUM}}
> h4. Use Case
> Consider a case where user wants to maximize their uptime and consistency. 
> They designing a system using C* where transactions are read/written with 
> LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between 
> DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the 
> cases.
> Application requires new transactions to be read back right after they were 
> generated. Write and read could be done through different DCs (no 
> stickiness). In some cases when user writes into DC1 and reads immediately 
> from DC2, replication delay may cause problems. Transaction won't show up on 
> read in DC2, user will retry and create duplicate transaction. Occasional 
> duplicates are fine and the goal is to minimize number of dups.
> Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) 
> whenever possible without compromising on availability. Using adaptive 
> consistency they should be able to define:
>{{Read CL = LOCAL_QUORUM}}
>{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}}
> Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, 
> MAX:ALL)}} case.
> h4. Criticism
> # This functionality can/should be implemented by user himself.
> bq. It will be hard for an average user to implement topology monitoring and 
> state machine. Moreover, this is a pattern which repeats.
> # Transparent downgrading violates the CL contract, and that contract 
> considered be just about the most important element of Cassandra's runtime 
> behavior.
> bq.Fully transparent downgrading without any contract is dangerous. However, 
> would it be problem if we specify explicitly only two discrete CL levels - 
> MIN_CL and MAX_CL?
> # If you have split brain DCs (partitioned in CAP), you have to sacrifice 
> either consistency or availability, and auto downgrading sacrifices the 
> consistency in dangerous ways if the application isn't designed to handle it. 
> And if the application is designed to handle it, then it should be able to 
> handle it in normal circumstances, not just degraded/extraordinary ones.
> bq. Agreed. Application should be designed for MIN_CL. In that case, MAX_CL 
> will not be causing much harm, only adding flexibility.
> # It might be a better idea to loudly downgrade, instead of silently 
> downgrading, meaning that the client code does an explicit retry with lower 
> consistency on failure and takes some other kind of action to attempt to 
> inform either users or operators of the problem. The silent part of the 
> downgrading which could be dangerous.
> bq. There are certainly cases where user should be informed when consistency 
> changes in order to perform custom action. For this purpose we could 
> allow/require user to register callback function which will be triggered when 
> consistency level changes. Best practices could be enforced by requiring 
> callback.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7642) Adaptive Consistency

2014-07-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079361#comment-14079361
 ] 

Jack Krupansky edited comment on CASSANDRA-7642 at 7/30/14 3:16 PM:


Is there any actual functional difference deep in Cassandra for higher CL other 
than merely waiting for confirmation and giving the status code if sufficient 
number of confirmations are not received? I imagine not (other than some 
transaction stuff.) But I can sympathize with the difficulty of implementing a 
"consistency validation" check at the app level. IOW, if Cassandra is going to 
get to ALL consistency anyway, if at all humanly (so to speak!) possible, what 
advantage is there here other than how the waiting is performed? And I have 
heard of users who want their writes to happen as quickly as possible, but also 
want some way to "check" whether or when a specified level of consistency is 
achieved, other than pinging with reads and checking values.

Maybe the ultimate goal here should be asynchronous writes - send off a write 
with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response 
back that the operation is "initiated", and then have a "Check Operation CL 
Status" API call that would indicate whether or what level of CL has been 
achieved for a designated write operation.



was (Author: jkrupan):
Is there any actual functional difference deep in Cassandra for higher CL other 
than merely waiting for confirmation and giving the status code if sufficient 
number of confirmations are not received? I imagine not (other than some 
transaction stuff.) But I can sympathize with the difficulty of implementing a 
"consistency validation" check at the app level. IOW, if Cassandra is going to 
get to ALL consistency anyway, if at all humanly (so to speak!) possible, what 
advantage is there here other than how the waiting is performed? And I have 
heard of users who want their writes to happen as quickly as possible, but also 
want some way to "check" whether or when a specified level of consistency is 
achieved, other than pinging with reads and checking values.

Maybe the ultimate goal here should be asynchronous writes - send off a write 
with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response 
back that the operation is "initiated", and then have a "Check Operation CL 
Status" API call that would indicate whether a designated whether or what level 
of CL has been achieved for that operation.


> Adaptive Consistency
> 
>
> Key: CASSANDRA-7642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7642
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Rustam Aliyev
> Fix For: 3.0
>
>
> h4. Problem
> At minimum, application requires consistency level of X, which must be fault 
> tolerant CL. However, when there is no failure it would be advantageous to 
> use stronger consistency Y (Y>X).
> h4. Suggestion
> Application defines minimum (X) and maximum (Y) consistency levels. C* can 
> apply adaptive consistency logic to use Y whenever possible and downgrade to 
> X when failure occurs.
> Implementation should not negatively impact performance. Therefore, state has 
> to be maintained globally (not per request).
> h4. Example
> {{MIN_CL=LOCAL_QUORUM}}
> {{MAX_CL=EACH_QUORUM}}
> h4. Use Case
> Consider a case where user wants to maximize their uptime and consistency. 
> They designing a system using C* where transactions are read/written with 
> LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between 
> DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the 
> cases.
> Application requires new transactions to be read back right after they were 
> generated. Write and read could be done through different DCs (no 
> stickiness). In some cases when user writes into DC1 and reads immediately 
> from DC2, replication delay may cause problems. Transaction won't show up on 
> read in DC2, user will retry and create duplicate transaction. Occasional 
> duplicates are fine and the goal is to minimize number of dups.
> Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) 
> whenever possible without compromising on availability. Using adaptive 
> consistency they should be able to define:
>{{Read CL = LOCAL_QUORUM}}
>{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}}
> Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, 
> MAX:ALL)}} case.
> h4. Criticism
> # This functionality can/should be implemented by user himself.
> bq. It will be hard for an average user to implement topology monitoring and 
> state machine. Moreover, this is a pattern which repeats.
> # Transparent downgrading violates the CL contract, and that con

[jira] [Commented] (CASSANDRA-7642) Adaptive Consistency

2014-07-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079361#comment-14079361
 ] 

Jack Krupansky commented on CASSANDRA-7642:
---

Is there any actual functional difference deep in Cassandra for higher CL other 
than merely waiting for confirmation and giving the status code if sufficient 
number of confirmations are not received? I imagine not (other than some 
transaction stuff.) But I can sympathize with the difficulty of implementing a 
"consistency validation" check at the app level. IOW, if Cassandra is going to 
get to ALL consistency anyway, if at all humanly (so to speak!) possible, what 
advantage is there here other than how the waiting is performed? And I have 
heard of users who want their writes to happen as quickly as possible, but also 
want some way to "check" whether or when a specified level of consistency is 
achieved, other than pinging with reads and checking values.

Maybe the ultimate goal here should be asynchronous writes - send off a write 
with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response 
back that the operation is "initiated", and then have a "Check Operation CL 
Status" API call that would indicate whether a designated whether or what level 
of CL has been achieved for that operation.


> Adaptive Consistency
> 
>
> Key: CASSANDRA-7642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7642
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Rustam Aliyev
> Fix For: 3.0
>
>
> h4. Problem
> At minimum, application requires consistency level of X, which must be fault 
> tolerant CL. However, when there is no failure it would be advantageous to 
> use stronger consistency Y (Y>X).
> h4. Suggestion
> Application defines minimum (X) and maximum (Y) consistency levels. C* can 
> apply adaptive consistency logic to use Y whenever possible and downgrade to 
> X when failure occurs.
> Implementation should not negatively impact performance. Therefore, state has 
> to be maintained globally (not per request).
> h4. Example
> {{MIN_CL=LOCAL_QUORUM}}
> {{MAX_CL=EACH_QUORUM}}
> h4. Use Case
> Consider a case where user wants to maximize their uptime and consistency. 
> They designing a system using C* where transactions are read/written with 
> LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between 
> DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the 
> cases.
> Application requires new transactions to be read back right after they were 
> generated. Write and read could be done through different DCs (no 
> stickiness). In some cases when user writes into DC1 and reads immediately 
> from DC2, replication delay may cause problems. Transaction won't show up on 
> read in DC2, user will retry and create duplicate transaction. Occasional 
> duplicates are fine and the goal is to minimize number of dups.
> Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) 
> whenever possible without compromising on availability. Using adaptive 
> consistency they should be able to define:
>{{Read CL = LOCAL_QUORUM}}
>{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}}
> Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, 
> MAX:ALL)}} case.
> h4. Criticism
> # This functionality can/should be implemented by user himself.
> bq. It will be hard for an average user to implement topology monitoring and 
> state machine. Moreover, this is a pattern which repeats.
> # Transparent downgrading violates the CL contract, and that contract 
> considered be just about the most important element of Cassandra's runtime 
> behavior.
> bq.Fully transparent downgrading without any contract is dangerous. However, 
> would it be problem if we specify explicitly only two discrete CL levels - 
> MIN_CL and MAX_CL?
> # If you have split brain DCs (partitioned in CAP), you have to sacrifice 
> either consistency or availability, and auto downgrading sacrifices the 
> consistency in dangerous ways if the application isn't designed to handle it. 
> And if the application is designed to handle it, then it should be able to 
> handle it in normal circumstances, not just degraded/extraordinary ones.
> bq. Agreed. Application should be designed for MIN_CL. In that case, MAX_CL 
> will not be causing much harm, only adding flexibility.
> # It might be a better idea to loudly downgrade, instead of silently 
> downgrading, meaning that the client code does an explicit retry with lower 
> consistency on failure and takes some other kind of action to attempt to 
> inform either users or operators of the problem. The silent part of the 
> downgrading which could be dangerous.
> bq. There are certainly cases where user should be informed

[jira] [Commented] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)

2014-07-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079225#comment-14079225
 ] 

Jack Krupansky commented on CASSANDRA-7637:
---

bq. attribute < 'interests.food.z';

That's a great argument for this improvement - the simple, "obvious" explicit 
range... is incorrect. It won't match "'interests.food.zebra" or even 
"'interests.food.z".

It would need to be something like:

{code}
SELECT * FROM profile WHERE profile_id = 123 AND
  attribute > 'interests.food.' AND
  attribute < 'interests.food.{';
{code}



> Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
> --
>
> Key: CASSANDRA-7637
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7637
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Rustam Aliyev
> Fix For: 3.0
>
>
> Currently, if I want to perform range query on lexical type I need to do 
> something like this:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute > 'interests.food.' AND
>   attribute < 'interests.food.z';
> {code}
> This is very efficient range query. Yet, many users who are not familiar with 
> Thrift and storage level implementation are unaware of this "trick".
> Therefore, it would be convenient to introduce CQL keyword which will do this 
> more simply:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute START_WITH('interests.food.');
> {code}
> Keyword would have same restrictions as other inequality search operators 
> plus some type restrictions.
> Allowed types would be:
>  * {{ascii}}
>  * {{text}} / {{varchar}}
>  * {{map}} (same for ascii) (?)
>  * {{set}} (same for ascii) (?)
> (?) may require more work, therefore optional



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)

2014-07-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078555#comment-14078555
 ] 

Jack Krupansky edited comment on CASSANDRA-7637 at 7/29/14 11:42 PM:
-

Why not use the SQL "LIKE" keyword operator and just support trailing wildcard 
(AKA "prefix" query) for now?

{code}
SELECT * FROM profile WHERE profile_id = 123 AND
  attribute LIKE 'interests.food.*';
{code}

See:
http://www.w3schools.com/sql/sql_wildcards.asp
or
http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm

Note: I use \* in my example, although SQL uses \% and ? rather than the 
traditional \* and ? that "real programmers" use for "glob" characters. Take 
your pick, or maybe have a config option for that. Do we want to be strict SQL? 
I don't know - let the community decide!



was (Author: jkrupan):
Why not use the SQL "LIKE" keyword operator and just support trailing wildcard 
(AKA "prefix" query) for now?

{code}
SELECT * FROM profile WHERE profile_id = 123 AND
  attribute LIKE 'interests.food.*';
{code}

See:
http://www.w3schools.com/sql/sql_wildcards.asp
or
http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm



> Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
> --
>
> Key: CASSANDRA-7637
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7637
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Rustam Aliyev
> Fix For: 3.0
>
>
> Currently, if I want to perform range query on lexical type I need to do 
> something like this:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute > 'interests.food.' AND
>   attribute < 'interests.food.z';
> {code}
> This is very efficient range query. Yet, many users who are not familiar with 
> Thrift and storage level implementation are unaware of this "trick".
> Therefore, it would be convenient to introduce CQL keyword which will do this 
> more simply:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute START_WITH('interests.food.');
> {code}
> Keyword would have same restrictions as other inequality search operators 
> plus some type restrictions.
> Allowed types would be:
>  * {{ascii}}
>  * {{text}} / {{varchar}}
>  * {{map}} (same for ascii) (?)
>  * {{set}} (same for ascii) (?)
> (?) may require more work, therefore optional



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)

2014-07-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078555#comment-14078555
 ] 

Jack Krupansky commented on CASSANDRA-7637:
---

Why not use the SQL "LIKE" keyword operator and just support trailing wildcard 
(AKA "prefix" query) for now?

{code}
SELECT * FROM profile WHERE profile_id = 123 AND
  attribute LIKE 'interests.food.*';
{code}

See:
http://www.w3schools.com/sql/sql_wildcards.asp
or
http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm



> Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
> --
>
> Key: CASSANDRA-7637
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7637
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Rustam Aliyev
>
> Currently, if I want to perform range query on lexical type I need to do 
> something like this:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute > 'interests.food.' AND
>   attribute < 'interests.food.z';
> {code}
> This is very efficient range query. Yet, many users who are not familiar with 
> Thrift and storage level implementation are unaware of this "trick".
> Therefore, it would be convenient to introduce CQL keyword which will do this 
> more simply:
> {code}
> SELECT * FROM profile WHERE profile_id = 123 AND
>   attribute START_WITH('interests.food.');
> {code}
> Keyword would have same restrictions as other inequality search operators 
> plus some type restrictions.
> Allowed types would be:
>  * {{ascii}}
>  * {{text}} / {{varchar}}
>  * {{inet}} (?)
>  * {{map}} (same for ascii) (?)
>  * {{set}} (same for ascii) (?)
> (?) may require more work, therefore optional



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6384) "CREATE TABLE ..." execution time increases linearly with number of existing column families

2014-07-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078263#comment-14078263
 ] 

Jack Krupansky commented on CASSANDRA-6384:
---

Is this related to total column families on the cluster, or total for a single 
keyspace?

And is this with slab/arena allocation enabled or disabled (CASSANDRA-5935)?

> "CREATE TABLE ..." execution time increases linearly with number of existing 
> column families
> 
>
> Key: CASSANDRA-6384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6384
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.0.2
> x86_64 GNU/Linux (RHEL)
>Reporter: Anne Sullivan
>Assignee: Ryan McGuire
>Priority: Minor
>
> During creation of 9K column families, the time to execute the "CREATE TABLE" 
> statement increased linearly from 100ms to 15min.  Tried issuing the 
> statements using both the Java Driver (2.0.0-beta2) and cqlsh (4.1.0), with 
> the same result.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster

2014-07-26 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075416#comment-14075416
 ] 

Jack Krupansky commented on CASSANDRA-6977:
---

bq. 10K column families are distributed between different keyspaces

It would be helpful to have some guidance to offer to data modelers. For 
example, would it be better to have 100 keyspaces with 100 tables each, 10 
keyspaces with 1,000 tables each, 50 keyspaces with 200 tables each, 200 
keyspaces with 50 tables each, or... each table in a different key space?

Maybe we should go back and build upon the traditional guidance of "hundreds" 
of tables, and use that as the guidance for a single keyspace. So, that would 
suggest that 50 keyspaces with 200 tables each would be a better "sweet spot" 
for 1,000 tables in a cluster.

That still leaves open the question of whether a single table per keyspace with 
1,000 keyspaces would be just as as viable.

Maybe the final guidance could be "no more than a few hundred tables per 
keyspace."


> attempting to create 10K column families fails with 100 node cluster
> 
>
> Key: CASSANDRA-6977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6977
> Project: Cassandra
>  Issue Type: Bug
> Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances
>Reporter: Daniel Meyer
>Assignee: Ryan McGuire
>Priority: Minor
> Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, 
> keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv
>
>
> During this test we are attempting to create a total of 1K keyspaces with 10 
> column families each to bring the total column families to 10K.  With a 5 
> node cluster this operation can be completed; however, it fails with 100 
> nodes.  Please see the two charts.  For the 5 node case the time required to 
> create each keyspace and subsequent 10 column families increases linearly 
> until the number of keyspaces is 1K.  For a 100 node cluster there is a 
> sudden increase in latency between 450 keyspaces and 550 keyspaces.  The test 
> ends when the test script times out.  After the test script times out it is 
> impossible to reconnect to the cluster with the datastax python driver 
> because it cannot connect to the host:
> cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', 
> {'10.199.5.98': OperationTimedOut()}
> It was found that running the following stress command does work from the 
> same machine the test script runs on.
> cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT
> It should be noted that this test was initially done with DSE 4.0 and c* 
> version 2.0.5.24 and in that case it was not possible to run stress against 
> the cluster even locally on a node due to not finding the host.
> Attached are system logs from one of the nodes, charts showing schema 
> creation latency for 5 and 100 node clusters and virtualvm tracer data for 
> cpu, memory, num_threads and gc runs, tpstat output and the test script.
> The test script was on an m1.large aws instance outside of the cluster under 
> test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7527) Bump CQL version and update doc for 2.1

2014-07-22 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070795#comment-14070795
 ] 

Jack Krupansky commented on CASSANDRA-7527:
---

What is the official Apache landing place for the CQL doc? If I google 
"cassandra cql3", I find:

https://cassandra.apache.org/doc/cql3/CQL.html

But I don't find any links to that web page on the Apache Cassandra site - just 
links to the DataStax doc.

Also if I look in the doc/CQL3 directory on the Apache site I see the following

{code}
CQL-1.2.html2014-03-19 16:47   81K  
CQL-2.0.html2014-06-30 09:09   92K  
CQL.css 2012-07-13 09:15  2.0K  
CQL.html2014-06-30 09:09   92K 
{code}

Will there be a CQL-2.1.html for the c\* 2.1 doc CQL, or will CQL-2.0.html be 
overwritten?

And again I was unable to find any links to CQL-2.0.html or CQL-1.2.html on the 
Apache site. I mean, it would be nice to have a clean web link to consult 2.0 
doc even when 2.1 goes GA. I tried to google "Cassandra 2.0 cql doc", but it 
doesn't find that CQL-2.0.html page or find the 1.2 page when I search for 1.2.

Finally, will this official Apache C\* 2.1 CQL doc be available on the web real 
soon, or only at 2.1 GA?



> Bump CQL version and update doc for 2.1
> ---
>
> Key: CASSANDRA-7527
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7527
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Tyler Hobbs
> Fix For: 2.1.0
>
> Attachments: 7527-v2.txt, 7527.txt
>
>
> It appears we forgot to bump the CQL version for new 2.1 features (UDT, tuple 
> type, collection indexing), nor did we update the textile doc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7372) Exception when querying a composite-keyed table with a collection index

2014-06-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029750#comment-14029750
 ] 

Jack Krupansky commented on CASSANDRA-7372:
---

(Nit: The description says "composite-keyed table", but the example is for a 
"compound key".)

> Exception when querying a composite-keyed table with a collection index
> ---
>
> Key: CASSANDRA-7372
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7372
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ghais Issa
>Assignee: Mikhail Stepura
> Fix For: 2.1 rc2
>
> Attachments: CASSANDRA-2.1-7372-v3.patch
>
>
> Given the following schema:
> {code}  
> CREATE TABLE products (
>   account text,
>   id int,
>   categories set,
>   PRIMARY KEY (account, id)
> );
> CREATE INDEX cat_index ON products(categories);
> {code}  
> The following query fails with an exception
> {code}
> SELECT * FROM products WHERE account = 'xyz' AND categories CONTAINS 'lmn';
> errors={}, last_host=127.0.0.1
> {code}
> The exception in cassandra's log is:
> {code}
> WARN  17:01:49 Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2015)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_25]
>   at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:162)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:103) 
> ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
> Caused by: java.lang.IndexOutOfBoundsException: null
>   at 
> org.apache.cassandra.db.composites.Composites$EmptyComposite.get(Composites.java:60)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.index.composites.CompositesIndexOnCollectionKey.makeIndexColumnPrefix(CompositesIndexOnCollectionKey.java:78)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.index.composites.CompositesSearcher.makePrefix(CompositesSearcher.java:82)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.index.composites.CompositesSearcher.getIndexedIterator(CompositesSearcher.java:116)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.index.composites.CompositesSearcher.search(CompositesSearcher.java:68)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.index.SecondaryIndexManager.search(SecondaryIndexManager.java:589)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.search(ColumnFamilyStore.java:2060) 
> ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.db.RangeSliceCommand.executeLocally(RangeSliceCommand.java:131)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1368)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2011)
>  ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1]
>   ... 4 common frames omitted
> {code}
> The following query however works
> {code}
> SELECT * FROM products WHERE categories CONTAINS 'lmn';
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 106 matches

Mail list logo