[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift
[ https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253959#comment-15253959 ] Jack Krupansky commented on CASSANDRA-11547: It would be nice to have three distinct layers of defense for clock drift: 1. External monitoring service to alert users when the clocks on a cluster may be drifting and a super-alert when any clock in the cluster gets too far out of range. Hopefully catch and correct clock drift before cluster gets into trouble. 2. A warning from Cassandra itself if node clock gets more than a minor threshold out of sync with the majority of the cluster. 3. A strong warning or even freeze if node's clock is more than a major threshold out of sync with majority of cluster. > Add background thread to check for clock drift > -- > > Key: CASSANDRA-11547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11547 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Minor > Labels: clocks, time > > The system clock has the potential to drift while a system is running. As a > simple way to check if this occurs, we can run a background thread that wakes > up every n seconds, reads the system clock, and checks to see if, indeed, n > seconds have passed. > * If the clock's current time is less than the last recorded time (captured n > seconds in the past), we know the clock has jumped backward. > * If n seconds have not elapsed, we know the system clock is running slow or > has moved backward (by a value less than n) > * If (n + a small offset) seconds have elapsed, we can assume we are within > an acceptable window of clock movement. Reasons for including an offset are > the clock checking thread might not have been scheduled on time, or garbage > collection, and so on. > * If the clock is greater than (n + a small offset) seconds, we can assume > the clock jumped forward. > In the unhappy cases, we can write a message to the log and increment some > metric that the user's monitoring systems can trigger/alert on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11566) read time out when do count(*)
[ https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248694#comment-15248694 ] Jack Krupansky edited comment on CASSANDRA-11566 at 4/19/16 9:59 PM: - I suspect that this timeout is simply because cqlsh is set to only allow 10 seconds for a request by default. Try setting the request timeout to some largish number, like 2000 (seconds) using the {{--request-timeout}} command line option for cqlsh: {code} cqlsh --request-timeout=2000 ... {code} To be clear, even if setting a longer timeout works, it is not advisable to perform such a slow and resource-intensive operation on a production cluster unless absolutely necessary. was (Author: jkrupan): I suspect that this timeout is simply because cqlsh is set to only allow 10 seconds for a request by default. Try setting the request timeout to some largish number, like 2000 (seconds) using the {{--request-timeout}} command line option for cqlsh: {code} cqlsh --request-timeout=2000 ... {code} > read time out when do count(*) > -- > > Key: CASSANDRA-11566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11566 > Project: Cassandra > Issue Type: Bug > Environment: staging >Reporter: nizar > Fix For: 3.3 > > > Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I > set the limit to 1, it would make sense if the limit is high number .. > However only limit 1 and still timing out sounds odd? > [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4] > cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false > limit 1; > OperationTimedOut: errors={}, last_host= > my key look like this : > CREATE MATERIALIZED VIEW test.my_view AS > SELECT * > FROM table_name > WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS NOT NULL AND id > IS NOT NULL > PRIMARY KEY ( ( s_id, flag ), time, id ) > WITH CLUSTERING ORDER BY ( time ASC ); > I have 5 nodes with replica 3 > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc': '3'} AND durable_writes = true; > Below was the result for nodetoolcfstats > Keyspace: test > Read Count: 128770 > Read Latency: 1.42208769123243 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: tableName > SSTable count: 3 > Space used (live): 280777032 > Space used (total): 280777032 > Space used by snapshots (total): 0 > Off heap memory used (total): 2850227 > SSTable Compression Ratio: 0.24706731995327527 > Number of keys (estimate): 1277211 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 3 > Local read latency: 0.396 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 1589848 > Bloom filter off heap memory used: 1589824 > Index summary off heap memory used: 1195691 > Compression metadata off heap memory used: 64712 > Compacted partition minimum bytes: 311 > Compacted partition maximum bytes: 535 > Compacted partition mean bytes: 458 > Average live cells per slice (last five minutes): 102.92671205446536 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Table: my_view > SSTable count: 4 > Space used (live): 126114270 > Space used (total): 126114270 > Space used by snapshots (total): 0 > Off heap memory used (total): 91588 > SSTable Compression Ratio: 0.1652453778228639 > Number of keys (estimate): 8 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 128767 > Local read latency: 1.590 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 64 > Index summary off heap memory used: 140 > Compression metadata off heap memory used: 91384 > Compacted partition minimum bytes: 3974 > Compacted partition maximum bytes: 386857368 > Compacted partition mean bytes: 26034715 > Average live cells per slice (last five minutes): 102.99462595230145 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Thank you. > Nizar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)
[ https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248742#comment-15248742 ] Jack Krupansky commented on CASSANDRA-11566: This issue may also be considered a duplicate of CASSANDRA-9051. For reference, setting the {{--request-timeout}} parameter on the command line and the {{request_timeout}} option in the {{\[connection]}} section of the {{cqlshrc}} file are documented here: http://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlsh.html > read time out when do count(*) > -- > > Key: CASSANDRA-11566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11566 > Project: Cassandra > Issue Type: Bug > Environment: staging >Reporter: nizar > Fix For: 3.3 > > > Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I > set the limit to 1, it would make sense if the limit is high number .. > However only limit 1 and still timing out sounds odd? > [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4] > cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false > limit 1; > OperationTimedOut: errors={}, last_host= > my key look like this : > CREATE MATERIALIZED VIEW test.my_view AS > SELECT * > FROM table_name > WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS NOT NULL AND id > IS NOT NULL > PRIMARY KEY ( ( s_id, flag ), time, id ) > WITH CLUSTERING ORDER BY ( time ASC ); > I have 5 nodes with replica 3 > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc': '3'} AND durable_writes = true; > Below was the result for nodetoolcfstats > Keyspace: test > Read Count: 128770 > Read Latency: 1.42208769123243 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: tableName > SSTable count: 3 > Space used (live): 280777032 > Space used (total): 280777032 > Space used by snapshots (total): 0 > Off heap memory used (total): 2850227 > SSTable Compression Ratio: 0.24706731995327527 > Number of keys (estimate): 1277211 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 3 > Local read latency: 0.396 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 1589848 > Bloom filter off heap memory used: 1589824 > Index summary off heap memory used: 1195691 > Compression metadata off heap memory used: 64712 > Compacted partition minimum bytes: 311 > Compacted partition maximum bytes: 535 > Compacted partition mean bytes: 458 > Average live cells per slice (last five minutes): 102.92671205446536 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Table: my_view > SSTable count: 4 > Space used (live): 126114270 > Space used (total): 126114270 > Space used by snapshots (total): 0 > Off heap memory used (total): 91588 > SSTable Compression Ratio: 0.1652453778228639 > Number of keys (estimate): 8 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 128767 > Local read latency: 1.590 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 64 > Index summary off heap memory used: 140 > Compression metadata off heap memory used: 91384 > Compacted partition minimum bytes: 3974 > Compacted partition maximum bytes: 386857368 > Compacted partition mean bytes: 26034715 > Average live cells per slice (last five minutes): 102.99462595230145 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Thank you. > Nizar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)
[ https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248694#comment-15248694 ] Jack Krupansky commented on CASSANDRA-11566: I suspect that this timeout is simply because cqlsh is set to only allow 10 seconds for a request by default. Try setting the request timeout to some largish number, like 2000 (seconds) using the {{--request-timeout}} command line option for cqlsh: {code} cqlsh --request-timeout=2000 ... {code} > read time out when do count(*) > -- > > Key: CASSANDRA-11566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11566 > Project: Cassandra > Issue Type: Bug > Environment: staging >Reporter: nizar > Fix For: 3.3 > > > Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I > set the limit to 1, it would make sense if the limit is high number .. > However only limit 1 and still timing out sounds odd? > [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4] > cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false > limit 1; > OperationTimedOut: errors={}, last_host= > my key look like this : > CREATE MATERIALIZED VIEW test.my_view AS > SELECT * > FROM table_name > WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS NOT NULL AND id > IS NOT NULL > PRIMARY KEY ( ( s_id, flag ), time, id ) > WITH CLUSTERING ORDER BY ( time ASC ); > I have 5 nodes with replica 3 > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc': '3'} AND durable_writes = true; > Below was the result for nodetoolcfstats > Keyspace: test > Read Count: 128770 > Read Latency: 1.42208769123243 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: tableName > SSTable count: 3 > Space used (live): 280777032 > Space used (total): 280777032 > Space used by snapshots (total): 0 > Off heap memory used (total): 2850227 > SSTable Compression Ratio: 0.24706731995327527 > Number of keys (estimate): 1277211 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 3 > Local read latency: 0.396 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 1589848 > Bloom filter off heap memory used: 1589824 > Index summary off heap memory used: 1195691 > Compression metadata off heap memory used: 64712 > Compacted partition minimum bytes: 311 > Compacted partition maximum bytes: 535 > Compacted partition mean bytes: 458 > Average live cells per slice (last five minutes): 102.92671205446536 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Table: my_view > SSTable count: 4 > Space used (live): 126114270 > Space used (total): 126114270 > Space used by snapshots (total): 0 > Off heap memory used (total): 91588 > SSTable Compression Ratio: 0.1652453778228639 > Number of keys (estimate): 8 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 128767 > Local read latency: 1.590 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 64 > Index summary off heap memory used: 140 > Compression metadata off heap memory used: 91384 > Compacted partition minimum bytes: 3974 > Compacted partition maximum bytes: 386857368 > Compacted partition mean bytes: 26034715 > Average live cells per slice (last five minutes): 102.99462595230145 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Thank you. > Nizar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11566) read time out when do count(*)
[ https://issues.apache.org/jira/browse/CASSANDRA-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245719#comment-15245719 ] Jack Krupansky commented on CASSANDRA-11566: COUNT(\*) should only be used on relatively small tables or for relatively narrow token ranges or relatively narrow slices of wide rows - probably no more than thousands or maybe hundreds of rows and will depend on your data and your hardware. Your table may have only 8 "keys", but that means 8 partition keys, not primary keys. Your table is 120 MB, which is not large as tables go, but may in fact be large enough to fail to complete the count operation in a small amount of time. Try doing a count for each of the partition keys in that table. Maybe one of the rows is very wide and causing performance to bog down. (FWIW, if you need to write "(\*)" in Jira, you need to escape the \* with a backslash, as in "(\*)".) > read time out when do count(*) > -- > > Key: CASSANDRA-11566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11566 > Project: Cassandra > Issue Type: Bug > Environment: staging >Reporter: nizar > Fix For: 3.3 > > > Hello I using Cassandra Datastax 3.3, I keep getting read time out even if I > set the limit to 1, it would make sense if the limit is high number .. > However only limit 1 and still timing out sounds odd? > [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4] > cqlsh:test> select count(*) from test.my_view where s_id=? and flag=false > limit 1; > OperationTimedOut: errors={}, last_host= > my key look like this : > CREATE MATERIALIZED VIEW test.my_view AS > SELECT * > FROM table_name > WHERE id IS NOT NULL AND processed IS NOT NULL AND time IS NOT NULL AND id > IS NOT NULL > PRIMARY KEY ( ( s_id, flag ), time, id ) > WITH CLUSTERING ORDER BY ( time ASC ); > I have 5 nodes with replica 3 > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc': '3'} AND durable_writes = true; > Below was the result for nodetoolcfstats > Keyspace: test > Read Count: 128770 > Read Latency: 1.42208769123243 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: tableName > SSTable count: 3 > Space used (live): 280777032 > Space used (total): 280777032 > Space used by snapshots (total): 0 > Off heap memory used (total): 2850227 > SSTable Compression Ratio: 0.24706731995327527 > Number of keys (estimate): 1277211 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 3 > Local read latency: 0.396 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 1589848 > Bloom filter off heap memory used: 1589824 > Index summary off heap memory used: 1195691 > Compression metadata off heap memory used: 64712 > Compacted partition minimum bytes: 311 > Compacted partition maximum bytes: 535 > Compacted partition mean bytes: 458 > Average live cells per slice (last five minutes): 102.92671205446536 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Table: my_view > SSTable count: 4 > Space used (live): 126114270 > Space used (total): 126114270 > Space used by snapshots (total): 0 > Off heap memory used (total): 91588 > SSTable Compression Ratio: 0.1652453778228639 > Number of keys (estimate): 8 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 0 > Local read count: 128767 > Local read latency: 1.590 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 64 > Index summary off heap memory used: 140 > Compression metadata off heap memory used: 91384 > Compacted partition minimum bytes: 3974 > Compacted partition maximum bytes: 386857368 > Compacted partition mean bytes: 26034715 > Average live cells per slice (last five minutes): 102.99462595230145 > Maximum live cells per slice (last five minutes): 103 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Thank you. > Nizar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
[ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235141#comment-15235141 ] Jack Krupansky commented on CASSANDRA-9754: --- Any idea how a new wide partition will perform relative to the same amount of data and same number of clustering rows divided into bucketed partitions? For example, a single 1 GB wide partition vs. ten 100 MB partitions (same partition key plus a 0-9 bucket number) vs. a hundred 10 MB partitions (0-99 bucket number), for two access patterns: 1) random access a row or short slice, and 2) a full bulk read of the 1 GB of data, one moderate slice at a time. Or maybe the question is equivalent to asking what the cost is to access the last row of the 1 GB partition vs. the last row of the tenth or hundredth bucket of the bucketed equivalent. No precision required. Just inquiring whether we can get rid of bucketing as a preferred data modeling strategy, at least for the common use cases where the sum of the buckets is roughly 2 GB or less.. The bucketing approach does have the side effect of distributing the buckets around the cluster, which could be a good thing, or maybe not. > Make index info heap friendly for large CQL partitions > -- > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Michael Kjellman >Priority: Minor > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects > are IndexInfo and its ByteBuffers. This is specially bad in endpoints with > large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K > IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for > GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231487#comment-15231487 ] Jack Krupansky commented on CASSANDRA-8844: --- Since this new feature has evolved significantly since the original description, is there a good summary available for the current form of the feature? Not like full doc or the internal implementation details, but a concise summary at the user level, like where the CDC data will be stored, its format, how to retrieve it, and potential performance impact, both in terms of amount of CPU time required and additional memory required if CDC is enabled. Thanks. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order,
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216937#comment-15216937 ] Jack Krupansky commented on CASSANDRA-11383: +1 for using [~jrwest]'s most recent two comments here as the source for the doc changes that I myself was referring to here. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216763#comment-15216763 ] Jack Krupansky commented on CASSANDRA-11383: Thanks, [~jrwest]. I think that I finally don't have any additional questions! BTW, the DataStax Distribution of Cassandra (DDC) for 3.4 is out now, so the DataStax Cassandra doc has been updated for 3.4, including SASI: https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndexConcept.html https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html That happened four days ago, so maybe some of our recent discussion since then should get cycled into the doc. For example, your comments about range queries on SPARSE data. I'll pings docs to alert them of the discussion here, but you guys are free to highlight whatever info you think users should know about. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216233#comment-15216233 ] Jack Krupansky commented on CASSANDRA-11383: Thanks, [~jrwest] and [~doanduyhai]. I think I finally have the SASI terminology down now - SPARSE modes means that the index is sparse (few index entries per original column value) while the column data is dense (many distinct values.) And that non-SPARSE (AKA PREFIX) mode, the default mode, supports any cardinality of data, especially the low cardinality data that SPARSE mode does not support. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is considered advisable/recommended for high cardinality column data, where SPARSE mode is nominally a better choice. Maybe that is strictly a matter of whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is required, but if not, SPARSE mode sounds like the better choice. But I don't have a handle on the internal index structures to know if that's absolutely the case - that a PREFIX index for SPARSE data would necessarily be larger and/or slower than a SPARSE index for high cardinality data. I would hope so, but it would be good to have that confirmed. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216168#comment-15216168 ] Jack Krupansky commented on CASSANDRA-11383: 1. Was the conclusion that a SPARSE SASI index would work well even for low cardinality data (as in the original reported case, for period_end_month_int), or was there some application-level change required to adapt to a SASI change as well? 2. Is it now official that a non-SPARSE SASI index (e.g., PREFIX) can be used for non-TEXT data (int in particular), at least for the case of exact match lookup? > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy
[ https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215912#comment-15215912 ] Jack Krupansky commented on CASSANDRA-11448: Curious that we haven't had an acronym for Out Of Space in more common usage. In fact, this is the first time I've seen it. OOM is so common and so obvious, but OOS seems so foreign. Maybe that's because disk drives are so big these days that most people will now no longer come close to... running OOS on an HDD. SSD changes that with the (currently) much smaller drive size. > Running OOS should trigger the disk failure policy > -- > > Key: CASSANDRA-11448 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11448 > Project: Cassandra > Issue Type: Bug >Reporter: Brandon Williams >Assignee: Branimir Lambov > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Currently when you run OOS, this happens: > {noformat} > ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047 > CassandraDaemon.java:229 - Exception in thread > Thread[MemtableFlushWriter:8561,5,main] java.lang.RuntimeException: > Insufficient disk space to write 48 bytes > at > org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > ~[guava-16.0.1.jar:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_66] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_66] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66] > {noformat} > Now your flush writer is dead and postflush tasks build up forever. Instead > we should throw FSWE and trigger the failure policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202141#comment-15202141 ] Jack Krupansky commented on CASSANDRA-11383: bq. recreated them one by one but with no avail, it eventually OOM after a while But are you waiting for each to finish its build before proceeding to the next? I mean, can even one index alone complete a build? Or, can you create the first 2 or 3 and let them run in parallel to completion before proceeding to the next. Maybe there is some practical limit to how many indexes you can build in parallel before the rate of garbage generation exceeds the rate of GC with all of this going on in parallel. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202080#comment-15202080 ] Jack Krupansky commented on CASSANDRA-11383: 1. How large are each of the text fields being indexed? Are they fairly short or are some quite long (and not tokenized, either)? I'm wondering if maybe a wide column is causing difficulty. 2. Does OOM occur if SASI indexes are created one at a time - serially, waiting for full index to build before moving on to the next? 3. Do you need a 32G heap to build just one index? I cringe when I see a heap larger than 14G. See if you can get a single SASI index build to work in 10-12G or less. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202876#comment-15202876 ] Jack Krupansky commented on CASSANDRA-11383: The int field could easily be made a text field if that would make SASI work better (you can even do prefix query by year then.) Point 1 is precisely what SASI SPARSE is designed for. It also is what Materialized Views (formerly Global Indexes) is for and MV is even better for since it eliminates the need to scan multiple nodes since the rows get collected based on the new partition key that can include the indexed data value. You're using cardinality backwards - it is supposed to be a measure of the number of distinct values in a column, not the number of rows containing each value. See: https://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29. Granted, in ERD cardinality is the count of rows in a second table for each column value in a given table (one to n, n to one, etc.), but in the context of an index there is only one table involved, although you could consider the index to be a table, but that would be a little odd. In any case, best to stick with the standard SQL meaning of the cardinality of data values in a column. So, to be clear, an email address is high cardinality and gender is low cardinality. And the end of month int field is low cardinality or not dense in the original SASI doc terminology. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202877#comment-15202877 ] Jack Krupansky commented on CASSANDRA-11383: Sorry for any extra noise I may have generated here - [~xedin] has the info he needs without me. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202816#comment-15202816 ] Jack Krupansky edited comment on CASSANDRA-11383 at 3/19/16 3:52 PM: - The terminology is a bit confusing here - everybody understands what a sparse matrix is, but exactly what constitutes sparseness in a column is very unclear. What is clear is that the cardinality (number of distinct values) is low for that int field. A naive person (okay... me) would have thought that sparse data meant few distinct values, which is what the int field is (36 distinct values.) I decided to check the doc to see what it says about SPARSE, but discovered that the doc doesn't exist yet in the main Cassandra doc - I sent a message to d...@datastax.com about that (turns out, they sync the doc to the DataStax Distribution of Cassandra (DDC) and DDC 3.4 is not out yet, coming soon.) So I went back to the orginal, pre-integration doc (https://github.com/xedin/sasi) and see that there is separate, non-integrated doc for SASI in the Cassandra source tree - https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear that "SPARSE, which is meant to improve performance of querying large, dense number ranges like timestamps for data inserted every millisecond." Oops... SPARSE=dense, but in any case SPARSE is designed for high cardinality of distinct values, which the int field is clearly not. I would argue that SASI should give a strongly-worded warning if the column data for a SPARSE index has low cardinality - low number of distinct column values and high number of index values per column value. was (Author: jkrupan): The terminology is a bit confusing here - everybody understands what a sparse matrix is, but exactly what constitutes sparseness in a column is very unclear. What is clear is that the cardinality (number of distinct values) is low for that int field. A naive person (okay... me) would have thought that sparse data meant few distinct values, which is what the int field is (36 distinct values.) I decided to check the doc to see what it says about SPARSE, but discovered that the doc doesn't exist yet in the main Cassandra doc - I sent a message to d...@datastax.com about that. So I went back to the orginal, pre-integration doc (https://github.com/xedin/sasi) and see that there is separate, non-integrated doc for SASI in the Cassandra source tree - https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear that "SPARSE, which is meant to improve performance of querying large, dense number ranges like timestamps for data inserted every millisecond." Oops... SPARSE=dense, but in any case SPARSE is designed for high cardinality of distinct values, which the int field is clearly not. I would argue that SASI should give a strongly-worded warning if the column data for a SPARSE index has low cardinality - low number of distinct column values and high number of index values per column value. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202816#comment-15202816 ] Jack Krupansky commented on CASSANDRA-11383: The terminology is a bit confusing here - everybody understands what a sparse matrix is, but exactly what constitutes sparseness in a column is very unclear. What is clear is that the cardinality (number of distinct values) is low for that int field. A naive person (okay... me) would have thought that sparse data meant few distinct values, which is what the int field is (36 distinct values.) I decided to check the doc to see what it says about SPARSE, but discovered that the doc doesn't exist yet in the main Cassandra doc - I sent a message to d...@datastax.com about that. So I went back to the orginal, pre-integration doc (https://github.com/xedin/sasi) and see that there is separate, non-integrated doc for SASI in the Cassandra source tree - https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear that "SPARSE, which is meant to improve performance of querying large, dense number ranges like timestamps for data inserted every millisecond." Oops... SPARSE=dense, but in any case SPARSE is designed for high cardinality of distinct values, which the int field is clearly not. I would argue that SASI should give a strongly-worded warning if the column data for a SPARSE index has low cardinality - low number of distinct column values and high number of index values per column value. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202196#comment-15202196 ] Jack Krupansky commented on CASSANDRA-11383: Just to make sure I understand what's going on... 1. The first index is for the territory_code column, whose values are simple 2-character country codes from allCountries which has 8 entries, with 'FR' repeated 3 times in that list of 8 country codes. 2. How many rows are generated per machine - is it 100 * 40,000,000 = 4 billion? 3. That means that the SASI index will have six unique index values, each with roughly 4 billion / 8 = 500 million rows, correct? (Actually, 5 of the 6 unique values will have 500 million rows and the 6th will have 1.5 billion rows (3 times 500 million.) Sounds like a great stress test for SASI! 4. That's just for the territory_code column. 5. Some of the columns have only 2 unique values, like commercial_offer_code. That would mean 2 billion rows for each indexed unique value. An even more excellent stress test for SASI! > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202241#comment-15202241 ] Jack Krupansky commented on CASSANDRA-11383: What's the table schema? Is period_end_month_int text or int? period_end_month_int has 3 years times 12 months = 36 unique values, so 3.4 billion / 36 = 94.44 million rows for each indexed unique value. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
[ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189407#comment-15189407 ] Jack Krupansky commented on CASSANDRA-9754: --- Is this issue still considered a Minor priority? Seems like a bigger deal to me. +1 for making it a Major priority - unless there is a longer list of even bigger fish in the queue. Just today there is a user on the list struggling with time series data and really not wanting to have to split a partition that he needs to be able to scan. Of source, scanning a super-wide partition will still be a very bad idea anyway, but at least more narrow scans would still be workable with this improvement in place. Is this a 3.x improvement or 4.x or beyond? +1 for 3.x (3.6? 3.8?). > Make index info heap friendly for large CQL partitions > -- > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Michael Kjellman >Priority: Minor > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects > are IndexInfo and its ByteBuffers. This is specially bad in endpoints with > large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K > IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for > GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11169) [sasi] exception thrown when trying to index row with index on set
[ https://issues.apache.org/jira/browse/CASSANDRA-11169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151289#comment-15151289 ] Jack Krupansky commented on CASSANDRA-11169: To be clear, "Fixed" means that a plain English error is given for the CQL statement rather than a nasty-looking exception. Is it still the intent to eventually/soon implement indexing of the column values of collection columns? Is there a Jira for that? Is it like a 3.x improvement or more like 4.x? > [sasi] exception thrown when trying to index row with index on set > > > Key: CASSANDRA-11169 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11169 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jon Haddad >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I have a brand new cluster, built off 1944bf507d66b5c103c136319caeb4a9e3767a69 > I created a new table with a set, then a SASI index on the set. I > tried to insert a row with a set, Cassandra throws an exception and becomes > unavailable. > {code} > cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > cqlsh> use test; > cqlsh:test> create table a (id int PRIMARY KEY , s set ); > cqlsh:test> create CUSTOM INDEX on a(s) USING > 'org.apache.cassandra.index.sasi.SASIIndex'; > cqlsh:test> insert into a (id, s) values (1, {'jon', 'haddad'}); > WriteTimeout: code=1100 [Coordinator node timed out waiting for replica > nodes' responses] message="Operation timed out - received only 0 responses." > info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} > {code} > Cassandra stacktrace: > {code} > java.lang.AssertionError: null > at org.apache.cassandra.db.rows.BTreeRow.getCell(BTreeRow.java:212) > ~[main/:na] > at > org.apache.cassandra.index.sasi.conf.ColumnIndex.getValueOf(ColumnIndex.java:194) > ~[main/:na] > at > org.apache.cassandra.index.sasi.conf.ColumnIndex.index(ColumnIndex.java:95) > ~[main/:na] > at > org.apache.cassandra.index.sasi.SASIIndex$1.insertRow(SASIIndex.java:247) > ~[main/:na] > at > org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onInserted(SecondaryIndexManager.java:808) > ~[main/:na] > at > org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:335) > ~[main/:na] > at > org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:295) > ~[main/:na] > at org.apache.cassandra.utils.btree.BTree.buildInternal(BTree.java:136) > ~[main/:na] > at org.apache.cassandra.utils.btree.BTree.build(BTree.java:118) > ~[main/:na] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:177) > ~[main/:na] > at > org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:156) > ~[main/:na] > at org.apache.cassandra.db.Memtable.put(Memtable.java:244) ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1216) > ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:531) ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na] > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) > ~[main/:na] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:228) ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$$Lambda$201/413275033.run(Unknown > Source) ~[na:na] > at > org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1343) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2520) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_45] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > ~[main/:na] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) > [main/:na] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [main/:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11145) Materialized View throws error if Map type is in base table
[ https://issues.apache.org/jira/browse/CASSANDRA-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140331#comment-15140331 ] Jack Krupansky commented on CASSANDRA-11145: Sounds like a dup of CASSANDRA-11069. The good news is that there is a workaround: "all collection columns must be selected in a materialised view" - make sure to explicitly list each collection column from the base table in the MV SELECT. You can still use "*" to get all columns from the base, but also need to add the collection column names. Kind of surprised that this bug didn't have a priority for 3.3. > Materialized View throws error if Map type is in base table > --- > > Key: CASSANDRA-11145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11145 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Patrick McFadin >Priority: Critical > > Using the following test setup: > {code}CREATE TABLE test ( > a int PRIMARY KEY, > b text, > c map > ); > CREATE MATERIALIZED VIEW test_mv AS > SELECT a, b > FROM test > WHERE a IS NOT NULL AND b IS NOT NULL > PRIMARY KEY(b, a); > {code} > When inserting data to the base table: > {code} > INSERT INTO test (a,b,c) > VALUES(1, 'b', {'c':'c'}); > {code} > The insert will fail and a stack trace is generated in the logs: > {code} > ERROR [SharedPool-Worker-2] 2016-02-10 05:25:05,957 StorageProxy.java:1339 - > Failed to apply mutation locally : {} > java.lang.IllegalStateException: [ColumnDefinition{name=c, > type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type), > kind=REGULAR, position=-1}] is not a subset of [] > at > org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:532) > ~[main/:na] > at > org.apache.cassandra.db.Columns$Serializer.serializedSubsetSize(Columns.java:484) > ~[main/:na] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serializedRowBodySize(UnfilteredSerializer.java:277) > ~[main/:na] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:249) > ~[main/:na] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:236) > ~[main/:na] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serializedSize(UnfilteredSerializer.java:229) > ~[main/:na] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serializedSize(UnfilteredRowIteratorSerializer.java:171) > ~[main/:na] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:716) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:372) > ~[main/:na] > at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) > ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:498) ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na] > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) > ~[main/:na] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:748) > ~[main/:na] > at > org.apache.cassandra.db.view.ViewManager.pushViewReplicaUpdates(ViewManager.java:149) > ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:516) ~[main/:na] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:399) ~[main/:na] > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:202) > ~[main/:na] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[main/:na] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:228) ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$$Lambda$197/1675816556.run(Unknown > Source) ~[na:na] > at > org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1333) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2510) > [main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [main/:na] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) > [main/:na] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [main/:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] >
[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
[ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139712#comment-15139712 ] Jack Krupansky commented on CASSANDRA-9754: --- bq. large CQL partitions (4GB,75GB,etc) What is the intended target/sweet spot for large partitions... 1GB, 2GB, 4GB, 8GB, 10GB, 15GB, 16GB, or... what? Will random access to larger partitions create any significant heap/off-heap memory demand, or will heap/memory simply become the total rows accessed regardless of how they might be bucketed into partitions? Will we be able to tell people that bucketing of partitions is now never needed, or will there now just be a larger bucket size, like 4GB/partition rather than the 10MB or 50MB or 100MB that some of us recommend today? > Make index info heap friendly for large CQL partitions > -- > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Michael Kjellman >Priority: Minor > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects > are IndexInfo and its ByteBuffers. This is specially bad in endpoints with > large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K > IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for > GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132534#comment-15132534 ] Jack Krupansky commented on CASSANDRA-11067: Clarification question about SASI itself (as oppose to Cassandra syntax/semantics): If the column is tokenized, is the original raw literal text for each column also still available for indexing or are only the tokenized/analyzed terms indexed? > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132531#comment-15132531 ] Jack Krupansky commented on CASSANDRA-11067: Clarification question: Will SASI apply the analyzer to the LIKE string? Then... what will happen if that analysis produces more than one term? In Solr land that is expected and the semantics is phrase query. What will SASI do? Will it be an error or be treated as a list of AND terms? > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132517#comment-15132517 ] Jack Krupansky commented on CASSANDRA-11067: For reference, over in Solr land users constantly struggle with how to combine exact and partial matching - sometimes they want an absolute literal match for the full field/column, sometimes a wildcard on that full field, sometimes keyword tokenization, sometimes wildcard on tokenized terms, sometimes phrases of tokenized terms, and sometimes phrases from the full literal string. Unfortunately, Solr doesn't have a direct answer for that, so people are forced to copy the field (typically a ) directive and then one field is the literal string and the other is the tokenized field. That gives them complete control at query time, so q=name_literal:Joe would only match when the full name is Joe while q=name_tokenized:joe would match for any name with joe. Similarly, q=name_lit:Jo* would only match names with Jo as a prefix, while q=name_tok:jo* would match Joe Smith as well as Bill Johnson. The user might also opt to copy to yet a third field which is tokenized but with the so-called keyword tokenizer which permits the string to be normalized but not broken into tokens. The common case is to lower case, but other common cases would be to eliminate punctuation, replace certain prefixes and suffixes, or whatever. The real point there is that "exact" match is still a range of possibilities. One of the issues here for Cassandra is whether you really want to combine these two separate exactness semantics that Solr keeps separate. > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10368) Support Restricting non-PK Cols in Materialized View Select Statements
[ https://issues.apache.org/jira/browse/CASSANDRA-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122565#comment-15122565 ] Jack Krupansky commented on CASSANDRA-10368: I just stumbled on this issue as kind of a loose end. Is there any intent to support this feature any time soon, assuming that the implementation is not a big deal? > Support Restricting non-PK Cols in Materialized View Select Statements > -- > > Key: CASSANDRA-10368 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10368 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Tyler Hobbs >Priority: Minor > Fix For: 3.x > > > CASSANDRA-9664 allows materialized views to restrict primary key columns in > the select statement. Due to CASSANDRA-10261, the patch did not include > support for restricting non-PK columns. Now that the timestamp issue has > been resolved, we can add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11075) Consider making SASI the default index implementation
[ https://issues.apache.org/jira/browse/CASSANDRA-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121779#comment-15121779 ] Jack Krupansky commented on CASSANDRA-11075: bq. A good start would probably be run all our dtest and utests on a version where SASI is hard-coded as default. Maybe it would make sense to introduce a config setting to select the default indexing. I presume that it would mean the default mode would be SPARSE, which may make sense for the traditional use cases of Cassandra secondary indexes - cardinality is not too high and not too low. Syntax-size, OPTIONS can only be specified when USING is specified. That would only be an issue if there weren't keywords for all the SASI options. I vaguely recall [~jbellis] objecting some time ago in some completely unrelated context about cluttering up the CQL syntax with lots of keywords for options, so it might make sense to loosen up the CREATE INDEX syntax to allow WITH OPTIONS even when a class is not specified. The mode might make sense as a keyword, but then we get to the analyzer class and case sensitivity and the keyword clutter would start getting out of hand. > Consider making SASI the default index implementation > - > > Key: CASSANDRA-11075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11075 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Pavel Yaskevich > > We now have 2 secondary index implementation in tree: the old native ones and > SASI. Moving forward, that feels like one too much to maintain, especially > since it seems that SASI is an overall better implementation. > So we should gather enough data to decide if SASI is indeed always better (or > at least sufficiently better than we're convinced no-one would want to stick > with the native implementation), and if that's the case, we should consider > making it the default (and ultimately get rid of the current implementation). > So first, we should at least: > # double check that SASI handles all cases that the native implementation > handles. A good start would probably be run all our dtest and utests on a > version where SASI is hard-coded as default. > # compare the performance of SASI and native indexes. In particular our > native indexes, in all their weaknesses, have the advantage of not doing a > read-before-write. Haven't looked at SASI much so I don't know if it's the > case but anyway, we need numbers on both reads and writes. > Once we have that, if we do decide to make SASI the default, then we need to > figure out what is the upgrade path (and whether we add extra syntax for SASI > specific options). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119654#comment-15119654 ] Jack Krupansky commented on CASSANDRA-11067: Thanks, [~slebresne], for opening that separate issue. My apologies for taking advantage of the vague and general wording of the title/summary of this particular Jira. I had considered making my suggestions on the original ticket, but didn't when I saw that it was already "closed" and this one is suggestively labeled "Improve SASI syntax" (rather than "Restore = semantics for SASI".) Again, sorry for the distraction from getting SASI done for 3.4 ASAP. > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117501#comment-15117501 ] Jack Krupansky commented on CASSANDRA-11067: Awesome! Watch out, SQL! One more nit... The fact that a SASI index needs to be "CUSTOM" and an explicit class name is needed feels a little hokey to me. Is there a longer-term plan to fully integrate SASI so it is a first-class feature rather than simply an add-on? In fact, is there any reason not to make it the default secondary indexing (other than the fact that is new and experimental and unproven in the real world yet)? Having the mode be a keyword rather than all this extra lexical distraction would feel better to me. But if this is billed as experimental in 3.4, maybe there is no real harm in deferring first-class status until a future feature release. Still, it would be nice to be able to say CREATE PREFIX INDEX or CREATE SUFFIX INDEX or CREATE SPARSE INDEX. > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10699) Make schema alterations strongly consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116229#comment-15116229 ] Jack Krupansky commented on CASSANDRA-10699: Will resolution of this ticket enable concurrent clients to successfully perform CREATE TABLE IF NOT EXISTS? Or will that still be problematic? I just want to know if this is the ticket to point people to for concurrent CREATE TABLE IF NOT EXISTS issues. In the mean time, should we update the doc to effectively say that concurrent CREATE TABLE IF NOT EXISTS is not supported and that it is the responsibility of the user to absolutely refrain from attempting any potentially concurrent attempts to CREATE TABLE IF NOT EXISTS for a given table? A related doc issue is how the user can tell that the CREATE TABLE has successfully completed around the ring. IOW, if cqlsh returns success, is the table really created on all nodes? Is a nodetool tablestats a reliable check - if all nodes are listed then the CREATE TABLE has succeeded/completed on all nodes? > Make schema alterations strongly consistent > --- > > Key: CASSANDRA-10699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10699 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko > Fix For: 3.x > > > Schema changes do not necessarily commute. This has been the case before > CASSANDRA-5202, but now is particularly problematic. > We should employ a strongly consistent protocol instead of relying on > marshalling {{Mutation}} objects with schema changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115501#comment-15115501 ] Jack Krupansky commented on CASSANDRA-11067: How about a severely restricted LIKE that only permits patterns ending with a "%" for prefix query (LIKE 'J%') or with a "%" at either end for contains (LIKE '%abc%') or ending with a % for suffix query (LIKE '%smith')? Then it would be fully compatible with SQL. In any case, "=" would then attempt an exact match using the SASI index? That would allow both exact and inexact matching for each column using a single index. If we can't have this restricted LIKE, descriptive keyword operators like SUFFIX and PREFIX would seem desirable. Could the existing CONTAINS operator also be used? They would also handle the case where the prefix/suffix/contains string is a parameter - otherwise the user has to do a messy concat. > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Sam Tunnicliffe > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114614#comment-15114614 ] Jack Krupansky commented on CASSANDRA-10937: bq. Number of keys (estimate): 10142095 That indicates that you have over 99% of your data on a single node, which is a slam-dunk antipattern. Check the numbers to make sure what you posted are valid, and if so, you'll need to redesign your partition key to distribute the data to more partition keys so that they get assigned to other nodes. And if your client is sending INSERT requests to the various nodes of your cluster, five of them will have to forward those requests to that one node. You need to get this resolved before attempting anything else. Was this with RF=1? Presumably since those INSERTS are not being replicated to another node, or else the key count would have been roughly comparable on that other node. > OOM on multiple nodes on write load (v. 3.0.0), problem also present on > DSE-4.8.3, but there it survives more time > -- > > Key: CASSANDRA-10937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10937 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra : 3.0.0 > Installed as open archive, no connection to any OS specific installer. > Java: > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > OS : > Linux version 2.6.32-431.el6.x86_64 > (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013 > We have: > 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each > physical host keeps 4 guests. > Physical host parameters(shared by all 4 guests): > Model: HP ProLiant DL380 Gen9 > Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz > 46 logical processors. > Hyperthreading - enabled > Each guest assigned to have: > 1 disk 300 Gb for seq. log (NOT SSD) > 1 disk 4T for data (NOT SSD) > 11 CPU cores > Disks are local, not shared. > Memory on each host - 24 Gb total. > 8 (or 6, tested both) Gb - cassandra heap > (lshw and cpuinfo attached in file test2.rar) >Reporter: Peter Kovgan >Priority: Critical > Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, > more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, > test5.rar, test_2.1.rar, test_2.1_logs_older.rar, > test_2.1_restart_attempt_log.rar > > > 8 cassandra nodes. > Load test started with 4 clients(different and not equal machines), each > running 1000 threads. > Each thread assigned in round-robin way to run one of 4 different inserts. > Consistency->ONE. > I attach the full CQL schema of tables and the query of insert. > Replication factor - 2: > create keyspace OBLREPOSITORY_NY with replication = > {'class':'NetworkTopologyStrategy','NY':2}; > Initiall throughput is: > 215.000 inserts /sec > or > 54Mb/sec, considering single insert size a bit larger than 256byte. > Data: > all fields(5-6) are short strings, except one is BLOB of 256 bytes. > After about a 2-3 hours of work, I was forced to increase timeout from 2000 > to 5000ms, for some requests failed for short timeout. > Later on(after aprox. 12 hous of work) OOM happens on multiple nodes. > (all failed nodes logs attached) > I attach also java load client and instructions how set-up and use > it.(test2.rar) > Update: > Later on test repeated with lesser load (10 mes/sec) with more relaxed > CPU (idle 25%), with only 2 test clients, but anyway test failed. > Update: > DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, > not 10-12. > Attachments: > test2.rar -contains most of material > more-logs.rar - contains additional nodes logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 6:57 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... (I may be confused here - can you specify an index name in place of a column name in a relation in a SELECT/WHERE clause (SELECT... WHERE... first_name_exact = 'Joe')? I don't see any doc/spec that indicates that you can. I'm not sure why I thought that you could. But I don't see any code that detects and fails on this case at CREATE INDEX time. The code checks for "everything but name" rather than detecting two non-keys/values indexes on the same column.) It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition key of the MV I could assure that all base table rows with the same first name would be on the same node. If all of that is true, we will need to give users some decent guidance on when to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.) was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 5:58 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition key of the MV I could assure that all base table rows with the same first name would be on the same node. If all of that is true, we will need to give users some decent guidance on when to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.) was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbr
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 4:55 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON table CREATE CUSTOM INDEX first_name_prefix ... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113836#comment-15113836 ] Jack Krupansky commented on CASSANDRA-10661: Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON table CREATE CUSTOM INDEX first_name_prefix ... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113816#comment-15113816 ] Jack Krupansky commented on CASSANDRA-10661: So is this stuff actually ready to release? I mean, consistent with the new philosophy that "trunk is always releasable"? IOW, if it does get committed, it will be in 3.4 no matter what? I only ask because it just seemed that there was stuff in flux fairly recently (a couple days ago), suggested it wasn't quite baked enough to be considered "releasable". > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9472) Reintroduce off heap memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112647#comment-15112647 ] Jack Krupansky commented on CASSANDRA-9472: --- A couple quick questions: 1. Does this Jira move memtables "entirely" offheap, or just "partially"? (Back in July the discussion was that fully offheap was too large an effort.) 2. Is there still an "arena" allocation onheap? 3. What ballpark fraction of a typical Cassandra heap is consumed by memtables - 80%, more, less? 4. Does moving memtables offheap get Cassandra to the point where a default JVM heap allocation is sufficient? If not, please be sure to offer new recommended best practice guidance as to how to estimate heap requirements when memtables are offheap. 5. What heuristic rule/threshold is used to determine how much of system memory can be consumed by offheap memtables? Is that limit user-controllable by a (documented) configuration setting? 6. Are offheap memtables an optional configuration setting, or hardwired? 7. Is this coming soon, like 3.4, or is it still a ways off? > Reintroduce off heap memtables > -- > > Key: CASSANDRA-9472 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9472 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Benedict > Fix For: 3.x > > > CASSANDRA-8099 removes off heap memtables. We should reintroduce them ASAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110798#comment-15110798 ] Jack Krupansky commented on CASSANDRA-10937: A few more questions: 1. When nodes do crash, what happens when you restart them? Do they immediately crash again immediately or run for many hours? 2. Is it just a single node crashing or do like all the nodes fail around the same time, like falling dominoes? Just to be clear, the fact that the cluster seemed fine for 48 hours does not tell us whether it might have been near the edge of failing for quite some time and maybe the precise pattern of load just statistically became the straw that broke the camel's back at that moment. That's why it's important to know what happened after you restarted and resumed the test after the crash as 48 hours. It it really was a resource leak, then reducing the heap would make the failure occur sooner. Determine what the minimal heap size is to run the test at all - set it low enough so the test won't run even for a minute, then increase the heap so it does run, then decrease it by less than you increased it - a binary search for the exact heap size that is needed for the test to run even for a few minutes or an hour. At least then you would have an easy to reproduce test case. So if you can tune the heap so that the test can run successfully for say 10 minutes before reliably hitting the OOM, then you can see how much you need to reduce the load (throttling the app) to be able to run without hitting OOM. I'm not saying that there is absolutely no chance that there is a resource leak, just simply that there are still a lot of open questions to answer about usage before we can leap to that conclusion. Ultimately, we do have to have a reliable repo test case before anything can be done. In any case, at least at this stage it seems clear that you probably do need a much larger cluster (more nodes with less load on each node.) Yes, it's unfortunate the Cassandra won't give you a nice clean message that says that, but that ultimate requirement remains unchanged - pending answers to all of the open questions. > OOM on multiple nodes on write load (v. 3.0.0), problem also present on > DSE-4.8.3, but there it survives more time > -- > > Key: CASSANDRA-10937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10937 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra : 3.0.0 > Installed as open archive, no connection to any OS specific installer. > Java: > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > OS : > Linux version 2.6.32-431.el6.x86_64 > (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013 > We have: > 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each > physical host keeps 4 guests. > Physical host parameters(shared by all 4 guests): > Model: HP ProLiant DL380 Gen9 > Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz > 46 logical processors. > Hyperthreading - enabled > Each guest assigned to have: > 1 disk 300 Gb for seq. log (NOT SSD) > 1 disk 4T for data (NOT SSD) > 11 CPU cores > Disks are local, not shared. > Memory on each host - 24 Gb total. > 8 (or 6, tested both) Gb - cassandra heap > (lshw and cpuinfo attached in file test2.rar) >Reporter: Peter Kovgan >Priority: Critical > Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, > more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, > test5.rar, test_2.1.rar, test_2.1_logs_older.rar, > test_2.1_restart_attempt_log.rar > > > 8 cassandra nodes. > Load test started with 4 clients(different and not equal machines), each > running 1000 threads. > Each thread assigned in round-robin way to run one of 4 different inserts. > Consistency->ONE. > I attach the full CQL schema of tables and the query of insert. > Replication factor - 2: > create keyspace OBLREPOSITORY_NY with replication = > {'class':'NetworkTopologyStrategy','NY':2}; > Initiall throughput is: > 215.000 inserts /sec > or > 54Mb/sec, considering single insert size a bit larger than 256byte. > Data: > all fields(5-6) are short strings, except one is BLOB of 256 bytes. > After about a 2-3 hours of work, I was forced to increase timeout from 2000 > to 5000ms, for some requests failed for short timeout. > Later on(after aprox. 12 hous of work) OOM happens on multiple nodes. > (all failed nodes logs attached) > I attach also java load client and instructions how set-up and use > it.(test2.rar) > Update: > Later on test repeated with lesser load (10 mes/sec) with more relaxed > CPU (idle 25%), with only 2 test c
[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110761#comment-15110761 ] Jack Krupansky commented on CASSANDRA-10937: Sorry, [~tierhetze], but as a matter of policy I don't download or read doc/docx files. Please post the essential text here. > OOM on multiple nodes on write load (v. 3.0.0), problem also present on > DSE-4.8.3, but there it survives more time > -- > > Key: CASSANDRA-10937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10937 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra : 3.0.0 > Installed as open archive, no connection to any OS specific installer. > Java: > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > OS : > Linux version 2.6.32-431.el6.x86_64 > (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013 > We have: > 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each > physical host keeps 4 guests. > Physical host parameters(shared by all 4 guests): > Model: HP ProLiant DL380 Gen9 > Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz > 46 logical processors. > Hyperthreading - enabled > Each guest assigned to have: > 1 disk 300 Gb for seq. log (NOT SSD) > 1 disk 4T for data (NOT SSD) > 11 CPU cores > Disks are local, not shared. > Memory on each host - 24 Gb total. > 8 (or 6, tested both) Gb - cassandra heap > (lshw and cpuinfo attached in file test2.rar) >Reporter: Peter Kovgan >Priority: Critical > Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, > more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, > test5.rar, test_2.1.rar, test_2.1_logs_older.rar, > test_2.1_restart_attempt_log.rar > > > 8 cassandra nodes. > Load test started with 4 clients(different and not equal machines), each > running 1000 threads. > Each thread assigned in round-robin way to run one of 4 different inserts. > Consistency->ONE. > I attach the full CQL schema of tables and the query of insert. > Replication factor - 2: > create keyspace OBLREPOSITORY_NY with replication = > {'class':'NetworkTopologyStrategy','NY':2}; > Initiall throughput is: > 215.000 inserts /sec > or > 54Mb/sec, considering single insert size a bit larger than 256byte. > Data: > all fields(5-6) are short strings, except one is BLOB of 256 bytes. > After about a 2-3 hours of work, I was forced to increase timeout from 2000 > to 5000ms, for some requests failed for short timeout. > Later on(after aprox. 12 hous of work) OOM happens on multiple nodes. > (all failed nodes logs attached) > I attach also java load client and instructions how set-up and use > it.(test2.rar) > Update: > Later on test repeated with lesser load (10 mes/sec) with more relaxed > CPU (idle 25%), with only 2 test clients, but anyway test failed. > Update: > DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, > not 10-12. > Attachments: > test2.rar -contains most of material > more-logs.rar - contains additional nodes logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105495#comment-15105495 ] Jack Krupansky edited comment on CASSANDRA-10937 at 1/18/16 5:16 PM: - I still don't see any reason to believe that there is a bug here and that the primary issue is that you are overloading the cluster. Sure, Cassandra should do a better job of shedding/failing excessive incoming requests, and there is an open Jira ticket to add just such a freature, but even with that new feature, the net effect will be the same - it will still be up to the application and operations to properly size the cluster and throttle application load before it gets to Cassandra. OOM is not typically an indication of a software bug. Sure, sometimes code has memory leaks, but with a highly dynamic system such as Cassandra, it typically means either a misconfigured JVM or just very heavy load. Sometimes OOM simply means that there is a lot of background processing going on (like compactions or hinted handoff) that is having trouble keeping up with incoming requests. Sometimes OOM occurs because you have too large a heap which defers GC but then GC takes too long and further incoming requests simply generate more pressure on the heap faster than that massive GC can deal with it. It is indeed tricky to make sure the JVM has enough heap but not too much. DSE typically runs with a larger heap by default. You can try increasing your heap to 10 or 12G. But if you make the heap too big, the big GC can bite you as described above. In that case, the heap needs to be reduced. Typically you don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it typically means the load on that node is simply too heavy. Be sure to review the recommendations in this blog post on reasonable recommendations: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra A few questions that will help us better understand what you are really trying to do: 1. How much reading are you doing and when relative to writes? 2. Are you doing any updates or deletes? (Cause compaction, which can fall behind your write/update load.) 3. How much data is on the cluster (rows)? 4. How many tables? 5. What RF? RF=3 would be the recommendation, but if you have a heavy read load you may need RF=5, although heavy load usually means you just need a lot more nodes so that the fraction of incoming requests going to a particular node are dramatically reduced. RF>3 is only needed if there is high load for each particular row or partition. 6. Have you tested using cassandra-stress? That's the gold standard around here. 7. Are your clients using token-aware routing? (Otherwise a write must be bounced from the coordinating node to the node owning the token for the partition key.) 8. Are you using batches for your writes? If, so, do all the writes in one batch have the same partition key? (If not, adds more network hops.) 9. What expectations did you have as to how many writes/reads a given number of nodes should be able to handle? was (Author: jkrupan): I still don't see any reason to believe that there is a bug here and that the primary issue is that you are overloading the cluster. Sure, Cassandra should do a better job of shedding/failing excessive incoming requests, and there is an open Jira ticket to add just such a freature, but even with that new feature, the net effect will be the same - it will still be up to the application and operations to properly size the cluster and throttle application load before it gets to Cassandra. OOM is not typically an indication of a software bug. Sure, sometimes code has memory leaks, but with a highly dynamic system such as Cassandra, it typically means either a misconfigured JVM or just very heavy load. Sometimes OOM simply means that there is a lot of background processing going on (like compactions or hinted handoff) that is having trouble keeping up with incoming requests. Sometimes OOM occurs because you have too large a heap which defers GC but then GC takes too long and further incoming requests simply generate more pressure on the heap faster than that massive GC can deal with it. It is indeed tricky to make sure the JVM has enough heap but not too much. DSE typically runs with a larger heap by default. You can try increasing your heap to 10 or 12G. But if you make the heap too big, the big GC can bite you as described above. In that case, the heap needs to be reduced. Typically you don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it typically means the load on that node is simply too heavy. Be sure to review the recommendations in this blog post on reasonable recommendations: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra A few questions that will help us better understand what you ar
[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105495#comment-15105495 ] Jack Krupansky commented on CASSANDRA-10937: I still don't see any reason to believe that there is a bug here and that the primary issue is that you are overloading the cluster. Sure, Cassandra should do a better job of shedding/failing excessive incoming requests, and there is an open Jira ticket to add just such a freature, but even with that new feature, the net effect will be the same - it will still be up to the application and operations to properly size the cluster and throttle application load before it gets to Cassandra. OOM is not typically an indication of a software bug. Sure, sometimes code has memory leaks, but with a highly dynamic system such as Cassandra, it typically means either a misconfigured JVM or just very heavy load. Sometimes OOM simply means that there is a lot of background processing going on (like compactions or hinted handoff) that is having trouble keeping up with incoming requests. Sometimes OOM occurs because you have too large a heap which defers GC but then GC takes too long and further incoming requests simply generate more pressure on the heap faster than that massive GC can deal with it. It is indeed tricky to make sure the JVM has enough heap but not too much. DSE typically runs with a larger heap by default. You can try increasing your heap to 10 or 12G. But if you make the heap too big, the big GC can bite you as described above. In that case, the heap needs to be reduced. Typically you don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it typically means the load on that node is simply too heavy. Be sure to review the recommendations in this blog post on reasonable recommendations: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra A few questions that will help us better understand what you are really trying to do: 1. How much reading are you doing and when relative to writes? 2. Are you doing any updates or deletes? (Cause compaction, which can fall behind your write/update load.) 3. How much data is on the cluster (rows)? 4. How many tables? 5. What RF? RF=3 would be the recommendation, but if you have a heavy read load you may need RF=5. 6. Have you tested using cassandra-stress? That's the gold standard around here. 7. Are your clients using token-aware routing? (Otherwise a write must be bounced from the coordinating node to the node owning the token for the partition key.) 8. Are you using batches for your writes? If, so, do all the writes in one batch have the same partition key? (If not, adds more network hops.) > OOM on multiple nodes on write load (v. 3.0.0), problem also present on > DSE-4.8.3, but there it survives more time > -- > > Key: CASSANDRA-10937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10937 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra : 3.0.0 > Installed as open archive, no connection to any OS specific installer. > Java: > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > OS : > Linux version 2.6.32-431.el6.x86_64 > (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013 > We have: > 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each > physical host keeps 4 guests. > Physical host parameters(shared by all 4 guests): > Model: HP ProLiant DL380 Gen9 > Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz > 46 logical processors. > Hyperthreading - enabled > Each guest assigned to have: > 1 disk 300 Gb for seq. log (NOT SSD) > 1 disk 4T for data (NOT SSD) > 11 CPU cores > Disks are local, not shared. > Memory on each host - 24 Gb total. > 8 (or 6, tested both) Gb - cassandra heap > (lshw and cpuinfo attached in file test2.rar) >Reporter: Peter Kovgan >Priority: Critical > Attachments: gc-stat.txt, more-logs.rar, some-heap-stats.rar, > test2.rar, test3.rar, test4.rar, test5.rar, test_2.1.rar, > test_2.1_logs_older.rar, test_2.1_restart_attempt_log.rar > > > 8 cassandra nodes. > Load test started with 4 clients(different and not equal machines), each > running 1000 threads. > Each thread assigned in round-robin way to run one of 4 different inserts. > Consistency->ONE. > I attach the full CQL schema of tables and the query of insert. > Replication factor - 2: > create keyspace OBLREPOSITORY_NY with replication = > {'class':'NetworkTopologyStrategy','NY':2}; > Initiall throughput is: > 215.000 inserts /sec > or > 54Mb/sec, considering single insert size a bit larger than 256byte. > Data: > all fields(5-6) are short strings, e
[jira] [Commented] (CASSANDRA-10922) Inconsistent query results
[ https://issues.apache.org/jira/browse/CASSANDRA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102056#comment-15102056 ] Jack Krupansky commented on CASSANDRA-10922: Normally this kind of investigation should be pursued on the user list before assuming that there is an actual bug, but now that we are here... 1. Describe how you got to this situation - did you upgrade the old cluster or copy and import sstables, or... exactly what? 2. Provide the schema. Anxious to know the type of that column and exactly why you are using hex. 3. Create a dummy table with exactly the same schema and compose INSERT statements that insert the data you are querying. Does the query work fine on that dummy table? Post that schema, INSERTS, SELECTs, and output here. 4. Create that same dummy table in a single-node test cluster running C* 2.2.3, execute those dummy INSERTs, see that the query works the way it used to, "upgrade" that test database to C* 3.0.2 the same way you did your main cluster and see if the query fails in the way you have reported. If it doesn't... then nobody here will have much to go on. IN any case, be sure to post the exact steps you used. That's a lot of work to do, but start by posting the schema and the output of the query that shows both rows. > Inconsistent query results > -- > > Key: CASSANDRA-10922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10922 > Project: Cassandra > Issue Type: Bug >Reporter: Maxim Podkolzine >Priority: Critical > > I have a DB created with Cassandra 2.2.3. And currently I'm running it by > Cassandra 3.0.2. > The value of a particular cell is returned depending on the query I run (in > cqlsh): > - returned when iterate all columns, i.e. > SELECT value FROM "3xupsource".Content WHERE databaseid=0x2112 LIMIT 2 > (I can see the columns 0x and 0x0100 there, the values seem > correct) > - not returned when I specify a particular column > SELECT value FROM "3xupsource".Content WHERE databaseid=0x2112 AND > columnid=0x0100 > Other queries like SELECT value FROM "3xupsource".Content WHERE > databaseid=0x2112 AND columnid=0x work consistently. > There is nothing in Cassandra error log, so it does not look like a > corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava
[ https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096425#comment-15096425 ] Jack Krupansky commented on CASSANDRA-10528: Pardon my interruption of the relevant discussion flow here, but can somebody point me to a deeper discussion of exactly how TPC applies to more complex requests? I mean, I can fully grasp TPC for the common use case where the client is doing a point query (exact row PK specified) with a token-aware driver and the row is fully in a memtable - a reasonably direct control flow path, but what about all the semi-common access patterns that are not so direct, namely anything that would require I/O or a network hop or is fairly CPU-intensive for a non-trivial amount of time? The simplest example being a non-token aware query that the coordinator node has to send to another node. Is this thread/core completely tied up while waiting for the remote response across the network? IOW, is the redesign for a 100% pure-TPC architecture or is it for a hybrid, with TPC only for some use cases and then SEDA (queuing) when the control flow path is no longer direct and fast? And what of requests that become I/O intensive, such as sstables when there has been heavy updating and compaction has fallen behind (maybe because it doesn't have enough threads)? And then there are scan-intensive operations that are just going to take a long time. Wouldn't be architecturally better to break them into chunks such that each check gets TPC treatment but then the overall aggregation gets queue/SEDA treatment, so that such resource-intensive operations don't interfere with higher-volume, lower-latency point queries that TPC does better with? And then there are scatter-gather type queries (especially DSE Search/Solr/Lucene) which have a much greater network latency factor. First, tying up a full thread/core while this requests are mostly sitting idle waiting for the network seems excessive. Second, a queued/SEDA model that supports chunking/partitioning the overall request down to more atomic (TPC) requests so that they can run in parallel on multiple threads/cores would seem highly desirable. In short, will SEDA be completely gone or just TPC added for the cases where it is most relevant? Thanks! In any case, it's great to see architectural progress on this front. > Proposal: Integrate RxJava > -- > > Key: CASSANDRA-10528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10528 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Fix For: 3.x > > Attachments: rxjava-stress.png > > > The purpose of this ticket is to discuss the merits of integrating the > [RxJava|https://github.com/ReactiveX/RxJava] framework into C*. Enabling us > to incrementally make the internals of C* async and move away from SEDA to a > more modern thread per core architecture. > Related tickets: >* CASSANDRA-8520 >* CASSANDRA-8457 >* CASSANDRA-5239 >* CASSANDRA-7040 >* CASSANDRA-5863 >* CASSANDRA-6696 >* CASSANDRA-7392 > My *primary* goals in raising this issue are to provide a way of: > * *Incrementally* making the backend async > * Avoiding code complexity/readability issues > * Avoiding NIH where possible > * Building on an extendable library > My *non*-goals in raising this issue are: > >* Rewrite the entire database in one big bang >* Write our own async api/framework > > - > I've attempted to integrate RxJava a while back and found it not ready mainly > due to our lack of lambda support. Now with Java 8 I've found it very > enjoyable and have not hit any performance issues. A gentle introduction to > RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as > well as their > [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading]. The > primary concept of RX is the > [Obervable|http://reactivex.io/documentation/observable.html] which is > essentially a stream of stuff you can subscribe to and act on, chain, etc. > This is quite similar to [Java 8 streams > api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html] > (or I should say streams api is similar to it). The difference is java 8 > streams can't be used for asynchronous events while RxJava can. > Another improvement since I last tried integrating RxJava is the completion > of CASSANDRA-8099 which provides is a very iterable/incremental approach to > our storage engine. *Iterators and Observables are well paired conceptually > so morphing our current Storage engine to be async is much simpler now.* > In an e
[jira] [Commented] (CASSANDRA-10985) OOM during bulk read(slice query) operation
[ https://issues.apache.org/jira/browse/CASSANDRA-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090139#comment-15090139 ] Jack Krupansky commented on CASSANDRA-10985: How big a slice are you trying to read? I'd recommend no more than 5K columns in a single request and issue multiple requests. Very large operations are an anti-pattern even if they do manage to sort of work. Was this working before for you and suddenly stopped working or was this the first time you tried a slice of this size? You're dealing with Thrift, so don't expect too much support. > OOM during bulk read(slice query) operation > --- > > Key: CASSANDRA-10985 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10985 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: OS : Linux 6.5 > RAM : 126GB > assign heap size: 8GB >Reporter: sumit thakur > > The thread java.lang.Thread @ 0x55000a4f0 Thrift:6 keeps local variables with > total size 16,214,953,728 (98.23%) bytes. > The memory is accumulated in one instance of "java.lang.Thread" loaded by > "". > The stacktrace of this Thread is available. See stacktrace. > Keywords > java.lang.Thread > -- > Trace: > Thrift:6 > at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:48) > at > org.apache.cassandra.utils.ByteBufferUtil.read(Ljava/io/DataInput;I)Ljava/nio/ByteBuffer; > (ByteBufferUtil.java:401) > at > org.apache.cassandra.utils.ByteBufferUtil.readWithVIntLength(Lorg/apache/cassandra/io/util/DataInputPlus;)Ljava/nio/ByteBuffer; > (ByteBufferUtil.java:339) > at > org.apache.cassandra.db.marshal.AbstractType.readValue(Lorg/apache/cassandra/io/util/DataInputPlus;)Ljava/nio/ByteBuffer; > (AbstractType.java:391) > at > org.apache.cassandra.db.rows.BufferCell$Serializer.deserialize(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/LivenessInfo;Lorg/apache/cassandra/config/ColumnDefinition;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;)Lorg/apache/cassandra/db/rows/Cell; > (BufferCell.java:298) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(Lorg/apache/cassandra/config/ColumnDefinition;Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;Lorg/apache/cassandra/db/rows/Row$Builder;Lorg/apache/cassandra/db/LivenessInfo;)V > (UnfilteredSerializer.java:453) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;IILorg/apache/cassandra/db/rows/Row$Builder;)Lorg/apache/cassandra/db/rows/Row; > (UnfilteredSerializer.java:431) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/db/rows/SerializationHelper;Lorg/apache/cassandra/db/rows/Row$Builder;)Lorg/apache/cassandra/db/rows/Unfiltered; > (UnfilteredSerializer.java:360) > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext()Lorg/apache/cassandra/db/rows/Unfiltered; > (UnfilteredRowIteratorSerializer.java:217) > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext()Ljava/lang/Object; > (UnfilteredRowIteratorSerializer.java:210) > at org.apache.cassandra.utils.AbstractIterator.hasNext()Z > (AbstractIterator.java:47) > at org.apache.cassandra.db.transform.BaseRows.hasNext()Z (BaseRows.java:108) > at > org.apache.cassandra.db.LegacyLayout$3.computeNext()Lorg/apache/cassandra/db/LegacyLayout$LegacyCell; > (LegacyLayout.java:658) > at org.apache.cassandra.db.LegacyLayout$3.computeNext()Ljava/lang/Object; > (LegacyLayout.java:640) > at org.apache.cassandra.utils.AbstractIterator.hasNext()Z > (AbstractIterator.java:47) > at > org.apache.cassandra.thrift.CassandraServer.thriftifyColumns(Lorg/apache/cassandra/config/CFMetaData;Ljava/util/Iterator;)Ljava/util/List; > (CassandraServer.java:112) > at > org.apache.cassandra.thrift.CassandraServer.thriftifyPartition(Lorg/apache/cassandra/db/rows/RowIterator;ZZI)Ljava/util/List; > (CassandraServer.java:250) > at > org.apache.cassandra.thrift.CassandraServer.getSlice(Ljava/util/List;ZILorg/apache/cassandra/db/ConsistencyLevel;Lorg/apache/cassandra/service/ClientState;)Ljava/util/Map; > (CassandraServer.java:270) > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(Ljava/lang/String;Ljava/util/List;Lorg/apache/cassandra/thrift/ColumnParent;ILorg/apache/cassandra/thrift/Sli
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646738#comment-14646738 ] Jack Krupansky commented on CASSANDRA-6477: --- The CQL.textile for MV still shows parentheses around the selection list, which is not the case in SELECT. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 alpha 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646736#comment-14646736 ] Jack Krupansky commented on CASSANDRA-9927: --- The CQL.textile for MV still shows parentheses being required around the selection list, which is not the case in SELECT. > Security for MaterializedViews > -- > > Key: CASSANDRA-9927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani > Fix For: 3.0 beta 1 > > > We need to think about how to handle security wrt materialized views. Since > they are based on a source table we should possibly inherit the same security > model as that table. > However I can see cases where users would want to create different security > auth for different views. esp once we have CASSANDRA-9664 and users can > filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CASSANDRA-9927: -- Comment: was deleted (was: The CQL.textile for MV still shows parentheses being required around the selection list, which is not the case in SELECT.) > Security for MaterializedViews > -- > > Key: CASSANDRA-9927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani > Fix For: 3.0 beta 1 > > > We need to think about how to handle security wrt materialized views. Since > they are based on a source table we should possibly inherit the same security > model as that table. > However I can see cases where users would want to create different security > auth for different views. esp once we have CASSANDRA-9664 and users can > filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632464#comment-14632464 ] Jack Krupansky commented on CASSANDRA-6477: --- Are there any significant advantages or disadvantages of using an MV as a pure global index - no data columns other than the primary key columns? Consider the use case of large customer data rows with customer id as the primary key, and you wish to log in by any of customer id, user id, email address, social security number, full name and age or birth date, and name alone, but you really want to simply immediately map any of those alternative logins to the customer id so that the main customer data tables can be accessed directly rather than having all of the data replicated in a bunch of MVs. So, each of the four MVs would not need any non-PK data columns per se, since the base table PK is (must be, right?) in the MV PK, I think. Does this make sense? Would there be any special efficiency (or inefficiency) to having essentially empty partitions? For example: {code} CREATE TABLE cust (id text, email text, ssn text, name text, address text, zip text, birth timestamp, data map, pwd text, PRIMARY KEY (id)); CREATE MATERIALIZED VIEW email AS SELECT id,email FROM cust PRIMARY KEY (email, id); CREATE MATERIALIZED VIEW ssn AS SELECT id,ssn FROM cust PRIMARY KEY (ssn, id); CREATE MATERIALIZED VIEW name AS SELECT id,name FROM cust PRIMARY KEY (name, id); CREATE MATERIALIZED VIEW email AS SELECT id,name,zip,birth FROM cust PRIMARY KEY ((name,zip,birth), id); {code} Incidentally, the lookup by name alone would not necessarily be unique - it might not be for an end-user login per se but for a customer service agent who would view the list and then ask the customer some questions to narrow down which specific customer they are. Does this specific use case represent what might be considered a best practice use of MVs? If not, why not or what improvements could be made? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631496#comment-14631496 ] Jack Krupansky commented on CASSANDRA-6477: --- bq. multiple MVs being updated It would be good to get a handle on what the scalability of MVs per base table is in terms of recommended best practice. Hundreds? Thousands? A few dozen? Maybe just a handful, like 5 or 10 or a dozen? I hate it when a feature like this gets implemented without scalability in mind and then some poor/idiot user comes along and tries a use case which is way out of line with the implemented architecture but we provide no guidance as to what the practical limits really are (e.g., number of tables - thousands vs. hundreds.) It seems to me that the primary use case is for query tables, where an app might typically have a handful of queries and probably not more than a small number of dozens in even extreme cases. In any case, it would be great to be clear about the design limit for number of MVs per base table - and to make sure some testing gets done to assure that the number is practical. And by design limit I don't mean a hard limit where more will cause an explicit error, but where performance is considered acceptable. Are the MV updates occurring in parallel with each other, or are they serial? How many MVs could a base table have before the MV updates effectively become serialized? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629847#comment-14629847 ] Jack Krupansky commented on CASSANDRA-6477: --- bq. force users to include _all_ columns from the original PK in the MV PK. I don't follow the rationale and that seems over-limiting. For example, if my base table was id, name, and address, with id as the PK, I couldn't have MV with just name or address as the PK key according to this requirement, right? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628189#comment-14628189 ] Jack Krupansky commented on CASSANDRA-6477: --- What CL will apply when MV rows are deleted on TTL expiration? Presumably each of the replicas of the base table will have its TTL expiration triggering roughly at the same time, each local change presumably triggering a delete of the MV, but the MV has replicas as well. Maybe ANY is reasonable for CL for MV update on TTL since the app is not performing an explicit operation with explicit expectations. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623370#comment-14623370 ] Jack Krupansky commented on CASSANDRA-6477: --- bq. we allow <= 1 non-pk column in a MV partition key however the error message we log on multiple attempts reads "Cannot include non-primary key column '%s' in materialized view partition key". We should log that <= 1 are allowed instead. Wow, is that really true? It sounds like a crippling restriction. Is that simply a short-term expediency for the initial elease or a hard-core long-term restriction? Just as an example if I had a table with name, address and id, with id as the primary key, I couldn't have an MV with just name or just address or just name and address as the partition key, right? In particular, this restriction seems to preclude pure inverted index MVs - where the non-key content of the row is used to index the key for the row. Still waiting to read an updated CQL spec - especially any such limitations. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618763#comment-14618763 ] Jack Krupansky edited comment on CASSANDRA-6477 at 7/8/15 3:22 PM: --- I don't see updated CQL.textile for CREATE MV on the branch. Coming soon? Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax: 1. Missing the IF NOT EXISTS clause. 2. Has parentheses around the list, but SELECT does not have that. 3. Unclear whether AS or functions are supported in the column name list, but selectStatement would certainly allow that. 4. Has FROM (), which should be FROM , I think. was (Author: jkrupan): I don't see updated CQL.textile for CREATE MV on the branch. Coming soon? Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax: 1. Missing the IF NOT EXISTS clause. 2. Has parentheses around the list, but SELECT does not have that. 3. Unclear whether AS on functions are supported in the column name list, but selectStatement would certainly allow that. 4. Has FROM (), which should be FROM , I think. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618763#comment-14618763 ] Jack Krupansky commented on CASSANDRA-6477: --- I don't see updated CQL.textile for CREATE MV on the branch. Coming soon? Also, the comment for CREATE MV in CQL.g does not quite match the actual syntax: 1. Missing the IF NOT EXISTS clause. 2. Has parentheses around the list, but SELECT does not have that. 3. Unclear whether AS on functions are supported in the column name list, but selectStatement would certainly allow that. 4. Has FROM (), which should be FROM , I think. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616866#comment-14616866 ] Jack Krupansky commented on CASSANDRA-6477: --- 1. Are MV updates still eventually consistent (not guaranteed)? 2. Is there any way for the app to assure that the MV update have been completed to some desired CL? 3. Will a repair to the base table assure that all MV are consistent? 4. Can a single MV be repaired to assure that it is consistent? (Especially since the data for a MV on a node will be derived from data on other nodes due to differences in the partition keys.) Great to see such an exciting new feature take shape! > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561752#comment-14561752 ] Jack Krupansky commented on CASSANDRA-6977: --- [~jasonstack], this issue was resolved as a duplicate of CASSANDRA-7444 which notes: {quote} The patch should change it from linear wrt the total number of tables in the schema, to linear wrt the number of tables in a keyspace. So if you are creating 1000s of tables in a single keyspace we expect no change at all. {quote} > attempting to create 10K column families fails with 100 node cluster > > > Key: CASSANDRA-6977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6977 > Project: Cassandra > Issue Type: Bug > Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances >Reporter: Daniel Meyer >Assignee: Rocco Varela >Priority: Minor > Fix For: 2.1.1 > > Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, > keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv > > > During this test we are attempting to create a total of 1K keyspaces with 10 > column families each to bring the total column families to 10K. With a 5 > node cluster this operation can be completed; however, it fails with 100 > nodes. Please see the two charts. For the 5 node case the time required to > create each keyspace and subsequent 10 column families increases linearly > until the number of keyspaces is 1K. For a 100 node cluster there is a > sudden increase in latency between 450 keyspaces and 550 keyspaces. The test > ends when the test script times out. After the test script times out it is > impossible to reconnect to the cluster with the datastax python driver > because it cannot connect to the host: > cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', > {'10.199.5.98': OperationTimedOut()} > It was found that running the following stress command does work from the > same machine the test script runs on. > cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT > It should be noted that this test was initially done with DSE 4.0 and c* > version 2.0.5.24 and in that case it was not possible to run stress against > the cluster even locally on a node due to not finding the host. > Attached are system logs from one of the nodes, charts showing schema > creation latency for 5 and 100 node clusters and virtualvm tracer data for > cpu, memory, num_threads and gc runs, tpstat output and the test script. > The test script was on an m1.large aws instance outside of the cluster under > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556055#comment-14556055 ] Jack Krupansky commented on CASSANDRA-6477: --- 1. Has a decision been made on refresh modes? It sounds like the focus is on "always consistent", as opposed to manual refresh or one-time without refresh or on some time interval, but is that simply the default, preferred refresh mode, or the only mode that will be available (initially)? 2. What happens if an MV is created for a base table that is already populated? Will the operation block while all existing data is propagated to the MV, or will that propagation happen in the background (in which case, is there a way to monitor its status and completion?), or is that not supported (initially)? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9435) Documetation bug
[ https://issues.apache.org/jira/browse/CASSANDRA-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552493#comment-14552493 ] Jack Krupansky commented on CASSANDRA-9435: --- All of the doc references have a capital C in EC2 in the snitch class names, which the source code for the classes have as Ec2, with an uncapitalized c. Damn case-sensitivity! For example, http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchesAbout_c.html http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architectureSnitchEC2_t.html The EC2 multi-region snitch has the same issue. > Documetation bug > > > Key: CASSANDRA-9435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9435 > Project: Cassandra > Issue Type: Bug > Components: Documentation & website > Environment: Debian 7 >Reporter: jincer >Priority: Minor > Fix For: 2.1.5 > > > Hello, you have some inaccuracy at docs on your website. > When i try to change snitch from default to EC2Snitch (endpoint_snitch: > EC2Snitch) i have message, that " Unable to find snitch class > 'org.apache.cassandra.locator.EC2Snitch' " . > Only when i change "endpoint_snitch: EC2Snitch" to "endpoint_snitch: > Ec2Snitch" it has been started. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541814#comment-14541814 ] Jack Krupansky commented on CASSANDRA-6477: --- When exactly would population of the MV occur? What refresh options would initially be supported? Would population/refresh begin instantly when the MV is created, by default, or would an explicit command be required to begin population? Earlier I linked to the Oracle doc on MV, so a comparison to Oracle for refresh options might be nice, especially for users migrating from Oracle. Where would the state of refresh be stored, and how can a user monitor it? On each node of the base table? PostgreSQL doesn't seem to have as many options: http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html With RF>1, which of the nodes containing a given token would push an update to the MV? All of them? Presumably the push can be token-aware, so that each push only goes to RF=n nodes based on the PK of the MV insert row. Would a consistency level be warranted for the push? Would there be hints as well? And repair of an MV if the rate of updates of the base table overwhelms the update bandwidth of the (many) MVs for the base table? Any thoughts on throttling of the flow of updates from other nodes so that population of a MV does not overwhelm or interfere with normal cluster operation? What default, and what override? What would be a reasonable default, and what would be best practice advice for a maximum? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540714#comment-14540714 ] Jack Krupansky commented on CASSANDRA-6477: --- Back to the original description, will the revised MV purpose address the high cardinality issue? That may depend on what guidance the spec offers for how data modelers should set up the primary key columns in terms of partition (or routing!) columns vs. clustering columns. Is the basic concept that although the selected rows might be scattered across multiple nodes in the base table, the goal is that they would cluster together on a single node for the MV table based on careful specification of partition key columns in the MV? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540699#comment-14540699 ] Jack Krupansky commented on CASSANDRA-6477: --- Is it fair to say that the primary technique for using this feature is to have one base table and n views of that table, each with a different selection of the base columns as the primary key of the view, with all rows selected, possibly projected differently but with different keys? Would it also be sensible to select a subset of rows? Although that might confuse some users who might think it would give them sophisticated ad hoc queries when in fact the query column values are fixed. For example, select all rows for a specific state. In this way, it doesn't offer what a global index would offer. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540664#comment-14540664 ] Jack Krupansky commented on CASSANDRA-6477: --- Still waiting for an updated description for the ticket. In particular, what specific use cases is this feature designed to handle well, and is it definitively expert-only, or will there be use cases that are safe for normal users. The key thing (ha ha!) is whether this feature will provide capabilities to make it much easier for people to migrate from SQL to Cassandra in terms of the denormalization process, and do it in a way that people can pick up easily in Data Modeling 101 training. A couple of examples would help a lot - like test cases. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Global indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504086#comment-14504086 ] Jack Krupansky commented on CASSANDRA-6477: --- It would be helpful if someone were to update the description and primary use case(s) for this feature. My understanding of the original use case was to avoid the fan out from the coordinator node on an indexed query - the global index would contain the partition keys for matched rows so that only the node(s) containing those partition key(s) would be needed. So, my question at this stage is whether the intention is that the initial cut of MV would include a focus on that performance optimization use case, or merely focus on the increased general flexibility of MV instead. Would the initial implementation of MV even necessarily use a GI? Would local vs. global index be an option to be specified? Also, whether it is GI or MV, what guidance will the spec, doc, and training give users as to its performance and scalability? My concern with GI was that it works well for small to medium-sized clusters, but not with very large clusters. So, what would the largest cluster that a user could use a GI for? And also how many GI's make sense. For example, with 1 billion rows per node, and 50 nodes, and a GI on 10 columns, that would be... 1B * 50 * 10 = 500 billion index entries on each node, right? Seems like a bit much for a JVM heap or even off-heap memory. Maybe 500M * 20 * 4 = 40 billion index entries per node would be a wiser upper limit, and even that may be a bit extreme. > Global indexes > -- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Global indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503936#comment-14503936 ] Jack Krupansky commented on CASSANDRA-6477: --- Oracle has lots of options for the REFRESH clause of the CREATE MATERIALIZED VIEW statement: http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm Notes on that syntax: http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm#i2064161 Full MV syntax: http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm You can request that a materialized view be automatically refreshed when the base tables are updated using the "REFRESH ON COMMIT" option. The update transaction pauses while the views are updated - "Specify ON COMMIT to indicate that a fast refresh is to occur whenever the database commits a transaction that operates on a master table of the materialized view. This clause may increase the time taken to complete the commit, because the database performs the refresh operation as part of the commit process." You can also refresh on time intervals, on demand, or no refresh ever. Originally MV was known as SNAPSHOT - a one-time snapshot of a view of the base tables/query. Oracle has a FAST refresh, which depends on a MATERIALIZED VIEW LOG, which must be created for the base table(s). Otherwise a COMPLETE refresh is required. > Global indexes > -- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Global indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503273#comment-14503273 ] Jack Krupansky commented on CASSANDRA-6477: --- Why not call the feature "high cardinality index" since that's the use case it is focused on, right? My personal preference would be to have a "cardinality" option clause with option values like "low", "medium", "high", and "unique". The default being "low". A global index would be implied for "high" and "unique" cardinality. > Global indexes > -- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8889) CQL spec is missing doc for support of bind variables for LIMIT, TTL, and TIMESTAMP
[ https://issues.apache.org/jira/browse/CASSANDRA-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345711#comment-14345711 ] Jack Krupansky commented on CASSANDRA-8889: --- Thanks. The change for the special variable names looks fine, but the grammar for LIMIT, TTL, and TIMESTAMP still says "" - it needs to be "( | )". > CQL spec is missing doc for support of bind variables for LIMIT, TTL, and > TIMESTAMP > --- > > Key: CASSANDRA-8889 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8889 > Project: Cassandra > Issue Type: Bug > Components: Documentation & website >Reporter: Jack Krupansky >Assignee: Tyler Hobbs >Priority: Minor > > CASSANDRA-4450 added the ability to specify a bind variable for the integer > value of a LIMIT, TTL, or TIMESTAMP option, but the CQL spec has not been > updated to reflect this enhancement. > Also, the special predefined bind variable names are not documented in the > CQL spec: "[limit]", "[ttl]", and "[timestamp]". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8889) CQL spec is missing doc for support of bind variables for LIMIT, TTL, and TIMESTAMP
Jack Krupansky created CASSANDRA-8889: - Summary: CQL spec is missing doc for support of bind variables for LIMIT, TTL, and TIMESTAMP Key: CASSANDRA-8889 URL: https://issues.apache.org/jira/browse/CASSANDRA-8889 Project: Cassandra Issue Type: Bug Components: Documentation & website Reporter: Jack Krupansky Priority: Minor CASSANDRA-4450 added the ability to specify a bind variable for the integer value of a LIMIT, TTL, or TIMESTAMP option, but the CQL spec has not been updated to reflect this enhancement. Also, the special predefined bind variable names are not documented in the CQL spec: "[limit]", "[ttl]", and "[timestamp]". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8814) Formatting of code blocks in CQL doc in github is a little messed up
[ https://issues.apache.org/jira/browse/CASSANDRA-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CASSANDRA-8814: -- Description: Although the html version of the CQL doc on the website looks fine, the textile conversion of the source files in github looks a little messed up. In particular, the "p." paragraph directives that terminate "bc.." block code directives are not properly recognized and then the following text gets subsumed into the code block. The directives look fine, as per my read of the textile doc, but it appears that the textile converter used by github requires that there be a blank line before the "p." directive to end the code block. It also requires a space after the dot for "p. ". If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, but only where the syntax code block was multiple lines. This is not a problem where the "bc." directive is used with a single dot for a single line, as opposed to the "bc.." directive used with a double dot for a block of lines. Or in the case of the CREATE KEYSPACE section you see all of the notes crammed into what should be the "Sample" box. See: https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile This problem ("p." not recognized to terminate a code block unless followed by a space and preceded by a blank line) actually occurs for the interactive textile formatter as well: http://txstyle.org/doc/4/block-code was: Although the html version of the CQL doc on the website looks fine, the textile conversion of the source files in github looks a little messed up. In particular, the "p." paragraph directives that terminate "bc.." block code directives are not properly recognized and then the following text gets subsumed into the code block. The directives look fine, as per my read of the textile doc, but it appears that the textile converter used by github requires that there be a blank line before the "p." directive to end the code block. It also requires a space after the dot for "p. ". If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, but only where the syntax code block was multiple lines. This is not a problem where the "bc." directive is used with a single dot for a single line, as opposed to the "bc.." directive used with a double dot for a block of lines. Or in the case of the CREATE KEYSPACE section you see all of the notes crammed into what should be the "Sample" box. See: https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile This problem ("p." not recognized to termined a code block unless followed by a space and preceded by a blank line) actually occurs for the interactive textile formatter as well: http://txstyle.org/doc/4/block-code > Formatting of code blocks in CQL doc in github is a little messed up > > > Key: CASSANDRA-8814 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8814 > Project: Cassandra > Issue Type: Task > Components: Documentation & website >Reporter: Jack Krupansky >Priority: Minor > > Although the html version of the CQL doc on the website looks fine, the > textile conversion of the source files in github looks a little messed up. In > particular, the "p." paragraph directives that terminate "bc.." block code > directives are not properly recognized and then the following text gets > subsumed into the code block. The directives look fine, as per my read of the > textile doc, but it appears that the textile converter used by github > requires that there be a blank line before the "p." directive to end the code > block. It also requires a space after the dot for "p. ". > If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you > will see stray "p." directives as well as "\_\_Sample\_\_" text in the code > blocks, but only where the syntax code block was multiple lines. This is not > a problem where the "bc." directive is used with a single dot for a single > line, as opposed to the "bc.." directive used with a double dot for a block > of lines. Or in the case of the CREATE KEYSPACE section you see all of the > notes crammed into what should be the "Sample" box. > See: > https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile > https://github.co
[jira] [Created] (CASSANDRA-8814) Formatting of code blocks in CQL doc in github is a little messed up
Jack Krupansky created CASSANDRA-8814: - Summary: Formatting of code blocks in CQL doc in github is a little messed up Key: CASSANDRA-8814 URL: https://issues.apache.org/jira/browse/CASSANDRA-8814 Project: Cassandra Issue Type: Task Components: Documentation & website Reporter: Jack Krupansky Priority: Minor Although the html version of the CQL doc on the website looks fine, the textile conversion of the source files in github looks a little messed up. In particular, the "p." paragraph directives that terminate "bc.." block code directives are not properly recognized and then the following text gets subsumed into the code block. The directives look fine, as per my read of the textile doc, but it appears that the textile converter used by github requires that there be a blank line before the "p." directive to end the code block. It also requires a space after the dot for "p. ". If you go to the github pages for the CQL doc for trunk, 2.1, and 2.0, you will see stray "p." directives as well as "\_\_Sample\_\_" text in the code blocks, but only where the syntax code block was multiple lines. This is not a problem where the "bc." directive is used with a single dot for a single line, as opposed to the "bc.." directive used with a double dot for a block of lines. Or in the case of the CREATE KEYSPACE section you see all of the notes crammed into what should be the "Sample" box. See: https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0.11/doc/cql3/CQL.textile This problem ("p." not recognized to termined a code block unless followed by a space and preceded by a blank line) actually occurs for the interactive textile formatter as well: http://txstyle.org/doc/4/block-code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8135) documentation missing for CONTAINS keyword
[ https://issues.apache.org/jira/browse/CASSANDRA-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322240#comment-14322240 ] Jack Krupansky commented on CASSANDRA-8135: --- That web page appears to be the 2.0 doc, while CONTAINS is in 2.1. The 2.0 doc: https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile The 2.1 doc: https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile And of course you can always consult the DataStax CQL doc: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Although it does seem odd that DataStax has that as "3.1" even thought the CQL doc for 2.1 should be CQL 3.2.0. Hey, [~thobbs], since you did some of the recent edits to the CQL spec, I'm curious what the process is for deciding which CQL version doc should be posted on the Apache Cassandra site in that doc directory. I would think that CQL doc for both 2.0 and 2.1 should be published. Maybe more to the point, the doc/spec fo CQL 3.1.7 and 3.2.0 should be published since both C* 2.0 and 2.1 are commonly used. And DataStax should also have doc for both CQL 3.1.7 and 3.2.0 - I mean, having doc for CONTAINS doesn't help a DSE customer since DSE doesn't support C* 2.1 yet. > documentation missing for CONTAINS keyword > -- > > Key: CASSANDRA-8135 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8135 > Project: Cassandra > Issue Type: New Feature > Components: Documentation & website >Reporter: Jon Haddad > > the contains keyword was covered in this blog entry > http://www.datastax.com/dev/blog/cql-in-2-1 but is missing from the > documentation https://cassandra.apache.org/doc/cql3/CQL.html#collections -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8538) Clarify default write time timestamp for CQL insert and update
Jack Krupansky created CASSANDRA-8538: - Summary: Clarify default write time timestamp for CQL insert and update Key: CASSANDRA-8538 URL: https://issues.apache.org/jira/browse/CASSANDRA-8538 Project: Cassandra Issue Type: Improvement Components: Documentation & website Reporter: Jack Krupansky Priority: Minor The current CQL spec (and downstream doc) says that the default timestamp for write time for CQL inserts and updates is "the current time of the insertion", but that is somewhat vague and non-specific. In particular, is that the time when the coordinator node parses the CQL statement, or... when the row is inserted or updated on the target nodes, or... something else? In particular, if the coordinator doesn't own the token of the primary key, will the owning node set the write time or does the coordinator node do that? Obviously the application can set an explicit TIMESTAMP, but this issue is concerned with the default if that explicit option is not used. Also, will all replicas of the insert or update share the precisely same write time, or will they reflect the actual time when each particular replica row is inserted or updated on each of the replica nodes? Finally, if a batch statement is used to insert or update multiple rows, will they all share the same write time (e.g., the time the batch statement was parsed) or when each replica row is actually inserted or updated on the target (if the coordinator node does not own the token of the partition key) or replica nodes? It would also be helpful if the tracing option was specific as to which time is the official write time for the insert or update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7769) Implement pg-style dollar syntax for string constants
[ https://issues.apache.org/jira/browse/CASSANDRA-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121468#comment-14121468 ] Jack Krupansky commented on CASSANDRA-7769: --- Those patterns don't seem to recognize empty string sequences or non-name sequences for the delimiter marker. The PG rules allow both. Or even single letter sequences, for that matter. Or... upper case. It would be good to list a set of test use cases, which can also be included in doc. > Implement pg-style dollar syntax for string constants > - > > Key: CASSANDRA-7769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7769 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 > > Attachments: 7769.txt, 7769v2.txt > > > Follow-up of CASSANDRA-7740: > {{$function$...$function$}} in addition to string style variant. > See also > http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout
[ https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118109#comment-14118109 ] Jack Krupansky commented on CASSANDRA-7447: --- Thanks, [~benedict], so Just to paraphrase, "columnar" is referring to the CQL use of "clustering columns" - multiple/many CQL rows per partition, and "row-oriented" is referring to a primary key consisting of only partition key columns with no clustering columns, so that "row-oriented" means only a single CQL row per partition, right? One clarification, does the delta encoding require that each CQL row have only one column, so that each adjacent "cell" is for the same CQL column, or... is delta-coding effective when the CQL row has a sequence of columns that map to a repeating sequence of adjacent cells, but the cells for a particular CQL column are never immediately adjacent in the partition? > New sstable format with support for columnar layout > --- > > Key: CASSANDRA-7447 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7447 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Labels: performance, storage > Fix For: 3.0 > > Attachments: ngcc-storage.odp > > > h2. Storage Format Proposal > C* has come a long way over the past few years, and unfortunately our storage > format hasn't kept pace with the data models we are now encouraging people to > utilise. This ticket proposes a collections of storage primitives that can be > combined to serve these data models more optimally. > It would probably help to first state the data model at the most abstract > level. We have a fixed three-tier structure: We have the partition key, the > clustering columns, and the data columns. Each have their own characteristics > and so require their own specialised treatment. > I should note that these changes will necessarily be delivered in stages, and > that we will be making some assumptions about what the most useful features > to support initially will be. Any features not supported will require > sticking with the old format until we extend support to all C* functionality. > h3. Partition Key > * This really has two components: the partition, and the value. Although the > partition is primarily used to distribute across nodes, it can also be used > to optimise lookups for a given key within a node > * Generally partitioning is by hash, and for the moment I want to focus this > ticket on the assumption that this is the case > * Given this, it makes sense to optimise our storage format to permit O(1) > searching of a given partition. It may be possible to achieve this with > little overhead based on the fact we store the hashes in order and know they > are approximately randomly distributed, as this effectively forms an > immutable contiguous split-ordered list (see Shalev/Shavit, or > CASSANDRA-7282), so we only need to store an amount of data based on how > imperfectly distributed the hashes are, or at worst a single value per block. > * This should completely obviate the need for a separate key-cache, which > will be relegated to supporting the old storage format only > h3. Primary Key / Clustering Columns > * Given we have a hierarchical data model, I propose the use of a > cache-oblivious trie > * The main advantage of the trie is that it is extremely compact and > _supports optimally efficient merges with other tries_ so that we can support > more efficient reads when multiple sstables are touched > * The trie will be preceded by a small amount of related data; the full > partition key, a timestamp epoch (for offset-encoding timestamps) and any > other partition level optimisation data, such as (potentially) a min/max > timestamp to abort merges earlier > * Initially I propose to limit the trie to byte-order comparable data types > only (the number of which we can expand through translations of the important > types that are not currently) > * Crucially the trie will also encapsulate any range tombstones, so that > these are merged early in the process and avoids re-iterating the same data > * Results in true bidirectional streaming without having to read entire range > into memory > h3. Values > There are generally two approaches to storing rows of data: columnar, or > row-oriented. The above two data structures can be combined with a value > storage scheme that is based on either. However, given the current model we > have of reading large 64Kb blocks for any read, I am inclined to focus on > columnar support first, as this delivers order-of-magnitude benefits to those > users with the correct workload, while for most workloads our 64Kb blocks are > large enough to store row-orie
[jira] [Comment Edited] (CASSANDRA-7855) Genralize use of IN for compound partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117360#comment-14117360 ] Jack Krupansky edited comment on CASSANDRA-7855 at 9/1/14 1:34 PM: --- bq. not necessary a better idea than parallelizing queries server side Server side? Or should that be *client* side? Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. "given". And earlier in the description "later" s.b. "latter". And even earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only support an IN". was (Author: jkrupan): bq. not necessary a better idea than parallelizing queries server side Server side? Or should that be CLIENT side? Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. "given". And earlier in the description "later" s.b. "latter". And even earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only support an IN". > Genralize use of IN for compound partition keys > --- > > Key: CASSANDRA-7855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7855 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Priority: Minor > Labels: cql > Fix For: 2.0.11 > > > When you have a compount partition key, we currently only support to have a > {{IN}} on the last column of that partition key. So given: > {noformat} > CREATE TABLE foo ( > k1 int, > k2 int, > v int, > PRIMARY KEY ((k1, k2)) > ) > {noformat} > we allow > {noformat} > SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2) > {noformat} > but not > {noformat} > SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2) > {noformat} > There is no particular reason for us not supporting the later (to the best of > my knowledge) since it's reasonably straighforward, so we should fix it. > I'll note that using {{IN}} on a partition key is not necessarily a better > idea than parallelizing queries server client side so this syntax, when > introduced, should probably be used sparingly, but given we do support IN on > partition keys, I see no reason not to extend it to compound PK properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout
[ https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117375#comment-14117375 ] Jack Krupansky commented on CASSANDRA-7447: --- I'm also interested in the impact of the proposal on storage and performance on: 1. Secondary indexes - either native Cassandra, manually maintained index tables, or even DSE/Solr indexing. Would it make them faster, more compact, less needed, or no net impact? 2. Filtering - Would it enable more/faster filtering, discourage it, or no net change? > New sstable format with support for columnar layout > --- > > Key: CASSANDRA-7447 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7447 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Labels: performance, storage > Fix For: 3.0 > > Attachments: ngcc-storage.odp > > > h2. Storage Format Proposal > C* has come a long way over the past few years, and unfortunately our storage > format hasn't kept pace with the data models we are now encouraging people to > utilise. This ticket proposes a collections of storage primitives that can be > combined to serve these data models more optimally. > It would probably help to first state the data model at the most abstract > level. We have a fixed three-tier structure: We have the partition key, the > clustering columns, and the data columns. Each have their own characteristics > and so require their own specialised treatment. > I should note that these changes will necessarily be delivered in stages, and > that we will be making some assumptions about what the most useful features > to support initially will be. Any features not supported will require > sticking with the old format until we extend support to all C* functionality. > h3. Partition Key > * This really has two components: the partition, and the value. Although the > partition is primarily used to distribute across nodes, it can also be used > to optimise lookups for a given key within a node > * Generally partitioning is by hash, and for the moment I want to focus this > ticket on the assumption that this is the case > * Given this, it makes sense to optimise our storage format to permit O(1) > searching of a given partition. It may be possible to achieve this with > little overhead based on the fact we store the hashes in order and know they > are approximately randomly distributed, as this effectively forms an > immutable contiguous split-ordered list (see Shalev/Shavit, or > CASSANDRA-7282), so we only need to store an amount of data based on how > imperfectly distributed the hashes are, or at worst a single value per block. > * This should completely obviate the need for a separate key-cache, which > will be relegated to supporting the old storage format only > h3. Primary Key / Clustering Columns > * Given we have a hierarchical data model, I propose the use of a > cache-oblivious trie > * The main advantage of the trie is that it is extremely compact and > _supports optimally efficient merges with other tries_ so that we can support > more efficient reads when multiple sstables are touched > * The trie will be preceded by a small amount of related data; the full > partition key, a timestamp epoch (for offset-encoding timestamps) and any > other partition level optimisation data, such as (potentially) a min/max > timestamp to abort merges earlier > * Initially I propose to limit the trie to byte-order comparable data types > only (the number of which we can expand through translations of the important > types that are not currently) > * Crucially the trie will also encapsulate any range tombstones, so that > these are merged early in the process and avoids re-iterating the same data > * Results in true bidirectional streaming without having to read entire range > into memory > h3. Values > There are generally two approaches to storing rows of data: columnar, or > row-oriented. The above two data structures can be combined with a value > storage scheme that is based on either. However, given the current model we > have of reading large 64Kb blocks for any read, I am inclined to focus on > columnar support first, as this delivers order-of-magnitude benefits to those > users with the correct workload, while for most workloads our 64Kb blocks are > large enough to store row-oriented data in a column-oriented fashion without > any performance degradation (I'm happy to consign very large row support to > phase 2). > Since we will most likely target both behaviours eventually, I am currently > inclined to suggest that static columns, sets and maps be targeted for a > row-oriented release, as they don't naturally fit in a columna
[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout
[ https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117373#comment-14117373 ] Jack Krupansky commented on CASSANDRA-7447: --- Can you guys provide two very simple table definitions and corresponding CQL queries that exemplify row vs. columnar storage and processing optimality? IOW, the two key test cases that would confirm the extent to which the goals of this issue are met, although that might include a narrative about how much updating and deletions are impacting storage and performance as well. > New sstable format with support for columnar layout > --- > > Key: CASSANDRA-7447 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7447 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Labels: performance, storage > Fix For: 3.0 > > Attachments: ngcc-storage.odp > > > h2. Storage Format Proposal > C* has come a long way over the past few years, and unfortunately our storage > format hasn't kept pace with the data models we are now encouraging people to > utilise. This ticket proposes a collections of storage primitives that can be > combined to serve these data models more optimally. > It would probably help to first state the data model at the most abstract > level. We have a fixed three-tier structure: We have the partition key, the > clustering columns, and the data columns. Each have their own characteristics > and so require their own specialised treatment. > I should note that these changes will necessarily be delivered in stages, and > that we will be making some assumptions about what the most useful features > to support initially will be. Any features not supported will require > sticking with the old format until we extend support to all C* functionality. > h3. Partition Key > * This really has two components: the partition, and the value. Although the > partition is primarily used to distribute across nodes, it can also be used > to optimise lookups for a given key within a node > * Generally partitioning is by hash, and for the moment I want to focus this > ticket on the assumption that this is the case > * Given this, it makes sense to optimise our storage format to permit O(1) > searching of a given partition. It may be possible to achieve this with > little overhead based on the fact we store the hashes in order and know they > are approximately randomly distributed, as this effectively forms an > immutable contiguous split-ordered list (see Shalev/Shavit, or > CASSANDRA-7282), so we only need to store an amount of data based on how > imperfectly distributed the hashes are, or at worst a single value per block. > * This should completely obviate the need for a separate key-cache, which > will be relegated to supporting the old storage format only > h3. Primary Key / Clustering Columns > * Given we have a hierarchical data model, I propose the use of a > cache-oblivious trie > * The main advantage of the trie is that it is extremely compact and > _supports optimally efficient merges with other tries_ so that we can support > more efficient reads when multiple sstables are touched > * The trie will be preceded by a small amount of related data; the full > partition key, a timestamp epoch (for offset-encoding timestamps) and any > other partition level optimisation data, such as (potentially) a min/max > timestamp to abort merges earlier > * Initially I propose to limit the trie to byte-order comparable data types > only (the number of which we can expand through translations of the important > types that are not currently) > * Crucially the trie will also encapsulate any range tombstones, so that > these are merged early in the process and avoids re-iterating the same data > * Results in true bidirectional streaming without having to read entire range > into memory > h3. Values > There are generally two approaches to storing rows of data: columnar, or > row-oriented. The above two data structures can be combined with a value > storage scheme that is based on either. However, given the current model we > have of reading large 64Kb blocks for any read, I am inclined to focus on > columnar support first, as this delivers order-of-magnitude benefits to those > users with the correct workload, while for most workloads our 64Kb blocks are > large enough to store row-oriented data in a column-oriented fashion without > any performance degradation (I'm happy to consign very large row support to > phase 2). > Since we will most likely target both behaviours eventually, I am currently > inclined to suggest that static columns, sets and maps be targeted for a > row-oriented release, as they don't
[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117363#comment-14117363 ] Jack Krupansky commented on CASSANDRA-7855: --- And terminology - this improvement is for "composite partition keys" - "compound" refers to "primary keys" with clustering columns rather than just the partition key portion of the primary key. > Genralize use of IN for compound partition keys > --- > > Key: CASSANDRA-7855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7855 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Priority: Minor > Labels: cql > Fix For: 2.0.11 > > > When you have a compount partition key, we currently only support to have a > {{IN}} on the last column of that partition key. So given: > {noformat} > CREATE TABLE foo ( > k1 int, > k2 int, > v int, > PRIMARY KEY ((k1, k2)) > ) > {noformat} > we allow > {noformat} > SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2) > {noformat} > but not > {noformat} > SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2) > {noformat} > There is no particular reason for us not supporting the later (to the best of > my knowledge) since it's reasonably straighforward, so we should fix it. > I'll note that using {{IN}} on a partition key is not necessary a better idea > than parallelizing queries server side so this syntax, when introduced, > should probably be used sparingly, but give we do support IN on partition > keys, I see no reason not to extend it to compound PK properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117360#comment-14117360 ] Jack Krupansky commented on CASSANDRA-7855: --- bq. not necessary a better idea than parallelizing queries server side Server side? Or should that be CLIENT side? Typo: "necessary" s.b. "necessarily". And later in the description "give" s.b. "given". And earlier in the description "later" s.b. "latter". And even earlier, "compount" s.b. "compound" and "only support to have a IN" s.b. "only support an IN". > Genralize use of IN for compound partition keys > --- > > Key: CASSANDRA-7855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7855 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Priority: Minor > Labels: cql > Fix For: 2.0.11 > > > When you have a compount partition key, we currently only support to have a > {{IN}} on the last column of that partition key. So given: > {noformat} > CREATE TABLE foo ( > k1 int, > k2 int, > v int, > PRIMARY KEY ((k1, k2)) > ) > {noformat} > we allow > {noformat} > SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2) > {noformat} > but not > {noformat} > SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2) > {noformat} > There is no particular reason for us not supporting the later (to the best of > my knowledge) since it's reasonably straighforward, so we should fix it. > I'll note that using {{IN}} on a partition key is not necessary a better idea > than parallelizing queries server side so this syntax, when introduced, > should probably be used sparingly, but give we do support IN on partition > keys, I see no reason not to extend it to compound PK properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7813) Decide how to deal with conflict between native and user-defined functions
[ https://issues.apache.org/jira/browse/CASSANDRA-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108411#comment-14108411 ] Jack Krupansky commented on CASSANDRA-7813: --- +0 for requiring empty namespace. It assures that a conflict with native functions is avoided, but... adds to the burden of people migrating from RDBMS SQL and makes CQL doc and training materials look overly complex. A top goal should be to make CQL look a lot easier and more approachable, not more complicated. CQL doesn't have to 100% mimic SQL, but at least learn from it and avoid reinventing the horse unless there is a clear and compelling benefit to a wide range of developers. If CQL is going to detour from SQL for some reason, at least clearly refer to the specific SQL rule and the specific rationale for doing so. Being "better" than SQL is a positive, but merely being "different" is a distinct negative, IMHO. I would be +1 for ALLOWING empty namespace to override native functions as long as it is not absolutely required. Compromise: produce a semi-noisy WARNING whenever an unqualified UDF is used, and then let the developer set an option to disable that warning. That way, newbie app developers will at least start out with the caution to be careful with unqualified UDF references, but still be able to write clean and simple CQL. Another possibility is to suggest a naming convention, such as an application prefix such as "u_" or "u." which would be unlikely in any future CQL native functions. The fact that CQL does not have a stable collection of native functions is a distinct negative for the project - makes it seem rather immature compared to RDBMS SQL. Maybe... come up with a roadmap for obvious future enhancements and then reserve those names, or at least give a noisy warning that the names can be overridden, but at the risk of future upgrade incompatibility. > Decide how to deal with conflict between native and user-defined functions > -- > > Key: CASSANDRA-7813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7813 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne > Labels: cql > Fix For: 3.0 > > > We have a bunch of native/hardcoded functions (now(), dateOf(), ...) and in > 3.0, user will be able to define new functions. Now, there is a very high > change that we will provide more native functions over-time (to be clear, I'm > not particularly for adding native functions for allthethings just because we > can, but it's clear that we should ultimately provide more than what we > have). Which begs the question: how do we want to deal with the problem of > adding a native function potentially breaking a previously defined > user-defined function? > A priori I see the following options (maybe there is more?): > # don't do anything specific, hoping that it won't happen often and consider > it a user problem if it does. > # reserve a big number of names that we're hoping will cover all future need. > # make native function and user-defined function syntactically distinct so it > cannot happen. > I'm not a huge fan of solution 1). Solution 2) is actually what we did for > UDT but I think it's somewhat less practical here: there is so much types > that it makes sense to provide natively and so it wasn't too hard to come up > with a reasonably small list of types name to reserve just in case. This > feels a lot harder for functions to me. > Which leaves solution 3). Since we already have the concept of namespaces for > functions, a simple idea would be to force user function to have namespace. > We could even allow that namespace to be empty as long as we force the > namespace separator (so we'd allow {{bar::foo}} and {{::foo}} for user > functions, but *not* {{foo}} which would be reserved for native function). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7740) Parsing of UDF body is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092746#comment-14092746 ] Jack Krupansky commented on CASSANDRA-7740: --- PostgreSQL-style dollar-quoted string constants would be nice. See: http://www.postgresql.org/docs/8.2/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING But as [~slebresne] suggested, that should be a global feature of string constants, not limited to this feature. > Parsing of UDF body is broken > - > > Key: CASSANDRA-7740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7740 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > > The parsing of function body introduced by CASSANDRA-7395 is somewhat broken. > It blindly parse everything up to {{END_BODY}}, which as 2 problems: > # it parse function body as if it was part of the CQL syntax, so anything > that don't happen to be a valid CQL token won't even parse. > # something like > {noformat} > CREATE FUNCTION foo() RETURNS text LANGUAGE JAVA BODY return "END_BODY"; > END_BODY; > {noformat} > will not parse correctly. > I don't think we can accept random syntax like that. A better solution (which > is the one Postgresql uses) is to pass the function body as a normal string. > And in fact I'd be in favor of reusing Postgresql syntax (because why not), > that is to have: > {noformat} > CREATE FUNCTION foo() RETURNS text LANGUAGE JAVA AS 'return "END_BODY"'; > {noformat} > One minor annoyance might be, for certain languages, the necessity to double > every quote inside the string. But in a separate ticket we could introduce > Postregsql solution of adding an [alternate syntax for string > constants|http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING]. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6377) ALLOW FILTERING should allow seq scan filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087826#comment-14087826 ] Jack Krupansky commented on CASSANDRA-6377: --- [~ztyx], here's the relevant restriction from the DataStax CQL doc: bq. The WHERE clause is composed of conditions on the columns that are part of the primary key or are indexed. See: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html And the same restriction from the project spec for CQL3: bq. The specifies which rows must be queried. It is composed of relations on the columns that are part of the PRIMARY KEY and/or have a secondary index defined on them. See: https://cassandra.apache.org/doc/cql3/CQL.html Was there part of either of those two statements that is maybe worded too vaguely, or is the issue how you would have found those statements more easily? Improving doc usability is a priority. > ALLOW FILTERING should allow seq scan filtering > --- > > Key: CASSANDRA-6377 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6377 > Project: Cassandra > Issue Type: Bug > Components: API >Reporter: Jonathan Ellis >Assignee: Sylvain Lebresne > Labels: cql > Fix For: 3.0 > > > CREATE TABLE emp_table2 ( > empID int PRIMARY KEY, > firstname text, > lastname text, > b_mon text, > b_day text, > b_yr text, > ); > INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) >VALUES (100,'jane','doe','oct','31','1980'); > INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) >VALUES (101,'john','smith','jan','01','1981'); > INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) >VALUES (102,'mary','jones','apr','15','1982'); > INSERT INTO emp_table2 (empID,firstname,lastname,b_mon,b_day,b_yr) >VALUES (103,'tim','best','oct','25','1982'); > > SELECT b_mon,b_day,b_yr,firstname,lastname FROM emp_table2 > WHERE b_mon='oct' ALLOW FILTERING; > Bad Request: No indexed columns present in by-columns clause with Equal > operator -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7683) Always allow CREATE TABLE IF NOT EXISTS if it exists
[ https://issues.apache.org/jira/browse/CASSANDRA-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085517#comment-14085517 ] Jack Krupansky commented on CASSANDRA-7683: --- Could you at least update the CQL3 spec to indicate the actual semantics, as I have suggested? > Always allow CREATE TABLE IF NOT EXISTS if it exists > > > Key: CASSANDRA-7683 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7683 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Jens Rantil >Priority: Minor > > Background: I have a table that I'd like to make sure exists when I boot up > my application. To make the life easier for our developers I execute an > `ALTER TABLE IF EXISTS`. > In production I am using user based authorization and for security reasons > regular production users are not allowed to CREATE TABLEs. > Problem: When a user without CREATE permission executes `ALTER TABLE IF > EXISTS` for a table that already exists, the command fails telling me the > user is not allowed to execute `CREATE TABLE`. It feels kinda ridiculous that > this fails when I'm not actually creating the table. > Proposal: That the permission check only should be done if the table is only > actually to be created. > Workaround: Right now, I have a boolean that checks if in production and in > that case don't try to create the table. Another approach would be to > manually check if the table exists. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7683) Always allow CREATE TABLE IF NOT EXISTS if it exists
[ https://issues.apache.org/jira/browse/CASSANDRA-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084605#comment-14084605 ] Jack Krupansky commented on CASSANDRA-7683: --- bq. I execute an `ALTER TABLE IF EXISTS`. Ummm... there's no such command, at least in the CQL3 spec! I suspect that you simply meant "CREATE TABLE IF NOT EXISTS". Assumming that, I think the CQL3 spec suggests that you should indeed be able to do what you suggest - or the spec needs to be revised to specifically disallow it: {code} Attempting to create an already existing table will return an error unless the IF NOT EXISTS option is used. If it is used, the statement will be a no-op if the table already exists. {code} So, unless, somebody wants to propose changing that second sentence to "If it is used, the statement will be a no-op if the table already exists, unless the user does not have CREATE permission, in which case the request will return an error" the Wish should be considered reasonable. Personally, this one seems to be in a very gray area - fielder's choice, flip a coin. Maybe the proper argument to make here is that the user wishes to have a single script that can be used by a range of users and for completeness includes the CREATE TABLE so it can be used for initial as well as incremental operations. It that context it would make sense, but... I may be reading too much into the users' intentions! > Always allow CREATE TABLE IF NOT EXISTS if it exists > > > Key: CASSANDRA-7683 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7683 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Jens Rantil >Priority: Minor > > Background: I have a table that I'd like to make sure exists when I boot up > my application. To make the life easier for our developers I execute an > `ALTER TABLE IF EXISTS`. > In production I am using user based authorization and for security reasons > regular production users are not allowed to CREATE TABLEs. > Problem: When a user without CREATE permission executes `ALTER TABLE IF > EXISTS` for a table that already exists, the command fails telling me the > user is not allowed to execute `CREATE TABLE`. It feels kinda ridiculous that > this fails when I'm not actually creating the table. > Proposal: That the permission check only should be done if the table is only > actually to be created. > Workaround: Right now, I have a boolean that checks if in production and in > that case don't try to create the table. Another approach would be to > manually check if the table exists. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)
[ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084005#comment-14084005 ] Jack Krupansky commented on CASSANDRA-7395: --- In RDBMS terminology, these are strictly "single-row functions", as opposed to "aggregate functions", correct? Not that aggregate UDFs wouldn't be useful, but they are more complex. See: http://docs.oracle.com/database/121/SQLRF/functions002.htm > Support for pure user-defined functions (UDF) > - > > Key: CASSANDRA-7395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7395 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Robert Stupp > Labels: cql > Fix For: 3.0 > > Attachments: 7395-dtest.txt, 7395.txt, udf-create-syntax.png, > udf-drop-syntax.png > > > We have some tickets for various aspects of UDF (CASSANDRA-4914, > CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of > ocean-boiling. > Let's start with something simple: allowing pure user-defined functions in > the SELECT clause of a CQL query. That's it. > By "pure" I mean, must depend only on the input parameters. No side effects. > No exposure to C* internals. Column values in, result out. > http://en.wikipedia.org/wiki/Pure_function -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement
[ https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081542#comment-14081542 ] Jack Krupansky commented on CASSANDRA-7654: --- As far as the cqlsh angle, copying from a CSV file might be a lot more convenient anyway, if inserting more than just a very few rows. > CQL INSERT improvement > -- > > Key: CASSANDRA-7654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7654 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Priority: Minor > > It would be nice to be able to add multiple rows using a single {{INSERT}}. > Restricted to the same partition. > For example: > Current behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 1, 41); > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 2, 42); > {noformat} > Wanted behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES > ('foo', 1, 41), > ('foo', 2, 42), > ('foo', 3, 42), > ('foo', 4, 42); > {noformat} > Assumed table def: > {noformat} > CREATE TABLE comp_key ( > key TEXT, > clust INT, > num_val DECIMAL, > PRIMARY KEY ( key, clust ) > ); > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement
[ https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081511#comment-14081511 ] Jack Krupansky commented on CASSANDRA-7654: --- This appears to be the duplicate - CASSANDRA-5959 - "CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)", which was in turn resolved as a duplicate of CASSANDRA-4693 - "CQL Protocol should allow multiple PreparedStatements to be atomically executed", which is for the feature that [~tjake] referenced. It used a slightly different syntax, factoring out the partition key: {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Which highlights the fact that the example in this issue did not even have the partition key specified in either the primary key or the insert column list. For convenient (future) reference, the batch prepared statement replaces: {code} PreparedStatement ps = session.prepare( "BEGIN BATCH" + " INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" + " INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" + " INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" + "APPLY BATCH" ); session.execute(ps.bind(uid, mid1, title1, body1, uid, mid2, title2, body2, uid, mid3, title3, body3)); {code} with {code} PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); batch.add(ps.bind(uid, mid1, title1, body1)); batch.add(ps.bind(uid, mid2, title2, body2)); batch.add(ps.bind(uid, mid3, title3, body3)); session.execute(batch); {code} Granted, that doesn't help with cqlsh. It also doesn't help with DevCenter either, right? > CQL INSERT improvement > -- > > Key: CASSANDRA-7654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7654 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Priority: Minor > > It would be nice to be able to add multiple rows using a single {{INSERT}}. > Restricted to the same partition. > For example: > Current behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 1, 41); > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 2, 42); > {noformat} > Wanted behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES > ('foo', 1, 41), > ('foo', 2, 42), > ('foo', 3, 42), > ('foo', 4, 42); > {noformat} > Assumed table def: > {noformat} > CREATE TABLE comp_key ( > key TEXT, > clust INT, > num_val DECIMAL, > PRIMARY KEY ( key, clust ) > ); > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7654) CQL INSERT improvement
[ https://issues.apache.org/jira/browse/CASSANDRA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080766#comment-14080766 ] Jack Krupansky commented on CASSANDRA-7654: --- 1. For doc purposes, what is the rationale for restricting the insert to a single partition? 2. Will subsequent inserts occur if any of the inserts fail due to consistency or for any other reason? 3. Can the app assume that the inserts will be attempted in parallel? 4. Will the driver route the insert to the insert a node that owns that partition key? 4a. Should all of the inserts really be routed to the same node, or distributed according to RF? (Driver question.) 5. Is it also proposed to enhance the driver to support such a "batch" insertion of documents? > CQL INSERT improvement > -- > > Key: CASSANDRA-7654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7654 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp > > It would be nice to be able to add multiple rows using a single {{INSERT}}. > Restricted to the same partition. > For example: > Current behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 1, 41); > INSERT INTO comp_key (key, num_val) > VALUES ('foo', 2, 42); > {noformat} > Wanted behaviour: > {noformat} > INSERT INTO comp_key (key, num_val) > VALUES > ('foo', 1, 41), > ('foo', 2, 42), > ('foo', 3, 42), > ('foo', 4, 42); > {noformat} > Assumed table def: > {noformat} > CREATE TABLE comp_key ( > key TEXT, > clust INT, > num_val DECIMAL, > PRIMARY KEY ( key, clust ) > ); > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7642) Adaptive Consistency
[ https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079392#comment-14079392 ] Jack Krupansky commented on CASSANDRA-7642: --- Maybe this ends up being a doc issue - detailing best practice for achieving adaptive consistency. And just better doc for consistency in general. I mean, as more and more lower-skilled RDBMS ACID-heads get sucked into NoSQL Cassandra, understanding and managing consistency is only going to get a bigger and bigger issue. > Adaptive Consistency > > > Key: CASSANDRA-7642 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7642 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Rustam Aliyev > Fix For: 3.0 > > > h4. Problem > At minimum, application requires consistency level of X, which must be fault > tolerant CL. However, when there is no failure it would be advantageous to > use stronger consistency Y (Y>X). > h4. Suggestion > Application defines minimum (X) and maximum (Y) consistency levels. C* can > apply adaptive consistency logic to use Y whenever possible and downgrade to > X when failure occurs. > Implementation should not negatively impact performance. Therefore, state has > to be maintained globally (not per request). > h4. Example > {{MIN_CL=LOCAL_QUORUM}} > {{MAX_CL=EACH_QUORUM}} > h4. Use Case > Consider a case where user wants to maximize their uptime and consistency. > They designing a system using C* where transactions are read/written with > LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between > DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the > cases. > Application requires new transactions to be read back right after they were > generated. Write and read could be done through different DCs (no > stickiness). In some cases when user writes into DC1 and reads immediately > from DC2, replication delay may cause problems. Transaction won't show up on > read in DC2, user will retry and create duplicate transaction. Occasional > duplicates are fine and the goal is to minimize number of dups. > Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) > whenever possible without compromising on availability. Using adaptive > consistency they should be able to define: >{{Read CL = LOCAL_QUORUM}} >{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}} > Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, > MAX:ALL)}} case. > h4. Criticism > # This functionality can/should be implemented by user himself. > bq. It will be hard for an average user to implement topology monitoring and > state machine. Moreover, this is a pattern which repeats. > # Transparent downgrading violates the CL contract, and that contract > considered be just about the most important element of Cassandra's runtime > behavior. > bq.Fully transparent downgrading without any contract is dangerous. However, > would it be problem if we specify explicitly only two discrete CL levels - > MIN_CL and MAX_CL? > # If you have split brain DCs (partitioned in CAP), you have to sacrifice > either consistency or availability, and auto downgrading sacrifices the > consistency in dangerous ways if the application isn't designed to handle it. > And if the application is designed to handle it, then it should be able to > handle it in normal circumstances, not just degraded/extraordinary ones. > bq. Agreed. Application should be designed for MIN_CL. In that case, MAX_CL > will not be causing much harm, only adding flexibility. > # It might be a better idea to loudly downgrade, instead of silently > downgrading, meaning that the client code does an explicit retry with lower > consistency on failure and takes some other kind of action to attempt to > inform either users or operators of the problem. The silent part of the > downgrading which could be dangerous. > bq. There are certainly cases where user should be informed when consistency > changes in order to perform custom action. For this purpose we could > allow/require user to register callback function which will be triggered when > consistency level changes. Best practices could be enforced by requiring > callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7642) Adaptive Consistency
[ https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079361#comment-14079361 ] Jack Krupansky edited comment on CASSANDRA-7642 at 7/30/14 3:16 PM: Is there any actual functional difference deep in Cassandra for higher CL other than merely waiting for confirmation and giving the status code if sufficient number of confirmations are not received? I imagine not (other than some transaction stuff.) But I can sympathize with the difficulty of implementing a "consistency validation" check at the app level. IOW, if Cassandra is going to get to ALL consistency anyway, if at all humanly (so to speak!) possible, what advantage is there here other than how the waiting is performed? And I have heard of users who want their writes to happen as quickly as possible, but also want some way to "check" whether or when a specified level of consistency is achieved, other than pinging with reads and checking values. Maybe the ultimate goal here should be asynchronous writes - send off a write with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response back that the operation is "initiated", and then have a "Check Operation CL Status" API call that would indicate whether or what level of CL has been achieved for a designated write operation. was (Author: jkrupan): Is there any actual functional difference deep in Cassandra for higher CL other than merely waiting for confirmation and giving the status code if sufficient number of confirmations are not received? I imagine not (other than some transaction stuff.) But I can sympathize with the difficulty of implementing a "consistency validation" check at the app level. IOW, if Cassandra is going to get to ALL consistency anyway, if at all humanly (so to speak!) possible, what advantage is there here other than how the waiting is performed? And I have heard of users who want their writes to happen as quickly as possible, but also want some way to "check" whether or when a specified level of consistency is achieved, other than pinging with reads and checking values. Maybe the ultimate goal here should be asynchronous writes - send off a write with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response back that the operation is "initiated", and then have a "Check Operation CL Status" API call that would indicate whether a designated whether or what level of CL has been achieved for that operation. > Adaptive Consistency > > > Key: CASSANDRA-7642 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7642 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Rustam Aliyev > Fix For: 3.0 > > > h4. Problem > At minimum, application requires consistency level of X, which must be fault > tolerant CL. However, when there is no failure it would be advantageous to > use stronger consistency Y (Y>X). > h4. Suggestion > Application defines minimum (X) and maximum (Y) consistency levels. C* can > apply adaptive consistency logic to use Y whenever possible and downgrade to > X when failure occurs. > Implementation should not negatively impact performance. Therefore, state has > to be maintained globally (not per request). > h4. Example > {{MIN_CL=LOCAL_QUORUM}} > {{MAX_CL=EACH_QUORUM}} > h4. Use Case > Consider a case where user wants to maximize their uptime and consistency. > They designing a system using C* where transactions are read/written with > LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between > DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the > cases. > Application requires new transactions to be read back right after they were > generated. Write and read could be done through different DCs (no > stickiness). In some cases when user writes into DC1 and reads immediately > from DC2, replication delay may cause problems. Transaction won't show up on > read in DC2, user will retry and create duplicate transaction. Occasional > duplicates are fine and the goal is to minimize number of dups. > Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) > whenever possible without compromising on availability. Using adaptive > consistency they should be able to define: >{{Read CL = LOCAL_QUORUM}} >{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}} > Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, > MAX:ALL)}} case. > h4. Criticism > # This functionality can/should be implemented by user himself. > bq. It will be hard for an average user to implement topology monitoring and > state machine. Moreover, this is a pattern which repeats. > # Transparent downgrading violates the CL contract, and that con
[jira] [Commented] (CASSANDRA-7642) Adaptive Consistency
[ https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079361#comment-14079361 ] Jack Krupansky commented on CASSANDRA-7642: --- Is there any actual functional difference deep in Cassandra for higher CL other than merely waiting for confirmation and giving the status code if sufficient number of confirmations are not received? I imagine not (other than some transaction stuff.) But I can sympathize with the difficulty of implementing a "consistency validation" check at the app level. IOW, if Cassandra is going to get to ALL consistency anyway, if at all humanly (so to speak!) possible, what advantage is there here other than how the waiting is performed? And I have heard of users who want their writes to happen as quickly as possible, but also want some way to "check" whether or when a specified level of consistency is achieved, other than pinging with reads and checking values. Maybe the ultimate goal here should be asynchronous writes - send off a write with a relatively low CL, like ONE or even ANY or some LOCAL CL, get a response back that the operation is "initiated", and then have a "Check Operation CL Status" API call that would indicate whether a designated whether or what level of CL has been achieved for that operation. > Adaptive Consistency > > > Key: CASSANDRA-7642 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7642 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Rustam Aliyev > Fix For: 3.0 > > > h4. Problem > At minimum, application requires consistency level of X, which must be fault > tolerant CL. However, when there is no failure it would be advantageous to > use stronger consistency Y (Y>X). > h4. Suggestion > Application defines minimum (X) and maximum (Y) consistency levels. C* can > apply adaptive consistency logic to use Y whenever possible and downgrade to > X when failure occurs. > Implementation should not negatively impact performance. Therefore, state has > to be maintained globally (not per request). > h4. Example > {{MIN_CL=LOCAL_QUORUM}} > {{MAX_CL=EACH_QUORUM}} > h4. Use Case > Consider a case where user wants to maximize their uptime and consistency. > They designing a system using C* where transactions are read/written with > LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between > DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the > cases. > Application requires new transactions to be read back right after they were > generated. Write and read could be done through different DCs (no > stickiness). In some cases when user writes into DC1 and reads immediately > from DC2, replication delay may cause problems. Transaction won't show up on > read in DC2, user will retry and create duplicate transaction. Occasional > duplicates are fine and the goal is to minimize number of dups. > Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) > whenever possible without compromising on availability. Using adaptive > consistency they should be able to define: >{{Read CL = LOCAL_QUORUM}} >{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}} > Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, > MAX:ALL)}} case. > h4. Criticism > # This functionality can/should be implemented by user himself. > bq. It will be hard for an average user to implement topology monitoring and > state machine. Moreover, this is a pattern which repeats. > # Transparent downgrading violates the CL contract, and that contract > considered be just about the most important element of Cassandra's runtime > behavior. > bq.Fully transparent downgrading without any contract is dangerous. However, > would it be problem if we specify explicitly only two discrete CL levels - > MIN_CL and MAX_CL? > # If you have split brain DCs (partitioned in CAP), you have to sacrifice > either consistency or availability, and auto downgrading sacrifices the > consistency in dangerous ways if the application isn't designed to handle it. > And if the application is designed to handle it, then it should be able to > handle it in normal circumstances, not just degraded/extraordinary ones. > bq. Agreed. Application should be designed for MIN_CL. In that case, MAX_CL > will not be causing much harm, only adding flexibility. > # It might be a better idea to loudly downgrade, instead of silently > downgrading, meaning that the client code does an explicit retry with lower > consistency on failure and takes some other kind of action to attempt to > inform either users or operators of the problem. The silent part of the > downgrading which could be dangerous. > bq. There are certainly cases where user should be informed
[jira] [Commented] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
[ https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079225#comment-14079225 ] Jack Krupansky commented on CASSANDRA-7637: --- bq. attribute < 'interests.food.z'; That's a great argument for this improvement - the simple, "obvious" explicit range... is incorrect. It won't match "'interests.food.zebra" or even "'interests.food.z". It would need to be something like: {code} SELECT * FROM profile WHERE profile_id = 123 AND attribute > 'interests.food.' AND attribute < 'interests.food.{'; {code} > Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH) > -- > > Key: CASSANDRA-7637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7637 > Project: Cassandra > Issue Type: New Feature > Components: API >Reporter: Rustam Aliyev > Fix For: 3.0 > > > Currently, if I want to perform range query on lexical type I need to do > something like this: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute > 'interests.food.' AND > attribute < 'interests.food.z'; > {code} > This is very efficient range query. Yet, many users who are not familiar with > Thrift and storage level implementation are unaware of this "trick". > Therefore, it would be convenient to introduce CQL keyword which will do this > more simply: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute START_WITH('interests.food.'); > {code} > Keyword would have same restrictions as other inequality search operators > plus some type restrictions. > Allowed types would be: > * {{ascii}} > * {{text}} / {{varchar}} > * {{map}} (same for ascii) (?) > * {{set}} (same for ascii) (?) > (?) may require more work, therefore optional -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
[ https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078555#comment-14078555 ] Jack Krupansky edited comment on CASSANDRA-7637 at 7/29/14 11:42 PM: - Why not use the SQL "LIKE" keyword operator and just support trailing wildcard (AKA "prefix" query) for now? {code} SELECT * FROM profile WHERE profile_id = 123 AND attribute LIKE 'interests.food.*'; {code} See: http://www.w3schools.com/sql/sql_wildcards.asp or http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm Note: I use \* in my example, although SQL uses \% and ? rather than the traditional \* and ? that "real programmers" use for "glob" characters. Take your pick, or maybe have a config option for that. Do we want to be strict SQL? I don't know - let the community decide! was (Author: jkrupan): Why not use the SQL "LIKE" keyword operator and just support trailing wildcard (AKA "prefix" query) for now? {code} SELECT * FROM profile WHERE profile_id = 123 AND attribute LIKE 'interests.food.*'; {code} See: http://www.w3schools.com/sql/sql_wildcards.asp or http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm > Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH) > -- > > Key: CASSANDRA-7637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7637 > Project: Cassandra > Issue Type: New Feature > Components: API >Reporter: Rustam Aliyev > Fix For: 3.0 > > > Currently, if I want to perform range query on lexical type I need to do > something like this: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute > 'interests.food.' AND > attribute < 'interests.food.z'; > {code} > This is very efficient range query. Yet, many users who are not familiar with > Thrift and storage level implementation are unaware of this "trick". > Therefore, it would be convenient to introduce CQL keyword which will do this > more simply: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute START_WITH('interests.food.'); > {code} > Keyword would have same restrictions as other inequality search operators > plus some type restrictions. > Allowed types would be: > * {{ascii}} > * {{text}} / {{varchar}} > * {{map}} (same for ascii) (?) > * {{set}} (same for ascii) (?) > (?) may require more work, therefore optional -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7637) Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH)
[ https://issues.apache.org/jira/browse/CASSANDRA-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078555#comment-14078555 ] Jack Krupansky commented on CASSANDRA-7637: --- Why not use the SQL "LIKE" keyword operator and just support trailing wildcard (AKA "prefix" query) for now? {code} SELECT * FROM profile WHERE profile_id = 123 AND attribute LIKE 'interests.food.*'; {code} See: http://www.w3schools.com/sql/sql_wildcards.asp or http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm > Add CQL3 keyword for efficient lexical range queries (e.g. START_WITH) > -- > > Key: CASSANDRA-7637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7637 > Project: Cassandra > Issue Type: New Feature > Components: API >Reporter: Rustam Aliyev > > Currently, if I want to perform range query on lexical type I need to do > something like this: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute > 'interests.food.' AND > attribute < 'interests.food.z'; > {code} > This is very efficient range query. Yet, many users who are not familiar with > Thrift and storage level implementation are unaware of this "trick". > Therefore, it would be convenient to introduce CQL keyword which will do this > more simply: > {code} > SELECT * FROM profile WHERE profile_id = 123 AND > attribute START_WITH('interests.food.'); > {code} > Keyword would have same restrictions as other inequality search operators > plus some type restrictions. > Allowed types would be: > * {{ascii}} > * {{text}} / {{varchar}} > * {{inet}} (?) > * {{map}} (same for ascii) (?) > * {{set}} (same for ascii) (?) > (?) may require more work, therefore optional -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6384) "CREATE TABLE ..." execution time increases linearly with number of existing column families
[ https://issues.apache.org/jira/browse/CASSANDRA-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078263#comment-14078263 ] Jack Krupansky commented on CASSANDRA-6384: --- Is this related to total column families on the cluster, or total for a single keyspace? And is this with slab/arena allocation enabled or disabled (CASSANDRA-5935)? > "CREATE TABLE ..." execution time increases linearly with number of existing > column families > > > Key: CASSANDRA-6384 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6384 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.0.2 > x86_64 GNU/Linux (RHEL) >Reporter: Anne Sullivan >Assignee: Ryan McGuire >Priority: Minor > > During creation of 9K column families, the time to execute the "CREATE TABLE" > statement increased linearly from 100ms to 15min. Tried issuing the > statements using both the Java Driver (2.0.0-beta2) and cqlsh (4.1.0), with > the same result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075416#comment-14075416 ] Jack Krupansky commented on CASSANDRA-6977: --- bq. 10K column families are distributed between different keyspaces It would be helpful to have some guidance to offer to data modelers. For example, would it be better to have 100 keyspaces with 100 tables each, 10 keyspaces with 1,000 tables each, 50 keyspaces with 200 tables each, 200 keyspaces with 50 tables each, or... each table in a different key space? Maybe we should go back and build upon the traditional guidance of "hundreds" of tables, and use that as the guidance for a single keyspace. So, that would suggest that 50 keyspaces with 200 tables each would be a better "sweet spot" for 1,000 tables in a cluster. That still leaves open the question of whether a single table per keyspace with 1,000 keyspaces would be just as as viable. Maybe the final guidance could be "no more than a few hundred tables per keyspace." > attempting to create 10K column families fails with 100 node cluster > > > Key: CASSANDRA-6977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6977 > Project: Cassandra > Issue Type: Bug > Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances >Reporter: Daniel Meyer >Assignee: Ryan McGuire >Priority: Minor > Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, > keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv > > > During this test we are attempting to create a total of 1K keyspaces with 10 > column families each to bring the total column families to 10K. With a 5 > node cluster this operation can be completed; however, it fails with 100 > nodes. Please see the two charts. For the 5 node case the time required to > create each keyspace and subsequent 10 column families increases linearly > until the number of keyspaces is 1K. For a 100 node cluster there is a > sudden increase in latency between 450 keyspaces and 550 keyspaces. The test > ends when the test script times out. After the test script times out it is > impossible to reconnect to the cluster with the datastax python driver > because it cannot connect to the host: > cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', > {'10.199.5.98': OperationTimedOut()} > It was found that running the following stress command does work from the > same machine the test script runs on. > cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT > It should be noted that this test was initially done with DSE 4.0 and c* > version 2.0.5.24 and in that case it was not possible to run stress against > the cluster even locally on a node due to not finding the host. > Attached are system logs from one of the nodes, charts showing schema > creation latency for 5 and 100 node clusters and virtualvm tracer data for > cpu, memory, num_threads and gc runs, tpstat output and the test script. > The test script was on an m1.large aws instance outside of the cluster under > test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7527) Bump CQL version and update doc for 2.1
[ https://issues.apache.org/jira/browse/CASSANDRA-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070795#comment-14070795 ] Jack Krupansky commented on CASSANDRA-7527: --- What is the official Apache landing place for the CQL doc? If I google "cassandra cql3", I find: https://cassandra.apache.org/doc/cql3/CQL.html But I don't find any links to that web page on the Apache Cassandra site - just links to the DataStax doc. Also if I look in the doc/CQL3 directory on the Apache site I see the following {code} CQL-1.2.html2014-03-19 16:47 81K CQL-2.0.html2014-06-30 09:09 92K CQL.css 2012-07-13 09:15 2.0K CQL.html2014-06-30 09:09 92K {code} Will there be a CQL-2.1.html for the c\* 2.1 doc CQL, or will CQL-2.0.html be overwritten? And again I was unable to find any links to CQL-2.0.html or CQL-1.2.html on the Apache site. I mean, it would be nice to have a clean web link to consult 2.0 doc even when 2.1 goes GA. I tried to google "Cassandra 2.0 cql doc", but it doesn't find that CQL-2.0.html page or find the 1.2 page when I search for 1.2. Finally, will this official Apache C\* 2.1 CQL doc be available on the web real soon, or only at 2.1 GA? > Bump CQL version and update doc for 2.1 > --- > > Key: CASSANDRA-7527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7527 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Tyler Hobbs > Fix For: 2.1.0 > > Attachments: 7527-v2.txt, 7527.txt > > > It appears we forgot to bump the CQL version for new 2.1 features (UDT, tuple > type, collection indexing), nor did we update the textile doc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7372) Exception when querying a composite-keyed table with a collection index
[ https://issues.apache.org/jira/browse/CASSANDRA-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029750#comment-14029750 ] Jack Krupansky commented on CASSANDRA-7372: --- (Nit: The description says "composite-keyed table", but the example is for a "compound key".) > Exception when querying a composite-keyed table with a collection index > --- > > Key: CASSANDRA-7372 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7372 > Project: Cassandra > Issue Type: Bug >Reporter: Ghais Issa >Assignee: Mikhail Stepura > Fix For: 2.1 rc2 > > Attachments: CASSANDRA-2.1-7372-v3.patch > > > Given the following schema: > {code} > CREATE TABLE products ( > account text, > id int, > categories set, > PRIMARY KEY (account, id) > ); > CREATE INDEX cat_index ON products(categories); > {code} > The following query fails with an exception > {code} > SELECT * FROM products WHERE account = 'xyz' AND categories CONTAINS 'lmn'; > errors={}, last_host=127.0.0.1 > {code} > The exception in cassandra's log is: > {code} > WARN 17:01:49 Uncaught exception on thread > Thread[SharedPool-Worker-2,5,main]: {} > java.lang.RuntimeException: java.lang.IndexOutOfBoundsException > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2015) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_25] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:162) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:103) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] > Caused by: java.lang.IndexOutOfBoundsException: null > at > org.apache.cassandra.db.composites.Composites$EmptyComposite.get(Composites.java:60) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.index.composites.CompositesIndexOnCollectionKey.makeIndexColumnPrefix(CompositesIndexOnCollectionKey.java:78) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.index.composites.CompositesSearcher.makePrefix(CompositesSearcher.java:82) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.index.composites.CompositesSearcher.getIndexedIterator(CompositesSearcher.java:116) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.index.composites.CompositesSearcher.search(CompositesSearcher.java:68) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.index.SecondaryIndexManager.search(SecondaryIndexManager.java:589) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.ColumnFamilyStore.search(ColumnFamilyStore.java:2060) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.db.RangeSliceCommand.executeLocally(RangeSliceCommand.java:131) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1368) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2011) > ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] > ... 4 common frames omitted > {code} > The following query however works > {code} > SELECT * FROM products WHERE categories CONTAINS 'lmn'; > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)