Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Thank you Jeff for the link.
Please do comment on the G1GC settings,if they are ok for the cluster.
Also comment on reducing the concurrent reads to 32 on all nodes in the
cluster.
As has earlier lead to reads getting dropped.
Will adding nodes to the cluster be helpful.

Thanks,
Rajsekhar Mallick



On Wed, 6 Feb, 2019, 1:12 PM Jeff Jirsa 
> https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
>
>
> --
> Jeff Jirsa
>
>
> On Feb 5, 2019, at 11:33 PM, Rajsekhar Mallick 
> wrote:
>
> Hello Jeff,
>
> Thanks for the reply.
> We do have GC logs enabled.
> We do observe gc pauses upto 2 seconds but quite often we see this issue
> even when the gc log reads good and clear.
>
> JVM Flags related to G1GC:
>
> Xms: 48G
> Xmx:48G
> Maxgcpausemillis=200
> Parallels gc threads=32
> Concurrent gc threads= 10
> Initiatingheapoccupancypercent=50
>
> You talked about dropping application page size. Please do elaborate on
> how to change the same.
> Reducing the concurrent reads to 32 does help as we have tried the
> same...the cpu load average remains under threshold...but read timeout
> keeps on happening.
>
> We will definitely try increasing the key cache sizes after verifying the
> current max heap usage in the cluster.
>
> Thanks,
> Rajsekhar Mallick
>
> On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa 
>> What you're potentially seeing is the GC impact of reading a large
>> partition - do you have GC logs or StatusLogger output indicating you're
>> pausing? What are you actual JVM flags you're using?
>>
>> Given your heap size, the easiest mitigation may be significantly
>> increasing your key cache size (up to a gigabyte or two, if needed).
>>
>> Yes, when you read data, it's materialized in memory (iterators from each
>> sstable are merged and sent to the client), so reading lots of rows from a
>> wide partition can cause GC pressure just from materializing the responses.
>> Dropping your application's paging size could help if this is the problem.
>>
>> You may be able to drop concurrent reads from 64 to something lower
>> (potentially 48 or 32, given your core count) to mitigate GC impact from
>> lots of objects when you have a lot of concurrent reads, or consider
>> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206
>> (which made reading wide partitions less expensive). STCS especially wont
>> help here - a large partition may be larger than you think, if it's
>> spanning a lot of sstables.
>>
>>
>>
>>
>> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick 
>> wrote:
>>
>>> Hello Team,
>>>
>>> Cluster Details:
>>> 1. Number of Nodes in cluster : 7
>>> 2. Number of CPU cores: 48
>>> 3. Swap is enabled on all nodes
>>> 4. Memory available on all nodes : 120GB
>>> 5. Disk space available : 745GB
>>> 6. Cassandra version: 2.1
>>> 7. Active tables are using size-tiered compaction strategy
>>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster
>>> wide)
>>> 9. Read latency 99%: 300 ms
>>> 10. Write Throughput : 1800 writes/s
>>> 11. Write Latency 99%: 50 ms
>>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed
>>> when they get compacted), tombstones)
>>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for
>>> the active tables
>>> 14. Heap size: 48 GB G1GC
>>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>>> 16. Number of concurrent reads: 64
>>> 17. Number of connections from clients on port 9042 stays almost
>>> constant (close to 1800)
>>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>>
>>> Problem Statement:
>>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and
>>> pending reads go upto 4000.
>>> 2. When the above happens Native-Transport-Stage gets full on
>>> neighbouring nodes(1024 max) and pending threads are also observed.
>>> 3. During this time, CPU load average rises, user % for Cassandra
>>> process reaches 90%
>>> 4. We see Read getting dropped, org.apache.cassandra.transport package
>>> errors of reads getting timeout is seen.
>>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>>> 6. No IOwait observed on any of the virtual cores, sjk ttop command
>>> shows max us% being used by “Worker Threads”
>>>
>>> I have trying hard to zero upon what is the exact issue.
>>> What I make out of these above observations is…there might be some slow
>>> queries, which get stuck on few nodes.
>>> Then there is a cascading effect wherein other queries get lined up.
>>> Unable to figure out any such slow queries up till now.
>>> As I mentioned, there are large partitions. We using size-tiered
>>> compaction strategy, hence a large partition might be spread across
>>> multiple stables.
>>> Can this fact lead to slow queries. I also tried to understand, that
>>> data in stables is stored in serialized format and when read into memory,
>>> it is unseralized. This would lead to a large object in memory which then
>>> needs to be transferred across the wire to the 

Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Jeff Jirsa

https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/


-- 
Jeff Jirsa


> On Feb 5, 2019, at 11:33 PM, Rajsekhar Mallick  
> wrote:
> 
> Hello Jeff,
> 
> Thanks for the reply.
> We do have GC logs enabled.
> We do observe gc pauses upto 2 seconds but quite often we see this issue even 
> when the gc log reads good and clear.
> 
> JVM Flags related to G1GC:
> 
> Xms: 48G
> Xmx:48G
> Maxgcpausemillis=200
> Parallels gc threads=32
> Concurrent gc threads= 10
> Initiatingheapoccupancypercent=50
> 
> You talked about dropping application page size. Please do elaborate on how 
> to change the same.
> Reducing the concurrent reads to 32 does help as we have tried the same...the 
> cpu load average remains under threshold...but read timeout keeps on 
> happening.
> 
> We will definitely try increasing the key cache sizes after verifying the 
> current max heap usage in the cluster.
> 
> Thanks,
> Rajsekhar Mallick
> 
>> On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa > What you're potentially seeing is the GC impact of reading a large partition 
>> - do you have GC logs or StatusLogger output indicating you're pausing? What 
>> are you actual JVM flags you're using? 
>> 
>> Given your heap size, the easiest mitigation may be significantly increasing 
>> your key cache size (up to a gigabyte or two, if needed).
>> 
>> Yes, when you read data, it's materialized in memory (iterators from each 
>> sstable are merged and sent to the client), so reading lots of rows from a 
>> wide partition can cause GC pressure just from materializing the responses. 
>> Dropping your application's paging size could help if this is the problem. 
>> 
>> You may be able to drop concurrent reads from 64 to something lower 
>> (potentially 48 or 32, given your core count) to mitigate GC impact from 
>> lots of objects when you have a lot of concurrent reads, or consider 
>> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206 
>> (which made reading wide partitions less expensive). STCS especially wont 
>> help here - a large partition may be larger than you think, if it's spanning 
>> a lot of sstables. 
>> 
>> 
>> 
>> 
>>> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick  
>>> wrote:
>>> Hello Team,
>>> 
>>> Cluster Details:
>>> 1. Number of Nodes in cluster : 7
>>> 2. Number of CPU cores: 48
>>> 3. Swap is enabled on all nodes
>>> 4. Memory available on all nodes : 120GB 
>>> 5. Disk space available : 745GB
>>> 6. Cassandra version: 2.1
>>> 7. Active tables are using size-tiered compaction strategy
>>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
>>> 9. Read latency 99%: 300 ms
>>> 10. Write Throughput : 1800 writes/s
>>> 11. Write Latency 99%: 50 ms
>>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed 
>>> when they get compacted), tombstones)
>>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the 
>>> active tables
>>> 14. Heap size: 48 GB G1GC
>>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>>> 16. Number of concurrent reads: 64
>>> 17. Number of connections from clients on port 9042 stays almost constant 
>>> (close to 1800)
>>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>> 
>>> Problem Statement:
>>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and 
>>> pending reads go upto 4000.
>>> 2. When the above happens Native-Transport-Stage gets full on neighbouring 
>>> nodes(1024 max) and pending threads are also observed.
>>> 3. During this time, CPU load average rises, user % for Cassandra process 
>>> reaches 90%
>>> 4. We see Read getting dropped, org.apache.cassandra.transport package 
>>> errors of reads getting timeout is seen.
>>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>>> 6. No IOwait observed on any of the virtual cores, sjk ttop command shows 
>>> max us% being used by “Worker Threads”
>>> 
>>> I have trying hard to zero upon what is the exact issue.
>>> What I make out of these above observations is…there might be some slow 
>>> queries, which get stuck on few nodes.
>>> Then there is a cascading effect wherein other queries get lined up.
>>> Unable to figure out any such slow queries up till now.
>>> As I mentioned, there are large partitions. We using size-tiered compaction 
>>> strategy, hence a large partition might be spread across multiple stables.
>>> Can this fact lead to slow queries. I also tried to understand, that data 
>>> in stables is stored in serialized format and when read into memory, it is 
>>> unseralized. This would lead to a large object in memory which then needs 
>>> to be transferred across the wire to the client.
>>> 
>>> Not sure what might be the reason. Kindly help on helping me understand 
>>> what might be the impact on read performance when we have large partitions.
>>> Kindly Suggest ways to catch these slow queries.
>>> Also do add if you see any other issues from the above details
>>> We are now 

Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Hello Jeff,

Thanks for the reply.
We do have GC logs enabled.
We do observe gc pauses upto 2 seconds but quite often we see this issue
even when the gc log reads good and clear.

JVM Flags related to G1GC:

Xms: 48G
Xmx:48G
Maxgcpausemillis=200
Parallels gc threads=32
Concurrent gc threads= 10
Initiatingheapoccupancypercent=50

You talked about dropping application page size. Please do elaborate on how
to change the same.
Reducing the concurrent reads to 32 does help as we have tried the
same...the cpu load average remains under threshold...but read timeout
keeps on happening.

We will definitely try increasing the key cache sizes after verifying the
current max heap usage in the cluster.

Thanks,
Rajsekhar Mallick

On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa  What you're potentially seeing is the GC impact of reading a large
> partition - do you have GC logs or StatusLogger output indicating you're
> pausing? What are you actual JVM flags you're using?
>
> Given your heap size, the easiest mitigation may be significantly
> increasing your key cache size (up to a gigabyte or two, if needed).
>
> Yes, when you read data, it's materialized in memory (iterators from each
> sstable are merged and sent to the client), so reading lots of rows from a
> wide partition can cause GC pressure just from materializing the responses.
> Dropping your application's paging size could help if this is the problem.
>
> You may be able to drop concurrent reads from 64 to something lower
> (potentially 48 or 32, given your core count) to mitigate GC impact from
> lots of objects when you have a lot of concurrent reads, or consider
> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206
> (which made reading wide partitions less expensive). STCS especially wont
> help here - a large partition may be larger than you think, if it's
> spanning a lot of sstables.
>
>
>
>
> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick 
> wrote:
>
>> Hello Team,
>>
>> Cluster Details:
>> 1. Number of Nodes in cluster : 7
>> 2. Number of CPU cores: 48
>> 3. Swap is enabled on all nodes
>> 4. Memory available on all nodes : 120GB
>> 5. Disk space available : 745GB
>> 6. Cassandra version: 2.1
>> 7. Active tables are using size-tiered compaction strategy
>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
>> 9. Read latency 99%: 300 ms
>> 10. Write Throughput : 1800 writes/s
>> 11. Write Latency 99%: 50 ms
>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed
>> when they get compacted), tombstones)
>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the
>> active tables
>> 14. Heap size: 48 GB G1GC
>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>> 16. Number of concurrent reads: 64
>> 17. Number of connections from clients on port 9042 stays almost constant
>> (close to 1800)
>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>
>> Problem Statement:
>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and
>> pending reads go upto 4000.
>> 2. When the above happens Native-Transport-Stage gets full on
>> neighbouring nodes(1024 max) and pending threads are also observed.
>> 3. During this time, CPU load average rises, user % for Cassandra process
>> reaches 90%
>> 4. We see Read getting dropped, org.apache.cassandra.transport package
>> errors of reads getting timeout is seen.
>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>> 6. No IOwait observed on any of the virtual cores, sjk ttop command shows
>> max us% being used by “Worker Threads”
>>
>> I have trying hard to zero upon what is the exact issue.
>> What I make out of these above observations is…there might be some slow
>> queries, which get stuck on few nodes.
>> Then there is a cascading effect wherein other queries get lined up.
>> Unable to figure out any such slow queries up till now.
>> As I mentioned, there are large partitions. We using size-tiered
>> compaction strategy, hence a large partition might be spread across
>> multiple stables.
>> Can this fact lead to slow queries. I also tried to understand, that data
>> in stables is stored in serialized format and when read into memory, it is
>> unseralized. This would lead to a large object in memory which then needs
>> to be transferred across the wire to the client.
>>
>> Not sure what might be the reason. Kindly help on helping me understand
>> what might be the impact on read performance when we have large partitions.
>> Kindly Suggest ways to catch these slow queries.
>> Also do add if you see any other issues from the above details
>> We are now considering to expand our cluster. Is the cluster under-sized.
>> Will addition of nodes help resolve the issue.
>>
>> Thanks,
>> Rajsekhar Mallick
>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: 

Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Jeff Jirsa
What you're potentially seeing is the GC impact of reading a large
partition - do you have GC logs or StatusLogger output indicating you're
pausing? What are you actual JVM flags you're using?

Given your heap size, the easiest mitigation may be significantly
increasing your key cache size (up to a gigabyte or two, if needed).

Yes, when you read data, it's materialized in memory (iterators from each
sstable are merged and sent to the client), so reading lots of rows from a
wide partition can cause GC pressure just from materializing the responses.
Dropping your application's paging size could help if this is the problem.

You may be able to drop concurrent reads from 64 to something lower
(potentially 48 or 32, given your core count) to mitigate GC impact from
lots of objects when you have a lot of concurrent reads, or consider
upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206
(which made reading wide partitions less expensive). STCS especially wont
help here - a large partition may be larger than you think, if it's
spanning a lot of sstables.




On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick 
wrote:

> Hello Team,
>
> Cluster Details:
> 1. Number of Nodes in cluster : 7
> 2. Number of CPU cores: 48
> 3. Swap is enabled on all nodes
> 4. Memory available on all nodes : 120GB
> 5. Disk space available : 745GB
> 6. Cassandra version: 2.1
> 7. Active tables are using size-tiered compaction strategy
> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
> 9. Read latency 99%: 300 ms
> 10. Write Throughput : 1800 writes/s
> 11. Write Latency 99%: 50 ms
> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed
> when they get compacted), tombstones)
> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the
> active tables
> 14. Heap size: 48 GB G1GC
> 15. Read timeout : 5000ms , Write timeouts: 2000ms
> 16. Number of concurrent reads: 64
> 17. Number of connections from clients on port 9042 stays almost constant
> (close to 1800)
> 18. Cassandra thread count also stays almost constant (close to 2000)
>
> Problem Statement:
> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and
> pending reads go upto 4000.
> 2. When the above happens Native-Transport-Stage gets full on neighbouring
> nodes(1024 max) and pending threads are also observed.
> 3. During this time, CPU load average rises, user % for Cassandra process
> reaches 90%
> 4. We see Read getting dropped, org.apache.cassandra.transport package
> errors of reads getting timeout is seen.
> 5. Read latency 99% reached 5seconds, client starts seeing impact.
> 6. No IOwait observed on any of the virtual cores, sjk ttop command shows
> max us% being used by “Worker Threads”
>
> I have trying hard to zero upon what is the exact issue.
> What I make out of these above observations is…there might be some slow
> queries, which get stuck on few nodes.
> Then there is a cascading effect wherein other queries get lined up.
> Unable to figure out any such slow queries up till now.
> As I mentioned, there are large partitions. We using size-tiered
> compaction strategy, hence a large partition might be spread across
> multiple stables.
> Can this fact lead to slow queries. I also tried to understand, that data
> in stables is stored in serialized format and when read into memory, it is
> unseralized. This would lead to a large object in memory which then needs
> to be transferred across the wire to the client.
>
> Not sure what might be the reason. Kindly help on helping me understand
> what might be the impact on read performance when we have large partitions.
> Kindly Suggest ways to catch these slow queries.
> Also do add if you see any other issues from the above details
> We are now considering to expand our cluster. Is the cluster under-sized.
> Will addition of nodes help resolve the issue.
>
> Thanks,
> Rajsekhar Mallick
>
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Hello Team,

Cluster Details:
1. Number of Nodes in cluster : 7
2. Number of CPU cores: 48
3. Swap is enabled on all nodes
4. Memory available on all nodes : 120GB 
5. Disk space available : 745GB
6. Cassandra version: 2.1
7. Active tables are using size-tiered compaction strategy
8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
9. Read latency 99%: 300 ms
10. Write Throughput : 1800 writes/s
11. Write Latency 99%: 50 ms
12. Known issues in the cluster ( Large Partitions(upto 560MB, observed when 
they get compacted), tombstones)
13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the 
active tables
14. Heap size: 48 GB G1GC
15. Read timeout : 5000ms , Write timeouts: 2000ms
16. Number of concurrent reads: 64
17. Number of connections from clients on port 9042 stays almost constant 
(close to 1800)
18. Cassandra thread count also stays almost constant (close to 2000)

Problem Statement:
1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and pending 
reads go upto 4000.
2. When the above happens Native-Transport-Stage gets full on neighbouring 
nodes(1024 max) and pending threads are also observed.
3. During this time, CPU load average rises, user % for Cassandra process 
reaches 90%
4. We see Read getting dropped, org.apache.cassandra.transport package errors 
of reads getting timeout is seen.
5. Read latency 99% reached 5seconds, client starts seeing impact.
6. No IOwait observed on any of the virtual cores, sjk ttop command shows max 
us% being used by “Worker Threads”

I have trying hard to zero upon what is the exact issue.
What I make out of these above observations is…there might be some slow 
queries, which get stuck on few nodes.
Then there is a cascading effect wherein other queries get lined up.
Unable to figure out any such slow queries up till now.
As I mentioned, there are large partitions. We using size-tiered compaction 
strategy, hence a large partition might be spread across multiple stables.
Can this fact lead to slow queries. I also tried to understand, that data in 
stables is stored in serialized format and when read into memory, it is 
unseralized. This would lead to a large object in memory which then needs to be 
transferred across the wire to the client.

Not sure what might be the reason. Kindly help on helping me understand what 
might be the impact on read performance when we have large partitions.
Kindly Suggest ways to catch these slow queries.
Also do add if you see any other issues from the above details
We are now considering to expand our cluster. Is the cluster under-sized. Will 
addition of nodes help resolve the issue.

Thanks,
Rajsekhar Mallick





-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: SASI queries- cqlsh vs java driver

2019-02-05 Thread Peter Heitman
The table and secondary indexes look generally like this. Note that I have
changed the names of many of the columns to be generic since they aren't
important to the question as far as I know. I left the actual names for
those columns that I've created SASI indexes for. The query I use to try to
create a PreparedStatement is:

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

the schema cql statements are:

CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
  id text,
  sql_id bigint,
  cpe_id text,
  sw_version text,
  hw_version text,
  manufacturer text,
  product_class text,
  manufacturer_oui text,
  description text,
  periodic_inform_interval text,
  restricted_mode_enabled text,
  restricted_mode_reason text,
  type text,
  model_name text,
  serial text,
  mac text,
   text,
  generic0 timestamp,
  household_id text,
  generic1 int,
  generic2 text,
  generic3 text,
  generic4 int,
  generic5 int,
  generic6 text,
  generic7 text,
  generic8 text,
  generic9 text,
  generic10 text,
  generic11 timestamp,
  generic12 text,
  generic13 text,
  generic14 timestamp,
  generic15 text,
  generic16 text,
  generic17 text,
  generic18 text,
  generic19 text,
  generic20 text,
  generic21 text,
  generic22 text,
  generic23 text,
  generic24 text,
  generic25 text,
  generic26 text,
  generic27 text,
  generic28 int,
  generic29 int,
  generic30 text,
  generic31 text,
  generic32 text,
  generic33 text,
  generic34 text,
  generic35 int,
  generic36 int,
  generic37 int,
  generic38 int,
  generic39 text,
  generic40 text,
  generic41 text,
  generic42 text,
  generic43 text,
  generic44 text,
  generic45 text,
  PRIMARY KEY (id)
);

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
(serial)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
(cpe_id)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable
(manufacturer)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
mykeyspace.mytable (manufacturer_oui)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable
(hw_version)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable
(sw_version)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON mykeyspace.mytable
(household_id)
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};


On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov 
wrote:

> Could you post full table schema (names obfuscated, if required) with
> index creation statements and queries?
>
> On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
>> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
>> could try to use OR instead of IN?
>>
>> SELECT blah FROM foo WHERE  = :val1 OR  =
>> :val2 ALLOW FILTERING
>>
>>
>>
>> It should be the same as IN query, but I don’t if it makes a difference
>> for performance.
>>
>>
>>
>> *From: *Peter Heitman 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Monday 4 February 2019 at 07:17
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *SASI queries- cqlsh vs java driver
>>

Cassandra 2.1.18 - NPE during startup

2019-02-05 Thread Steinmaurer, Thomas
Hello,

at a particular customer location, we are seeing the following NPE during 
startup with Cassandra 2.1.18.

INFO  [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - 
Opening 
/var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-130
 (256 bytes)
ERROR [main] 2019-02-03 13:32:56,552 CassandraDaemon.java:583 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.18.jar:2.1.18]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:664)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
... 3 common frames omitted

I found https://issues.apache.org/jira/browse/CASSANDRA-10501, but this should 
be fixed in 2.1.18.

Is the above log stating that it is caused by a system keyspace related SSTable?

This is a 3 node setup with 2 others running fine. If system table related and 
as LocalStrategy is used as replication strategy (to my knowledge), perhaps 
simply copying over data for the schema_keyspaces table from another node might 
fix it?

Any help appreciated.

Thanks.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Re: datamodelling

2019-02-05 Thread Jonathan Haddad
We (The Last Pickle) wrote a blog post on scaling time series:
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Rather than an agent_type, you can use a application determined bucket, so
that agents with more data use more buckets.  That'll keep your partition
sizes under control.  The blog post goes into a bit of detail, I won't
rehash it all here.  It's a pretty standard solution to this problem.

On Tue, Feb 5, 2019 at 11:38 AM Bobbie Haynes  wrote:

> even if i try to create a agent_type it will be same issue again because
> agent_id and agent_type have same values...
>
> On Tue, Feb 5, 2019 at 11:36 AM Bobbie Haynes 
> wrote:
>
>> unfortunately i do not have different of agents(agent_type) .. i only
>> have agent_id which is also a UUID type.
>>
>> On Tue, Feb 5, 2019 at 11:34 AM Nitan Kainth 
>> wrote:
>>
>>> You could consider a sudo column like agent_type and make it a compound
>>> partition key. It will limit break your partition into smaller ones but you
>>> will have to query with agent_id and agent_type in that case.
>>>
>>> On Tue, Feb 5, 2019 at 12:59 PM Bobbie Haynes 
>>> wrote:
>>>
 Hi Everyone,
   Could you please help me in modeling my table
 below.I'm stuck here. My Partition key is agent_id and clustering column is
 rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
 busy the agent .I'm facing large partition issue for my busy agents.
 I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
 ratio) and have deletes also in the table by agentid.


 CREATE TABLE IF NOT EXISTS XXX (
  agent_id UUID,
  row_id BIGINT,
  path TEXT,
  security_attributes TEXT,
  actor TEXT,
  PRIMARY KEY (agent_id,row_id)
 )

>>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: datamodelling

2019-02-05 Thread Bobbie Haynes
even if i try to create a agent_type it will be same issue again because
agent_id and agent_type have same values...

On Tue, Feb 5, 2019 at 11:36 AM Bobbie Haynes  wrote:

> unfortunately i do not have different of agents(agent_type) .. i only have
> agent_id which is also a UUID type.
>
> On Tue, Feb 5, 2019 at 11:34 AM Nitan Kainth 
> wrote:
>
>> You could consider a sudo column like agent_type and make it a compound
>> partition key. It will limit break your partition into smaller ones but you
>> will have to query with agent_id and agent_type in that case.
>>
>> On Tue, Feb 5, 2019 at 12:59 PM Bobbie Haynes 
>> wrote:
>>
>>> Hi Everyone,
>>>   Could you please help me in modeling my table
>>> below.I'm stuck here. My Partition key is agent_id and clustering column is
>>> rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
>>> busy the agent .I'm facing large partition issue for my busy agents.
>>> I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
>>> ratio) and have deletes also in the table by agentid.
>>>
>>>
>>> CREATE TABLE IF NOT EXISTS XXX (
>>>  agent_id UUID,
>>>  row_id BIGINT,
>>>  path TEXT,
>>>  security_attributes TEXT,
>>>  actor TEXT,
>>>  PRIMARY KEY (agent_id,row_id)
>>> )
>>>
>>


Re: datamodelling

2019-02-05 Thread Bobbie Haynes
unfortunately i do not have different of agents(agent_type) .. i only have
agent_id which is also a UUID type.

On Tue, Feb 5, 2019 at 11:34 AM Nitan Kainth  wrote:

> You could consider a sudo column like agent_type and make it a compound
> partition key. It will limit break your partition into smaller ones but you
> will have to query with agent_id and agent_type in that case.
>
> On Tue, Feb 5, 2019 at 12:59 PM Bobbie Haynes 
> wrote:
>
>> Hi Everyone,
>>   Could you please help me in modeling my table
>> below.I'm stuck here. My Partition key is agent_id and clustering column is
>> rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
>> busy the agent .I'm facing large partition issue for my busy agents.
>> I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
>> ratio) and have deletes also in the table by agentid.
>>
>>
>> CREATE TABLE IF NOT EXISTS XXX (
>>  agent_id UUID,
>>  row_id BIGINT,
>>  path TEXT,
>>  security_attributes TEXT,
>>  actor TEXT,
>>  PRIMARY KEY (agent_id,row_id)
>> )
>>
>


Re: datamodelling

2019-02-05 Thread Nitan Kainth
You could consider a sudo column like agent_type and make it a compound
partition key. It will limit break your partition into smaller ones but you
will have to query with agent_id and agent_type in that case.

On Tue, Feb 5, 2019 at 12:59 PM Bobbie Haynes  wrote:

> Hi Everyone,
>   Could you please help me in modeling my table
> below.I'm stuck here. My Partition key is agent_id and clustering column is
> rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
> busy the agent .I'm facing large partition issue for my busy agents.
> I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
> ratio) and have deletes also in the table by agentid.
>
>
> CREATE TABLE IF NOT EXISTS XXX (
>  agent_id UUID,
>  row_id BIGINT,
>  path TEXT,
>  security_attributes TEXT,
>  actor TEXT,
>  PRIMARY KEY (agent_id,row_id)
> )
>


datamodelling

2019-02-05 Thread Bobbie Haynes
Hi Everyone,
  Could you please help me in modeling my table
below.I'm stuck here. My Partition key is agent_id and clustering column is
rowid. Each agent can have a minimum of 1000 rows to 10M depends on how
busy the agent .I'm facing large partition issue for my busy agents.
I'm using SizeTieredCompaction here..The table has Writes/Reads (70/30
ratio) and have deletes also in the table by agentid.


CREATE TABLE IF NOT EXISTS XXX (
 agent_id UUID,
 row_id BIGINT,
 path TEXT,
 security_attributes TEXT,
 actor TEXT,
 PRIMARY KEY (agent_id,row_id)
)


DataStax Accelerate call for papers closing February 15

2019-02-05 Thread Patrick McFadin
*Hello my fellow Cassandra people!I’m not sure if you’ve heard, but
DataStax is hosting a conference this year, DataStax Accelerate. I’m in
charge of the speaker part of the conference, so I’m here to ask for some
help. I want a ton of awesome Cassandra content! The best part of our
community is when we share our challenges and successes with other users.
Last year we ran a lot of developer events and the #1 topic request from
attendees was use cases. There is a ton of new Cassandra users out there
and they want to learn from what you have done. Yes. I’m talking to you.
Here’s the big ask part. The deadline for talk submissions is February 15.
That’s 10 days away! Time to think of that awesome topic and get it in. Do
it now! -> http://bit.ly/DataStaxAccelerate-CallForPapers
But you may have
questions:Q: My use case is boring, why would anyone want to see it?A: I
can guarantee you faced some unique challenges and solved them in a unique
way. Talk sessions are 40 minutes and that forces you to stick to a few
talking points. Q: This is a DataStax conference so you are only looking
for DataStax customers, right?A: Nope! I’m looking for Cassandra use cases
no matter how you roll it out. Except no Cassandra 1.x use cases. Sorry
Carlos Rolo. I have to draw a line! Q: I have a cool use case but what if I
can’t get approval to talk by February 15? A: No worries! Just submit your
talk with a note in the abstract that it should be held pending approvals.
I’ll follow-up with you after the deadline and hold it until you get final
approval. I MAY have a few of those already in the queue. ;) Q: My talk
idea could be cool, can I get some help pulling it together? A: I’m here to
help. If you want to talk out your idea, feel free to email me or hit me on
a Twitter DM, @PatrickMcFadin If you still don’t want to submit a talk
after all that, no problem. We would love to see you in the halls. There
will a lot of new and familiar faces from the Cassandra Community. A great
time to network and pick up some great ideas. The technical keynote alone
will be worth the trip with back-to-back talks from Jonathan Ellis and Nate
McCall. Where: Gaylord Convention Center just outside of Washington
D.C.When: May 21-23, 2019Register and take 20% off on me! McFadin20 ->
http://bit.ly/DataStaxAccelerate And
finally, if you are new to Cassandra and you want to get some fast
experience building apps, we are hosting a Apache Cassandra Application
Bootcamp the day before the conference. I’ll be there teaching along with a
bunch of other Developer Relations folks from DataStax. We would love to
have you join us -> http://bit.ly/DataStaxAccelerate-Bootcamp
Thanks everyone. I hope to see
you at Accelerate in May.Patrick*


Re: coordinator failure handling

2019-02-05 Thread Eric Stevens
This will depend on what driver you're using at the client.  The Java
driver, for example, has ways to configure each of the things you
mentioned, with a variety of implementations you can choose from.  There
are also ways to provide your own custom implementation if you don't like
the options available.

On Tue, Feb 5, 2019 at 8:45 AM amit sehas  wrote:

> Sorry to bother you, i am just starting to look into Cassandra, and am
> confused about a lot of things.
>
> If a client sends a query to a coordinator, then if it does not receive a
> response from the co-ordinator then:
>
> a) is there a timeout at which client retries the query?
> b) is there somewhere we can specify the timeout?
> c) will it resend the query to the same co-ordinator, if not then how does
> it determine which coordinators to send it to?
>
> thanks
>


Re: coordinator failure handling

2019-02-05 Thread Tom Wollert
All below AFAIK

a) The query will only be retried after half the timeout has passed, if the
query is idempotent (you have to set that on prepare statement, otherwise
it will assume it isn't)
b) Querytimeout can be set globally in the
Cluster.Builder().WithQueryTimeout
c) The LoadBalancingPolicy should handle that. Generally use
TokenAwarePolicy it should round robin through nodes that are responsible
for the data you are trying to save/access

On Tue, 5 Feb 2019 at 15:45, amit sehas  wrote:

> Sorry to bother you, i am just starting to look into Cassandra, and am
> confused about a lot of things.
>
> If a client sends a query to a coordinator, then if it does not receive a
> response from the co-ordinator then:
>
> a) is there a timeout at which client retries the query?
> b) is there somewhere we can specify the timeout?
> c) will it resend the query to the same co-ordinator, if not then how does
> it determine which coordinators to send it to?
>
> thanks
>


-- 
Development Director

| T: 0800 021 0888 | M: 0790 4489797 | www.codeweavers.net |
| Codeweavers Limited | Barn 4 | Dunston Business Village | Dunston | ST18
9AB |
| Registered in England and Wales No. 04092394 | VAT registration no. 974
9705 63 |

 CUSTOMERS' BLOG  TWITTER
   FACEBOOK
  LINKED
IN  DEVELOPERS' BLOG
  YOUTUBE


-- 
 



What's happened at Codeweavers in 2018? 
 
l *Codeweavers 2018 Stats & Trends 
*



*Phone:* 0800 021 0888   Email: contac...@codeweavers.net 

Codeweavers Ltd | Barn 4 | Dunston 
Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 
04092394 | VAT registration no. 974 9705 63 



 
  
  




coordinator failure handling

2019-02-05 Thread amit sehas
Sorry to bother you, i am just starting to look into Cassandra, and am confused 
about a lot of things.
If a client sends a query to a coordinator, then if it does not receive a 
response from the co-ordinator then:
a) is there a timeout at which client retries the query?b) is there somewhere 
we can specify the timeout?c) will it resend the query to the same 
co-ordinator, if not then how does it determine which coordinators to send it 
to?
thanks

Re: upgrading cassandra

2019-02-05 Thread Jeff Jirsa
The 3.0 branch is slightly different than the 3.11 branch

For you, going to 3.0.18 would be a minor version upgrade while going to 3.11.4 
would be a major version upgrade

3.11 would give you access to features like CDC, SASI and performance 
improvements like better large partition support and the in process chunk cache 

Note that I said 3.0.18 and 3.11.4 - new releases should come out this week or 
early next, and I encourage you to wait for these new versions 

-- 
Jeff Jirsa


> On Feb 5, 2019, at 3:03 AM, Adil  wrote:
> 
> Hi,
> we have a cluster with cassandra 3.0.9 and we are going to upgrade it to the 
> latest version which i think is 3.11.3 but a teamate told me that the latest 
> version is 3.0.17.
> what is the latest stable version?
> 
> thanks in advance.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



upgrading cassandra

2019-02-05 Thread Adil
Hi,
we have a cluster with cassandra 3.0.9 and we are going to upgrade it to
the latest version which i think is 3.11.3 but a teamate told me that the
latest version is 3.0.17.
what is the latest stable version?

thanks in advance.


Re: SASI queries- cqlsh vs java driver

2019-02-05 Thread Oleksandr Petrov
Could you post full table schema (names obfuscated, if required) with index
creation statements and queries?

On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:

> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
> could try to use OR instead of IN?
>
> SELECT blah FROM foo WHERE  = :val1 OR  =
> :val2 ALLOW FILTERING
>
>
>
> It should be the same as IN query, but I don’t if it makes a difference
> for performance.
>
>
>
> *From: *Peter Heitman 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday 4 February 2019 at 07:17
> *To: *"user@cassandra.apache.org" 
> *Subject: *SASI queries- cqlsh vs java driver
>
>
>
> When I create a SASI index on a secondary column, from cqlsh I can execute
> a query
>
>
>
> SELECT blah FROM foo WHERE  IN ('mytext') ALLOW FILTERING;
>
>
>
> but not from the java driver:
>
>
>
> SELECT blah FROM foo WHERE  IN :val ALLOW FILTERING
>
>
>
> Here I get an exception
>
>
>
> com.datastax.driver.core.exceptions.InvalidQueryException: IN predicates
> on non-primary-key columns () is not yet supported
>
> at
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
> ~[cassandra-driver-core-3.6.0.jar:na]
>
>
>
> Why are they different? Is there anything I can do with the java driver to
> get past this exception?
>
>
>
> Peter
>
>
>
>
>


-- 
alex p