Re: Maximum memory usage

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Are you running any nodetool commands during that period? IIRC, this is a log 
entry emitted by the BufferPool. It may be harm unless it's happening very 
often or logging a OOM.
Dinesh 

On Wednesday, February 6, 2019, 6:19:42 AM PST, Rahul Reddy 
 wrote:  
 
 Hello,
I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked
INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

Thanks in advance  

Re: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
You also want to use Cassandra with a minimum of 3 nodes.
Dinesh 

On Wednesday, February 6, 2019, 11:26:07 PM PST, dinesh.jo...@yahoo.com 
 wrote:  
 
 Hey Kunal,
Can you add more details about the size of data, read/write throughput, what 
are your latency expectations, etc? What do you mean by "performance" issue 
with replication? Without these details it's a bit tough to answer your 
questions.
Dinesh 

On Wednesday, February 6, 2019, 3:47:05 PM PST, Kunal 
 wrote:  
 
 HI All,
I need some recommendation on using two datacenters with one node in each 
datacenter. 
 
In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 
 Thanks in anticipation. 
 Regards,Kunal Vaid

Re: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Hey Kunal,
Can you add more details about the size of data, read/write throughput, what 
are your latency expectations, etc? What do you mean by "performance" issue 
with replication? Without these details it's a bit tough to answer your 
questions.
Dinesh 

On Wednesday, February 6, 2019, 3:47:05 PM PST, Kunal 
 wrote:  
 
 HI All,
I need some recommendation on using two datacenters with one node in each 
datacenter. 
 
In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 
 Thanks in anticipation. 
 Regards,Kunal Vaid  

Re: Bootstrap keeps failing

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Would it be possible for you to take a thread dump & logs and share them?
Dinesh 

On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
 wrote:  
 
 Hello !
I am having a recurrent problem when trying to bootstrap a few new nodes.
Some general info :    
   - I am running cassandra 3.0.17
   - We have about 30 nodes in our cluster
   - All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra   

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.
`nodetool status` says the node is still joining, 
When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :   
   - The node fills up to 100% disk space and crashes.
   - The bootstrap resume finishes with errors
When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.
This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: 2114%)

[2019-02-06 01:42:14,980] Stream failed

[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed

[2019-02-06 01:42:14,982] Resume bootstrap complete

  The bootstrap `progress` goes way over 100% and eventually fails.

Right now I have a node with this output from `nodetool status` : `UJ  
10.16.XX.YYY  2.93 TB    256          ?                 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`
It is almost filled with data, yet if I look at `nodetool netstats` :
        Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
        Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
        Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
        Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
        Receiving 424 files, 281.25 GB total. Already received 1 files, 1.3 GB 
total
        Receiving 581 files, 349.26 GB total. Already received 8 files, 45.96 
MB total
        Receiving 443 files, 337.26 GB total. Already received 6 files, 96.15 
MB total
        Receiving 424 files, 275.23 GB total. Already received 5 files, 42.67 
MB total

It is trying to pull all the data again.
Am I missing something about the way `nodetool bootstrap resume` is supposed to 
be used ?
Regards,
Leo
  

Re: SASI queries- cqlsh vs java driver

2019-02-06 Thread Peter Heitman
Yes, I have read the material. The problem is that the application has a
query facility available to the user where they can type in "(A = foo AND B
= bar) OR C = chex" where A, B, and C are from a defined list of terms,
many of which are columns in the mytable below while others are from other
tables. This query facility was implemented and shipped years before we
decided to move to Cassandra

On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
wrote:

> The problem is you’re not using a query first design.  I would recommend
> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
> Carpenter and Eben Hewitt.  It’s available free online at this link
> 
> .
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:33 PM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
> inefficient data model. I haven't figured out how to do it correctly yet
>
> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
> wrote:
>
> Exactly.  When you design your data model correctly you shouldn’t have to
> use ALLOW FILTERING in the queries.  That is not recommended.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> You are completely right! My problem is that I am trying to port code for
> SQL to CQL for an application that provides the user with a relatively
> general search facility. The original implementation didn't worry about
> secondary indexes - it just took advantage of the ability to create
> arbitrarily complex queries with inner joins, left joins, etc. I am
> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
> and ORs in the application. Of course once I get enough of this implemented
> I will have to load up the table with a large data set and see if it gives
> acceptable performance for our use case.
>
> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
> wrote:
>
> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
> more to reduce both columns per table and SASI indexes per table?  Eight
> SASI indexes on one table seems like a lot.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Tuesday, February 05, 2019 6:59 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> The table and secondary indexes look generally like this. Note that I have
> changed the names of many of the columns to be generic since they aren't
> important to the question as far as I know. I left the actual names for
> those columns that I've created SASI indexes for. The query I use to try to
> create a PreparedStatement is:
>
>
>
> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>
>
>
> the schema cql statements are:
>
>
>
> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>
>   id text,
>
>   sql_id bigint,
>
>   cpe_id text,
>
>   sw_version text,
>
>   hw_version text,
>
>   manufacturer text,
>
>   product_class text,
>
>   manufacturer_oui text,
>
>   description text,
>
>   periodic_inform_interval text,
>
>   restricted_mode_enabled text,
>
>   restricted_mode_reason text,
>
>   type text,
>
>   model_name text,
>
>   serial text,
>
>   mac text,
>
>text,
>
>   generic0 timestamp,
>
>   household_id text,
>
>   generic1 int,
>
>   generic2 text,
>
>   generic3 text,
>
>   generic4 int,
>
>   generic5 int,
>
>   generic6 text,
>
>   generic7 text,
>
>   generic8 text,
>
>   generic9 text,
>
>   generic10 text,
>
>   generic11 timestamp,
>
>   generic12 text,
>
>   generic13 text,
>
>   generic14 timestamp,
>
>   generic15 text,
>
>   generic16 text,
>
>   generic17 text,
>
>   generic18 text,
>
>   generic19 text,
>
>   generic20 text,
>
>   generic21 text,
>
>   generic22 text,
>
>   generic23 text,
>
>   generic24 text,
>
>   generic25 text,
>
>   generic26 text,
>
>   generic27 text,
>
>   generic28 int,
>
>   generic29 int,
>
>   generic30 text,
>
>   generic31 text,
>
>   generic32 text,
>
>   generic33 text,
>
>   generic34 text,
>
>   generic35 int,
>
>   generic36 int,
>
>   generic37 int,
>
>   generic38 int,
>
>   generic39 text,
>
>   generic40 text,
>
>   generic41 text,
>
>   generic42 text,
>
>   generic43 text,
>
>   generic44 text,
>
>   generic45 text,
>
>   PRIMARY KEY (id)
>
> );
>
>
>
> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>
>
>
> CREATE CUSTOM INDEX IF 

How to read the Index.db file

2019-02-06 Thread Pranay akula
I was trying to get all the partition of a particular SSTable, i have tried
reading Index,db file  i can read some part of it but not all of it , is
there any way to convert it to readable format?


Thanks
Pranay


RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman
The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this link 

 .

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:33 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet 

On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman  
wrote:

Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'orgapache.c

Re: SASI queries- cqlsh vs java driver

2019-02-06 Thread Peter Heitman
Yes, I "know" that allow filtering is a sign of a (possibly fatal)
inefficient data model. I haven't figured out how to do it correctly yet

On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
wrote:

> Exactly.  When you design your data model correctly you shouldn’t have to
> use ALLOW FILTERING in the queries.  That is not recommended.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> You are completely right! My problem is that I am trying to port code for
> SQL to CQL for an application that provides the user with a relatively
> general search facility. The original implementation didn't worry about
> secondary indexes - it just took advantage of the ability to create
> arbitrarily complex queries with inner joins, left joins, etc. I am
> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
> and ORs in the application. Of course once I get enough of this implemented
> I will have to load up the table with a large data set and see if it gives
> acceptable performance for our use case.
>
> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
> wrote:
>
> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
> more to reduce both columns per table and SASI indexes per table?  Eight
> SASI indexes on one table seems like a lot.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Tuesday, February 05, 2019 6:59 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> The table and secondary indexes look generally like this. Note that I have
> changed the names of many of the columns to be generic since they aren't
> important to the question as far as I know. I left the actual names for
> those columns that I've created SASI indexes for. The query I use to try to
> create a PreparedStatement is:
>
>
>
> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>
>
>
> the schema cql statements are:
>
>
>
> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>
>   id text,
>
>   sql_id bigint,
>
>   cpe_id text,
>
>   sw_version text,
>
>   hw_version text,
>
>   manufacturer text,
>
>   product_class text,
>
>   manufacturer_oui text,
>
>   description text,
>
>   periodic_inform_interval text,
>
>   restricted_mode_enabled text,
>
>   restricted_mode_reason text,
>
>   type text,
>
>   model_name text,
>
>   serial text,
>
>   mac text,
>
>text,
>
>   generic0 timestamp,
>
>   household_id text,
>
>   generic1 int,
>
>   generic2 text,
>
>   generic3 text,
>
>   generic4 int,
>
>   generic5 int,
>
>   generic6 text,
>
>   generic7 text,
>
>   generic8 text,
>
>   generic9 text,
>
>   generic10 text,
>
>   generic11 timestamp,
>
>   generic12 text,
>
>   generic13 text,
>
>   generic14 timestamp,
>
>   generic15 text,
>
>   generic16 text,
>
>   generic17 text,
>
>   generic18 text,
>
>   generic19 text,
>
>   generic20 text,
>
>   generic21 text,
>
>   generic22 text,
>
>   generic23 text,
>
>   generic24 text,
>
>   generic25 text,
>
>   generic26 text,
>
>   generic27 text,
>
>   generic28 int,
>
>   generic29 int,
>
>   generic30 text,
>
>   generic31 text,
>
>   generic32 text,
>
>   generic33 text,
>
>   generic34 text,
>
>   generic35 int,
>
>   generic36 int,
>
>   generic37 int,
>
>   generic38 int,
>
>   generic39 text,
>
>   generic40 text,
>
>   generic41 text,
>
>   generic42 text,
>
>   generic43 text,
>
>   generic44 text,
>
>   generic45 text,
>
>   PRIMARY KEY (id)
>
> );
>
>
>
> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
> (serial)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
> (cpe_id)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
> mykeyspace.mytable (manufacturer)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonToke

RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman
Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable 
(manufacturer)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON mykeyspace.mytable 
(manufacturer_oui)

   USING 'org.apache.cassandra.index.sasiSASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable 
(hw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTI

Re: SASI queries- cqlsh vs java driver

2019-02-06 Thread Peter Heitman
You are completely right! My problem is that I am trying to port code for
SQL to CQL for an application that provides the user with a relatively
general search facility. The original implementation didn't worry about
secondary indexes - it just took advantage of the ability to create
arbitrarily complex queries with inner joins, left joins, etc. I am
reimplimenting it to create a parse tree of CQL queries and doing the ANDs
and ORs in the application. Of course once I get enough of this implemented
I will have to load up the table with a large data set and see if it gives
acceptable performance for our use case.

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
wrote:

> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
> more to reduce both columns per table and SASI indexes per table?  Eight
> SASI indexes on one table seems like a lot.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Tuesday, February 05, 2019 6:59 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> The table and secondary indexes look generally like this. Note that I have
> changed the names of many of the columns to be generic since they aren't
> important to the question as far as I know. I left the actual names for
> those columns that I've created SASI indexes for. The query I use to try to
> create a PreparedStatement is:
>
>
>
> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>
>
>
> the schema cql statements are:
>
>
>
> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>
>   id text,
>
>   sql_id bigint,
>
>   cpe_id text,
>
>   sw_version text,
>
>   hw_version text,
>
>   manufacturer text,
>
>   product_class text,
>
>   manufacturer_oui text,
>
>   description text,
>
>   periodic_inform_interval text,
>
>   restricted_mode_enabled text,
>
>   restricted_mode_reason text,
>
>   type text,
>
>   model_name text,
>
>   serial text,
>
>   mac text,
>
>text,
>
>   generic0 timestamp,
>
>   household_id text,
>
>   generic1 int,
>
>   generic2 text,
>
>   generic3 text,
>
>   generic4 int,
>
>   generic5 int,
>
>   generic6 text,
>
>   generic7 text,
>
>   generic8 text,
>
>   generic9 text,
>
>   generic10 text,
>
>   generic11 timestamp,
>
>   generic12 text,
>
>   generic13 text,
>
>   generic14 timestamp,
>
>   generic15 text,
>
>   generic16 text,
>
>   generic17 text,
>
>   generic18 text,
>
>   generic19 text,
>
>   generic20 text,
>
>   generic21 text,
>
>   generic22 text,
>
>   generic23 text,
>
>   generic24 text,
>
>   generic25 text,
>
>   generic26 text,
>
>   generic27 text,
>
>   generic28 int,
>
>   generic29 int,
>
>   generic30 text,
>
>   generic31 text,
>
>   generic32 text,
>
>   generic33 text,
>
>   generic34 text,
>
>   generic35 int,
>
>   generic36 int,
>
>   generic37 int,
>
>   generic38 int,
>
>   generic39 text,
>
>   generic40 text,
>
>   generic41 text,
>
>   generic42 text,
>
>   generic43 text,
>
>   generic44 text,
>
>   generic45 text,
>
>   PRIMARY KEY (id)
>
> );
>
>
>
> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
> (serial)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
> (cpe_id)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
> mykeyspace.mytable (manufacturer)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
> mykeyspace.mytable (manufacturer_oui)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable
> (hw_version)
>
>USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>WITH OPTIONS = {'mode': 'CON

RE: Bootstrap keeps failing

2019-02-06 Thread Kenneth Brotman
Not sure off hand why that is happening but could you try bootstrapping that 
node from scratch again or try a different new node?

 

Kenneth Brotman

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Wednesday, February 06, 2019 9:15 AM
To: user@cassandra.apache.org
Subject: Bootstrap keeps failing

 

Hello !

 

I am having a recurrent problem when trying to bootstrap a few new nodes.

 

Some general info : 

*   I am running cassandra 3.0.17
*   We have about 30 nodes in our cluster
*   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

 

`nodetool status` says the node is still joining, 

 

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

1.  The node fills up to 100% disk space and crashes.
2.  The bootstrap resume finishes with errors

When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

 

This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,980] session with /1016.XX.ZZZ complete (progress: 2114%)

[2019-02-06 01:42:14,980] Stream failed

[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed

[2019-02-06 01:42:14,982] Resume bootstrap complete

  

The bootstrap `progress` goes way over 100% and eventually fails.

 

 

Right now I have a node with this output from `nodetool status` : 

`UJ  10.16.XX.YYY  2.93 TB256  ? 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`

 

It is almost filled with data, yet if I look at `nodetool netstats` :

Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
Receiving 424 files, 281.25 GB total. Already received 1 files, 1.3 GB 
total
Receiving 581 files, 349.26 GB total. Already received 8 files, 45.96 
MB total
Receiving 443 files, 337.26 GB total. Already received 6 files, 96.15 
MB total
Receiving 424 files, 275.23 GB total. Already received 5 files, 42.67 
MB total

 

It is trying to pull all the data again.

 

Am I missing something about the way `nodetool bootstrap resume` is supposed to 
be used ?

 

Regards,

 

Leo

 



RE: Maximum memory usage

2019-02-06 Thread Kenneth Brotman
Can you give us the “nodetool tablehistograms”

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Wednesday, February 06, 2019 6:19 AM
To: user@cassandra.apache.org
Subject: Maximum memory usage

 

Hello,

 

I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked

 

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

 

Thanks in advance



RE: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread Kenneth Brotman
Hi Kunal,

 

The short answer is absolutely not; that’s not what Cassandra is for.  
Cassandra is a distributed database for when you have to much data for one 
machine.

 

Kenneth Brotman

 

From: Kunal [mailto:kunal.v...@gmail.com] 
Sent: Wednesday, February 06, 2019 3:47 PM
To: user@cassandra.apache.org
Subject: Two datacenters with one cassandra node in each datacenter

 

HI All,

 

I need some recommendation on using two datacenters with one node in each 
datacenter. 

In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 

Thanks in anticipation. 

 

Regards,

Kunal Vaid



Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread Kunal
HI All,

I need some recommendation on using two datacenters with one node in each
datacenter.

In our organization, We are trying to have two cassandra dataceters with
only 1 node on each side. From the preliminary investigation, I see
replication is happening but I want to know if we can use this deployment
in production? Will there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and
replication is working fine.

Can you please let me know if this kind of setup is recommended for
production deployment.
Thanks in anticipation.

Regards,
Kunal Vaid


Bootstrap keeps failing

2019-02-06 Thread Léo FERLIN SUTTON
Hello !

I am having a recurrent problem when trying to bootstrap a few new nodes.

Some general info :

   - I am running cassandra 3.0.17
   - We have about 30 nodes in our cluster
   - All healthy nodes have between 60% to 90% used disk space on
   /var/lib/cassandra

So I create a new node and let auto_bootstrap do it's job. After a few days
the bootstrapping node stops streaming new data but is still not a member
of the cluster.

`nodetool status` says the node is still joining,

When this happens I run `nodetool bootstrap resume`. This usually ends up
in two different ways :

   1. The node fills up to 100% disk space and crashes.
   2. The bootstrap resume finishes with errors

When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
does not resume but restarts a full transfer of every data from every node.

This is the output I get from `nodetool resume` :

> [2019-02-06 01:39:14,369] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:39:16,821] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:39:17,003] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
>> 2113%)
>
> [2019-02-06 01:41:15,160] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:42:02,864] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:42:09,284] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:42:10,522] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:42:10,622] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
>> (progress: 2113%)
>
> [2019-02-06 01:42:11,925] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
>> (progress: 2114%)
>
> [2019-02-06 01:42:14,887] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
>> (progress: 2114%)
>
> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
>> 2114%)
>
> [2019-02-06 01:42:14,980] Stream failed
>
> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>
> [2019-02-06 01:42:14,982] Resume bootstrap complete
>
>
The bootstrap `progress` goes way over 100% and eventually fails.


Right now I have a node with this output from `nodetool status` :
`UJ  10.16.XX.YYY  2.93 TB256  ?
 5788f061-a3c0-46af-b712-ebeecd397bf7  c`

It is almost filled with data, yet if I look at `nodetool netstats` :

> Receiving 480 files, 325.39 GB total. Already received 5 files,
> 68.32 MB total
> Receiving 499 files, 328.96 GB total. Already received 1 files,
> 1.32 GB total
> Receiving 506 files, 345.33 GB total. Already received 6 files,
> 24.19 MB total
> Receiving 362 files, 206.73 GB total. Already received 7 files, 34
> MB total
> Receiving 424 files, 281.25 GB total. Already received 1 files,
> 1.3 GB total
> Receiving 581 files, 349.26 GB total. Already received 8 files,
> 45.96 MB total
> Receiving 443 files, 337.26 GB total. Already received 6 files,
> 96.15 MB total
> Receiving 424 files, 275.23 GB total. Already received 5 files,
> 42.67 MB total


It is trying to pull all the data again.

Am I missing something about the way `nodetool bootstrap resume` is
supposed to be used ?

Regards,

Leo


Re: Revive a downed node with a different IP address

2019-02-06 Thread Jeff Jirsa
On Wed, Feb 6, 2019 at 5:47 AM Antoine d'Otreppe 
wrote:

> Hi all,
>
> New to Cassandra, I'm trying to wrap my head around how dead nodes should
> be revived.
>
>
> Specifically, we deployed our cluster in Kubernetes, which means that
> nodes that go down will lose their IP address. When restarted, it is
> possible that:
>
> 1. their IP address changes
>

This in itself is not a problem, but


> 2. their new IP address is that of another downed node.
>

This ends up being a huge problem in cassandra with K8s. Since we use just
the bare IP as the key for some data structures, re-using the IP of another
down instance basically (incorrectly) removes it from the ring.


>
> I spent the last two days looking for, and reading, possible solutions
> online. However I could not find any recent or working solution (any link
> would be appreciated). I've seen plenty of hacks where people would define
> one k8s service per node, but that sounds like a burdensome and fragile
> solution.
>

What you may want to consider as a workaround is starting a pod and then
interrogating the assigned IP to see if it already exists / DOWN in the
cluster before you issue the start command for Cassandra itself.


>
> My current understanding is that a node should be able to be revived and
> get its missing data from hinted handoff if it wasn't down longer than
> max_hint_handoff_windom.
>
Or, if that window is exceeded, a repair would be needed. In any case, it's
> possible that the data is still available, and I'd like to avoid having to
> stream everything from zero from the other nodes.
>

This is correct. If you don't do deletes, then the max_hint_handoff_window
becomes MUCH less important.


>
> I also looked into -Dcassandra.replace_address, but I feel like that would
> trigger a new token assignment, and again lots of streaming.
>

It puts a new instance on top of the old, down instance. Strictly speaking,
you'd want to run repair BEFORE you start streaming or you violate
consistency, so you'd have to repair, then re-stream a whole instance of
data.


>
> Finally there's one thing unclear to me as of yet (forgetting that dynamic
> IP address and kubernetes stuff): say I have several downed nodes, in the
> "DN" state. When one of those nodes is restarted, will it go through the
> "UJ" state?
>

No, it'll go straight to UN


> In other words, can I restart all downed nodes at once, or should I still
> respect the 2 minute rule?
>

You can restart them all at once.


>
> And how would that work with dynamic IP addresses?
>

As mentioned before, you need a way to avoid having a restarting instance
take the IP of another instance already in the cluster.


>
>
> tl;dr: is there any updated documentation on how to revive nodes
> consistently when static IP addresses can't be assigned?
>


Probably not documented, no.


>
>
> Best regards,
> Antoine
>


RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman
Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable 
(manufacturer)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON mykeyspace.mytable 
(manufacturer_oui)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable 
(hw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable 
(sw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON mykeyspace.mytable 
(household_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

 

On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov  
wrote:

Could you post full table schema (names obfuscated, if required) with index 
creation statements and queries?

 

On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet 
 wrote:

I’m not su

Maximum memory usage

2019-02-06 Thread Rahul Reddy
Hello,

I see maximum memory usage alerts in my system.log couple of times in a day
as INFO. So far I haven't seen any issue with db. Why those messages are
logged in system.log do we have any impact for reads/writes with those
warnings? And what nerd to be looked

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot
allocate chunk of 1.000MiB

Thanks in advance


Revive a downed node with a different IP address

2019-02-06 Thread Antoine d'Otreppe
Hi all,

New to Cassandra, I'm trying to wrap my head around how dead nodes should be 
revived.

Specifically, we deployed our cluster in Kubernetes, which means that nodes 
that go down will lose their IP address. When restarted, it is possible that:

1. their IP address changes
2. their new IP address is that of another downed node.

I spent the last two days looking for, and reading, possible solutions online. 
However I could not find any recent or working solution (any link would be 
appreciated). I've seen plenty of hacks where people would define one k8s 
service per node, but that sounds like a burdensome and fragile solution.

My current understanding is that a node should be able to be revived and get 
its missing data from hinted handoff if it wasn't down longer than 
max_hint_handoff_windom. Or, if that window is exceeded, a repair would be 
needed. In any case, it's possible that the data is still available, and I'd 
like to avoid having to stream everything from zero from the other nodes.

I also looked into -Dcassandra.replace_address, but I feel like that would 
trigger a new token assignment, and again lots of streaming.

Finally there's one thing unclear to me as of yet (forgetting that dynamic IP 
address and kubernetes stuff): say I have several downed nodes, in the "DN" 
state. When one of those nodes is restarted, will it go through the "UJ" state? 
In other words, can I restart all downed nodes at once, or should I still 
respect the 2 minute rule?

And how would that work with dynamic IP addresses?

tl;dr: is there any updated documentation on how to revive nodes consistently 
when static IP addresses can't be assigned?

Best regards,
Antoine