date:20180312

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

Also to check:

 

You should use the same list of seeds, probably two in each data center if you 
will have five nodes in each, in all the yaml files.  All the seeds node 
addresses from all the data centers listed in each yaml file where it says 
“-seeds:”.  I’m not sure from your previous replies if you’re doing that.

 

Let us know your results.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar

Re: Cassandra vs MySQL

2018-03-12 Thread Satendra

Cassandra is going to be die in next few time (What I see) - Cassandra
is not solving the purpose rather people are facing fewer issue
sometime where in virtual environments.

We have tried crdb database cluster and migrated few of cluster over
on the cockroach database environment, it seems working having said
the relational as nature.

Saen

On 3/13/18, Matija Gobec  wrote:
> Hi Oliver,
>
> Few years back I had a similar problem where there was a lot of data in
> MySQL and it was starting to choke. I migrated data to Cassandra, ran
> benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
> If you have a use case for Cassandra the answer is yes, but keep in mind
> that there are some use cases like relational problems which can be hard to
> solve with Cassandra and I tend to keep them in relational database. That
> being said, I don't think you can benchmark these two head to head since
> they basically solve different problems and Cassandra is distributed by
> design.
>
> Best,
> Matija
>
> On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth  wrote:
>
>> Hi,
>>
>> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker 
>> wrote:
>>
>>> We have a project currently using MySQL single-node with 5-6TB of data
>>> and some performance issues, and we plan to add data up to a total size
>>> of
>>> maybe 25-30TB.
>>>
>>
>> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
>> of MySQL. Maybe it will be faster, maybe it will be totally unusable,
>> based
>> on your use-case and database scheme.
>>
>> Is there some good more recent material?
>>>
>>
>> Are you able to completely redesign your database schema? :)
>>
>> Bye,
>> Gábor Auth
>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: system.size_estimates - safe to remove sstables?

2018-03-12 Thread Kunal Gangakhedkar

No, this is a different cluster.

Kunal

On 13-Mar-2018 6:27 AM, "Kenneth Brotman" 
wrote:

Kunal,

Is  this the GCE cluster you are speaking of in the “Adding new DC?” thread?

Kenneth Brotman

*From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
*Sent:* Sunday, March 11, 2018 2:18 PM
*To:* user@cassandra.apache.org
*Subject:* Re: system.size_estimates - safe to remove sstables?

Finally, got a chance to work on it over the weekend.

It worked as advertised. :)

Thanks a lot, Chris.

Kunal

On 8 March 2018 at 10:47, Kunal Gangakhedkar 
wrote:

Thanks a lot, Chris.

Will try it today/tomorrow and update here.

Thanks,

Kunal

On 7 March 2018 at 00:25, Chris Lohfink  wrote:

While its off you can delete the files in the directory yeah

Chris

On Mar 6, 2018, at 2:35 AM, Kunal Gangakhedkar 
wrote:

Hi Chris,

I checked for snapshots and backups - none found.

Also, we're not using opscenter, hadoop or spark or any such tool.

So, do you think we can just remove the cf and restart the service?

Thanks,

Kunal

On 5 March 2018 at 21:52, Chris Lohfink  wrote:

Any chance space used by snapshots? What files exist there that are taking
up space?

> On Mar 5, 2018, at 1:02 AM, Kunal Gangakhedkar 
wrote:
>

> Hi all,
>
> I have a 2-node cluster running cassandra 2.1.18.
> One of the nodes has run out of disk space and died - almost all of it
shows up as occupied by size_estimates CF.
> Out of 296GiB, 288GiB shows up as consumed by size_estimates in 'du -sh'
output.
>
> This is while the other node is chugging along - shows only 25MiB
consumed by size_estimates (du -sh output).
>
> Any idea why this descripancy?
> Is it safe to remove the size_estimates sstables from the affected node
and restart the service?
>
> Thanks,
> Kunal

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org

For additional commands, e-mail: user-h...@cassandra.apache.org

Re: command to view yaml file setting in use on console

2018-03-12 Thread Anthony Grasso

Hi Kenneth,

In addition to CASSANDRA-7622, it may help to inspect the Cassandra
*system.log* and look for the following entry:

INFO  [main] ... - Node configuration:[...]

The content of "Node configuration" will have the settings the node is
using.

Regards,
Anthony



On Tue, 13 Mar 2018 at 12:50, Kenneth Brotman 
wrote:

> You say the nicest things!
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Monday, March 12, 2018 6:43 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: command to view yaml file setting in use on console
>
>
>
> Cassandra-7622 went patch available today
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 12, 2018, at 6:40 PM, Kenneth Brotman 
> wrote:
>
> Is there a command, perhaps a nodetool command to view the actual yaml
> settings a node is using so you can confirm it is using the changes to a
> yaml file you made?
>
>
>
> Kenneth Brotman
>
>

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

While we are looking into all this I feel compelled to ask you to check your 
security configurations now that you are using public addresses to communicate 
inter-node across data centers.  Are you sure you are using best practices?  

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE:

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x

Re: Anomaly detection

2018-03-12 Thread Fernando Ipar

Hello Salvatore,

On Mon, Mar 12, 2018 at 2:12 PM, D. Salvatore 
wrote:

> Hi Rahul,
> I was mainly thinking about performance anomaly detection but I am also
> interested in other types such as fault detection, data or queries
> anomalies.
>

I know VividCortex (http://vividcortex.com) supports Cassandra (2.1 or
greater) and I also know it does automatic (they call it adaptive) fault
detection for MySQL. I took a quick look at their website and could not
find an explicit list of features they support for Cassandra but it's
possible that fault detection is one of them too, so if SaaS is an option
I'd recommend you take a look at them.

Regards,

Fernando Ipar

http://fernandoipar.com

RE: command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman

You say the nicest things!

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, March 12, 2018 6:43 PM
To: user@cassandra.apache.org
Subject: Re: command to view yaml file setting in use on console

Cassandra-7622 went patch available today 

-- 

Jeff Jirsa

On Mar 12, 2018, at 6:40 PM, Kenneth Brotman  
wrote:

Is there a command, perhaps a nodetool command to view the actual yaml settings 
a node is using so you can confirm it is using the changes to a yaml file you 
made?

Kenneth Brotman

Re: command to view yaml file setting in use on console

2018-03-12 Thread Jeff Jirsa

Cassandra-7622 went patch available today 

-- 
Jeff Jirsa


> On Mar 12, 2018, at 6:40 PM, Kenneth Brotman  
> wrote:
> 
> Is there a command, perhaps a nodetool command to view the actual yaml 
> settings a node is using so you can confirm it is using the changes to a yaml 
> file you made?
>  
> Kenneth Brotman

Re: Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Jeff Jirsa

There’s a bit of nuance in that there are some undocumented situations in some 
versions where we may reload seeds from yaml without notice - notably when 
instances come online and we decided whether or not to gossip with them.

That’s not really intended, and fixed in recent versions

-- 
Jeff Jirsa

> On Mar 12, 2018, at 6:29 PM, Lerh Chuan Low  wrote:
> 
> To my knowledge for any version updates to cassandra.yaml will only be 
> applied after you restart the node..
> 
>> On 13 March 2018 at 12:24, Kenneth Brotman  
>> wrote:
>> Can you update changes to cassandra.yaml in version 2.1x without restating 
>> the node?
>> 
>>  
>> 
>> Kenneth Brotman
>> 
>

command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman

Is there a command, perhaps a nodetool command to view the actual yaml
settings a node is using so you can confirm it is using the changes to a
yaml file you made?

 

Kenneth Brotman

Re: Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Lerh Chuan Low

To my knowledge for any version updates to cassandra.yaml will only be
applied after you restart the node..

On 13 March 2018 at 12:24, Kenneth Brotman 
wrote:

> Can you update changes to cassandra.yaml in version 2.1x without restating
> the node?
>
>
>
> Kenneth Brotman
>

Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Kenneth Brotman

Can you update changes to cassandra.yaml in version 2.1x without restating
the node?

 

Kenneth Brotman

RE: system.size_estimates - safe to remove sstables?

2018-03-12 Thread Kenneth Brotman

Kunal,

Is  this the GCE cluster you are speaking of in the “Adding new DC?” thread?

Kenneth Brotman

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 2:18 PM
To: user@cassandra.apache.org
Subject: Re: system.size_estimates - safe to remove sstables?

Finally, got a chance to work on it over the weekend.

It worked as advertised. :)

Thanks a lot, Chris.

Kunal

On 8 March 2018 at 10:47, Kunal Gangakhedkar  wrote:

Thanks a lot, Chris.

Will try it today/tomorrow and update here.

Thanks,

Kunal

On 7 March 2018 at 00:25, Chris Lohfink  wrote:

While its off you can delete the files in the directory yeah

Chris

On Mar 6, 2018, at 2:35 AM, Kunal Gangakhedkar  wrote:

Hi Chris,

I checked for snapshots and backups - none found.

Also, we're not using opscenter, hadoop or spark or any such tool.

So, do you think we can just remove the cf and restart the service?

Thanks,

Kunal

On 5 March 2018 at 21:52, Chris Lohfink  wrote:

Any chance space used by snapshots? What files exist there that are taking up 
space?

> On Mar 5, 2018, at 1:02 AM, Kunal Gangakhedkar  
> wrote:
>

> Hi all,
>
> I have a 2-node cluster running cassandra 2.1.18.
> One of the nodes has run out of disk space and died - almost all of it shows 
> up as occupied by size_estimates CF.
> Out of 296GiB, 288GiB shows up as consumed by size_estimates in 'du -sh' 
> output.
>
> This is while the other node is chugging along - shows only 25MiB consumed by 
> size_estimates (du -sh output).
>
> Any idea why this descripancy?
> Is it safe to remove the size_estimates sstables from the affected node and 
> restart the service?
>
> Thanks,
> Kunal

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 

For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-12 Thread Jeff Jirsa

On Mon, Mar 12, 2018 at 3:58 PM, Carl Mueller 
wrote:

>  Rocksandra can expand out it's non-java footprint without rearchitecting
> the java codebase. Or are there serious concerns with Datastax and the
> binary protocols?
>
>
Rockssandra should eventually become part of Cassandra. The pluggable
storage has other benefits beyond avoiding JVM garbage.

I don't know what "concerns with Datastax and the binary protocols" means,
Apache Cassandra owns the protocol, not any company or driver.

Re: What snitch to use with AWS and Google

2018-03-12 Thread Jeff Jirsa

GPFS

-- 
Jeff Jirsa


> On Mar 12, 2018, at 4:31 PM, Kenneth Brotman  
> wrote:
> 
> Quick question:  If you have one cluster made of nodes of a datacenter in AWS 
> and a datacenter in Google, what snitch do you use?
>  
> Kenneth Brotman

Re: What snitch to use with AWS and Google

2018-03-12 Thread Lerh Chuan Low

I would just go with GossipingPropertyFileSnitch, it will work across both
data centers (I once had a test cluster with 1 DC in Azure, 1 DC in AWS and
1 DC in GCP using GPFS). Even if it's just solely AWS, I think GPFS is
superior because you can configure virtual racks if you ever need it while
EC2Snitch you are at the mercy of AWS. You have to put in a little extra to
configure rackdc.properties though, but I think it's worth it :)

On 13 March 2018 at 10:40, Madhu-Nosql  wrote:

> Kenneth,
>
> For AWS -EC2Snitch(if DC in Single Region)
> For Google- Better go with GossipingPropertyFileSnitch
>
> Thanks,
> Madhu
>
> On Mon, Mar 12, 2018 at 6:31 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
>> Quick question:  If you have one cluster made of nodes of a datacenter in
>> AWS and a datacenter in Google, what snitch do you use?
>>
>>
>>
>> Kenneth Brotman
>>
>
>

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar

Yes, that's correct. The customer wants us to migrate the cassandra setup
in their AWS account.

Thanks,
Kunal

On 13 March 2018 at 04:56, Kenneth Brotman 
wrote:

> I didn’t understand something.  Are you saying you are using one data
> center on Google and one on Amazon?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Monday, March 12, 2018 4:24 PM
> *To:* user@cassandra.apache.org
> *Cc:* Nikhil Soman
> *Subject:* Re: [EXTERNAL] RE: Adding new DC?
>
>
>
>
>
> On 13 March 2018 at 03:28, Kenneth Brotman 
> wrote:
>
> You can’t migrate and upgrade at the same time perhaps but you could do
> one and then the other so as to end up on new version.  I’m guessing it’s
> an error in the yaml file or a port not open.  Is there any good reason for
> a production cluster to still be on version 2.1x?
>
>
>
> I'm not trying to migrate AND upgrade at the same time. However, the apt
> repo shows only 2.120 as the available version.
>
> This is the output from the new node in AWS
>
>
>
> ubuntu@ip-10-0-43-213:*~*$ apt-cache policy cassandra
> cassandra:
>  Installed: 2.1.20
>  Candidate: 2.1.20
>  Version table:
> *** 2.1.20 500
>500 http://www.apache.org/dist/cassandra/debian 21x/main amd64
> Packages
>100 /var/lib/dpkg/status
>
> Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node
> into GCE nodes.
>
> As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE
> firewall for the public IP of the AWS instance.
>
>
>
> I mentioned earlier - there are some differences in the column types - for
> example, date (>= 2.2) vs. timestamp (2.1.x)
>
> The application has not been updated yet.
>
> Hence sticking to 2.1.x for now.
>
>
>
> And, so far, 2.1.x has been serving the purpose.
>
> Kunal
>
>
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Durity, Sean R [mailto:sean_r_dur...@homedepot.com]
> *Sent:* Monday, March 12, 2018 11:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] RE: Adding new DC?
>
>
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com
> ]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docsdatastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> 
>
>
>
> Will add more nodes once the first one joins successfully.
>
>
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
>
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS,

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar

On 13 March 2018 at 04:54, Kenneth Brotman 
wrote:

> Kunal,
>
>
>
> Please provide the following setting from the yaml files you  are using:
>
>
>
> seeds:
>

In GCE: seeds: "10.142.14.27"
In AWS (new node being added): seeds:
"35.196.96.247,35.227.127.245,35.196.241.232" (these are the public IP
addresses of 3 nodes from GCE)

 I have verified that I am able to do cqlsh from the AWS instance to all 3
ip addresses.


> listen_address:
>

We use the listen_interface setting instead of listen_address.

In GCE: listen_interface: eth0 (running ubuntu 14.04 LTS)
In AWS: listen_interface: ens3 (running ubuntu 16.04 LTS)


> broadcast_address:
>

I tried setting broadcast_address to one instance in GCE: broadcast_address:
35.196.96.247

In AWS: broadcast_address: 13.127.89.251 (this is the public/elastic IP of
the node in AWS)

rpc_address:
>

Like listen_address, we use rpc_interface.
In GCE: rpc_interface:  eth0
In AWS: rpc_interface:  ens3


> endpoint_snitch:
>

In both setups, we currently use GossipingPropertyFileSnitch.
The cassandra-rackdc.properties files from both setups:
GCE:
dc=DC1
rack=RAC1

AWS:
dc=DC2
rack=RAC1



> auto_bootstrap:
>

When the google cloud instances started up, we hadn't set this explicitly -
so, they started off with default value (auto_bootstrap: true)
However, as outlined in the datastax doc for adding new dc, I had added
'auto_bootstrap: false' to the google cloud instances (not restarted the
service as per the doc).

In the AWS instance, I had added 'auto_bootstrap: false' - the doc says we
need to do "nodetool rebuild" and hence no automatic bootstrapping.
But, haven't gotten to that step yet.

Thanks,
Kunal


>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Monday, March 12, 2018 4:13 PM
> *To:* user@cassandra.apache.org
> *Cc:* Nikhil Soman
> *Subject:* Re: [EXTERNAL] RE: Adding new DC?
>
>
>
>
>
> On 13 March 2018 at 00:06, Durity, Sean R 
> wrote:
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> I'm not trying to upgrade as of now - first priority is the migration.
>
> We can look at version upgrade later on.
>
>
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> I just tried setting the broadcast_address in one of the instances in GCE
> to its public IP and restarted the service.
>
> However, it now shows all other nodes (in GCE) as DN in nodetool status
> output and the other nodes also report this node as DN with its
> internal/private IP address.
>
>
>
> I also tried setting the broadcast_rpc_address to the internal/private IP
> address - still the same.
>
>
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> No. of nodes: 5
>
> RF: 3
>
> Data size (as reported by the load factor in nodetool status output):
> ~30GB per node
>
>
>
> Thanks,
> Kunal
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhedkar@gmailcom
> ]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
>

Re: What snitch to use with AWS and Google

2018-03-12 Thread Madhu-Nosql

Kenneth,

For AWS -EC2Snitch(if DC in Single Region)
For Google- Better go with GossipingPropertyFileSnitch

Thanks,
Madhu

On Mon, Mar 12, 2018 at 6:31 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Quick question:  If you have one cluster made of nodes of a datacenter in
> AWS and a datacenter in Google, what snitch do you use?
>
>
>
> Kenneth Brotman
>

What snitch to use with AWS and Google

2018-03-12 Thread Kenneth Brotman

Quick question:  If you have one cluster made of nodes of a datacenter in
AWS and a datacenter in Google, what snitch do you use?

 

Kenneth Brotman

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docsdatastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 

 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

 

In cassandra.yaml file,

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar

On 13 March 2018 at 03:28, Kenneth Brotman 
wrote:

> You can’t migrate and upgrade at the same time perhaps but you could do
> one and then the other so as to end up on new version.  I’m guessing it’s
> an error in the yaml file or a port not open.  Is there any good reason for
> a production cluster to still be on version 2.1x?
>

I'm not trying to migrate AND upgrade at the same time. However, the apt
repo shows only 2.1.20 as the available version.
This is the output from the new node in AWS

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra
cassandra:
 Installed: 2.1.20
 Candidate: 2.1.20
 Version table:
*** 2.1.20 500
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64
Packages
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node
into GCE nodes.
As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE
firewall for the public IP of the AWS instance.

I mentioned earlier - there are some differences in the column types - for
example, date (>= 2.2) vs. timestamp (2.1.x)
The application has not been updated yet.
Hence sticking to 2.1.x for now.

And, so far, 2.1.x has been serving the purpose.

Kunal


>
> Kenneth Brotman
>
>
>
> *From:* Durity, Sean R [mailto:sean_r_dur...@homedepot.com]
> *Sent:* Monday, March 12, 2018 11:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] RE: Adding new DC?
>
>
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com
> ]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> 
>
>
>
> Will add more nodes once the first one joins successfully.
>
>
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
>
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
>
>
>
> When I start cassandra service on the AWS instance, I see the version
> handshake msgs in the logs trying to connect to the public IPs of the GCE
> nodes:
>
> OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
>
> However, nodetool status output on both sides don't show the other side at
> all. That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS
> setup doesn't show old DC (dc=DC1).
>
>
>
> In cassandra.yaml file, I'm only using listen_interface and rpc_interface
> settings - no explicit IP addresses used - so, ends up using the internal
> private

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

Please provide the following setting from the yaml files you  are using:

 

seeds: 

listen_address: 

broadcast_address: 

rpc_address: 

endpoint_snitch: 

auto_bootstrap: 

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:13 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 00:06, Durity, Sean R  wrote:

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

I'm not trying to upgrade as of now - first priority is the migration.

We can look at version upgrade later on.

 

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

I just tried setting the broadcast_address in one of the instances in GCE to 
its public IP and restarted the service.

However, it now shows all other nodes (in GCE) as DN in nodetool status output 
and the other nodes also report this node as DN with its internal/private IP 
address.

 

I also tried setting the broadcast_rpc_address to the internal/private IP 
address - still the same.

 

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

No. of nodes: 5

RF: 3

Data size (as reported by the load factor in nodetool status output): ~30GB per 
node

 

Thanks,
Kunal

 

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk.

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhedkar@gmailcom 
 ] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 

 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

 

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

 

Do I need to explicitly add the broadcast_address? for both side?

Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

 

I would prefer a non-restart option.

 

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

 

Thanks,


Kunal

 

 

  _  


The information in this Internet

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar

On 13 March 2018 at 00:06, Durity, Sean R 
wrote:

> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>

I'm not trying to upgrade as of now - first priority is the migration.
We can look at version upgrade later on.


>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>

I just tried setting the broadcast_address in one of the instances in GCE
to its public IP and restarted the service.
However, it now shows all other nodes (in GCE) as DN in nodetool status
output and the other nodes also report this node as DN with its
internal/private IP address.

I also tried setting the broadcast_rpc_address to the internal/private IP
address - still the same.


>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>

No. of nodes: 5
RF: 3
Data size (as reported by the load factor in nodetool status output): ~30GB
per node

Thanks,
Kunal


>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> 
>
>
>
> Will add more nodes once the first one joins successfully.
>
>
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
>
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
>
>
>
> When I start cassandra service on the AWS instance, I see the version
> handshake msgs in the logs trying to connect to the public IPs of the GCE
> nodes:
>
> OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
>
> However, nodetool status output on both sides don't show the other side at
> all. That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS
> setup doesn't show old DC (dc=DC1).
>
>
>
> In cassandra.yaml file, I'm only using listen_interface and rpc_interface
> settings - no explicit IP addresses used - so, ends up using the internal
> private IP ranges.
>
>
>
> Do I need to explicitly add the broadcast_address? for both side?
>
> Would that require restarting of cassandra service on GCE side? Or is it
> possible to change that setting on-the-fly without a restart?
>
>
>
> I would prefer a non-restart option.
>
>
>
> PS: The cassandra version running in GCE is 2.1.18 while the new node
> setup in AWS is running 2.1.20 - just in case if that's relevant
>
>
>
> Thanks,
>
> Kunal
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-12 Thread Carl Mueller

Again, I'd really like to get a feel for scylla vs rocksandra vs cassandra.

Isn't the driver binary protocol the easiest / least redesign level of
storage engine swapping? Scylla and Cassandra and Rocksandra are currently
three options. Rocksandra can expand out it's non-java footprint without
rearchitecting the java codebase. Or are there serious concerns with
Datastax and the binary protocols?

On Tue, Mar 6, 2018 at 12:42 PM, Goutham reddy 
wrote:

> It’s an interesting conversation. For more details about the pluggable
> storage engine here is the link.
>
> Blog:
> https://thenewstack.io/instagram-supercharges-cassandra-pluggable-rocksdb-
> storage-engine/
>
> JIRA:
> https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/CASSANDRA-13475
>
>
> On Tue, Mar 6, 2018 at 9:01 AM Kenneth Brotman
>  wrote:
>
>> Just released on DataStax Distributed Data Show, DiKang Gu of Instagram
>> interviewed by author Jeff Carpenter.
>>
>> Found it really interesting:  Shadow clustering, migrating from 2.2 to
>> 3.0, using the Rocks DB as a pluggable storage engine for Cassandra
>>
>> https://academy.datastax.com/content/distributed-data-show-
>> episode-37-cassandra-instagram-dikang-gu
>>
>>
>>
>> Kenneth Brotman
>>
> --
> Regards
> Goutham Reddy
>

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk.

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 

 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

 

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

 

Do I need to explicitly add the broadcast_address? for both side?

Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

 

I would prefer a non-restart option.

 

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

 

Thanks,


Kunal

 

 

  _  


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any

Re: TWCS enabling tombstone compaction

2018-03-12 Thread Lerh Chuan Low

Dear Lucas,

Those properties that result in the log message you are seeing are
properties common to all compaction strategies. See http://cassandra.apache.
org/doc/latest/operating/compaction.html#common-options. They are
*tombstone_compaction_interval
*and *tombstone_threshold*. If you didn't define them when you created your
table, then you will see the log message. I'm not fully certain what the
intent is, but in a TWCS setting you should only rely on TTLs and not run
checks for including SSTables with dropping tombstones (so it may save a
little bit of computation there). DTCS has the same property, which you can
find detail in this JIRA https://issues.apache.
org/jira/browse/CASSANDRA-9234.

You shouldn't be seeing it all the time though unless you are constantly
creating and dropping tables. Hope this helps in some way :)

On 10 March 2018 at 04:38, Lucas Benevides 
wrote:

> Dear community,
>
> I have been using TWCS in my lab, with TTL'd data.
> In the debug log there is always the sentence:
> "TimeWindowCompactionStrategy.java:65 Disabling tombstone compactions for
> TWCS". Indeed, the line is always repeated.
>
> What does it actually mean? If my data gets expired, the TWCS is already
> working and purging the SSTables that become expired. It surely sound
> strange to me to disable tombstone compaction.
>
> In the subcompaction subproperties there are only two subproperties,
> compaction_window_unit and compaction_window_size. Jeff already told us
> that the STCS properties also apply to TWCS, although it is not in the
> documentation.
>
> Thanks in advance,
> Lucas Benevides Dias
>

Re: Cassandra vs MySQL

2018-03-12 Thread Matija Gobec

Hi Oliver,

Few years back I had a similar problem where there was a lot of data in
MySQL and it was starting to choke. I migrated data to Cassandra, ran
benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
If you have a use case for Cassandra the answer is yes, but keep in mind
that there are some use cases like relational problems which can be hard to
solve with Cassandra and I tend to keep them in relational database. That
being said, I don't think you can benchmark these two head to head since
they basically solve different problems and Cassandra is distributed by
design.

Best,
Matija

On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth  wrote:

> Hi,
>
> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker 
> wrote:
>
>> We have a project currently using MySQL single-node with 5-6TB of data
>> and some performance issues, and we plan to add data up to a total size of
>> maybe 25-30TB.
>>
>
> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
> of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
> on your use-case and database scheme.
>
> Is there some good more recent material?
>>
>
> Are you able to completely redesign your database schema? :)
>
> Bye,
> Gábor Auth
>
>

Re: Cassandra vs MySQL

2018-03-12 Thread Gábor Auth

Hi,

On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker  wrote:

> We have a project currently using MySQL single-node with 5-6TB of data and
> some performance issues, and we plan to add data up to a total size of
> maybe 25-30TB.
>

There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
on your use-case and database scheme.

Is there some good more recent material?
>

Are you able to completely redesign your database schema? :)

Bye,
Gábor Auth

Cassandra vs MySQL

2018-03-12 Thread Oliver Ruebenacker

 Hello,

  We have a project currently using MySQL single-node with 5-6TB of data
and some performance issues, and we plan to add data up to a total size of
maybe 25-30TB.

  We are thinking of migrating to Cassandra. I have been trying to find
benchmarks or other guidelines to compare MySQL and Cassandra, but most of
them seem to be five years old or older.

  Is there some good more recent material?

  Thanks!

 Best, Oliver

-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
, Broad Institute

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Durity, Sean R

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

Sean Durity
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

Hi Kenneth,

Replies inline below.

On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
Hi Kunal,

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?


Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

Besides, this is production setup - so, cannot take risk.
Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?


No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

Thanks,
Kunal
Kenneth Brotman

From: Kunal Gangakhedkar 
[mailto:kgangakhed...@gmail.com]
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

Hi all,

We currently have a cluster in GCE for one of the customers.
They want it to be migrated to AWS.

I have setup one node in AWS to join into the cluster by following:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Will add more nodes once the first one joins successfully.

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:
OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

Do I need to explicitly add the broadcast_address? for both side?
Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

I would prefer a non-restart option.

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

Thanks,
Kunal




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: What versions should the documentation support now?

2018-03-12 Thread Jon Haddad

Docs for 3.0 go in the 3.0 branch.

I’ve never heard of anyone shipping docs for multiple versions, I don’t know 
why we’d do that.  You can get the docs for any version you need by downloading 
C*, the docs are included.  I’m a firm -1 on changing that process.

Jon

> On Mar 12, 2018, at 9:19 AM, Kenneth Brotman  
> wrote:
> 
> It seems like the documentation that should be in the trunk for version 3.0, 
> should include information for users of version 3.0 and 2.1; the 
> documentation that should in 4.0 (when its released), should include 
> information for users of 4.0 and at least one previous version, etc. 
>  
> How about if we do it that way?
>  
> Kenneth Brotman
>  
> From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
> Sent: Monday, March 12, 2018 9:10 AM
> To: user@cassandra.apache.org
> Subject: Re: What versions should the documentation support now?
>  
> Right now they can’t.
> On Mon, Mar 12, 2018 at 9:03 AM Kenneth Brotman  > wrote:
>> I see how that makes sense Jon but how does a user then select the 
>> documentation for the version they are running on the Apache Cassandra web 
>> site?
>>  
>> Kenneth Brotman
>>  
>> From: Jonathan Haddad [mailto:j...@jonhaddad.com 
>> ] 
>> Sent: Monday, March 12, 2018 8:40 AM
>> 
>> To: user@cassandra.apache.org 
>> Subject: Re: What versions should the documentation support now?
>>  
>> The docs are in tree, meaning they are versioned, and should be written for 
>> the version they correspond to. Trunk docs should reflect the current state 
>> of trunk, and shouldn’t have caveats for other versions. 
>> On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman > > wrote:
>>> If we use DataStax’s example, we would have instructions for v3.0 and v2.1. 
>>>  How’s that?  
>>>  
>>> We should have to be instructions for the cloud platforms like AWS but how 
>>> do you do that and stay vendor neutral?
>>>  
>>> Kenneth Brotman
>>>  
>>> From: Hannu Kröger [mailto:hkro...@gmail.com ] 
>>> Sent: Monday, March 12, 2018 7:40 AM
>>> To: user@cassandra.apache.org 
>>> Subject: Re: What versions should the documentation support now?
>>>  
>>> In my opinion, a good documentation should somehow include version specific 
>>> pieces of information. Whether it is nodetool command that came in certain 
>>> version or parameter for something or something else.
>>>  
>>> That would very useful. It’s confusing if I see documentation talking about 
>>> 4.0 specifics and I try to find that in my 3.11.x
>>>  
>>> Hannu
>>>  
>>> 
>>> On 12 Mar 2018, at 16:38, Kenneth Brotman >> > wrote:
>>>  
>>> I’m unclear what versions are most popular right now? What version are you 
>>> running?
>>>  
>>> What version should still be supported in the documentation?  For example, 
>>> I’m turning my attention back to writing a section on adding a data center. 
>>>  What versions should I support in that information?
>>>  
>>> I’m working on it right now.  Thanks,
>>>  
>>> Kenneth Brotman

Re: Anomaly detection

2018-03-12 Thread D. Salvatore

Hi Rahul,
I was mainly thinking about performance anomaly detection but I am also
interested in other types such as fault detection, data or queries
anomalies.

Thanks

2018-03-12 16:52 GMT+00:00 Rahul Singh :

> Anomaly detection of what? The data inside Cassandra or Casandra metrics?
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 12, 2018, 12:44 PM -0400, D. Salvatore ,
> wrote:
>
> Hello everyone,
> Do you know if exist a Cassandra tool that performs anomaly detection?
>
> Thank you in advance
> Salvatore
>
>

Re: Anomaly detection

2018-03-12 Thread Rahul Singh

Anomaly detection of what? The data inside Cassandra or Casandra metrics?

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 12:44 PM -0400, D. Salvatore , wrote:
> Hello everyone,
> Do you know if exist a Cassandra tool that performs anomaly detection?
>
> Thank you in advance
> Salvatore

Anomaly detection

2018-03-12 Thread D. Salvatore

Hello everyone,
Do you know if exist a Cassandra tool that performs anomaly detection?

Thank you in advance
Salvatore

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman

It seems like the documentation that should be in the trunk for version 3.0, 
should include information for users of version 3.0 and 2.1; the documentation 
that should in 4.0 (when its released), should include information for users of 
4.0 and at least one previous version, etc. 

 

How about if we do it that way?

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Monday, March 12, 2018 9:10 AM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

Right now they can’t.

On Mon, Mar 12, 2018 at 9:03 AM Kenneth Brotman  
wrote:

I see how that makes sense Jon but how does a user then select the 
documentation for the version they are running on the Apache Cassandra web site?

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Monday, March 12, 2018 8:40 AM


To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

The docs are in tree, meaning they are versioned, and should be written for the 
version they correspond to. Trunk docs should reflect the current state of 
trunk, and shouldn’t have caveats for other versions. 

On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman  > wrote:

If we use DataStax’s example, we would have instructions for v3.0 and v2.1.  
How’s that?  

 

We should have to be instructions for the cloud platforms like AWS but how do 
you do that and stay vendor neutral?

 

Kenneth Brotman

 

From: Hannu Kröger [mailto:hkro...@gmail.com] 
Sent: Monday, March 12, 2018 7:40 AM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

In my opinion, a good documentation should somehow include version specific 
pieces of information. Whether it is nodetool command that came in certain 
version or parameter for something or something else.

 

That would very useful. It’s confusing if I see documentation talking about 4.0 
specifics and I try to find that in my 3.11.x

 

Hannu

 

On 12 Mar 2018, at 16:38, Kenneth Brotman  wrote:

 

I’m unclear what versions are most popular right now? What version are you 
running?

 

What version should still be supported in the documentation?  For example, I’m 
turning my attention back to writing a section on adding a data center.  What 
versions should I support in that information?

 

I’m working on it right now.  Thanks,

 

Kenneth Brotman

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad

Right now they can’t.
On Mon, Mar 12, 2018 at 9:03 AM Kenneth Brotman
 wrote:

> I see how that makes sense Jon but how does a user then select the
> documentation for the version they are running on the Apache Cassandra web
> site?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Monday, March 12, 2018 8:40 AM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> The docs are in tree, meaning they are versioned, and should be written
> for the version they correspond to. Trunk docs should reflect the current
> state of trunk, and shouldn’t have caveats for other versions.
>
> On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> If we use DataStax’s example, we would have instructions for v3.0 and
> v2.1.  How’s that?
>
>
>
> We should have to be instructions for the cloud platforms like AWS but how
> do you do that and stay vendor neutral?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Hannu Kröger [mailto:hkro...@gmail.com]
> *Sent:* Monday, March 12, 2018 7:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> In my opinion, a good documentation should somehow include version
> specific pieces of information. Whether it is nodetool command that came in
> certain version or parameter for something or something else.
>
>
>
> That would very useful. It’s confusing if I see documentation talking
> about 4.0 specifics and I try to find that in my 3.11.x
>
>
>
> Hannu
>
>
>
> On 12 Mar 2018, at 16:38, Kenneth Brotman 
> wrote:
>
>
>
> I’m unclear what versions are most popular right now? What version are you
> running?
>
>
>
> What version should still be supported in the documentation?  For example,
> I’m turning my attention back to writing a section on adding a data
> center.  What versions should I support in that information?
>
>
>
> I’m working on it right now.  Thanks,
>
>
>
> Kenneth Brotman
>
>
>
>

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman

I see how that makes sense Jon but how does a user then select the 
documentation for the version they are running on the Apache Cassandra web site?

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Monday, March 12, 2018 8:40 AM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

The docs are in tree, meaning they are versioned, and should be written for the 
version they correspond to. Trunk docs should reflect the current state of 
trunk, and shouldn’t have caveats for other versions. 

On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman  
wrote:

If we use DataStax’s example, we would have instructions for v3.0 and v2.1.  
How’s that?  

 

We should have to be instructions for the cloud platforms like AWS but how do 
you do that and stay vendor neutral?

 

Kenneth Brotman

 

From: Hannu Kröger [mailto:hkro...@gmail.com] 
Sent: Monday, March 12, 2018 7:40 AM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

In my opinion, a good documentation should somehow include version specific 
pieces of information. Whether it is nodetool command that came in certain 
version or parameter for something or something else.

 

That would very useful. It’s confusing if I see documentation talking about 4.0 
specifics and I try to find that in my 3.11.x

 

Hannu

 

On 12 Mar 2018, at 16:38, Kenneth Brotman  wrote:

 

I’m unclear what versions are most popular right now? What version are you 
running?

 

What version should still be supported in the documentation?  For example, I’m 
turning my attention back to writing a section on adding a data center.  What 
versions should I support in that information?

 

I’m working on it right now.  Thanks,

 

Kenneth Brotman

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad

The docs are in tree, meaning they are versioned, and should be written for
the version they correspond to. Trunk docs should reflect the current state
of trunk, and shouldn’t have caveats for other versions.
On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman
 wrote:

> If we use DataStax’s example, we would have instructions for v3.0 and
> v2.1.  How’s that?
>
>
>
> We should have to be instructions for the cloud platforms like AWS but how
> do you do that and stay vendor neutral?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Hannu Kröger [mailto:hkro...@gmail.com]
> *Sent:* Monday, March 12, 2018 7:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: What versions should the documentation support now?
>
>
>
> In my opinion, a good documentation should somehow include version
> specific pieces of information. Whether it is nodetool command that came in
> certain version or parameter for something or something else.
>
>
>
> That would very useful. It’s confusing if I see documentation talking
> about 4.0 specifics and I try to find that in my 3.11.x
>
>
>
> Hannu
>
>
>
> On 12 Mar 2018, at 16:38, Kenneth Brotman 
> wrote:
>
>
>
> I’m unclear what versions are most popular right now? What version are you
> running?
>
>
>
> What version should still be supported in the documentation?  For example,
> I’m turning my attention back to writing a section on adding a data
> center.  What versions should I support in that information?
>
>
>
> I’m working on it right now.  Thanks,
>
>
>
> Kenneth Brotman
>
>
>

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman

If we use DataStax’s example, we would have instructions for v3.0 and v2.1.  
How’s that?  

 

We should have to be instructions for the cloud platforms like AWS but how do 
you do that and stay vendor neutral?

 

Kenneth Brotman

 

From: Hannu Kröger [mailto:hkro...@gmail.com] 
Sent: Monday, March 12, 2018 7:40 AM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

In my opinion, a good documentation should somehow include version specific 
pieces of information. Whether it is nodetool command that came in certain 
version or parameter for something or something else.

 

That would very useful. It’s confusing if I see documentation talking about 4.0 
specifics and I try to find that in my 3.11.x

 

Hannu





On 12 Mar 2018, at 16:38, Kenneth Brotman  wrote:

 

I’m unclear what versions are most popular right now? What version are you 
running?

 

What version should still be supported in the documentation?  For example, I’m 
turning my attention back to writing a section on adding a data center.  What 
versions should I support in that information?

 

I’m working on it right now.  Thanks,

 

Kenneth Brotman

Re: yet another benchmark bottleneck

2018-03-12 Thread Michael Burman

Although low amount of updates, it's possible that you hit a contention 
bug. A simple test would be to add multiple Cassandra nodes on the same 
physical node (like split your 20 cores to 5 instances of Cassandra). If 
you get much higher throughput, then you have an answer..


I don't think a single-instance Cassandra 3.11.2 scales to 20 cores (at 
least with the stress-test pattern). There's few known issues in the 
write-path at least that prevent scaling with high CPU core count.


  - Micke


On 03/12/2018 03:14 PM, onmstester onmstester wrote:
I mentioned that already tested increasing client threads + many 
stress-client instances in one node + two stress-client in two 
separate nodes, in all of them the sum of throughputs is less than 
130K. I've been tuning all aspects of OS and Cassandra (whatever I've 
seen in config files!) for two days, still no luck!


Sent using Zoho Mail 



 On Mon, 12 Mar 2018 16:38:22 +0330 *Jacques-Henri Berthemet 
* wrote 


What happens if you increase number of client threads?

Can you add another instance of cassandra-stress on another host?


*--*

*Jacques-Henri Berthemet*


*From:* onmstester onmstester [mailto:onmstes...@zoho.com
]
*Sent:* Monday, March 12, 2018 12:50 PM
*To:* user >
*Subject:* RE: yet another benchmark bottleneck


no luck even with 320 threads for write


Sent using Zoho Mail 



 On Mon, 12 Mar 2018 14:44:15 +0330 *Jacques-Henri Berthemet
>* wrote 


It makes more sense now, 130K is not that bad.


According to cassandra.yaml you should be able to increase
your number of write threads in Cassandra:

# On the other hand, since writes are almost never IO bound,
the ideal

# number of "concurrent_writes" is dependent on the number of
cores in

# your system; (8 * number_of_cores) is a good rule of thumb.

concurrent_reads: 32

concurrent_writes: 32

concurrent_counter_writes: 32


Jumping directly to 160 would be a bit high with spinning
disks, maybe start with 64 just to see if it gets better.


*--*

*Jacques-Henri Berthemet*


*From:*onmstester onmstester [mailto:onmstes...@zoho.com
]

*Sent:*Monday, March 12, 2018 12:08 PM

*To:*user >

*Subject:*RE: yet another benchmark bottleneck


RF=1

No errors or warnings.

Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a
'K' in first mail, but anyway! the point is: More than half of
node resources (cpu, mem, disk, network) is unused and i can't
increase write throughput.


Sent using Zoho Mail 



 On Mon, 12 Mar 2018 14:25:12 +0330 *Jacques-Henri
Berthemet >* wrote 


Any errors/warning in Cassandra logs? What’s your RF?

Using 300MB/s of network bandwidth for only 130 op/s looks
very high.


*--*

*Jacques-Henri Berthemet*


*From:*onmstester onmstester [mailto:onmstes...@zoho.com
]

*Sent:*Monday, March 12, 2018 11:38 AM

*To:*user >

*Subject:*RE: yet another benchmark bottleneck


1.2 TB 15K

latency reported by stress tool is 7.6 ms. disk latency is
2.6 ms


Sent using Zoho Mail 



 On Mon, 12 Mar 2018 14:02:29 +0330 *Jacques-Henri
Berthemet >* wrote 


What’s your disk latency? What kind of disk is it?


*--*

*Jacques-Henri Berthemet*


*From:*onmstester onmstester
[mailto:onmstes...@zoho.com ]

*Sent:*Monday, March 12, 2018 10:48 AM

*To:*user >

*Subject:*Re: yet another benchmark bottleneck


Running two instance of Apache Cassandra on same
server, each having their own commit log disk dis not
help. Sum of cpu/ram usage for both instances would be
less than half of all available resources.

Re: What versions should the documentation support now?

2018-03-12 Thread Hannu Kröger

In my opinion, a good documentation should somehow include version specific 
pieces of information. Whether it is nodetool command that came in certain 
version or parameter for something or something else.

That would very useful. It’s confusing if I see documentation talking about 4.0 
specifics and I try to find that in my 3.11.x

Hannu

> On 12 Mar 2018, at 16:38, Kenneth Brotman  
> wrote:
> 
> I’m unclear what versions are most popular right now? What version are you 
> running?
>  
> What version should still be supported in the documentation?  For example, 
> I’m turning my attention back to writing a section on adding a data center.  
> What versions should I support in that information?
>  
> I’m working on it right now.  Thanks,
>  
> Kenneth Brotman

What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman

I'm unclear what versions are most popular right now? What version are you
running?

 

What version should still be supported in the documentation?  For example,
I'm turning my attention back to writing a section on adding a data center.
What versions should I support in that information?

 

I'm working on it right now.  Thanks,

 

Kenneth Brotman

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet

If throughput decreases as you add more load then it’s probably due to disk 
latency, can you test SDDs? Are you using VMWare ESXi?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 2:15 PM
To: user 
Subject: RE: yet another benchmark bottleneck

I mentioned that already tested increasing client threads + many stress-client 
instances in one node + two stress-client in two separate nodes, in all of them 
the sum of throughputs is less than 130K. I've been tuning all aspects of OS 
and Cassandra (whatever I've seen in config files!) for two days, still no luck!

Sent using Zoho Mail

 On Mon, 12 Mar 2018 16:38:22 +0330 Jacques-Henri Berthemet 
>
 wrote 

What happens if you increase number of client threads?
Can you add another instance of cassandra-stress on another host?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:50 PM
To: user >
Subject: RE: yet another benchmark bottleneck

no luck even with 320 threads for write

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
>
 wrote 

It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:08 PM
To: user >
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 11:38 AM
To: user >
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user >
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
> wrote 

Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
> wrote:

I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

I mentioned that already tested increasing client threads + many stress-client 
instances in one node + two stress-client in two separate nodes, in all of them 
the sum of throughputs is less than 130K. I've been tuning all aspects of OS 
and Cassandra (whatever I've seen in config files!) for two days, still no luck!


Sent using Zoho Mail






 On Mon, 12 Mar 2018 16:38:22 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 




What happens if you increase number of client threads?

Can you add another instance of cassandra-stress on another host?

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com] 

 Sent: Monday, March 12, 2018 12:50 PM

 To: user user@cassandra.apache.org

 Subject: RE: yet another benchmark bottleneck




 

no luck even with 320 threads for write


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


It makes more sense now, 130K is not that bad.

 

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:

# On the other hand, since writes are almost never IO bound, the ideal

# number of "concurrent_writes" is dependent on the number of cores in

# your system; (8 * number_of_cores) is a good rule of thumb.

concurrent_reads: 32

concurrent_writes: 32

concurrent_counter_writes: 32

 

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 12:08 PM


To: user user@cassandra.apache.org


Subject: RE: yet another benchmark bottleneck




 

RF=1


No errors or warnings.


Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


Any errors/warning in Cassandra logs? What’s your RF?

Using 300MB/s of network bandwidth for only 130 op/s looks very high.

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 11:38 AM


To: user user@cassandra.apache.org


Subject: RE: yet another benchmark bottleneck




 

1.2 TB 15K


latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


What’s your disk latency? What kind of disk is it?

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 10:48 AM


To: user user@cassandra.apache.org


Subject: Re: yet another benchmark bottleneck




 

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
onmstes...@zoho.com wrote 



 


Apache-cassandra-3.11.1


Yes, i'm dosing a single host test


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 



 


 



 




Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 


 


On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:


 





 


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:


CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16

 


Running Cassandra-stress:


cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X


 


from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:


60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.


Increasing wmem_max and rmem_max did not help either.


 


Sent using  Zoho Mail

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet

What happens if you increase number of client threads?
Can you add another instance of cassandra-stress on another host?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:50 PM
To: user 
Subject: RE: yet another benchmark bottleneck

no luck even with 320 threads for write

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
>
 wrote 

It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:08 PM
To: user >
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 11:38 AM
To: user >
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user >
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
> wrote 

Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
> wrote:

I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.

Sent using Zoho Mail

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

> On 12 Mar 2018, at 14:45, Rahul Singh  wrote:
> 
> I may be wrong, but what I’ve read and used in the past assumes that the 
> “first” N rows are cached and the clustering key design is how I change what 
> N rows are put into memory. Looking at the code, it seems that’s the case. 

So we agree that we row cache is storing only N rows from the beginning of the 
partition. So if only the last row in a partition is read, then it probably 
doesn’t get cached assuming there are more than N rows in a partition?

> The language of the comment basically says that it holds in cache what 
> satisfies the query if and only if it’s the head of the partition, if not it 
> fetches it and saves it - I dont interpret it differently from what I have 
> seen in the documentation. 

Hmm, I’m trying to understand this. Does it mean that it stores the results in 
cache if it is head and if not, it will fetch the head and store that (instead 
of the results for the query) ?

Hannu

Re: Archive cassandra old data into Hadoop

2018-03-12 Thread Rahul Singh

HDFS / S3 is a great place to dump this data. You can also consider other types 
of compaction strategies for “COLD DATA” in not so powerful C* clusters for 
which the purpose is write only. C* is still better in my opinion for data 
management than S3/HDFS.  It depends on how easy you want the retrieval and 
analysis to be.



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 8:30 AM -0400, Javier Pareja , wrote:
> Hi,
>
> I understand that a well designed cassandra system will allow to query ANY 
> data within it at an incredible speed as well as ingesting data at a very 
> fast pace.
>
> However this data is going to grow until it is archived. As I see it, data 
> has two stages, HOT DATA when data is accessible to be queried on very low 
> latency and COLD DATA when data can be queried and processed but we can allow 
> a (relatively long) delay. Cassandra is VERY good with the HOT DATA but it is 
> not very cost effective when the COLD DATA starts to grow because each node 
> only stores a tiny amount (1TB recommended). The number of nodes needed start 
> to grow even if this data is rarely queried!!
>
> Has anyone implemented a solution that "archives" data into a cold(er) 
> storage outside of cassandra, while still being available for (offline) 
> processing with spark? For example into a separate cluster with Hadoop/HIVE?
> What is the standard in this cases?
>
> F Javier Pareja

Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh

I may be wrong, but what I’ve read and used in the past assumes that the 
“first” N rows are cached and the clustering key design is how I change what N 
rows are put into memory. Looking at the code, it seems that’s the case.

The language of the comment basically says that it holds in cache what 
satisfies the query if and only if it’s the head of the partition, if not it 
fetches it and saves it - I dont interpret it differently from what I have seen 
in the documentation.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 7:13 AM -0400, Hannu Kröger , wrote:
>
> rows_per_partition

Archive cassandra old data into Hadoop

2018-03-12 Thread Javier Pareja

Hi,

I understand that a well designed cassandra system will allow to query ANY
data within it at an incredible speed as well as ingesting data at a very
fast pace.

However this data is going to grow until it is archived. As I see it, data
has two stages, HOT DATA when data is accessible to be queried on very low
latency and COLD DATA when data can be queried and processed but we can
allow a (relatively long) delay. Cassandra is VERY good with the HOT DATA
but it is not very cost effective when the COLD DATA starts to grow because
each node only stores a tiny amount (1TB recommended). The number of nodes
needed start to grow even if this data is rarely queried!!

Has anyone implemented a solution that "archives" data into a cold(er)
storage outside of cassandra, while still being available for (offline)
processing with spark? For example into a separate cluster with Hadoop/HIVE?
What is the standard in this cases?

F Javier Pareja

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

no luck even with 320 threads for write


Sent using Zoho Mail






 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 




It makes more sense now, 130K is not that bad.

 

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:

# On the other hand, since writes are almost never IO bound, the ideal

# number of "concurrent_writes" is dependent on the number of cores in

# your system; (8 * number_of_cores) is a good rule of thumb.

concurrent_reads: 32

concurrent_writes: 32

concurrent_counter_writes: 32

 

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com] 

 Sent: Monday, March 12, 2018 12:08 PM

 To: user user@cassandra.apache.org

 Subject: RE: yet another benchmark bottleneck




 

RF=1


No errors or warnings.


Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


Any errors/warning in Cassandra logs? What’s your RF?

Using 300MB/s of network bandwidth for only 130 op/s looks very high.

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 11:38 AM


To: user user@cassandra.apache.org


Subject: RE: yet another benchmark bottleneck




 

1.2 TB 15K


latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


What’s your disk latency? What kind of disk is it?

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 10:48 AM


To: user user@cassandra.apache.org


Subject: Re: yet another benchmark bottleneck




 

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
onmstes...@zoho.com wrote 



 


Apache-cassandra-3.11.1


Yes, i'm dosing a single host test


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 



 


 



 




Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 


 


On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:


 





 


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:


CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16

 


Running Cassandra-stress:


cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X


 


from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:


60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.


Increasing wmem_max and rmem_max did not help either.


 


Sent using  Zoho Mail

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet

It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:08 PM
To: user 
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 11:38 AM
To: user >
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user >
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
> wrote 

Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
> wrote:

I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.

Sent using Zoho Mail

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

Hi,

My goal is to make sure that I understand functionality correctly and that the 
documentation is accurate. 

The question in other words: Is the documentation or the comment in the code 
wrong (or inaccurate).

Hannu

> On 12 Mar 2018, at 13:00, Rahul Singh  wrote:
> 
> What’s the goal? How big are your partitions , size in MB and in rows?
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger , wrote:
>> Anyone?
>> 
>>> On 4 Mar 2018, at 20:45, Hannu Kröger >> > wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to verify and understand fully the functionality of row cache 
>>> in Cassandra.
>>> 
>>> I have been using mainly two different sources for information:
>>> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>>>  
>>> 
>>> AND
>>> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
>>> 
>>> 
>>> and based on what I read documentation is not correct. 
>>> 
>>> Documentation says like this:
>>> “rows_per_partition: The amount of rows to cache per partition (“row 
>>> cache”). If an integer n is specified, the first n queried rows of a 
>>> partition will be cached. Other possible options are ALL, to cache all rows 
>>> of a queried partition, or NONE to disable row caching.”
>>> 
>>> The problematic part is "the first n queried rows of a partition will be 
>>> cached”. Shouldn’t it be that the first N rows in a partition will be 
>>> cached? Not first N that are queried?
>>> 
>>> If this is the case, I’m more than happy to create a ticket (and maybe even 
>>> create a patch) for the doc update.
>>> 
>>> BR,
>>> Hannu
>>> 
>>

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

RF=1

No errors or warnings.

Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


Sent using Zoho Mail






 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 




Any errors/warning in Cassandra logs? What’s your RF?

Using 300MB/s of network bandwidth for only 130 op/s looks very high.

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com] 

 Sent: Monday, March 12, 2018 11:38 AM

 To: user user@cassandra.apache.org

 Subject: RE: yet another benchmark bottleneck




 

1.2 TB 15K


latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 



 


What’s your disk latency? What kind of disk is it?

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com]


Sent: Monday, March 12, 2018 10:48 AM


To: user user@cassandra.apache.org


Subject: Re: yet another benchmark bottleneck




 

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
onmstes...@zoho.com wrote 



 


Apache-cassandra-3.11.1


Yes, i'm dosing a single host test


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 



 


 



 




Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 


 


On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:


 





 


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:


CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16

 


Running Cassandra-stress:


cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X


 


from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:


60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.


Increasing wmem_max and rmem_max did not help either.


 


Sent using  Zoho Mail

Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh

What’s the goal? How big are your partitions , size in MB and in rows?

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger , wrote:
> Anyone?
>
> > On 4 Mar 2018, at 20:45, Hannu Kröger  wrote:
> >
> > Hello,
> >
> > I am trying to verify and understand fully the functionality of row cache 
> > in Cassandra.
> >
> > I have been using mainly two different sources for information:
> > https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
> > AND
> > http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options
> >
> > and based on what I read documentation is not correct.
> >
> > Documentation says like this:
> > “rows_per_partition: The amount of rows to cache per partition (“row 
> > cache”). If an integer n is specified, the first n queried rows of a 
> > partition will be cached. Other possible options are ALL, to cache all rows 
> > of a queried partition, or NONE to disable row caching.”
> >
> > The problematic part is "the first n queried rows of a partition will be 
> > cached”. Shouldn’t it be that the first N rows in a partition will be 
> > cached? Not first N that are queried?
> >
> > If this is the case, I’m more than happy to create a ticket (and maybe even 
> > create a patch) for the doc update.
> >
> > BR,
> > Hannu
> >
>

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 11:38 AM
To: user 
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms

Sent using Zoho Mail

 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user >
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
> wrote 

Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
> wrote:

I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.

Sent using Zoho Mail

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

Anyone?

> On 4 Mar 2018, at 20:45, Hannu Kröger  wrote:
> 
> Hello,
> 
> I am trying to verify and understand fully the functionality of row cache in 
> Cassandra.
> 
> I have been using mainly two different sources for information:
> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>  
> 
> AND
> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
> 
> 
> and based on what I read documentation is not correct. 
> 
> Documentation says like this:
> “rows_per_partition: The amount of rows to cache per partition (“row cache”). 
> If an integer n is specified, the first n queried rows of a partition will be 
> cached. Other possible options are ALL, to cache all rows of a queried 
> partition, or NONE to disable row caching.”
> 
> The problematic part is "the first n queried rows of a partition will be 
> cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
> Not first N that are queried?
> 
> If this is the case, I’m more than happy to create a ticket (and maybe even 
> create a patch) for the doc update.
> 
> BR,
> Hannu
>

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

1.2 TB 15K

latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


Sent using Zoho Mail






 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
jacques-henri.berthe...@genesys.com wrote 




What’s your disk latency? What kind of disk is it?

 

--

Jacques-Henri Berthemet


 

From: onmstester onmstester [mailto:onmstes...@zoho.com] 

 Sent: Monday, March 12, 2018 10:48 AM

 To: user user@cassandra.apache.org

 Subject: Re: yet another benchmark bottleneck




 

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
onmstes...@zoho.com wrote 



 


Apache-cassandra-3.11.1


Yes, i'm dosing a single host test


 


Sent using  Zoho Mail


 


 


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 



 


 



 




Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 


 


On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:


 





 


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:


CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16

 


Running Cassandra-stress:


cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X


 


from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:


60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.


Increasing wmem_max and rmem_max did not help either.


 


Sent using  Zoho Mail

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user 
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test

Sent using Zoho Mail

 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
> wrote 

Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
> wrote:

I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.

Sent using Zoho Mail

Re: vnodes: high availability

2018-03-12 Thread Hannu Kröger

If this is a universal recommendation, then should that actually be default in 
Cassandra? 

Hannu

> On 18 Jan 2018, at 00:49, Jon Haddad  wrote:
> 
> I *strongly* recommend disabling dynamic snitch.  I’ve seen it make latency 
> jump 10x.  
> 
> dynamic_snitch: false is your friend.
> 
> 
> 
>> On Jan 17, 2018, at 2:00 PM, Kyrylo Lebediev > > wrote:
>> 
>> Avi, 
>> If we prefer to have better balancing [like absence of hotspots during a 
>> node down event etc], large number of vnodes is a good solution.
>> Personally, I wouldn't prefer any balancing over overall resiliency  (and in 
>> case of non-optimal setup, larger number of nodes in a cluster decreases 
>> overall resiliency, as far as I understand.) 
>> 
>> Talking about hotspots, there is a number of features helping to mitigate 
>> the issue, for example:
>>   - dynamic snitch [if a node overloaded it won't be queried]
>>   - throttling of streaming operations
>> 
>> Thanks, 
>> Kyrill
>> 
>> From: Avi Kivity >
>> Sent: Wednesday, January 17, 2018 2:50 PM
>> To: user@cassandra.apache.org ; kurt 
>> greaves
>> Subject: Re: vnodes: high availability
>>  
>> On the flip side, a large number of vnodes is also beneficial. For example, 
>> if you add a node to a 20-node cluster with many vnodes, each existing node 
>> will contribute 5% of the data towards the new node, and all nodes will 
>> participate in streaming (meaning the impact on any single node will be 
>> limited, and completion time will be faster).
>> 
>> With a low number of vnodes, only a few nodes participate in streaming, 
>> which means that the cluster is left unbalanced and the impact on each 
>> streaming node is greater (or that completion time is slower).
>> 
>> Similarly, with a high number of vnodes, if a node is down its work is 
>> distributed equally among all nodes. With a low number of vnodes the cluster 
>> becomes unbalanced.
>> 
>> Overall I recommend high vnode count, and to limit the impact of failures in 
>> other ways (smaller number of large nodes vs. larger number of small nodes).
>> 
>> btw, rack-aware topology improves the multi-failure problem but at the cost 
>> of causing imbalance during maintenance operations. I recommend using 
>> rack-aware topology only if you really have racks with 
>> single-points-of-failure, not for other reasons.
>> 
>> On 01/17/2018 05:43 AM, kurt greaves wrote:
>>> Even with a low amount of vnodes you're asking for a bad time. Even if you 
>>> managed to get down to 2 vnodes per node, you're still likely to include 
>>> double the amount of nodes in any streaming/repair operation which will 
>>> likely be very problematic for incremental repairs, and you still won't be 
>>> able to easily reason about which nodes are responsible for which token 
>>> ranges. It's still quite likely that a loss of 2 nodes would mean some 
>>> portion of the ring is down (at QUORUM). At the moment I'd say steer clear 
>>> of vnodes and use single tokens if you can; a lot of work still needs to be 
>>> done to ensure smooth operation of C* while using vnodes, and they are much 
>>> more difficult to reason about (which is probably the reason no one has 
>>> bothered to do the math). If you're really keen on the math your best bet 
>>> is to do it yourself, because it's not a point of interest for many C* devs 
>>> plus probably a lot of us wouldn't remember enough math to know how to 
>>> approach it.
>>> 
>>> If you want to get out of this situation you'll need to do a DC migration 
>>> to a new DC with a better configuration of snitch/replication 
>>> strategy/racks/tokens.
>>> 
>>> 
>>> On 16 January 2018 at 21:54, Kyrylo Lebediev >> > wrote:
>>> Thank you for this valuable info, Jon.
>>> I guess both you and Alex are referring to improved vnodes allocation 
>>> method  https://issues.apache.org/jira/browse/CASSANDRA-7032 
>>>  which was 
>>> implemented in 3.0.
>>> Based on your info and comments in the ticket it's really a bad idea to 
>>> have small number of vnodes for the versions using old allocation method 
>>> because of hot-spots, so it's not an option for my particular case (v.2.1) 
>>> :( 
>>> 
>>> [As far as I can see from the source code this new method wasn't backported 
>>> to 2.1.]
>>> 
>>> 
>>> Regards, 
>>> Kyrill
>>> [CASSANDRA-7032] Improve vnode allocation - ASF JIRA 
>>> 
>>> issues.apache.org 
>>> It's been known for a little while that random vnode allocation causes 
>>> hotspots of ownership. It should be possible to improve dramatically on 
>>> this with deterministic ...
>>> 
>>> From: Jon Haddad >> > on

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.



Sent using Zoho Mail






 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
onmstes...@zoho.com wrote 




Apache-cassandra-3.11.1

Yes, i'm dosing a single host test



Sent using Zoho Mail






 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 











Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 



On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:








I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16



Running Cassandra-stress:

cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X



from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.

Increasing wmem_max and rmem_max did not help either.



Sent using Zoho Mail

Re: Adding new DC?

2018-03-12 Thread Rahul Singh

How did you distribute your seed nodes across whole cluster?

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 5:12 AM -0400, Oleksandr Shulgin 
, wrote:
> > On Sun, Mar 11, 2018 at 10:31 PM, Kunal Gangakhedkar 
> >  wrote:
> > > Hi all,
> > >
> > > We currently have a cluster in GCE for one of the customers.
> > > They want it to be migrated to AWS.
> > >
> > > I have setup one node in AWS to join into the cluster by following:
> > > https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
> > >
> > > Will add more nodes once the first one joins successfully.
> > >
> > > The node in AWS has an elastic IP - which is white-listed for ports 
> > > 7000-7001, 7199, 9042 in GCE firewall.
> > >
> > > The snitch is set to GossipingPropertyFileSnitch. The GCE setup has 
> > > dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
> > >
> > > When I start cassandra service on the AWS instance, I see the version 
> > > handshake msgs in the logs trying to connect to the public IPs of the GCE 
> > > nodes:
> > >     OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
> > >
> > > However, nodetool status output on both sides don't show the other side 
> > > at all. That is, the GCE setup doesn't show the new DC (dc=DC2) and the 
> > > AWS setup doesn't show old DC (dc=DC1).
> > >
> > > In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
> > > settings - no explicit IP addresses used - so, ends up using the internal 
> > > private IP ranges.
> > >
> > > Do I need to explicitly add the broadcast_address?
> >
> > On the AWS side you could use EC2MultiRegionSnitch: it will assign the 
> > appropriate address (Elastic IP) to this, as well as set DC and rack from 
> > the EC2 Availability Zone.
> >
> > > for both side?
> >
> > I would expect that you have to specify proper broadcast_address on the GCE 
> > side as well.
> >
> > > Would that require restarting of cassandra service on GCE side? Or is it 
> > > possible to change that setting on-the-fly without a restart?
> >
> > A restart is required AFAIK.
> >
> > --
> > Alex
> >

Re: Adding new DC?

2018-03-12 Thread Oleksandr Shulgin

On Sun, Mar 11, 2018 at 10:31 PM, Kunal Gangakhedkar <
kgangakhed...@gmail.com> wrote:

> Hi all,
>
> We currently have a cluster in GCE for one of the customers.
> They want it to be migrated to AWS.
>
> I have setup one node in AWS to join into the cluster by following:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
>
> Will add more nodes once the first one joins successfully.
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
>
> When I start cassandra service on the AWS instance, I see the version
> handshake msgs in the logs trying to connect to the public IPs of the GCE
> nodes:
> OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
>
> However, nodetool status output on both sides don't show the other side at
> all. That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS
> setup doesn't show old DC (dc=DC1).
>
> In cassandra.yaml file, I'm only using listen_interface and rpc_interface
> settings - no explicit IP addresses used - so, ends up using the internal
> private IP ranges.
>
> Do I need to explicitly add the broadcast_address?
>

On the AWS side you could use EC2MultiRegionSnitch: it will assign the
appropriate address (Elastic IP) to this, as well as set DC and rack from
the EC2 Availability Zone.


> for both side?
>

I would expect that you have to specify proper broadcast_address on the GCE
side as well.


> Would that require restarting of cassandra service on GCE side? Or is it
> possible to change that setting on-the-fly without a restart?
>

A restart is required AFAIK.

--
Alex

RE: Cassandra DevCenter

2018-03-12 Thread Jacques-Henri Berthemet

Hi,

There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar 
names and exceptions you encounter. Make sure you’re not mixing Guava versions 
from other dependencies. DevCenter uses Datastax driver to connect to 
Cassandra, double check the versions of the jars you need here:
https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core

Put only the jars listed on the driver version you have on you classpath and it 
should work.

--
Jacques-Henri Berthemet

From: Philippe de Rochambeau [mailto:phi...@free.fr]
Sent: Saturday, March 10, 2018 6:56 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Hi,
thank you for replying.
Unfortunately, the computer DevCenter is running on doesn’t have Internet 
access (for security reasons).  As a result, I can’t use the pom.xml.
Furthermore, I’ve tried running a Groovy program whose classpath included the 
DevCenter (2.x) lib directory, but to no avail as a Google dependency was 
missing (I can’t recall the dependency’s name).
Because DevCenter manages to connect to Cassandra without downloading 
dependencies, there’s bound to be a way to drive the former using Java or 
Groovy.

Le 10 mars 2018 à 18:34, Goutham reddy 
> a écrit :
Get the JARS from Cassandra lib folder and put it in your build path. Or else 
use Pom.xml maven project to directly download from repository.

Thanks and Regards,
Goutham Reddy Aenugu.

On Sat, Mar 10, 2018 at 9:30 AM Philippe de Rochambeau 
> wrote:
Hello,
has anyone tried running CQL queries from a Java program using the jars 
provided with DevCenter?
Many thanks.
Philippe

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org
--
Regards
Goutham Reddy

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester

Apache-cassandra-3.11.1

Yes, i'm dosing a single host test


Sent using Zoho Mail






 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa jji...@gmail.com 
wrote 




Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test? 



On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
onmstes...@zoho.com wrote:








I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

CPU: 20 Cores

Memory: 128 GB (32 GB as Cassandra heap)

Disk: 3 seprate disk for OS, data and commitlog

Network: 10 Gb (test it with iperf)

Os: Ubuntu 16



Running Cassandra-stress:

cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X



from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

60% of cpu

30% of memory

30-40% util in iostat of commitlog

300 Mb of network bandwidth

I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.

Increasing wmem_max and rmem_max did not help either.



Sent using Zoho Mail

66 matches

Mail list logo