Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jonathan Koppenhofer
We do multiple nodes per host as a standard practice. In our case, we never
put 2 nodes from a single cluster  on the same host, though as mentioned
before, you could potentially get away with that if you properly use rack
awareness, just be careful of load.

We also do NOT use any other layer of segregation such as docker or VMs, we
just have multiple IPs per host, and bind each IP to a distinct node. We
have looked at VMs and Containers, but they either add abstraction
complexity or some kind of performance penalty.

As for system resources, we dedicate individual ssds for each node, but
CPU, memory, and network is shared. We are spoiled by good network and
beefy memory, so the only place we have to be careful is CPU. As such, we
pick fairly conservative Cassandra.yaml settings and monitor CPU usage. If
workloads get hot on a particular host, we have some flexibility to move
things around.

In any case, it sounds like you will be fine running 1 node per host. With
that many resources, be sure to tune you nodes to make use of them.

Good luck.

On Thu, Apr 18, 2019, 2:49 PM William R 
wrote:

> hi,
>
> Thank you for your answers, starting with the most important point from
> your answers I understand that
>
> "it is OK to go more than 1 TB in disk usage"
>
> so in this case if I am going to use the 50% of the disk capacity I will
> end up having around 3 TB per node which in this case I will not need to
> use a docker solution which is a very good usa case for us.
>
> The goal of my setup is to save large data volumes in every node (~ 3 TB -
> 50% usage of HD) with the current hardware that we possess. The high
> availability I consider it standard since we are going to have 2 DCs with
> RF3.
>
> I also have to note that Datastax also recommends usage no more than 500
> GB - 1 TB.
>
> Cheers,
>
> Vasilis
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, April 18, 2019 6:56 PM, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
> So how much data can you safely fit per node using SSDs with Cassandra
> 3.11? How much free space do you need on your disks?
>
> There should be some recommendations on node sizes on:
>
> http://cassandra.apache.org/doc/latest/operating/hardware.html
>
> Documentation - Apache Cassandra
> <http://cassandra.apache.org/doc/latest/operating/hardware.html>
> cassandra.apache.org
> The Apache Cassandra database is the right choice when you need
> scalability and high availability without compromising performance. Linear
> scalability and proven fault-tolerance on commodity hardware or cloud
> infrastructure make it the perfect platform for mission-critical data.
> Cassandra's support for replicating across multiple datacenters is
> best-in-class, providing lower latency for your ...
>
>
> --------------
>
> *From:* Jon Haddad 
> *Sent:* Thursday, April 18, 2019 6:43:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] multiple Cassandra instances per server,
> possible?
>
> Agreed with Jeff here.  The whole "community recommends no more than
> 1TB" has been around, and inaccurate, for a long time.
>
> The biggest issue with dense nodes is how long it takes to replace
> them.  4.0 should help with that under certain circumstances.
>
>
> On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
> >
> > Agreed that you can go larger than 1T on ssd
> >
> > You can do this safely with both instances in the same cluster if you
> guarantee two replicas aren’t on the same machine. Cassandra provides a
> primitive to do this - rack awareness through the network topology snitch.
> >
> > The limitation (until 4.0) is that you’ll need two IPs per machine as
> both instances have to run in the same port.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Apr 18, 2019, at 6:45 AM, Durity, Sean R 
> wrote:
> >
> > What is the data problem that you are trying to solve with Cassandra? Is
> it high availability? Low latency queries? Large data volumes? High
> concurrent users? I would design the solution to fit the problem(s) you are
> solving.
> >
> >
> >
> > For example, if high availability is the goal, I would be very cautious
> about 2 nodes/machine. If you need the full amount of the disk – you *can*
> have larger nodes than 1 TB. I agree that administration tasks (like
> adding/removing nodes, etc.) are more painful with large nodes – but not
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3
> TB of usable SSD disk.
> >
> >
> >
> > It is possible

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread William R
hi,

Thank you for your answers, starting with the most important point from your 
answers I understand that

"it is OK to go more than 1 TB in disk usage"

so in this case if I am going to use the 50% of the disk capacity I will end up 
having around 3 TB per node which in this case I will not need to use a docker 
solution which is a very good usa case for us.

The goal of my setup is to save large data volumes in every node (~ 3 TB - 50% 
usage of HD) with the current hardware that we possess. The high availability I 
consider it standard since we are going to have 2 DCs with RF3.

I also have to note that Datastax also recommends usage no more than 500 GB - 1 
TB.

Cheers,

Vasilis

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐ Original Message ‐‐‐
On Thursday, April 18, 2019 6:56 PM, Jacques-Henri Berthemet 
 wrote:

> So how much data can you safely fit per node using SSDs with Cassandra 3.11? 
> How much free space do you need on your disks?
>
> There should be some recommendations on node sizes on:
>
> http://cassandra.apache.org/doc/latest/operating/hardware.html
>
> [Documentation - Apache 
> Cassandra](http://cassandra.apache.org/doc/latest/operating/hardware.html)
> cassandra.apache.org
> The Apache Cassandra database is the right choice when you need scalability 
> and high availability without compromising performance. Linear scalability 
> and proven fault-tolerance on commodity hardware or cloud infrastructure make 
> it the perfect platform for mission-critical data. Cassandra's support for 
> replicating across multiple datacenters is best-in-class, providing lower 
> latency for your ...
>
> ---
>
> From: Jon Haddad 
> Sent: Thursday, April 18, 2019 6:43:15 PM
> To: user@cassandra.apache.org
> Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible?
>
> Agreed with Jeff here.  The whole "community recommends no more than
> 1TB" has been around, and inaccurate, for a long time.
>
> The biggest issue with dense nodes is how long it takes to replace
> them.  4.0 should help with that under certain circumstances.
>
> On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>>
>> Agreed that you can go larger than 1T on ssd
>>
>> You can do this safely with both instances in the same cluster if you 
>> guarantee two replicas aren’t on the same machine. Cassandra provides a 
>> primitive to do this - rack awareness through the network topology snitch.
>>
>> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
>> instances have to run in the same port.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
>> wrote:
>>
>> What is the data problem that you are trying to solve with Cassandra? Is it 
>> high availability? Low latency queries? Large data volumes? High concurrent 
>> users? I would design the solution to fit the problem(s) you are solving.
>>
>>
>>
>> For example, if high availability is the goal, I would be very cautious 
>> about 2 nodes/machine. If you need the full amount of the disk – you *can* 
>> have larger nodes than 1 TB. I agree that administration tasks (like 
>> adding/removing nodes, etc.) are more painful with large nodes – but not 
>> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
>> TB of usable SSD disk.
>>
>>
>>
>> It is possible that your nodes might be under-utilized, especially at first. 
>> But if the hardware is already available, you have to use what you have.
>>
>>
>>
>> We have done multiple nodes on single physical hardware, but they were two 
>> separate clusters (for the same application). In that case, we had  a 
>> different install location and different ports for one of the clusters.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> From: William R 
>> Sent: Thursday, April 18, 2019 9:14 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
>> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
>> am reading around, the community recommends that every node should not keep 
>> more than 1 TB data so in this case I am wondering if it is possible to 
>> install 2 instances per node using docker so each docker instance can write 
>> to its own physical disk and utilise more efficiently the rest hardware (CP

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jacques-Henri Berthemet
So how much data can you safely fit per node using SSDs with Cassandra 3.11? 
How much free space do you need on your disks?

There should be some recommendations on node sizes on:

http://cassandra.apache.org/doc/latest/operating/hardware.html

Documentation - Apache 
Cassandra<http://cassandra.apache.org/doc/latest/operating/hardware.html>
cassandra.apache.org
The Apache Cassandra database is the right choice when you need scalability and 
high availability without compromising performance. Linear scalability and 
proven fault-tolerance on commodity hardware or cloud infrastructure make it 
the perfect platform for mission-critical data. Cassandra's support for 
replicating across multiple datacenters is best-in-class, providing lower 
latency for your ...





From: Jon Haddad 
Sent: Thursday, April 18, 2019 6:43:15 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible?

Agreed with Jeff here.  The whole "community recommends no more than
1TB" has been around, and inaccurate, for a long time.

The biggest issue with dense nodes is how long it takes to replace
them.  4.0 should help with that under certain circumstances.


On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>
> Agreed that you can go larger than 1T on ssd
>
> You can do this safely with both instances in the same cluster if you 
> guarantee two replicas aren’t on the same machine. Cassandra provides a 
> primitive to do this - rack awareness through the network topology snitch.
>
> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
> instances have to run in the same port.
>
>
> --
> Jeff Jirsa
>
>
> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
>
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>
>
>
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>
>
>
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>
>
>
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>
>
>
> Sean Durity
>
>
>
> From: William R 
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>
>
>
> Hi all,
>
>
>
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>
>
>
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>
>
>
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>
>
>
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>
>
>
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>
>
>
> Thank you,
>
>
>
> Wil
>
>
> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to o

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jon Haddad
Agreed with Jeff here.  The whole "community recommends no more than
1TB" has been around, and inaccurate, for a long time.

The biggest issue with dense nodes is how long it takes to replace
them.  4.0 should help with that under certain circumstances.


On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>
> Agreed that you can go larger than 1T on ssd
>
> You can do this safely with both instances in the same cluster if you 
> guarantee two replicas aren’t on the same machine. Cassandra provides a 
> primitive to do this - rack awareness through the network topology snitch.
>
> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
> instances have to run in the same port.
>
>
> --
> Jeff Jirsa
>
>
> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
>
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>
>
>
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>
>
>
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>
>
>
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>
>
>
> Sean Durity
>
>
>
> From: William R 
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>
>
>
> Hi all,
>
>
>
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>
>
>
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>
>
>
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>
>
>
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>
>
>
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>
>
>
> Thank you,
>
>
>
> Wil
>
>
> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jeff Jirsa
Agreed that you can go larger than 1T on ssd

You can do this safely with both instances in the same cluster if you guarantee 
two replicas aren’t on the same machine. Cassandra provides a primitive to do 
this - rack awareness through the network topology snitch. 

The limitation (until 4.0) is that you’ll need two IPs per machine as both 
instances have to run in the same port.


-- 
Jeff Jirsa


> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
> 
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>  
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>  
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>  
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>  
> Sean Durity
>  
> From: William R  
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>  
> Hi all,
>  
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>  
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>  
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>  
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>  
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>  
> Thank you,
>  
> Wil
> 
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The  Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.


RE: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Durity, Sean R
What is the data problem that you are trying to solve with Cassandra? Is it 
high availability? Low latency queries? Large data volumes? High concurrent 
users? I would design the solution to fit the problem(s) you are solving.

For example, if high availability is the goal, I would be very cautious about 2 
nodes/machine. If you need the full amount of the disk – you *can* have larger 
nodes than 1 TB. I agree that administration tasks (like adding/removing nodes, 
etc.) are more painful with large nodes – but not impossible. For large amounts 
of data, I like nodes that have about 2.5 – 3 TB of usable SSD disk.

It is possible that your nodes might be under-utilized, especially at first. 
But if the hardware is already available, you have to use what you have.

We have done multiple nodes on single physical hardware, but they were two 
separate clusters (for the same application). In that case, we had  a different 
install location and different ports for one of the clusters.

Sean Durity

From: William R 
Sent: Thursday, April 18, 2019 9:14 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] multiple Cassandra instances per server, possible?

Hi all,

In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
and 64 cores and we are thinking to use them as Cassandra nodes. From what I am 
reading around, the community recommends that every node should not keep more 
than 1 TB data so in this case I am wondering if it is possible to install 2 
instances per node using docker so each docker instance can write to its own 
physical disk and utilise more efficiently the rest hardware (CPU & RAM).

I understand with this setup there is the danger of creating a single point of 
failure for 2 Cassandra nodes but except that do you think that is a possible 
setup to start with the cluster?

Except the docker solution do you recommend any other way to split the physical 
node to 2 instances? (VMWare? or even maybe 2 separate installations of 
Cassandra? )

Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each (5 
baremetal nodes with 2 Cassandra instances)

Probably later when we will start introducing more nodes to the cluster we can 
decommissioning the "double-instaned" ones and aim for a more homogeneous 
solution..

Thank you,

Wil



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.