Re: Deployment

2019-01-12 Thread amit sehas
 Thanks for your response, i am understanding that the application will 
typically send the queries to coordinator nodes,if that coordinator does not 
respond within a certain time frame then the application will probably resend 
the query to a different coordinator node, under the assumption that the 
primary coordinator is no longer alive.
thanks
On Saturday, January 12, 2019, 11:31:40 AM PST, Andy Tolbert 
 wrote:  
 
 Hi Amit,

a) If queries are submitted to co-ordinator nodes (i assume this includes 
writes as well as reads) then:  -- is this the approach also followed for the 
initial data load? 


Writes get sent to all replica nodes, and then the coordinator responds to the 
client as soon as enough replicas have responded to achieve the configured 
consistency level.
 
  -- some select queries may not have restrictions on all the partition key 
columns, and Cassandra would reject such a query, but we we utilize ALLOW 
FILTER then the query will execute, since there is no way to determine which 
node to send the query to, it will be sent to all the nodes that could 
potentially have results. In such a case it would seem that the co-ordinator 
would gather the results from all the nodes and return it to the application.

Correct, if the data is on multiple ranges, the coordinator will make queries 
to as many replicas needed to cover those ranges and will then gather those 
results.  Using tracing (blog post) is a good way to get insights into what 
replicas are involved in your queries.


Application does not know which nodes may have the data, so it can not directly 
send the data to the right nodes. Even if application had the data, it may not 
be able to perform load balancing.

Most client drivers have a nice optimization called token-aware load balancing 
(i.e. DataStax Java Driver's TokenAwarePolicy), where if the driver is able to 
infer which partition is being accessed, it will prioritize coordinators that 
have that data.   This determination will typically work if all parts of your 
partition key are bind parameters in your statement (requirements).


 Does the coordinator perform load balancing? I imagine it would have to ...

The coordinator utilizes a dynamic snitch to determine where to route read 
queries.
Thanks,Andy

On Sat, Jan 12, 2019 at 9:14 AM amit sehas  wrote:

 Thanks for your response, this leads to some further questions:
a) If queries are submitted to co-ordinator nodes (i assume this includes 
writes as well as reads) then:  -- is this the approach also followed for the 
initial data load?   -- some select queries may not have restrictions on all 
the partition key columns, and Cassandra would reject such a query, but we we 
utilize ALLOW FILTER then the query will execute, since there is no way to 
determine which node to send the query to, it will be sent to all the nodes 
that could potentially have results. In such a case it would seem that the 
co-ordinator would gather the results from all the nodes and return it to the 
application.
b) This seems as if this is a 3 tier architecture. Application sends query to 
coordinator. coordinator sends it to the right nodes.Application does not know 
which nodes may have the data, so it can not directly send the data to the 
right nodes. Even if application had the data, it may not be able to perform 
load balancing. Does the coordinator perform load balancing? I imagine it would 
have to ...
thanks
On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore 
 wrote:  
 
 Application would send request to one of the node(called as coordinating node) 
& this coordinating node is aware of where your result lies(considering you 
have modelled your DB correctly, it should not result in scatter& gather kind 
of stuff) and thus delegate the query to respective node, so it does follow 
client server architecture & your assumption is correct.
As per my knowledge , generally application should be unaware where your result 
lies & must not be tied to a specific node because it would have bigger 
implications when stuffs like re-balancing would occur. So, your application 
should be unaware where your data lies (in which node I meant), but obviously 
keeping application in same region as that of cassandra cluster would make 
sense, can't comment much on cloud deployment.
Thanks,Rajesh

On Sat, Jan 12, 2019 at 8:54 AM amit sehas  wrote:

I am new to Cassandra, i am wondering how the Cassandra applications are 
deployed in the cloud. Does Cassandra have a client server architecture and the 
application is deployed as a 3rd tier that sends over queries to the clients, 
which then submit them to the Cassandra servers?  Or does the application 
submit the request directly to any of the Cassandra server which then decides 
where the query will be routed to, and then gathers the response and returns 
that to the application.
Does the application accessing the data get deployed on the same nodes in the 
cloud as the Cassandra 

removing already joining node

2019-01-12 Thread Osman YOZGATLIOĞLU
Hello,

I have one joining node. I decided to change cluster topology and I need to 
move this node to another cluster.

How can I decommission joining node? I can't find exact case at google.


Regards,

Osman


Re: Deployment

2019-01-12 Thread Andy Tolbert
Hi Amit,

a) If queries are submitted to co-ordinator nodes (i assume this includes
> writes as well as reads) then:
>   -- is this the approach also followed for the initial data load?
>

Writes get sent to all replica nodes, and then the coordinator responds to
the client as soon as enough replicas have responded to achieve the
configured consistency level.


>   -- some select queries may not have restrictions on all the partition
> key columns, and Cassandra would reject such a query, but we we utilize
> ALLOW FILTER then the query will execute, since there is no way to
> determine which node to send the query to, it will be sent to all the nodes
> that could potentially have results. In such a case it would seem that the
> co-ordinator would gather the results from all the nodes and return it to
> the application.
>

Correct, if the data is on multiple ranges, the coordinator will make
queries to as many replicas needed to cover those ranges and will then
gather those results.  Using tracing (blog post
) is a good way
to get insights into what replicas are involved in your queries.

Application does not know which nodes may have the data, so it can not
> directly send the data to the right nodes. Even if application had the
> data, it may not be able to perform load balancing.
>

Most client drivers have a nice optimization called token-aware load
balancing (i.e. DataStax Java Driver's TokenAwarePolicy
),
where if the driver is able to infer which partition is being accessed, it
will prioritize coordinators that have that data.   This determination will
typically work if all parts of your partition key are bind parameters in
your statement (requirements

).

Does the coordinator perform load balancing? I imagine it would have to ...
>

The coordinator utilizes a dynamic snitch
 to determine
where to route read queries.

Thanks,
Andy

On Sat, Jan 12, 2019 at 9:14 AM amit sehas  wrote:

> Thanks for your response, this leads to some further questions:
>
> a) If queries are submitted to co-ordinator nodes (i assume this includes
> writes as well as reads) then:
>   -- is this the approach also followed for the initial data load?
>   -- some select queries may not have restrictions on all the partition
> key columns, and Cassandra would reject such a query, but we we utilize
> ALLOW FILTER then the query will execute, since there is no way to
> determine which node to send the query to, it will be sent to all the nodes
> that could potentially have results. In such a case it would seem that the
> co-ordinator would gather the results from all the nodes and return it to
> the application.
>
> b) This seems as if this is a 3 tier architecture. Application sends query
> to coordinator. coordinator sends it to the right nodes.
> Application does not know which nodes may have the data, so it can not
> directly send the data to the right nodes. Even if application had the
> data, it may not be able to perform load balancing. Does the coordinator
> perform load balancing? I imagine it would have to ...
>
> thanks
>
> On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore <
> rajesh10si...@gmail.com> wrote:
>
>
> Application would send request to one of the node(called as coordinating
> node) & this coordinating node is aware of where your result
> lies(considering you have modelled your DB correctly, it should not result
> in scatter& gather kind of stuff) and thus delegate the query to respective
> node, so it does follow client server architecture & your assumption is
> correct.
> As per my knowledge , generally application should be unaware where your
> result lies & must not be tied to a specific node because it would have
> bigger implications when stuffs like re-balancing would occur. So, your
> application should be unaware where your data lies (in which node I meant),
> but obviously keeping application in same region as that of cassandra
> cluster would make sense, can't comment much on cloud deployment.
>
> Thanks,
> Rajesh
>
> On Sat, Jan 12, 2019 at 8:54 AM amit sehas 
> wrote:
>
> I am new to Cassandra, i am wondering how the Cassandra applications are
> deployed in the cloud. Does Cassandra have a client server architecture and
> the application is deployed as a 3rd tier that sends over queries to the
> clients, which then submit them to the Cassandra servers?  Or does the
> application submit the request directly to any of the Cassandra server
> which then decides where the query will be routed to, and then gathers the
> response and returns that to the application.
>
> Does the application accessing the data get deployed on the same nodes in
> the cloud as the Cassandra 

Re: Deployment

2019-01-12 Thread amit sehas
 Thanks for your response, this leads to some further questions:
a) If queries are submitted to co-ordinator nodes (i assume this includes 
writes as well as reads) then:  -- is this the approach also followed for the 
initial data load?   -- some select queries may not have restrictions on all 
the partition key columns, and Cassandra would reject such a query, but we we 
utilize ALLOW FILTER then the query will execute, since there is no way to 
determine which node to send the query to, it will be sent to all the nodes 
that could potentially have results. In such a case it would seem that the 
co-ordinator would gather the results from all the nodes and return it to the 
application.
b) This seems as if this is a 3 tier architecture. Application sends query to 
coordinator. coordinator sends it to the right nodes.Application does not know 
which nodes may have the data, so it can not directly send the data to the 
right nodes. Even if application had the data, it may not be able to perform 
load balancing. Does the coordinator perform load balancing? I imagine it would 
have to ...
thanks
On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore 
 wrote:  
 
 Application would send request to one of the node(called as coordinating node) 
& this coordinating node is aware of where your result lies(considering you 
have modelled your DB correctly, it should not result in scatter& gather kind 
of stuff) and thus delegate the query to respective node, so it does follow 
client server architecture & your assumption is correct.
As per my knowledge , generally application should be unaware where your result 
lies & must not be tied to a specific node because it would have bigger 
implications when stuffs like re-balancing would occur. So, your application 
should be unaware where your data lies (in which node I meant), but obviously 
keeping application in same region as that of cassandra cluster would make 
sense, can't comment much on cloud deployment.
Thanks,Rajesh

On Sat, Jan 12, 2019 at 8:54 AM amit sehas  wrote:

I am new to Cassandra, i am wondering how the Cassandra applications are 
deployed in the cloud. Does Cassandra have a client server architecture and the 
application is deployed as a 3rd tier that sends over queries to the clients, 
which then submit them to the Cassandra servers?  Or does the application 
submit the request directly to any of the Cassandra server which then decides 
where the query will be routed to, and then gathers the response and returns 
that to the application.
Does the application accessing the data get deployed on the same nodes in the 
cloud as the Cassandra cluster itself? Or on separate nodes?  Are there any 
best practices available in this regard?
thanks
  

Re: Deployment

2019-01-12 Thread Rajesh Kishore
Application would send request to one of the node(called as coordinating
node) & this coordinating node is aware of where your result
lies(considering you have modelled your DB correctly, it should not result
in scatter& gather kind of stuff) and thus delegate the query to respective
node, so it does follow client server architecture & your assumption is
correct.
As per my knowledge , generally application should be unaware where your
result lies & must not be tied to a specific node because it would have
bigger implications when stuffs like re-balancing would occur. So, your
application should be unaware where your data lies (in which node I meant),
but obviously keeping application in same region as that of cassandra
cluster would make sense, can't comment much on cloud deployment.

Thanks,
Rajesh

On Sat, Jan 12, 2019 at 8:54 AM amit sehas  wrote:

> I am new to Cassandra, i am wondering how the Cassandra applications are
> deployed in the cloud. Does Cassandra have a client server architecture and
> the application is deployed as a 3rd tier that sends over queries to the
> clients, which then submit them to the Cassandra servers?  Or does the
> application submit the request directly to any of the Cassandra server
> which then decides where the query will be routed to, and then gathers the
> response and returns that to the application.
>
> Does the application accessing the data get deployed on the same nodes in
> the cloud as the Cassandra cluster itself? Or on separate nodes?  Are there
> any best practices available in this regard?
>
> thanks
>