Hi Amit,

a) If queries are submitted to co-ordinator nodes (i assume this includes
> writes as well as reads) then:
>   -- is this the approach also followed for the initial data load?
>

Writes get sent to all replica nodes, and then the coordinator responds to
the client as soon as enough replicas have responded to achieve the
configured consistency level.


>   -- some select queries may not have restrictions on all the partition
> key columns, and Cassandra would reject such a query, but we we utilize
> ALLOW FILTER then the query will execute, since there is no way to
> determine which node to send the query to, it will be sent to all the nodes
> that could potentially have results. In such a case it would seem that the
> co-ordinator would gather the results from all the nodes and return it to
> the application.
>

Correct, if the data is on multiple ranges, the coordinator will make
queries to as many replicas needed to cover those ranges and will then
gather those results.  Using tracing (blog post
<https://www.datastax.com/dev/blog/tracing-in-cassandra-1-2>) is a good way
to get insights into what replicas are involved in your queries.

Application does not know which nodes may have the data, so it can not
> directly send the data to the right nodes. Even if application had the
> data, it may not be able to perform load balancing.
>

Most client drivers have a nice optimization called token-aware load
balancing (i.e. DataStax Java Driver's TokenAwarePolicy
<https://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/TokenAwarePolicy.html>),
where if the driver is able to infer which partition is being accessed, it
will prioritize coordinators that have that data.   This determination will
typically work if all parts of your partition key are bind parameters in
your statement (requirements
<https://docs.datastax.com/en/developer/java-driver/3.6/manual/load_balancing/#requirements>
).

Does the coordinator perform load balancing? I imagine it would have to ...
>

The coordinator utilizes a dynamic snitch
<http://cassandra.apache.org/doc/latest/operating/snitch.html> to determine
where to route read queries.

Thanks,
Andy

On Sat, Jan 12, 2019 at 9:14 AM amit sehas <cu...@yahoo.com.invalid> wrote:

> Thanks for your response, this leads to some further questions:
>
> a) If queries are submitted to co-ordinator nodes (i assume this includes
> writes as well as reads) then:
>   -- is this the approach also followed for the initial data load?
>   -- some select queries may not have restrictions on all the partition
> key columns, and Cassandra would reject such a query, but we we utilize
> ALLOW FILTER then the query will execute, since there is no way to
> determine which node to send the query to, it will be sent to all the nodes
> that could potentially have results. In such a case it would seem that the
> co-ordinator would gather the results from all the nodes and return it to
> the application.
>
> b) This seems as if this is a 3 tier architecture. Application sends query
> to coordinator. coordinator sends it to the right nodes.
> Application does not know which nodes may have the data, so it can not
> directly send the data to the right nodes. Even if application had the
> data, it may not be able to perform load balancing. Does the coordinator
> perform load balancing? I imagine it would have to ...
>
> thanks
>
> On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore <
> rajesh10si...@gmail.com> wrote:
>
>
> Application would send request to one of the node(called as coordinating
> node) & this coordinating node is aware of where your result
> lies(considering you have modelled your DB correctly, it should not result
> in scatter& gather kind of stuff) and thus delegate the query to respective
> node, so it does follow client server architecture & your assumption is
> correct.
> As per my knowledge , generally application should be unaware where your
> result lies & must not be tied to a specific node because it would have
> bigger implications when stuffs like re-balancing would occur. So, your
> application should be unaware where your data lies (in which node I meant),
> but obviously keeping application in same region as that of cassandra
> cluster would make sense, can't comment much on cloud deployment.
>
> Thanks,
> Rajesh
>
> On Sat, Jan 12, 2019 at 8:54 AM amit sehas <cu...@yahoo.com.invalid>
> wrote:
>
> I am new to Cassandra, i am wondering how the Cassandra applications are
> deployed in the cloud. Does Cassandra have a client server architecture and
> the application is deployed as a 3rd tier that sends over queries to the
> clients, which then submit them to the Cassandra servers?  Or does the
> application submit the request directly to any of the Cassandra server
> which then decides where the query will be routed to, and then gathers the
> response and returns that to the application.
>
> Does the application accessing the data get deployed on the same nodes in
> the cloud as the Cassandra cluster itself? Or on separate nodes?  Are there
> any best practices available in this regard?
>
> thanks
>
>

Reply via email to