Thanks, Igor.

That enhancement will be very useful. Both faster load (parallel) and more 
efficiency (not transferring all data <n> times) are highly desirable.

Roger

From: Igor Rudyak [mailto:[email protected]]
Sent: Thursday, August 03, 2017 10:58 PM
To: [email protected]
Subject: Re: Cassandra Cache Store: How are loadCache() queries distributed

Hi Roger,

As of now Cassandra Cache Store loadCache() implementation is pretty 
straightforward - it sends all provided CQL queries from all Ignite nodes. 
There is no query analysis to distribute data loading routine among cluster 
nodes.

There is an enhancement ticket created for this: 
https://issues.apache.org/jira/browse/IGNITE-3962<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_IGNITE-2D3962&d=DwMFaQ&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=Um-YJWzYXVumPwMqixM6akpUk4J0hAYBgLfaLalcoio&s=I27CtlmoY0a0bYf8Iw9GSJrC0Yv0mizq3iIx9oo7TcA&e=>

Igor



On Thu, Aug 3, 2017 at 2:29 PM, Roger Fischer (CW) 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

could someone please explain to me how loadCache() queries are distributed to 
the Cassandra instances when using the Cassandra Cache Store module.

I used Ignite logging and Cassandra server tracing (system_traces.sessions) to 
try to determine how queries are distributed, but I can’t make sense of what I 
have observed.

I am quite sure of: An ignite server stores the objects for which it is the 
primary or a backup. It ignores other objects received from Cassandra.

I first tried a load-all scenario, with one query (select * from table) passed 
in the loadCache() call.

Initially, it looked like each Ignite server sends the query to one Cassandra 
node. That seems reasonable.

However, I have also observed cases when each Ignite server sends the query to 
more than one Cassandra node. Why?

Then I tried to call loadCache() with multiple queries. Specifically I created 
a query for each Cassandra partition. Best-practice for Cassandra is to limit 
queries to a single partition.

One test seemed to imply that each Ignite server sends all queries, 
distributing them across the available Cassandra nodes. This seems reasonable.

However, in another test one query (out of 6) got sent (really executed in 
Cassandra) only once, most got sent twice, and a few three times. With 3 Ignite 
servers, I would have expected each query to be sent 3 times (once from each 
Ignite server).

I am quite suspect of that last observation, as it would invalidate what I 
stated earlier as “quite sure of”. Maybe Cassandra did not record all queries 
in the sessions table.

So how does Ignite handle a loadCache() request when there are <n> Ignite 
servers and <m> Cassandra servers. The loadCache() call is made in an Ignite 
client.
a) when there is a single query provided to loadCache().
b) when there are multiple (<r>) queries provided to loadCache().
c) does it make any difference if the query includes all Cassandra partitioning 
key columns in the where clause (ie. would the Cassandra Cache Store analyze 
the query to optimize the distribution)?
d) does it make any difference if the query includes the Ignite affinity key 
(ie. would the Cassandra Cache Store analyze the query to optimize which 
queries to send from where)?

Thanks…

Roger



Reply via email to