Hi, Slight degradation is expected in some cases. Let me explain how it works. 1) Client sends request to each node (if you have query parallelism > 1 than number of requests multiplied by that num). 2) Each node runs that query against it's local dataset. 3) Each node responses with 100 entries. 4) Client collects all responses and performs reduce.
So what happens when you add node? First of all dataset splits between larger number of nodes, but if dataset is too small you will not see any difference in query processing, or if newly added node does not significantly reduces amount of data for each other node. F.e. you have 9 nodes and add one more. Each node looses no more than 10% of data. In case of small dataset it will not give you any performance boost. In the other hand, client has to send more requests and reduce more data. For instance, with 9 nodes it receives 900 entries, with 10 nodes - 1K entries. Again, if dataset is relatively small you get overhead on client for additional requests/responses and data. The best scaling show queries by primary key, because in that case client can send request to affinity node directly without broadcasting to all nodes. So when can you get scaling profit for SQL? 1) You have a very large dataset. Each node will process less data and they will do it in parallel. Here boost for each node will beat additional overhead on client. 2) You add more clients that run queries in parallel. Total throughput increases because request/response overhead will be divided between larger number of clients. (Or you can set more connections per node to better utilize client machine resources). 3) You query for primary key. Please note one more thing, that overall latency depends on how fast the slowest node, because client will wait all responses. Thanks! -Dmitry -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/