[jira] [Created] (FLINK-13479) Cassandra POJO Sink - Prepared Statement query does not have deterministic ordering of columns - causing prepared statement cache overflow

Ronak Thakrar (JIRA) Mon, 29 Jul 2019 15:50:44 -0700

Ronak Thakrar created FLINK-13479:
-------------------------------------

             Summary: Cassandra POJO Sink - Prepared Statement query does not 
have deterministic ordering of columns - causing prepared statement cache 
overflow
                 Key: FLINK-13479
                 URL: https://issues.apache.org/jira/browse/FLINK-13479
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Cassandra
            Reporter: Ronak Thakrar



While using Cassandra POJO Sink as part of Flink Jobs - prepared statements 
query string which is automatically generated while inserting the data(using 
Mapper.saveQuery method), Cassandra entity does not have deterministic ordering 
enforced-so every time column position is changed a new prepared statement is 
generated and used.  As an effect of that prepared statement query cache is 
overflown because every time when insert statement query string is generated by 
- columns are in random order. 

Following is the detailed explanation for what happens inside the Datastax java 
driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):

The current Mapper uses random ordering of columns when it creates prepared 
queries. This is fine when only 1 java client is accessing a cluster (and 
assuming the application developer does the correct thing by re-using a 
Mapper), since each Mapper will reused prepared statement. However when you 
have many java clients accessing a cluster, they will each create their own 
permutations of column ordering, and can thrash the prepared statement cache on 
the cluster.

I propose that the Mapper uses a TreeMap instead of a HashMap when it builds 
its set of AliasedMappedProperty - sorted by the column name 
(col.mappedProperty.getMappedName()). This would create a deterministic 
ordering of columns, and all java processes accessing the same cluster would 
end up with the same prepared queries for the same entities.

This issue is already fixed in the Datastax java driver update version(3.3.1) 
which is not used by Flink Cassandra connector (using 3.0.0).

I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector and 
tested, it stopped creating new prepared statements with different ordering of 
column for the same entity. I have the fix for this issue and would like to 
contribute the change and will raise the PR request for the same. 

Flink Cassandra Connector Version: flink-connector-cassandra_2.11

Flink Version: 1.7.1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13479) Cassandra POJO Sink - Prepared Statement query does not have deterministic ordering of columns - causing prepared statement cache overflow

Reply via email to