[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-21 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1477517177

   @zentol I addressed all your latest comments. Can we merge the PR ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-20 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1475934338

   @zentol I addressed all the points PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-17 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1474052060

   @zentol I pushed for security reasons, I have only the addSplitsBack case to 
deal with in lazy splits generation scenario. You can start to take a look at 
the other things if you have time or just wait for this part to be finished.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-14 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1468398944

   > (Note: This review is specifically about the JMX parts; nothing else)
   
   OK. Tell me if you have comments on the size-based split.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-14 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1468389988

   > > Instead of this whole JMX business we could use nodetool directly.
   > > (Note: This review is specifically about the JMX parts; nothing else)
   > 
   > Indeed, this binary is included in the offical cassandra image ! Way 
simpler. Thanks for the suggestion, I removed all the JMX related code.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-14 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1468388485

   > Instead of this whole JMX business we could use nodetool directly.
   > 
   > (Note: This review is specifically about the JMX parts; nothing else)
   
   Indeed, this binary is included in the offical cassandra image ! Way 
simpler. Thanks for the suggestion, I removed all the JMX related code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-13 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1465798685

   @zentol PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-03-04 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1454671877

   @mosche contributed testContainers configuration that allows to avoid having 
a private docker image + timeouts etc...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-02-28 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1448409889

   @zentol I addressed all your comments and changed the splits architecture:
   I Introduced a table size estimation (based on Cassandra statistical size 
estimates). I added an optional user conf to specify max split memory size. If 
set, the source generates splits of `maxSplitMemorySize` with protection 
measures (in comparison to task parallelism) for number of splits. 
   I now read a split as a whole (no state needed).
   I added the related splits and size tests. They require to use JMX to force 
mem tables flush on the cassandra cluster so that the system size estimates can 
be updated (as we just wrote test data). The official Cassandra image 
deactivates jmx, to enable it we need to provide authentication and modify 
cassandra-env.sh so I had to create my own image (!)
   Also the flush is very long (30s) so for all split tests I wrote and flush 
only once (contrary to other tests that write test data for each test)
   
   PTAL. I hope it will be last round of review as I changed a lot and spent so 
much time on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-02-14 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1429539810

   @zentol the PR is not ready for review yet as I changed a lot (split size 
estimation and fetch state management) it is still under coding, the tests 
don't pass yet. But still thanks for the confirmation about the general 
architecture about splits and memory


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-27 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1406744295

   I have implified a lot SplitsGenerator: it is now based only on Cassandra 
min and max tokens per partitioner and generates (min,x][x, max) for 2 splits 
for example. It now supports all the Cassandra partitioners. Now a Cassandra 
split is just a ring range (a split of cassandra token range). This also 
simplifies a lot the range query generation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-26 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1405116976

   @zentol I'm almost done replacing the whole split mechanism. I could even 
remove the size estimates as Flink does not rely on expected split memory size 
as Beam. So I could use a very simple splitting mechanism based only on 
Cassandra min and max tokens per partitioner (min,x][x, max) for 2 splits for 
example. I also could consider that a Cassandra split processes only a single 
ring range and get rid of RingRange class. It'll change much. I also need to do 
a final pass before it is ready for another round. Otherwise you'll see sketchy 
things. don't waste your time reviewing now. I'll ping you to take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-26 Thread via GitHub


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1404820642

   > > I don't think it is an ASF rule but fair enough,
   > 
   > FYI; By and large it should be viewed as a legal requirement. By copying 
code from cassandra you have to adhere to their licensing, which among other 
state that you must have prominent notices for changes to a file.
   
   You're right cf ASF v2 4.d:
   `If the Work includes a "NOTICE" text file as part of its distribution, then 
any Derivative Works that You distribute must include a readable copy of the 
attribution notices contained within such NOTICE file, excluding those notices 
that do not pertain to any part of the Derivative Works, in at least one of the 
following places: within a NOTICE text file distributed as part of the 
Derivative Works; within the Source form or documentation, if provided along 
with the Derivative Works; or, within a display generated by the Derivative 
Works, if and wherever such third-party notices normally appear. The contents 
of the NOTICE file are for informational purposes only and do not modify the 
License. You may add Your own attribution notices within Derivative Works that 
You distribute, alongside or as an addendum to the NOTICE text from the Work, 
provided that such additional attribution notices cannot be construed as 
modifying the License.`
   
   Anyway, as we decided to replace the split code with something that was 
never merged to Beam, there is no need anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-19 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1397142509

   > > I did not author the RingRange and SplitsGenerator classes. I got them 
from the Apache Beam Cassandra connector.
   > 
   > If you're copying stuff from other projects it please add some 
notice/attribution to the files and update the Flink source notice accordingly.
   
   ah yes I forgot this Flink guideline. I don't think it is an ASF rule but 
fair enough, I'll add it to the javadoc and notice.
   
   > 
   > > Back in 2017 I coded a [splitter for Cassandra Beam 
connector](https://github.com/echauchot/beam/blob/bfa33b85b6b310556ffa5c44c99bef50575b2c56/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java#L346)
 that works [with 
tokens](https://github.com/echauchot/beam/blob/BEAM-245-CassandraIO/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DataSizeEstimates.java)
 also but that is simpler and supports all the Cassandra partitionners. Would 
you prefer that we use this other approach ?
   > 
   > Not sure? Why didn't it make it into Beam? Do you know why the Beam code 
is written the way it is?
   
   Actually, another splitting approach was opted in by the reviewer in 2017. 
But short after there was another author who changed the splitting to something 
similar to my 2017 token based code. So when I thought about coding the split 
for Flink connector I decided to take the version of the code that was merged 
to Beam master. But it is true that it is over complicated, redundant and not 
supporting the non-default Cassandra partitioner. The approach I had in 2017 
was the same as the Cassandra Spark connector written by datastax (tokens + 
cassandra size estimates statistics). So I think I'll try to reuse this code, 
migrate it to Flink and update it to the latest Cassandra version and push it 
in this PR. 
   WDYT ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-18 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1387058843

   > @zentol I did not author the `RingRange` and `SplitsGenerator` classes. I 
got them from the `Apache Beam` Cassandra connector. I agree, some notions need 
clarification. I'll add comments.
   
   I was thinking: this `SplitsGenerator/RingRanges` I got from the Apache Beam 
project seems weird in many aspects that you pointed out. Back in 2017 I coded 
a [partitionner for Cassandra Beam connector]( 
https://github.com/echauchot/beam/blob/BEAM-245-CassandraIO/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DataSizeEstimates.java)
 that works with tokens also but that is simpler and supports all the Cassandra 
partitionners. Would you prefer that we use this other approach ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-18 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1386836888

   @zentol I did not author the `RingRange` and `SplitsGenerator` classes. I 
got them from the `Apache Beam` Cassandra connector. I agree, some notions need 
clarification. I'll add comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2023-01-02 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1368798535

   @zentol Ryan gave a LGTM can you take a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-12-23 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1364182149

   @RyanSkraba I finished everything I wanted to change PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-12-23 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1363862897

   Also I figured out that the the test of the cassandra tokens split I got 
from Apache Beam tests only for `RandomPartitioner` (the tokens split method is 
compatible with both `RandomPartitioner` and `Murmur3Partitioner` as only 
tokens boundaries vary between the 2) whereas in the actual data split it is 
compatible only with `Murmur3Partitioner`. So it sould test the tokens split 
for this cassandra partitionner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-12-22 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1363328649

   @RyanSkraba thanks for your review. I addressed all your comments PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-12-09 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1344284613

   > > Is there anything I can do to help the review ?
   > 
   > For better or worse, no :)
   > 
   > Holidays are coming up so I won't review it this year (scream) at the very 
least. If nothing happened til January (which I'd like to avoid!) I'll take a 
look myself to get things moving.
   
   Thanks @zentol 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-12-05 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1337060717

   Any updates on this PR ? This subject has been open for a long time: all the 
other Cassandra PRs (https://github.com/apache/flink/pull/19586, 
https://github.com/apache/flink/pull/19680, 
https://github.com/apache/flink-connector-cassandra/pull/2) were just a 
preparation for this PR and this PR was submitted in October. 
   Is there anything I can do to help the review ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-11-30 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1332013091

   > I'm taking a look! Thanks for the ping +1
   
   @RyanSkraba any review comments so far ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

2022-11-25 Thread GitBox


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1327516095

   R: @zentol 
   R: @RyanSkraba you mentioned that you already started to review this PR 
(when it was in the main flink repo) car you send your review comments ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org