[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 Tests are running https://travis-ci.org/rmetzger/flink/builds/166441047 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 Thank you for the review. I'll address your comments, rebase again, test it, and if it turns green merge it ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 Just found some minor issues that can be fixed when merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 @rmetzger Thanks for addressing the comments! Did a final pass, and the changes look good to me. I agree with merging the connector as is. Adding the timestamp to the regular sink interface seems like a good long term solution. +1 to merge once travis turns green ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 @tzulitai I addressed all your comments except the one relating `FlinkKafkaProducer010Configuration`: I had a quick offline discussion with @StephanEwen about the issue and he suggested to add the timestamp to the regular sink interface. But I would like to make that change separate from this one, and merge the Kafka 0.10. support as-is. This will make it easier for people to try it out now and provide us with feedback. Also, I think some other Kafka related pull requests are blocked on this one. @tzulitai could you do a final pass over the changes. If you agree, I'd like to merge it afterwards. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 I'm currently working on rebasing the PR and addressing the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 @cjstehno I would expect this to be in the 1.2.0 major release, which would probably be ~2 months from now according to Flink's past release cycle. The Flink community usually doesn't release major new features like this between minor bugfix releases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 Looks like we need to rebase this PR on the recently merged Kerberos support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user cjstehno commented on the issue: https://github.com/apache/flink/pull/2369 Any thoughts on when this might make it into a release? We are having issues running Flink with Kafka 0.10 and would like to have an idea of whether we can/should wait for this or pull it and try building our own. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user nemccarthy commented on the issue: https://github.com/apache/flink/pull/2369 +1 for this pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 Thanks Robert for addressing my comments :) Overall, I like the new hybrid producer approach. However, I'm still curious whether or not it is possible / reasonable to drop the `FlinkKafkaProducer010Configuration` return type of invocation (b), and let both invocation methods return `FlinkKafkaProducer010` instead. So, ``` FlinkKafkaProducer010 kafka = new FlinkKafkaProducer010(...) // or FlinkKafkaProducer010 kafka = FlinkKafkaProducer010.writeToKafkaWithTimestamps(...) for timestamp support // setter config methods directly done on the FlinkKafkaProducer010 instance regardless of (a) or (b) kafka.setLogFailuresOnly(true) kafka.setFlushOnCheckpoint(true) kafka.setWriteTimestampToKafka(true) // would not have effect if original invocation method (a) was used ``` But we'll need to be bit hacky in `invokeInternal(element, elementTimestamp)`, something like only letting the given `timestamp` to `ProducerRecord` be non-null if `writeTimestampToKafka && elementTimestamp != Long.MIN_VALUE`. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 @tzulitai I've addressed your comments. The Producer is now "hybrid": you can use it with both invocation methods. The AbstractFetcher now accepts a long timestamp instead of a record. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user eliaslevy commented on the issue: https://github.com/apache/flink/pull/2369 @rmetzger that's the one. NP. I realize breaking it up makes things easier. I just thought I'd mention it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 @eliaslevy, I assume you are referring to https://issues.apache.org/jira/browse/FLINK-4050. Its good that you are mentioning the issue again, so I can move it a bit up on my TODO list. I would personally prefer to first add the Kafka 0.10 module in this pull request and then resolve FLINK-4050 independently. I know that this might lead to a little bit of duplicate work on the Kafka 0.10 code, but on the other hand its easier to discuss one issue at a time :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user eliaslevy commented on the issue: https://github.com/apache/flink/pull/2369 This may be the wrong place to bring this up, but as you are discussing changes to the Kafka connector API, I think it is worth bring it up. As I've pointed out elsewhere, the current connector API makes it difficult to make use of Kafka native serializer or deserializer (`org.apache.kafka.common.[Serializer, Deserializer]`), which can be configured via the Kafka client and producer configs. The connector code assumes that `ConsummerRecord`s and `ProducerRecord`s are both parametrized as ``, with the Flink serdes performing the conversion to/from `byte[]`. This makes it difficult to make use of Confluent's `KafkaAvroSerializer` and `KafkaAvroDecoder`, which make use of their [schema registry](http://docs.confluent.io/3.0.0/schema-registry/docs/serializer-formatter.html#serializer). If you are going to change the connector API, it would be good to tackle this issue at the same time to avoid future changes. The connector should allow the type parametrization of the Kafka consumer and producer, and should make use of a pass through Flink serde by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 Thank you for the review @tzulitai. I'll try to find some time soon to look into your comments in detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2369 @StephanEwen: The explicit exclude is actually not needed, because the kafka version defined in the connector has precedence over transitive kafka versions. ``` [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ flink-connector-kafka-0.10_2.10 --- [INFO] org.apache.flink:flink-connector-kafka-0.10_2.10:jar:1.2-SNAPSHOT [INFO] +- org.apache.flink:flink-connector-kafka-0.9_2.10:jar:1.2-SNAPSHOT:compile [INFO] | \- org.apache.flink:flink-connector-kafka-base_2.10:jar:1.2-SNAPSHOT:compile [INFO] +- org.apache.kafka:kafka-clients:jar:0.10.0.1:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile [INFO] | \- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2369 Left a few comments on some high-level design choices for a first review. Mostly on `FlinkKafkaProducer010`, I wonder if there are other better possibilities over there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2369 Just looked over this briefly. What struck me first is that this again uses the dirty trick of adding a dependency to "Flink Kafka 0.9" and then transitively excluding "Kafka 0.9". Is there a nicer way to solve this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---