[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-10-10 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
Tests are running https://travis-ci.org/rmetzger/flink/builds/166441047


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-10-10 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
Thank you for the review. I'll address your comments, rebase again, test 
it, and if it turns green merge it ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-27 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
Just found some minor issues that can be fixed when merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-26 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
@rmetzger Thanks for addressing the comments! Did a final pass, and the 
changes look good to me.
I agree with merging the connector as is. Adding the timestamp to the 
regular sink interface seems like a good long term solution.

+1 to merge once travis turns green ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-26 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
@tzulitai I addressed all your comments except the one relating 
`FlinkKafkaProducer010Configuration`: I had a quick offline discussion with 
@StephanEwen about the issue and he suggested to add the timestamp to the 
regular sink interface.
But I would like to make that change separate from this one, and merge the 
Kafka 0.10. support as-is. This will make it easier for people to try it out 
now and provide us with feedback. Also, I think some other Kafka related pull 
requests are blocked on this one.

@tzulitai could you do a final pass over the changes. If you agree, I'd 
like to merge it afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-26 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
I'm currently working on rebasing the PR and addressing the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-20 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
@cjstehno I would expect this to be in the 1.2.0 major release, which would 
probably be ~2 months from now according to Flink's past release cycle. The 
Flink community usually doesn't release major new features like this between 
minor bugfix releases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-20 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
Looks like we need to rebase this PR on the recently merged Kerberos 
support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-19 Thread cjstehno
Github user cjstehno commented on the issue:

https://github.com/apache/flink/pull/2369
  
Any thoughts on when this might make it into a release? We are having 
issues running Flink with Kafka 0.10 and would like to have an idea of whether 
we can/should wait for this or pull it and try building our own. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-09-18 Thread nemccarthy
Github user nemccarthy commented on the issue:

https://github.com/apache/flink/pull/2369
  
+1 for this pr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-25 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
Thanks Robert for addressing my comments :)

Overall, I like the new hybrid producer approach. However, I'm still 
curious whether or not it is possible / reasonable to drop the 
`FlinkKafkaProducer010Configuration` return type of invocation (b), and let 
both invocation methods return `FlinkKafkaProducer010` instead. So,

```
FlinkKafkaProducer010 kafka = new FlinkKafkaProducer010(...)
// or FlinkKafkaProducer010 kafka = 
FlinkKafkaProducer010.writeToKafkaWithTimestamps(...) for timestamp support

// setter config methods directly done on the FlinkKafkaProducer010 
instance regardless of (a) or (b)
kafka.setLogFailuresOnly(true)
kafka.setFlushOnCheckpoint(true)
kafka.setWriteTimestampToKafka(true) // would not have effect if original 
invocation method (a) was used
```

But we'll need to be bit hacky in `invokeInternal(element, 
elementTimestamp)`, something like only letting the given `timestamp` to 
`ProducerRecord` be non-null if `writeTimestampToKafka && elementTimestamp != 
Long.MIN_VALUE`.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-25 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
@tzulitai I've addressed your comments.
The Producer is now "hybrid": you can use it with both invocation methods.
The AbstractFetcher now accepts a long timestamp instead of a record.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-24 Thread eliaslevy
Github user eliaslevy commented on the issue:

https://github.com/apache/flink/pull/2369
  
@rmetzger that's the one. NP.  I realize breaking it up makes things 
easier.  I just thought I'd mention it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-24 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
@eliaslevy, I assume you are referring to 
https://issues.apache.org/jira/browse/FLINK-4050. 
Its good that you are mentioning the issue again, so I can move it a bit up 
on my TODO list.

I would personally prefer to first add the Kafka 0.10 module in this pull 
request and then resolve FLINK-4050 independently. I know that this might lead 
to a little bit of duplicate work on the Kafka 0.10 code, but on the other hand 
its easier to discuss one issue at a time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-23 Thread eliaslevy
Github user eliaslevy commented on the issue:

https://github.com/apache/flink/pull/2369
  
This may be the wrong place to bring this up, but as you are discussing 
changes to the Kafka connector API, I think it is worth bring it up.  

As I've pointed out elsewhere, the current connector API makes it difficult 
to make use of Kafka native serializer or deserializer 
(`org.apache.kafka.common.[Serializer, Deserializer]`), which can be configured 
via the Kafka client and producer configs.  

The connector code assumes that `ConsummerRecord`s and `ProducerRecord`s 
are both parametrized as ``, with the Flink serdes performing 
the conversion to/from `byte[]`.  This makes it difficult to make use of 
Confluent's `KafkaAvroSerializer` and `KafkaAvroDecoder`, which make use of 
their [schema 
registry](http://docs.confluent.io/3.0.0/schema-registry/docs/serializer-formatter.html#serializer).

If you are going to change the connector API, it would be good to tackle 
this issue at the same time to avoid future changes.  The connector should 
allow the type parametrization of the Kafka consumer and producer, and should 
make use of a pass through Flink serde by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-23 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
Thank you for the review @tzulitai. I'll try to find some time soon to look 
into your comments in detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-23 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2369
  
@StephanEwen: The explicit exclude is actually not needed, because the 
kafka version defined in the connector has precedence over transitive kafka 
versions.
```
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
flink-connector-kafka-0.10_2.10 ---
[INFO] org.apache.flink:flink-connector-kafka-0.10_2.10:jar:1.2-SNAPSHOT
[INFO] +- 
org.apache.flink:flink-connector-kafka-0.9_2.10:jar:1.2-SNAPSHOT:compile
[INFO] |  \- 
org.apache.flink:flink-connector-kafka-base_2.10:jar:1.2-SNAPSHOT:compile
[INFO] +- org.apache.kafka:kafka-clients:jar:0.10.0.1:compile
[INFO] |  +- net.jpountz.lz4:lz4:jar:1.3.0:compile
[INFO] |  \- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-19 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2369
  
Left a few comments on some high-level design choices for a first review. 
Mostly on `FlinkKafkaProducer010`, I wonder if there are other better 
possibilities over there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2369: [FLINK-4035] Add a streaming connector for Apache Kafka 0...

2016-08-16 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2369
  
Just looked over this briefly.
What struck me first is that this again uses the dirty trick of adding a 
dependency to "Flink Kafka 0.9" and then transitively excluding "Kafka 0.9". Is 
there a nicer way to solve this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---