Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-21 Thread Stig Døssing
Hi Kris, I'm sorry the new spout has been so unstable. We should probably have marked it as an alpha/beta release, instead of just following the Storm version number. I agree that we should try to avoid duplicating the Kafka config, and there's a PR for that here https://github.com/apache/storm/p

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-20 Thread Kristopher Kane
>From the perspective of a user who wraps a config driven product around Storm, the setup of the KafkaSpout, and its configuration, has been a moving target that is hard to track. In addition, the lack of supporting usage examples across the README vs examples/ has also made it difficult. My opini

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-17 Thread Priyank Shah
Hugo, I agree about the benefits of immutability and encapsulation. But are they so important for the case we are discussing? As far as code being super fragile, hard to debug and being thread unsafe, I don’t really think its applicable here. Can you elaborate on how does it make the code super

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-14 Thread Stig Døssing
It would obviously be ideal if Flux could be made to support object creation via builders, but if that's not possible I think leaving the KafkaSpoutConfig constructor public is a decent workaround. We are still getting some of the benefits of the Builder pattern, even if Flux doesn't use builder.bu

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-14 Thread Hugo Da Cruz Louro
Hi, Flux is simply a mechanism to enabling Java objects creation using a descriptor file. If Flux does not support creating classes that follow the Builder design pattern, that is a Flux limitation and has to be fixed. It is not reasonable to impose that no one can write a builder because Flux

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-14 Thread Priyank Shah
Hi Stig/Hugo, That constructor is indeed public. I actually made that change but forgot about it. https://github.com/apache/storm/commit/5ff7865cf0b86f40e99b54e789fa60b8843191aa The reason for making that change is to make it work with flux. I think changing flux code to access private constru

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-14 Thread Hugo Da Cruz Louro
@Harsha @Stig, I agree with you. Let’s make the de facto implementation manual partition assignment. I have already adjusted the KafkaTridentSpout code to reflect @Stig’s changes and things seem to be working very well for Trident as well. I am tracking that on https://issues.apache.org/jira/bro

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-14 Thread Stig Døssing
It looks public to me? https://github.com/apache/storm/blob/38e997ed96ce6627cabb4054224d7044fd2b40f9/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L461 I think its good to be able to avoid telescoping constructors, while at the same time not having a

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-13 Thread Priyank Shah
Hi Stig, I think KafkaSpoutConfig constructor is private and it's throwing errors while using the approach that you mentioned. Making it public defeats be purpose of the builder. Can you give it a shot and confirm at your end if it's possible? Thanks Priyank Sent from my iPhone > On Jun 13, 2

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-13 Thread Stig Døssing
Hi Priyank, I changed my mind since those mails were sent. I don't think setKey/Value are very useful. They couldn't be used with the default Kafka deserializers, only for deserializers implemented by the user, and then only if they were declared to implement SerializableDeserializer. I agree that

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-12 Thread Priyank Shah
Hi Stig, I think PR https://github.com/apache/storm/pull/2155/files you created gets rid of setKey and setValue. I am fine with it and in fact that’s what I was suggesting in first place. However, your last two email replies suggest something else. Just making sure you are not going to undo any

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-10 Thread Stig Døssing
Priyank, I was a bit too hasty in the last response. The setKey/Value functions are necessary when users want to set only the key or the value deserializer. I think we should keep those. It may be possible to deduplicate the functionality on the API by removing the Builder constructors that takes d

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-10 Thread Stig Døssing
Harsha, +1 for simplifying away those methods that are just setting consumer config. The properties I think we should keep are all the spout configuration (timeouts, retry handling, tuple construction). Maybe we deprecate the consumer config functions on 1.x and remove them on master? Priyank, W

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Harsha
Dynamic assignment is what causing all the issues that we see now. 1. Duplicates at the start of the KafkaSpout and upon any rebalance 2. Trident Kafka Spout not holding the transactional batches. Many corner cases can easily produce duplicates. There is no point in keeping dynamic assignment giv

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Priyank Shah
Just realized that from my email it was not clear what were my comments versus Stig’s. I have put mine in [] below. On 6/9/17, 5:42 PM, "Priyank Shah" wrote: I want to add a few things other than the issues raised by Sriharsha and Hugo. I am pasting one of the other emails that I sent some

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Priyank Shah
I want to add a few things other than the issues raised by Sriharsha and Hugo. I am pasting one of the other emails that I sent sometime back about cleaning up KafkaSpoutConfig. Stig responsed to that email. Trying to answer that in this email so that it is all in one place. Answers in line.

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Hugo Da Cruz Louro
+1 for simplifying KafkaSpoutConfig. Too many constructors and too many methods.. I am not sure it’s justifiable to have any methods that simply set KafkaConsumer properties. All of these properties should just go in a Map, which is what KafkaConsumer receives, and what was supported in the ini

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Harsha
I think question why we need all those settings when a user can pass it via Properties with consumer properties defined or via Map conf object. Having the methods on top of consumer config means every time Kafka consumer property added or changed one needs add a builder method. We need to get out

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Stig Døssing
I'd be happy with a simpler KafkaSpoutConfig, but I think most of the configuration parameters have good reasons for being there. Any examples of parameters you think we should remove? 2017-06-09 21:34 GMT+02:00 Harsha : > +1 on using the manual assignment for the reasons specified below. We > w

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-06-09 Thread Harsha
+1 on using the manual assignment for the reasons specified below. We will see duplicates even in stable conditions which is not good. I don’t see any reason not to switch to manual assignment. While we are at it we should refactor the KafkaConfig part. It should be as simple as accepting the kaf

Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

2017-05-31 Thread Stig Døssing
Hi, We've recently seen some issues raised by users using the default subscription API in the new KafkaSpout ( https://issues.apache.org/jira/browse/STORM-2514, https://issues.apache.org/jira/browse/STORM-2538). A while ago an alternative subscription implementation was added ( https://github.com