[jira] [Created] (KAFKA-4802) Support direct ByteBuffer serializers/deserializers in clients

2017-02-24 Thread Matt Sicker (JIRA)
Matt Sicker created KAFKA-4802:
--

 Summary: Support direct ByteBuffer serializers/deserializers in 
clients
 Key: KAFKA-4802
 URL: https://issues.apache.org/jira/browse/KAFKA-4802
 Project: Kafka
  Issue Type: New Feature
  Components: clients
Reporter: Matt Sicker


RecordAccumulator and Fetcher are already written to take advantage of a pool 
of ByteBuffers, but Serializer and Deserializer require you to return a byte 
array. If I have a key or value format that is better handled directly via 
ByteBuffer, the added conversion to a byte array introduces unnecessary garbage.

An example use case would be in enhancing the KafkaAppender in Log4j 2 to 
support garbage free logging (or minimal garbage; I haven't really looked at 
the entire code path).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884094#comment-15884094
 ] 

James Cheng commented on KAFKA-1342:


[~jjkoshy], [~toddpalino], is it still true that it is unsafe to increase the 
number of controlled shutdown requests? We currently have brokers with 10,000 
partitions each, and there is no way they can effectively shutdown within the 
shutdown timeout of 30 seconds, even with the current default of 
controlled.shutdown.max.retries=3. If the brokers aren't able to shutdown 
within the 90 seconds (30 seconds * 3), then when we bounce them and they start 
back up too quickly, we end up with a broker with all of its replica fetchers 
stopped (as described in this JIRA). This also seems like a specific instance 
of KAFKA-1120

We have increased that to 40 or so, to allow brokers up to 20 minutes to 
shutdown. Usually, it takes them 8 minutes.

Is it better to increase the value of controller.socket.timeout.ms? If we 
increase this to 25 minutes for example, doesn't that impact much more than 
just the shutdown request? Won't normal controller->broker communication like 
LeaderAndIsr and MetadataUpdate requests also be subject to an 25 minute 
timeout?


> Slow controlled shutdowns can result in stale shutdown requests
> ---
>
> Key: KAFKA-1342
> URL: https://issues.apache.org/jira/browse/KAFKA-1342
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Joel Koshy
>Assignee: Joel Koshy
>Priority: Critical
>  Labels: newbie++, newbiee, reliability
> Fix For: 0.10.3.0
>
>
> I don't think this is a bug introduced in 0.8.1., but triggered by the fact
> that controlled shutdown seems to have become slower in 0.8.1 (will file a
> separate ticket to investigate that). When doing a rolling bounce, it is
> possible for a bounced broker to stop all its replica fetchers since the
> previous PID's shutdown requests are still being shutdown.
> - 515 is the controller
> - Controlled shutdown initiated for 503
> - Controller starts controlled shutdown for 503
> - The controlled shutdown takes a long time in moving leaders and moving
>   follower replicas on 503 to the offline state.
> - So 503's read from the shutdown channel times out and a new channel is
>   created. It issues another shutdown request.  This request (since it is a
>   new channel) is accepted at the controller's socket server but then waits
>   on the broker shutdown lock held by the previous controlled shutdown which
>   is still in progress.
> - The above step repeats for the remaining retries (six more requests).
> - 503 hits SocketTimeout exception on reading the response of the last
>   shutdown request and proceeds to do an unclean shutdown.
> - The controller's onBrokerFailure call-back fires and moves 503's replicas
>   to offline (not too important in this sequence).
> - 503 is brought back up.
> - The controller's onBrokerStartup call-back fires and moves its replicas
>   (and partitions) to online state. 503 starts its replica fetchers.
> - Unfortunately, the (phantom) shutdown requests are still being handled and
>   the controller sends StopReplica requests to 503.
> - The first shutdown request finally finishes (after 76 minutes in my case!).
> - The remaining shutdown requests also execute and do the same thing (sends
>   StopReplica requests for all partitions to
>   503).
> - The remaining requests complete quickly because they end up not having to
>   touch zookeeper paths - no leaders left on the broker and no need to
>   shrink ISR in zookeeper since it has already been done by the first
>   shutdown request.
> - So in the end-state 503 is up, but effectively idle due to the previous
>   PID's shutdown requests.
> There are some obvious fixes that can be made to controlled shutdown to help
> address the above issue. E.g., we don't really need to move follower
> partitions to Offline. We did that as an "optimization" so the broker falls
> out of ISR sooner - which is helpful when producers set required.acks to -1.
> However it adds a lot of latency to controlled shutdown. Also, (more
> importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3436) Speed up controlled shutdown.

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884088#comment-15884088
 ] 

James Cheng commented on KAFKA-3436:


Is there a chance this can be worked on for 0.10.3? We have a cluster with 
10,000 partitions per broker. It regularly takes around 8 minutes to shutdown a 
broker.

> Speed up controlled shutdown.
> -
>
> Key: KAFKA-3436
> URL: https://issues.apache.org/jira/browse/KAFKA-3436
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> Currently rolling bounce a Kafka cluster with tens of thousands of partitions 
> can take very long (~2 min for each broker with ~5000 partitions/broker in 
> our environment). The majority of the time is spent on shutting down a 
> broker. The time of shutting down a broker usually  includes the following 
> parts:
> T1: During controlled shutdown, people usually want to make sure there is no 
> under replicated partitions. So shutting down a broker during a rolling 
> bounce will have to wait for the previous restarted broker to catch up. This 
> is T1.
> T2: The time to send controlled shutdown request and receive controlled 
> shutdown response. Currently the a controlled shutdown request will trigger 
> many LeaderAndIsrRequest and UpdateMetadataRequest. And also involving many 
> zookeeper update in serial.
> T3: The actual time to shutdown all the components. It is usually small 
> compared with T1 and T2.
> T1 is related to:
> A) the inbound throughput on the cluster, and 
> B) the "down" time of the broker (time between replica fetchers stop and 
> replica fetchers restart)
> The larger the traffic is, or the longer the broker stopped fetching, the 
> longer it will take for the broker to catch up and get back into ISR. 
> Therefore the longer T1 will be. Assume:
> * the in bound network traffic is X bytes/second on a broker
> * the time T1.B ("down" time) mentioned above is T
> Theoretically it will take (X * T) / (NetworkBandwidth - X) = 
> InBoundNetworkUtilization * T / (1 - InboundNetworkUtilization) for a the 
> broker to catch up after the restart. While X is out of our control, T is 
> largely related to T2.
> The purpose of this ticket is to reduce T2 by:
> 1. Batching the LeaderAndIsrRequest and UpdateMetadataRequest during 
> controlled shutdown.
> 2. Use async zookeeper write to pipeline zookeeper writes. According to 
> Zookeeper wiki(https://wiki.apache.org/hadoop/ZooKeeper/Performance), a 3 
> node ZK cluster should be able to handle 20K writes (1K size). So if we use 
> async write, likely we will be able to reduce zookeeper update time to lower 
> seconds or even sub-second level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884085#comment-15884085
 ] 

James Cheng commented on KAFKA-1120:


I'm not sure if this helps, but I figured out how to trivially reproduce this 
problem.

1. Start 2 brokers.
2. Put 1 partitions on each of them.
3. Do a controlled shutdown of one of them. It will take its normal 3 attempts 
and then do an uncontrolled shutdown.
4. Once it exits, start it back up immediately.

When the brokers settle back down, you will see that you still have a bunch of 
under replicated partitions. The brokers will be sitting there idly, with no 
strange behavior in their logs.

I've tested this on 0.10.0.0, 0.10.1.1, and 0.10.2

What happens in step 4 is that the controller is still busy processing the 
shutdown from step 3. You can see this by looking at all the messages that are 
being written to controller.log. If the broker starts back up before the 
controller was done processing the controller shutdown, then you will encounter 
this problem.

/cc [~junrao]

I order to make this repro happen faster, I set 
controlled.shutdown.max.retries=1. And, in order to not fill up my hard drive, 
I set log.index.size.max.bytes=10


> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Trying to understand design decision about producer ack and min.insync.replicas

2017-02-24 Thread James Cheng
I read the recent Client Survey 
(https://www.confluent.io/blog/first-annual-state-apache-kafka-client-use-survey/
 
).
 It said that most responders to the survey said that reliability was critical 
or very important. And so given that, I was inspired to follow up on this 
thread.

Grant, Ewen, Ismael, and I all think that defaulting the producer to acks=all 
would be a good thing to do.

And Grant suggested a couple more. The producer suggestion in particular 
(block.on.buffer.full=true and max.in.flight.requests.per.connection=1) I 
believe would prevent silent data loss and prevent message reordering.

What do you all think is the next step? I imagine that the actual 
implementation of these wouldn't be the hard part (you'd just flip a default 
somewhere). The hard part would be the KIP discussions and the migration 
process and whatever backwards compatibility and messaging are required.

-James


> On Feb 3, 2017, at 8:01 AM, Grant Henke  wrote:
> 
> I would be in favor of defaulting acks=all.
> 
> I have found that most people want to start with the stronger/safer
> guarantees and then adjust them for performance on a case by case basis.
> This gives them a chance to understand and accept the tradeoffs.
> 
> A few other defaults I would be in favor of changing (some are harder and
> more controversial than others) are:
> 
> Broker:
> 
>   - zookeeper.chroot=kafka (was "")
>   - This will be easiest when direct communication to zookeeper isn't done
>  by clients
> 
> Producer:
> 
>   - block.on.buffer.full=true (was false)
>   - max.in.flight.requests.per.connection=1 (was 5)
> 
> All:
> 
>   - *receive.buffer.bytes=-1 (was 102400)
>   - *send.buffer.bytes=-1 (was 102400)
> 
> 
> 
> 
> On Fri, Feb 3, 2017 at 2:03 AM, Ismael Juma  wrote:
> 
>> I'd be in favour too.
>> 
>> Ismael
>> 
>> On 3 Feb 2017 7:33 am, "Ewen Cheslack-Postava"  wrote:
>> 
>>> On Thu, Feb 2, 2017 at 11:21 PM, James Cheng 
>> wrote:
>>> 
 Ewen,
 
 Ah right, that's a good point.
 
 My initial reaction to your examples was that "well, those should be in
 separate topics", but then I realized that people choose their topics
>>> for a
 variety of reasons. Sometimes they organize it based on their
>> producers,
 sometimes they organize it based on the nature of the data, but
>> sometimes
 (as you gave examples about), they may organize it based on the
>> consuming
 application. And there are valid reason to want different data types
>> in a
 single topic:
 
 1) You get global ordering
 2) You get persistent ordering in the case of re-reads (where as
>> reading
>>> 2
 topics would cause different ordering upon re-reads.)
 3) Logically-related data types all co-located.
 
 I do still think it'd be convenient to only have to set
 min.insync.replicas on a topic and not have to require producing
 applications to also set acks=all. It'd then be a single thing you have
>>> to
 configure, instead of the current 2 things. (since, as currently
 implemented, you have to set both things, in order to achieve high
 durability.)
 
>>> 
>>> I entirely agree, I think the default should be acks=all and then this
>>> would be true :) Similar to the unclean leader election setting, I think
>>> defaulting to durable by default is a better choice. I understand
>>> historically why a different choice was made (Kafka didn't start out as a
>>> replicated, durable storage system), but given how it has evolved I think
>>> durable by default would be a better choice on both the broker and
>>> producer.
>>> 
>>> 
 
 But I admit that it's hard to find the balance of features/simplicity/
>>> complexity,
 to handle all the use cases.
 
>>> 
>>> Perhaps the KIP-106 adjustment to unclean leader election could benefit
>>> from a sister KIP for adjusting the default producer acks setting?
>>> 
>>> Not sure how popular it would be, but I would be in favor.
>>> 
>>> -Ewen
>>> 
>>> 
 
 Thanks,
 -James
 
> On Feb 2, 2017, at 9:42 PM, Ewen Cheslack-Postava >> 
 wrote:
> 
> James,
> 
> Great question, I probably should have been clearer. log data is an
 example
> where the app (or even instance of the app) might know best what the
 right
> tradeoff is. Depending on your strategy for managing logs, you may or
>>> may
> not be mixing multiple logs (and logs from different deployments)
>> into
 the
> same topic. For example, if you key by application, then you have an
>>> easy
> way to split logs up while still getting a global feed of log
>> messages.
> Maybe logs from one app are really critical and we want to retry, but
 from
> another app are just a nice to have.
> 
> There are other examples 

New subscriber

2017-02-24 Thread Huimin He
My name is Huimin He. I want to subscribe to the mail list.

Thanks,
Huimin He


[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884000#comment-15884000
 ] 

Guozhang Wang commented on KAFKA-4798:
--

Thanks [~ewencp], and I agree we should have a refactoring of the the 
kafka-site with all the js script / template stuff.

BTW I noticed there is a font / style difference between 0102 and the older 
versions, and we no loner have the {{resources}} folder, replaced with a 
{{script.js}}. Do you know what gets changed?

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
> Attachments: Screen Shot 2017-02-24 at 12.02.29 PM.png
>
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-24 Thread Guozhang Wang
Correction: Non-Binding +0 count is 2.

On Fri, Feb 24, 2017 at 7:44 PM, Guozhang Wang  wrote:

> Hello all,
>
> Thanks a billion for your votes and comments. We have collected the
> following votes so far (+1 from myself as well):
>
> Binding +1: 9
>
>   Gwen Shapira
>   Jay Kreps
>   Becket Qin
>   Jun Rao
>   Ismael Juma
>   Jason Gustafson
>   Sriram Subramanian
>   Joel Koshy
>   Guozhang Wang
>
> Binding -1: 0
>
> Non-Binding +1: 4
>
>   Eno Thereska
>   Matthias J. Sax
>   Bill Bejeck
>   Rajini Sivaram
>
> Non Binding -1: 0
>
> Non binding +0: 0
>
>   Tom Crayford
>   Michael Pearce
>
>
> Based on the above result, I am now closing this voting thread as accepted.
>
> Thanks again for your comments.
>
> Guozhang
>
>
> On Fri, Feb 24, 2017 at 11:50 AM, Sriram Subramanian 
> wrote:
>
>> +1. Great work in driving this to a consensus. Lots of good constructive
>> conversations.
>>
>> On Fri, Feb 24, 2017 at 11:48 AM, Jason Gustafson 
>> wrote:
>>
>> > +1 from me (duh). Thanks to all the reviewers. The KIP has been much
>> > improved because of it!
>> >
>> > -Jason
>> >
>> > On Wed, Feb 22, 2017 at 8:48 AM, Ismael Juma  wrote:
>> >
>> > > Great work on the proposal and iterating on it based on community
>> > feedback.
>> > > As Jun (and others) said, it's likely that minor changes will happen
>> as
>> > the
>> > > PR is reviewed and additional testing takes place since this is a
>> > > significant change.
>> > >
>> > > I am +1 (binding) on the proposal without optional keys and values to
>> > keep
>> > > things consistent. If we show during performance testing that this is
>> > > worthwhile, we can update the proposal.
>> > >
>> > > Ismael
>> > >
>> > > On Tue, Feb 21, 2017 at 6:23 PM, Jun Rao  wrote:
>> > >
>> > > > It seems that it's simpler and more consistent to avoid optional
>> keys
>> > and
>> > > > values. Not sure if it's worth squeezing every byte at the expense
>> of
>> > > > additional complexity. Other than that, +1 from me.
>> > > >
>> > > > Also, since this is a large KIP, minor changes may arise as we start
>> > the
>> > > > implementation. It would be good if we can keep the community
>> posted of
>> > > > those changes, if any.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Tue, Feb 21, 2017 at 4:33 PM, Michael Pearce <
>> michael.pea...@ig.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > If the argument and objective within this KIP is to keep the
>> overhead
>> > > of
>> > > > > the protocol as small as possible and remove redundancy, and every
>> > byte
>> > > > is
>> > > > > being counted and the introduction of varInts, then it would make
>> > sense
>> > > > to
>> > > > > use attributes to me.
>> > > > >
>> > > > >
>> > > > > On 22/02/2017, 00:14, "Jason Gustafson" 
>> wrote:
>> > > > >
>> > > > > Done. I've left the key and value as optional since we may not
>> > have
>> > > > > reached
>> > > > > consensus on whether to use attributes or not. Perhaps we
>> should
>> > > just
>> > > > > keep
>> > > > > it simple and not do it? The benefit seems small.
>> > > > >
>> > > > > -Jason
>> > > > >
>> > > > > On Tue, Feb 21, 2017 at 4:05 PM, Michael Pearce <
>> > > > michael.pea...@ig.com
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Ok, no worries, can you add it back ValueLen on this KIP,
>> and
>> > > > update
>> > > > > the
>> > > > > > doc, then we can work from that ☺
>> > > > > >
>> > > > > > Cheers
>> > > > > > Mike
>> > > > > >
>> > > > > > On 22/02/2017, 00:02, "Jason Gustafson" > >
>> > > > wrote:
>> > > > > >
>> > > > > > I feel it was a little odd to leave out the value length
>> > > > anyway,
>> > > > > so I
>> > > > > > would
>> > > > > > rather add it back and put headers at the end. This is
>> more
>> > > > > consistent
>> > > > > > with
>> > > > > > the rest of the Kafka protocol.
>> > > > > >
>> > > > > > -Jason
>> > > > > >
>> > > > > > On Tue, Feb 21, 2017 at 3:58 PM, Michael Pearce <
>> > > > > michael.pea...@ig.com
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Or we keep as is (valuelen removed), and headers are
>> > added
>> > > > with
>> > > > > > headers
>> > > > > > > length..
>> > > > > > >
>> > > > > > > On 21/02/2017, 23:38, "Apurva Mehta" <
>> > apu...@confluent.io>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > Right now, we don't need the value length: since
>> it
>> > is
>> > > > the
>> > > > > last
>> > > > > > item
>> > > > > > > in the
>> > > > > > > message, and we have the message length, we can
>> > deduce
>> > > > the
>> > > > > value
>> > > > > > > length.
>> > > > > > > However, if we are adding record headers to the
>> end,
>> > we
>> > > 

Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-24 Thread Guozhang Wang
Hello all,

Thanks a billion for your votes and comments. We have collected the
following votes so far (+1 from myself as well):

Binding +1: 9

  Gwen Shapira
  Jay Kreps
  Becket Qin
  Jun Rao
  Ismael Juma
  Jason Gustafson
  Sriram Subramanian
  Joel Koshy
  Guozhang Wang

Binding -1: 0

Non-Binding +1: 4

  Eno Thereska
  Matthias J. Sax
  Bill Bejeck
  Rajini Sivaram

Non Binding -1: 0

Non binding +0: 0

  Tom Crayford
  Michael Pearce


Based on the above result, I am now closing this voting thread as accepted.

Thanks again for your comments.

Guozhang


On Fri, Feb 24, 2017 at 11:50 AM, Sriram Subramanian 
wrote:

> +1. Great work in driving this to a consensus. Lots of good constructive
> conversations.
>
> On Fri, Feb 24, 2017 at 11:48 AM, Jason Gustafson 
> wrote:
>
> > +1 from me (duh). Thanks to all the reviewers. The KIP has been much
> > improved because of it!
> >
> > -Jason
> >
> > On Wed, Feb 22, 2017 at 8:48 AM, Ismael Juma  wrote:
> >
> > > Great work on the proposal and iterating on it based on community
> > feedback.
> > > As Jun (and others) said, it's likely that minor changes will happen as
> > the
> > > PR is reviewed and additional testing takes place since this is a
> > > significant change.
> > >
> > > I am +1 (binding) on the proposal without optional keys and values to
> > keep
> > > things consistent. If we show during performance testing that this is
> > > worthwhile, we can update the proposal.
> > >
> > > Ismael
> > >
> > > On Tue, Feb 21, 2017 at 6:23 PM, Jun Rao  wrote:
> > >
> > > > It seems that it's simpler and more consistent to avoid optional keys
> > and
> > > > values. Not sure if it's worth squeezing every byte at the expense of
> > > > additional complexity. Other than that, +1 from me.
> > > >
> > > > Also, since this is a large KIP, minor changes may arise as we start
> > the
> > > > implementation. It would be good if we can keep the community posted
> of
> > > > those changes, if any.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Feb 21, 2017 at 4:33 PM, Michael Pearce <
> michael.pea...@ig.com
> > >
> > > > wrote:
> > > >
> > > > > If the argument and objective within this KIP is to keep the
> overhead
> > > of
> > > > > the protocol as small as possible and remove redundancy, and every
> > byte
> > > > is
> > > > > being counted and the introduction of varInts, then it would make
> > sense
> > > > to
> > > > > use attributes to me.
> > > > >
> > > > >
> > > > > On 22/02/2017, 00:14, "Jason Gustafson" 
> wrote:
> > > > >
> > > > > Done. I've left the key and value as optional since we may not
> > have
> > > > > reached
> > > > > consensus on whether to use attributes or not. Perhaps we
> should
> > > just
> > > > > keep
> > > > > it simple and not do it? The benefit seems small.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Tue, Feb 21, 2017 at 4:05 PM, Michael Pearce <
> > > > michael.pea...@ig.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Ok, no worries, can you add it back ValueLen on this KIP, and
> > > > update
> > > > > the
> > > > > > doc, then we can work from that ☺
> > > > > >
> > > > > > Cheers
> > > > > > Mike
> > > > > >
> > > > > > On 22/02/2017, 00:02, "Jason Gustafson" 
> > > > wrote:
> > > > > >
> > > > > > I feel it was a little odd to leave out the value length
> > > > anyway,
> > > > > so I
> > > > > > would
> > > > > > rather add it back and put headers at the end. This is
> more
> > > > > consistent
> > > > > > with
> > > > > > the rest of the Kafka protocol.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Tue, Feb 21, 2017 at 3:58 PM, Michael Pearce <
> > > > > michael.pea...@ig.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Or we keep as is (valuelen removed), and headers are
> > added
> > > > with
> > > > > > headers
> > > > > > > length..
> > > > > > >
> > > > > > > On 21/02/2017, 23:38, "Apurva Mehta" <
> > apu...@confluent.io>
> > > > > wrote:
> > > > > > >
> > > > > > > Right now, we don't need the value length: since it
> > is
> > > > the
> > > > > last
> > > > > > item
> > > > > > > in the
> > > > > > > message, and we have the message length, we can
> > deduce
> > > > the
> > > > > value
> > > > > > > length.
> > > > > > > However, if we are adding record headers to the
> end,
> > we
> > > > > would
> > > > > > need to
> > > > > > > introduce the value length along with that change.
> > > > > > >
> > > > > > > On Tue, Feb 21, 2017 at 3:34 PM, Michael Pearce <
> > > > > > michael.pea...@ig.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > 

Jenkins build is back to normal : kafka-trunk-jdk8 #1302

2017-02-24 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883866#comment-15883866
 ] 

Ewen Cheslack-Postava commented on KAFKA-4798:
--

[~wushujames] I fixed the version on the website repo to use the 0.10.2.0 
released artifacts for Javadocs so they should match. Unfortunately Apache's 
syncing of that update to GitHub and to the deployed website seems to be taking 
some time. Could you close this up once you've verified the change (hopefully 
no more than a couple of hours wait)?

[~guozhang] 2 things. First, I noticed that the artifacts in the release svn 
repo don't quite match those in the dev one that I copied into place. There are 
a bunch of extra files that include query parameters. I'm guessing you did 
something like wget them instead of checking out both repos and copying the 
files directly. For Apache purposes, we might want to clean this up (I'm also 
not sure what those extra files will do to the website; although I think they 
are benign, I have excluded them from the update).

2nd, I think unlike the main docs, javadocs updates in between releases will 
need some extra process to avoid the SNAPSHOT stuff [~wushujames] pointed out. 
I noticed this when generating the RCs as well, so perhaps we can automate it a 
bit with the same script to ensure we don't encounter that problem again (as 
well as to do things like standardize on which jdk's javadoc tools will be used 
to generate the docs so they don't switch back and forth between styles 
depending on who generated them).

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
> Attachments: Screen Shot 2017-02-24 at 12.02.29 PM.png
>
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2585: MINOR: Fix potential integer overflow and String.f...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-24 Thread Joel Koshy
+1

Apurva just updated the wiki after discussion on the doc. These aren't
major so if anyone needs a summary just view the recent edits on the wiki
or comments in the google doc (although I think you need to request
permissions for that).

Thanks,

Joel

On Fri, Feb 24, 2017 at 11:50 AM, Sriram Subramanian 
wrote:

> +1. Great work in driving this to a consensus. Lots of good constructive
> conversations.
>
> On Fri, Feb 24, 2017 at 11:48 AM, Jason Gustafson 
> wrote:
>
> > +1 from me (duh). Thanks to all the reviewers. The KIP has been much
> > improved because of it!
> >
> > -Jason
> >
> > On Wed, Feb 22, 2017 at 8:48 AM, Ismael Juma  wrote:
> >
> > > Great work on the proposal and iterating on it based on community
> > feedback.
> > > As Jun (and others) said, it's likely that minor changes will happen as
> > the
> > > PR is reviewed and additional testing takes place since this is a
> > > significant change.
> > >
> > > I am +1 (binding) on the proposal without optional keys and values to
> > keep
> > > things consistent. If we show during performance testing that this is
> > > worthwhile, we can update the proposal.
> > >
> > > Ismael
> > >
> > > On Tue, Feb 21, 2017 at 6:23 PM, Jun Rao  wrote:
> > >
> > > > It seems that it's simpler and more consistent to avoid optional keys
> > and
> > > > values. Not sure if it's worth squeezing every byte at the expense of
> > > > additional complexity. Other than that, +1 from me.
> > > >
> > > > Also, since this is a large KIP, minor changes may arise as we start
> > the
> > > > implementation. It would be good if we can keep the community posted
> of
> > > > those changes, if any.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Feb 21, 2017 at 4:33 PM, Michael Pearce <
> michael.pea...@ig.com
> > >
> > > > wrote:
> > > >
> > > > > If the argument and objective within this KIP is to keep the
> overhead
> > > of
> > > > > the protocol as small as possible and remove redundancy, and every
> > byte
> > > > is
> > > > > being counted and the introduction of varInts, then it would make
> > sense
> > > > to
> > > > > use attributes to me.
> > > > >
> > > > >
> > > > > On 22/02/2017, 00:14, "Jason Gustafson" 
> wrote:
> > > > >
> > > > > Done. I've left the key and value as optional since we may not
> > have
> > > > > reached
> > > > > consensus on whether to use attributes or not. Perhaps we
> should
> > > just
> > > > > keep
> > > > > it simple and not do it? The benefit seems small.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Tue, Feb 21, 2017 at 4:05 PM, Michael Pearce <
> > > > michael.pea...@ig.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Ok, no worries, can you add it back ValueLen on this KIP, and
> > > > update
> > > > > the
> > > > > > doc, then we can work from that ☺
> > > > > >
> > > > > > Cheers
> > > > > > Mike
> > > > > >
> > > > > > On 22/02/2017, 00:02, "Jason Gustafson" 
> > > > wrote:
> > > > > >
> > > > > > I feel it was a little odd to leave out the value length
> > > > anyway,
> > > > > so I
> > > > > > would
> > > > > > rather add it back and put headers at the end. This is
> more
> > > > > consistent
> > > > > > with
> > > > > > the rest of the Kafka protocol.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Tue, Feb 21, 2017 at 3:58 PM, Michael Pearce <
> > > > > michael.pea...@ig.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Or we keep as is (valuelen removed), and headers are
> > added
> > > > with
> > > > > > headers
> > > > > > > length..
> > > > > > >
> > > > > > > On 21/02/2017, 23:38, "Apurva Mehta" <
> > apu...@confluent.io>
> > > > > wrote:
> > > > > > >
> > > > > > > Right now, we don't need the value length: since it
> > is
> > > > the
> > > > > last
> > > > > > item
> > > > > > > in the
> > > > > > > message, and we have the message length, we can
> > deduce
> > > > the
> > > > > value
> > > > > > > length.
> > > > > > > However, if we are adding record headers to the
> end,
> > we
> > > > > would
> > > > > > need to
> > > > > > > introduce the value length along with that change.
> > > > > > >
> > > > > > > On Tue, Feb 21, 2017 at 3:34 PM, Michael Pearce <
> > > > > > michael.pea...@ig.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It seems I cannot add comment on the doc.
> > > > > > > >
> > > > > > > > In the section around the message protocol.
> > > > > > > >
> > > > > > > > It has stated:
> > > > > > > >
> > > > > > > > Message =>
> > > > > > 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
KIP updated in response to the below comments:

   > 1. Is the intent of `Headers.filter` to include or exclude the headers
> matching the key? Can you add a javadoc to clarify?
> 2. The KIP mentions that we will introduce V4 of FetchRequest and V4 
of
> ProduceRequest. Can you change this to say that the changes will
> piggyback
> onto V3 of ProduceRequest and V4 of FetchRequest which were introduced
> in
> KIP-98?




On 24/02/2017, 23:20, "Michael Pearce"  wrote:

We’re trying to make an eco-system for people to be able to use headers, I 
think we want to ensure some least common features are supported and not 
limited.


Some examples we have already.

On consume interceptors a security interceptor may need to take the current 
header, decrypt the data and replace the token with the next token for the next 
processing, in case of a single decryption token being one time use only.

On produce it could be the interceptors add some values in the clear from 
the systems that supply them, but later a security header interceptor needs to 
encrypt some headers, as such needs to replace the current value with new one.

I note Radai already requested this in the thread, I assume he has some use 
case also. S

Simple add / remove is a least common feature.

Rgds,
Mike


On 24/02/2017, 23:00, "Jason Gustafson"  wrote:

Hey Michael,

I'm not strongly opposed to them; I just don't see a lot of benefit. One
thing it would be good to understand is why a consumer interceptor would
need to add headers and why a producer interceptor would need to remove
them. Maybe we only need the common cases?

Thanks,
Jason

On Fri, Feb 24, 2017 at 2:22 PM, Michael Pearce 
wrote:

> Hi Jason,
>
> Sorry I thought this was the agreed compromise to provide an api that
> avoid boiler plate in return for immutabilty.
>
> If not then mutability will be needed as a goal is to have a single 
clean
> method call to append/remove a header.
>
> Cheers
> Mike
>
> On 24/02/2017, 22:15, "Jason Gustafson"  wrote:
>
> Hey Michael,
>
> I didn't actually comment on the new methods for ProducerRecord 
and
> ConsumerRecord. If they only save some boilerplate, I'd just as 
well
> not
> have them.
>
> Also a couple minor comments:
>
> 1. Is the intent of `Headers.filter` to include or exclude the 
headers
> matching the key? Can you add a javadoc to clarify?
> 2. The KIP mentions that we will introduce V4 of FetchRequest and 
V4 of
> ProduceRequest. Can you change this to say that the changes will
> piggyback
> onto V3 of ProduceRequest and V4 of FetchRequest which were 
introduced
> in
> KIP-98?
>
> The rest of the KIP looks good to me.
>
> -Jason
>
> On Fri, Feb 24, 2017 at 12:46 PM, Michael Pearce <
> michael.pea...@ig.com>
> wrote:
>
> > I’ve added the methods on the ProducerRecord that will return a 
new
> > instance of ProducerRecord with modified headers.
> >
> > On 24/02/2017, 19:22, "Michael Pearce" 
> wrote:
> >
> > Pattern.compile is expensive, and even if cached 
String.equals is
> > faster than matched. also if we end up with an internal map in
> future for
> > performance it will be easier to be by key.
> >
> > As all that's needed is to get header by key.
> >
> > With like the other arguements of let's implement simple and
> then we
> > can always add pattern later as well if it's found it's needed. 
(As
> noted
> > it's easier to add methods than to take away)
> >
> > Great I'll update kip with extra methods on producerecord 
and a
> note
> > that new objects are returned by method calls.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Jason Gustafson 
> > Sent: Friday, February 24, 2017 6:51:45 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > The APIs in the current KIP look good to me. Just a couple
> questions:

[GitHub] kafka pull request #2596: MINOR: Ensure consumer calls poll() if requests ar...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2596


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Becket Qin
Thanks for the KIP Jorge. I think this is a useful KIP. I haven't read the
KIP in detail yet, some comments from a quick review:

1. A glance at it it seems that there is no delete option. At LinkedIn we
identified some cases that users want to delete the committed offset of a
group. It would be good to include that as well.

2. It seems the KIP is missing some necessary implementation key points.
e.g. how would the tool to commit offsets for a consumer group, does the
broker need to know this is a special tool instead of an active consumer in
the group (the generation check will be made on offset commit)? They are
probably in your proof of concept code. Could you add them to the wiki as
well?

Thanks,

Jiangjie (Becket) Qin

On Fri, Feb 24, 2017 at 1:19 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Thanks Jorge for addressing my question/suggestion.
>
> One last thing. I noticed is that in the example you have for the "plan"
> option
> (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 122%3A+Add+Reset+Consumer+Group+Offsets+tooling#KIP-122:
> AddResetConsumerGroupOffsetstooling-ExecutionOptions
> )
> under "Description" column, you put 0 for lag. So I assume that is the
> current lag being reported, and not the new lag. Might be helpful to
> explicitly specify that (i.e. CURRENT-LAG) in the column header.
> The other option is to report both current and new lags, but I understand
> if we don't want to do that since it's rather redundant info.
>
> Thanks again.
> --Vahid
>
>
>
> From:   Jorge Esteban Quilcate Otoya 
> To: dev@kafka.apache.org
> Date:   02/24/2017 12:47 PM
> Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
>
>
>
> Hi Vahid,
>
> Thanks for your comments. Check my answers below:
>
> El vie., 24 feb. 2017 a las 19:41, Vahid S Hashemian (<
> vahidhashem...@us.ibm.com>) escribió:
>
> > Hi Jorge,
> >
> > Thanks for the useful KIP.
> >
> > I have a question regarding the proposed "plan" option.
> > The "current offset" and "lag" values of a topic partition are
> meaningful
> > within a consumer group. In other words, different consumer groups could
> > have different values for these properties of each topic partition.
> > I don't see that reflected in the discussion around the "plan" option.
> > Unless we are assuming a "--group" option is also provided by user
> (which
> > is not clear from the KIP if that is the case).
> >
>
> I have added an additional comment to state that this options will require
> a "group" argument.
> It is considered to affect only one Consumer Group.
>
>
> >
> > Also, I was wondering if you can provide at least one full command
> example
> > for each of the "plan", "execute", and "export" options. They would
> > definitely help in understanding some of the details.
> >
> >
> Added to the KIP.
>
>
> > Sorry for the delayed question/suggestion. I hope they make sense.
> >
> > Thanks.
> > --Vahid
> >
> >
> >
> > From:   Jorge Esteban Quilcate Otoya 
> > To: dev@kafka.apache.org
> > Date:   02/24/2017 09:51 AM
> > Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
> >
> >
> >
> > Great! KIP updated.
> >
> >
> >
> > El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax
> > ()
> > escribió:
> >
> > > I like this!
> > >
> > > --by-duration and --shift-by
> > >
> > >
> > > -Matthias
> > >
> > > On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > > > Renaming to --by-duration LGTM
> > > >
> > > > Not sure about changing it to --shift-by-duration because we could
> end
> > up
> > > > with the same redundancy as before with reset: --reset-offsets
> > > > --reset-to-*.
> > > >
> > > > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> > > consistent
> > > > enough?
> > > >
> > > >
> > > > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> > > matth...@confluent.io>)
> > > > escribió:
> > > >
> > > >> I just read the update KIP once more.
> > > >>
> > > >> I would suggest to rename --to-duration to --by-duration
> > > >>
> > > >> Or as a second idea, rename --to-duration to --shift-by-duration
> and
> > at
> > > >> the same time rename --shift-offset-by to --shift-by-offset
> > > >>
> > > >> Not sure what the best option is, but naming would be more
> consistent
> > > IMHO.
> > > >>
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> > > >>> Hi All,
> > > >>>
> > > >>> If there are no more concerns, I'd like to start vote for this
> KIP.
> > > >>>
> > > >>> Thanks!
> > > >>> Jorge.
> > > >>>
> > > >>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> > > >>> quilcate.jo...@gmail.com>) escribió:
> > > >>>
> > >  Oh ok :)
> > > 
> > >  So, we can keep `--topic t1:1,2,3`
> > > 
> > >  I think with this one we have most of the feedback applied. I
> will
> > > >> update
> > >  the KIP with this change.
> > > 

[GitHub] kafka pull request #2579: MINOR: Make it impossible to invoke `Request.body`...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2579


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
We’re trying to make an eco-system for people to be able to use headers, I 
think we want to ensure some least common features are supported and not 
limited.


Some examples we have already.

On consume interceptors a security interceptor may need to take the current 
header, decrypt the data and replace the token with the next token for the next 
processing, in case of a single decryption token being one time use only.

On produce it could be the interceptors add some values in the clear from the 
systems that supply them, but later a security header interceptor needs to 
encrypt some headers, as such needs to replace the current value with new one.

I note Radai already requested this in the thread, I assume he has some use 
case also. S

Simple add / remove is a least common feature.

Rgds,
Mike


On 24/02/2017, 23:00, "Jason Gustafson"  wrote:

Hey Michael,

I'm not strongly opposed to them; I just don't see a lot of benefit. One
thing it would be good to understand is why a consumer interceptor would
need to add headers and why a producer interceptor would need to remove
them. Maybe we only need the common cases?

Thanks,
Jason

On Fri, Feb 24, 2017 at 2:22 PM, Michael Pearce 
wrote:

> Hi Jason,
>
> Sorry I thought this was the agreed compromise to provide an api that
> avoid boiler plate in return for immutabilty.
>
> If not then mutability will be needed as a goal is to have a single clean
> method call to append/remove a header.
>
> Cheers
> Mike
>
> On 24/02/2017, 22:15, "Jason Gustafson"  wrote:
>
> Hey Michael,
>
> I didn't actually comment on the new methods for ProducerRecord and
> ConsumerRecord. If they only save some boilerplate, I'd just as well
> not
> have them.
>
> Also a couple minor comments:
>
> 1. Is the intent of `Headers.filter` to include or exclude the headers
> matching the key? Can you add a javadoc to clarify?
> 2. The KIP mentions that we will introduce V4 of FetchRequest and V4 
of
> ProduceRequest. Can you change this to say that the changes will
> piggyback
> onto V3 of ProduceRequest and V4 of FetchRequest which were introduced
> in
> KIP-98?
>
> The rest of the KIP looks good to me.
>
> -Jason
>
> On Fri, Feb 24, 2017 at 12:46 PM, Michael Pearce <
> michael.pea...@ig.com>
> wrote:
>
> > I’ve added the methods on the ProducerRecord that will return a new
> > instance of ProducerRecord with modified headers.
> >
> > On 24/02/2017, 19:22, "Michael Pearce" 
> wrote:
> >
> > Pattern.compile is expensive, and even if cached String.equals 
is
> > faster than matched. also if we end up with an internal map in
> future for
> > performance it will be easier to be by key.
> >
> > As all that's needed is to get header by key.
> >
> > With like the other arguements of let's implement simple and
> then we
> > can always add pattern later as well if it's found it's needed. (As
> noted
> > it's easier to add methods than to take away)
> >
> > Great I'll update kip with extra methods on producerecord and a
> note
> > that new objects are returned by method calls.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Jason Gustafson 
> > Sent: Friday, February 24, 2017 6:51:45 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > The APIs in the current KIP look good to me. Just a couple
> questions:
> > why
> > does filter not return Headers? Also would it be useful if the
> key is a
> > regex?
> >
> > On the point of immutability.. One option might be to use a
> mutable
> > object
> > only when passing the headers through the interceptor chain. I
> think as
> > long as we resort to mutability only when clear performance
> results
> > show
> > that it is worthwhile, I am satisfied. As Ismael noted, for
> common
> > scenarios it is possible to get reasonable performance with
> immutable
> > objects.
> >
> > -Jason
> >
> > On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce <
> michael.pea...@ig.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > On 1,  How can you guarantee two separate implemented clients
> would
  

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-24 Thread Becket Qin
Hi Jay,

Yeah, I got your point.

I think there might be a solution which do not require adding a new
configuration. We can start from a very conservative compression ratio say
1.0 and lower it very slowly according to the actual compression ratio
until we hit a point that we have to split a batch. At that point, we
exponentially back off on the compression ratio. The idea is somewhat like
TCP. This should help avoid frequent split.

The upper bound of the batch size is also a little awkward today because we
say the batch size is based on compressed size, but users cannot set it to
the max message size because that will result in oversized messages. With
this change we will be able to allow the users to set the message size to
close to max message size.

However the downside is that there could be latency spikes in the system in
this case due to the splitting, especially when there are many messages
need to be split at the same time. That could potentially be an issue for
some users.

What do you think about this approach?

Thanks,

Jiangjie (Becket) Qin



On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps  wrote:

> Hey Becket,
>
> Yeah that makes sense.
>
> I agree that you'd really have to both fix the estimation (i.e. make it per
> topic or make it better estimate the high percentiles) AND have the
> recovery mechanism. If you are underestimating often and then paying a high
> recovery price that won't fly.
>
> I think you take my main point though, which is just that I hate to exposes
> these super low level options to users because it is so hard to explain to
> people what it means and how they should set it. So if it is possible to
> make either some combination of better estimation and splitting or better
> tolerance of overage that would be preferrable.
>
> -Jay
>
> On Thu, Feb 23, 2017 at 11:51 AM, Becket Qin  wrote:
>
> > @Dong,
> >
> > Thanks for the comments. The default behavior of the producer won't
> change.
> > If the users want to use the uncompressed message size, they probably
> will
> > also bump up the batch size to somewhere close to the max message size.
> > This would be in the document. BTW the default batch size is 16K which is
> > pretty small.
> >
> > @Jay,
> >
> > Yeah, we actually had debated quite a bit internally what is the best
> > solution to this.
> >
> > I completely agree it is a bug. In practice we usually leave some
> headroom
> > to allow the compressed size to grow a little if the the original
> messages
> > are not compressible, for example, 1000 KB instead of exactly 1 MB. It is
> > likely safe enough.
> >
> > The major concern for the rejected alternative is performance. It largely
> > depends on how frequent we need to split a batch, i.e. how likely the
> > estimation can go off. If we only need to the split work occasionally,
> the
> > cost would be amortized so we don't need to worry about it too much.
> > However, it looks that for a producer with shared topics, the estimation
> is
> > always off. As an example, consider two topics, one with compression
> ratio
> > 0.6 the other 0.2, assuming exactly same traffic, the average compression
> > ratio would be roughly 0.4, which is not right for either of the topics.
> So
> > almost half of the batches (of the topics with 0.6 compression ratio)
> will
> > end up larger than the configured batch size. When it comes to more
> topics
> > such as mirror maker, this becomes more unpredictable. To avoid frequent
> > rejection / split of the batches, we need to configured the batch size
> > pretty conservatively. This could actually hurt the performance because
> we
> > are shoehorn the messages that are highly compressible to a small batch
> so
> > that the other topics that are not that compressible will not become too
> > large with the same batch size. At LinkedIn, our batch size is configured
> > to 64 KB because of this. I think we may actually have better batching if
> > we just use the uncompressed message size and 800 KB batch size.
> >
> > We did not think about loosening the message size restriction, but that
> > sounds a viable solution given that the consumer now can fetch oversized
> > messages. One concern would be that on the broker side oversized messages
> > will bring more memory pressure. With KIP-92, we may mitigate that, but
> the
> > memory allocation for large messages may not be very GC friendly. I need
> to
> > think about this a little more.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Wed, Feb 22, 2017 at 8:57 PM, Jay Kreps  wrote:
> >
> > > Hey Becket,
> > >
> > > I get the problem we want to solve with this, but I don't think this is
> > > something that makes sense as a user controlled knob that everyone
> > sending
> > > data to kafka has to think about. It is basically a bug, right?
> > >
> > > First, as a technical question is it true that using the uncompressed
> > size
> > > for batching actually guarantees that 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Jason Gustafson
Hey Michael,

I'm not strongly opposed to them; I just don't see a lot of benefit. One
thing it would be good to understand is why a consumer interceptor would
need to add headers and why a producer interceptor would need to remove
them. Maybe we only need the common cases?

Thanks,
Jason

On Fri, Feb 24, 2017 at 2:22 PM, Michael Pearce 
wrote:

> Hi Jason,
>
> Sorry I thought this was the agreed compromise to provide an api that
> avoid boiler plate in return for immutabilty.
>
> If not then mutability will be needed as a goal is to have a single clean
> method call to append/remove a header.
>
> Cheers
> Mike
>
> On 24/02/2017, 22:15, "Jason Gustafson"  wrote:
>
> Hey Michael,
>
> I didn't actually comment on the new methods for ProducerRecord and
> ConsumerRecord. If they only save some boilerplate, I'd just as well
> not
> have them.
>
> Also a couple minor comments:
>
> 1. Is the intent of `Headers.filter` to include or exclude the headers
> matching the key? Can you add a javadoc to clarify?
> 2. The KIP mentions that we will introduce V4 of FetchRequest and V4 of
> ProduceRequest. Can you change this to say that the changes will
> piggyback
> onto V3 of ProduceRequest and V4 of FetchRequest which were introduced
> in
> KIP-98?
>
> The rest of the KIP looks good to me.
>
> -Jason
>
> On Fri, Feb 24, 2017 at 12:46 PM, Michael Pearce <
> michael.pea...@ig.com>
> wrote:
>
> > I’ve added the methods on the ProducerRecord that will return a new
> > instance of ProducerRecord with modified headers.
> >
> > On 24/02/2017, 19:22, "Michael Pearce" 
> wrote:
> >
> > Pattern.compile is expensive, and even if cached String.equals is
> > faster than matched. also if we end up with an internal map in
> future for
> > performance it will be easier to be by key.
> >
> > As all that's needed is to get header by key.
> >
> > With like the other arguements of let's implement simple and
> then we
> > can always add pattern later as well if it's found it's needed. (As
> noted
> > it's easier to add methods than to take away)
> >
> > Great I'll update kip with extra methods on producerecord and a
> note
> > that new objects are returned by method calls.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Jason Gustafson 
> > Sent: Friday, February 24, 2017 6:51:45 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > The APIs in the current KIP look good to me. Just a couple
> questions:
> > why
> > does filter not return Headers? Also would it be useful if the
> key is a
> > regex?
> >
> > On the point of immutability.. One option might be to use a
> mutable
> > object
> > only when passing the headers through the interceptor chain. I
> think as
> > long as we resort to mutability only when clear performance
> results
> > show
> > that it is worthwhile, I am satisfied. As Ismael noted, for
> common
> > scenarios it is possible to get reasonable performance with
> immutable
> > objects.
> >
> > -Jason
> >
> > On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce <
> michael.pea...@ig.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > On 1,  How can you guarantee two separate implemented clients
> would
> > add
> > > the headers in the same order we are not specifying an order
> at the
> > > protocol level  (nor should we) with regards to keyA being
> ordered
> > before
> > > keyB? We shouldn’t be expecting keyA to be always set before
> keyB.
> > >
> > > On 2, I believe we have changed the naming based on feedback
> from
> > Jason
> > > already, e.g. we don’t have “get” method that inferred O(1)
> > performance,
> > > like wise nor “put” but we have an “append”
> > >
> > > On 3, in the KafkaProducer, I think we have mutability
> already, the
> > value
> > > for time is changed if it is null, at the point of send:
> > > “
> > > long timestamp = record.timestamp() == null ?
> > > time.milliseconds() : record.timestamp();
> > > “
> > >
> > > As such the timestamp is already mutable, so what’s the
> difference
> > here,
> > > we already have some mixed semantics. On timestamp.
> > > e.g. currently if I send to records with timestamp not set,
> the wire
> > > binary sent the value for the timestamp would be different, as
> such
> > we have
> > > mutation for the same record.
> > >
> > > On 4, I think we 

[GitHub] kafka pull request #2596: MINOR: Ensure consumer calls poll() if requests ar...

2017-02-24 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2596

MINOR: Ensure consumer calls poll() if requests are outstanding



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka 
ensure-poll-with-inflight-requests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2596


commit d1fa6f272a182651587bd3c1393c9ea18d0c1d33
Author: Jason Gustafson 
Date:   2017-02-24T22:44:49Z

MINOR: Ensure consumer calls poll() if requests are outstanding




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2582: MINOR: Fixed Non-Final Close Method + its Duplicat...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883632#comment-15883632
 ] 

James Cheng commented on KAFKA-4798:


And [~guozhang]. See my comment about the SNAPSHOT thing.

If that is fine, then let me know and I will close this JIRA.


> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
> Attachments: Screen Shot 2017-02-24 at 12.02.29 PM.png
>
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3995) Add a new configuration "enable.compression.ratio.estimation" to the producer config

2017-02-24 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3995:

Summary: Add a new configuration "enable.compression.ratio.estimation" to 
the producer config  (was: Add a new configuration 
"enable.comrpession.ratio.estimation" to the producer config)

> Add a new configuration "enable.compression.ratio.estimation" to the producer 
> config
> 
>
> Key: KAFKA-3995
> URL: https://issues.apache.org/jira/browse/KAFKA-3995
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>
> We recently see a few cases where RecordTooLargeException is thrown because 
> the compressed message sent by KafkaProducer exceeded the max message size.
> The root cause of this issue is because the compressor is estimating the 
> batch size using an estimated compression ratio based on heuristic 
> compression ratio statistics. This does not quite work for the traffic with 
> highly variable compression ratios. 
> For example, if the batch size is set to 1MB and the max message size is 1MB. 
> Initially a the producer is sending messages (each message is 1MB) to topic_1 
> whose data can be compressed to 1/10 of the original size. After a while the 
> estimated compression ratio in the compressor will be trained to 1/10 and the 
> producer would put 10 messages into one batch. Now the producer starts to 
> send messages (each message is also 1MB) to topic_2 whose message can only be 
> compress to 1/5 of the original size. The producer would still use 1/10 as 
> the estimated compression ratio and put 10 messages into a batch. That batch 
> would be 2 MB after compression which exceeds the maximum message size. In 
> this case the user do not have many options other than resend everything or 
> close the producer if they care about ordering.
> This is especially an issue for services like MirrorMaker whose producer is 
> shared by many different topics.
> To solve this issue, we can probably add a configuration 
> "enable.compression.ratio.estimation" to the producer. So when this 
> configuration is set to false, we stop estimating the compressed size but 
> will close the batch once the uncompressed bytes in the batch reaches the 
> batch size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
Hi Jason,

Sorry I thought this was the agreed compromise to provide an api that avoid 
boiler plate in return for immutabilty.

If not then mutability will be needed as a goal is to have a single clean 
method call to append/remove a header.

Cheers
Mike

On 24/02/2017, 22:15, "Jason Gustafson"  wrote:

Hey Michael,

I didn't actually comment on the new methods for ProducerRecord and
ConsumerRecord. If they only save some boilerplate, I'd just as well not
have them.

Also a couple minor comments:

1. Is the intent of `Headers.filter` to include or exclude the headers
matching the key? Can you add a javadoc to clarify?
2. The KIP mentions that we will introduce V4 of FetchRequest and V4 of
ProduceRequest. Can you change this to say that the changes will piggyback
onto V3 of ProduceRequest and V4 of FetchRequest which were introduced in
KIP-98?

The rest of the KIP looks good to me.

-Jason

On Fri, Feb 24, 2017 at 12:46 PM, Michael Pearce 
wrote:

> I’ve added the methods on the ProducerRecord that will return a new
> instance of ProducerRecord with modified headers.
>
> On 24/02/2017, 19:22, "Michael Pearce"  wrote:
>
> Pattern.compile is expensive, and even if cached String.equals is
> faster than matched. also if we end up with an internal map in future for
> performance it will be easier to be by key.
>
> As all that's needed is to get header by key.
>
> With like the other arguements of let's implement simple and then we
> can always add pattern later as well if it's found it's needed. (As noted
> it's easier to add methods than to take away)
>
> Great I'll update kip with extra methods on producerecord and a note
> that new objects are returned by method calls.
>
>
>
> Sent using OWA for iPhone
> 
> From: Jason Gustafson 
> Sent: Friday, February 24, 2017 6:51:45 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> The APIs in the current KIP look good to me. Just a couple questions:
> why
> does filter not return Headers? Also would it be useful if the key is 
a
> regex?
>
> On the point of immutability.. One option might be to use a mutable
> object
> only when passing the headers through the interceptor chain. I think 
as
> long as we resort to mutability only when clear performance results
> show
> that it is worthwhile, I am satisfied. As Ismael noted, for common
> scenarios it is possible to get reasonable performance with immutable
> objects.
>
> -Jason
>
> On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce  >
> wrote:
>
> > Hi
> >
> > On 1,  How can you guarantee two separate implemented clients would
> add
> > the headers in the same order we are not specifying an order at the
> > protocol level  (nor should we) with regards to keyA being ordered
> before
> > keyB? We shouldn’t be expecting keyA to be always set before keyB.
> >
> > On 2, I believe we have changed the naming based on feedback from
> Jason
> > already, e.g. we don’t have “get” method that inferred O(1)
> performance,
> > like wise nor “put” but we have an “append”
> >
> > On 3, in the KafkaProducer, I think we have mutability already, the
> value
> > for time is changed if it is null, at the point of send:
> > “
> > long timestamp = record.timestamp() == null ?
> > time.milliseconds() : record.timestamp();
> > “
> >
> > As such the timestamp is already mutable, so what’s the difference
> here,
> > we already have some mixed semantics. On timestamp.
> > e.g. currently if I send to records with timestamp not set, the wire
> > binary sent the value for the timestamp would be different, as such
> we have
> > mutation for the same record.
> >
> > On 4, I think we should not expect not 1 or 2 headers, but infact
> 10’s of
> > headers. This is the concern on immutable headers, whilst the append
> > self-reference works nicely, what if someone needs to remove a
> header?
> >
> > Trying to get this moving:
> >
> > If we really wanted Immutable Headers and essentially you guys wont
> give
> > +1 for it without.
> >
> > Whats the feeling for adding methods to ProducerRecord that does the
> > boiler plate code or creating a new ProducerRecord with the altered
> new
> > headers (appended or 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Jason Gustafson
Hey Michael,

I didn't actually comment on the new methods for ProducerRecord and
ConsumerRecord. If they only save some boilerplate, I'd just as well not
have them.

Also a couple minor comments:

1. Is the intent of `Headers.filter` to include or exclude the headers
matching the key? Can you add a javadoc to clarify?
2. The KIP mentions that we will introduce V4 of FetchRequest and V4 of
ProduceRequest. Can you change this to say that the changes will piggyback
onto V3 of ProduceRequest and V4 of FetchRequest which were introduced in
KIP-98?

The rest of the KIP looks good to me.

-Jason

On Fri, Feb 24, 2017 at 12:46 PM, Michael Pearce 
wrote:

> I’ve added the methods on the ProducerRecord that will return a new
> instance of ProducerRecord with modified headers.
>
> On 24/02/2017, 19:22, "Michael Pearce"  wrote:
>
> Pattern.compile is expensive, and even if cached String.equals is
> faster than matched. also if we end up with an internal map in future for
> performance it will be easier to be by key.
>
> As all that's needed is to get header by key.
>
> With like the other arguements of let's implement simple and then we
> can always add pattern later as well if it's found it's needed. (As noted
> it's easier to add methods than to take away)
>
> Great I'll update kip with extra methods on producerecord and a note
> that new objects are returned by method calls.
>
>
>
> Sent using OWA for iPhone
> 
> From: Jason Gustafson 
> Sent: Friday, February 24, 2017 6:51:45 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> The APIs in the current KIP look good to me. Just a couple questions:
> why
> does filter not return Headers? Also would it be useful if the key is a
> regex?
>
> On the point of immutability.. One option might be to use a mutable
> object
> only when passing the headers through the interceptor chain. I think as
> long as we resort to mutability only when clear performance results
> show
> that it is worthwhile, I am satisfied. As Ismael noted, for common
> scenarios it is possible to get reasonable performance with immutable
> objects.
>
> -Jason
>
> On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce  >
> wrote:
>
> > Hi
> >
> > On 1,  How can you guarantee two separate implemented clients would
> add
> > the headers in the same order we are not specifying an order at the
> > protocol level  (nor should we) with regards to keyA being ordered
> before
> > keyB? We shouldn’t be expecting keyA to be always set before keyB.
> >
> > On 2, I believe we have changed the naming based on feedback from
> Jason
> > already, e.g. we don’t have “get” method that inferred O(1)
> performance,
> > like wise nor “put” but we have an “append”
> >
> > On 3, in the KafkaProducer, I think we have mutability already, the
> value
> > for time is changed if it is null, at the point of send:
> > “
> > long timestamp = record.timestamp() == null ?
> > time.milliseconds() : record.timestamp();
> > “
> >
> > As such the timestamp is already mutable, so what’s the difference
> here,
> > we already have some mixed semantics. On timestamp.
> > e.g. currently if I send to records with timestamp not set, the wire
> > binary sent the value for the timestamp would be different, as such
> we have
> > mutation for the same record.
> >
> > On 4, I think we should not expect not 1 or 2 headers, but infact
> 10’s of
> > headers. This is the concern on immutable headers, whilst the append
> > self-reference works nicely, what if someone needs to remove a
> header?
> >
> > Trying to get this moving:
> >
> > If we really wanted Immutable Headers and essentially you guys wont
> give
> > +1 for it without.
> >
> > Whats the feeling for adding methods to ProducerRecord that does the
> > boiler plate code or creating a new ProducerRecord with the altered
> new
> > headers (appended or removed) inside. E.g.
> >
> > ProducerRecord {
> >
> >
> >  ProducerRecord append(Iterable headersToAppend){
> > return new ProducerRecord(key, value, headers.append(
> headersToAppend),
> > ….)
> >  }
> >
> >  ProducerRecord remove(Iterable headersToAppend){
> > return new ProducerRecord(key, value, headers.remove(
> headersToAppend),
> > ….)
> >  }
> >
> > }
> >
> > Were the headers methods actually returns new objects, and the
> producer
> > records methods create a new producer record with all the current
> values,
> > but with the new modified headers.
> >
> > Then interceptors / code return this new object?
> >
> >
> > Cheers
> 

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Vahid S Hashemian
Thanks Jorge for addressing my question/suggestion.

One last thing. I noticed is that in the example you have for the "plan" 
option
(
https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling#KIP-122:AddResetConsumerGroupOffsetstooling-ExecutionOptions
)
under "Description" column, you put 0 for lag. So I assume that is the 
current lag being reported, and not the new lag. Might be helpful to 
explicitly specify that (i.e. CURRENT-LAG) in the column header.
The other option is to report both current and new lags, but I understand 
if we don't want to do that since it's rather redundant info.

Thanks again.
--Vahid



From:   Jorge Esteban Quilcate Otoya 
To: dev@kafka.apache.org
Date:   02/24/2017 12:47 PM
Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets



Hi Vahid,

Thanks for your comments. Check my answers below:

El vie., 24 feb. 2017 a las 19:41, Vahid S Hashemian (<
vahidhashem...@us.ibm.com>) escribió:

> Hi Jorge,
>
> Thanks for the useful KIP.
>
> I have a question regarding the proposed "plan" option.
> The "current offset" and "lag" values of a topic partition are 
meaningful
> within a consumer group. In other words, different consumer groups could
> have different values for these properties of each topic partition.
> I don't see that reflected in the discussion around the "plan" option.
> Unless we are assuming a "--group" option is also provided by user 
(which
> is not clear from the KIP if that is the case).
>

I have added an additional comment to state that this options will require
a "group" argument.
It is considered to affect only one Consumer Group.


>
> Also, I was wondering if you can provide at least one full command 
example
> for each of the "plan", "execute", and "export" options. They would
> definitely help in understanding some of the details.
>
>
Added to the KIP.


> Sorry for the delayed question/suggestion. I hope they make sense.
>
> Thanks.
> --Vahid
>
>
>
> From:   Jorge Esteban Quilcate Otoya 
> To: dev@kafka.apache.org
> Date:   02/24/2017 09:51 AM
> Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
>
>
>
> Great! KIP updated.
>
>
>
> El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax
> ()
> escribió:
>
> > I like this!
> >
> > --by-duration and --shift-by
> >
> >
> > -Matthias
> >
> > On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > > Renaming to --by-duration LGTM
> > >
> > > Not sure about changing it to --shift-by-duration because we could 
end
> up
> > > with the same redundancy as before with reset: --reset-offsets
> > > --reset-to-*.
> > >
> > > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> > consistent
> > > enough?
> > >
> > >
> > > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> > matth...@confluent.io>)
> > > escribió:
> > >
> > >> I just read the update KIP once more.
> > >>
> > >> I would suggest to rename --to-duration to --by-duration
> > >>
> > >> Or as a second idea, rename --to-duration to --shift-by-duration 
and
> at
> > >> the same time rename --shift-offset-by to --shift-by-offset
> > >>
> > >> Not sure what the best option is, but naming would be more 
consistent
> > IMHO.
> > >>
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> > >>> Hi All,
> > >>>
> > >>> If there are no more concerns, I'd like to start vote for this 
KIP.
> > >>>
> > >>> Thanks!
> > >>> Jorge.
> > >>>
> > >>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> > >>> quilcate.jo...@gmail.com>) escribió:
> > >>>
> >  Oh ok :)
> > 
> >  So, we can keep `--topic t1:1,2,3`
> > 
> >  I think with this one we have most of the feedback applied. I 
will
> > >> update
> >  the KIP with this change.
> > 
> >  El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
> > >> matth...@confluent.io>)
> >  escribió:
> > 
> >  Sounds reasonable.
> > 
> >  If we have multiple --topic arguments, it does also not matter if
> we
> > use
> >  t1:1,2 or t2=1,2
> > 
> >  I just suggested '=' because I wanted use ':' to chain multiple
> > topics.
> > 
> > 
> >  -Matthias
> > 
> >  On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> > > Yeap, `--topic t1=1,2`LGTM
> > >
> > > Don't have idea neither about getting rid of repeated --topic, 
but
> >  --group
> > > is also repeated in the case of deletion, so it could be ok to
> have
> > > repeated --topic arguments.
> > >
> > > El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
> >  matth...@confluent.io>)
> > > escribió:
> > >
> > >> So you suggest to merge "scope options" --topics, --topic, and
> > >> --partitions into a single option? Sound good to me.
> > >>
> > >> I like the compact way to express it, ie,
> > 

Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-24 Thread Guozhang Wang
Hello folks,

Some people have reported that the java doc was missing for this release.
We have updated the web site including the java docs for 0102 just now.

Thanks,
Guozhang


On Wed, Feb 22, 2017 at 2:56 PM, James Cheng  wrote:

> Woohoo! Thanks for running the release, Ewen!
>
> -James
>
> > On Feb 22, 2017, at 12:33 AM, Ewen Cheslack-Postava 
> wrote:
> >
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.2.0. This is a feature release which includes the completion
> > of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> > requests merged.
> >
> > All of the changes in this release can be found in the release notes:
> > https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
> >
> > Apache Kafka is a distributed streaming platform with four four core
> > APIs:
> >
> > ** The Producer API allows an application to publish a stream records to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output
> > stream to one or more output topics, effectively transforming the input
> > streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might capture
> > every change to a table.three key capabilities:
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react to
> > the
> > streams of data.
> >
> >
> > You can download the source release from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka-0.10.2.0-src.tgz
> >
> > and binary releases from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.11-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.10-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.12-0.10.2.0.tgz
> > (experimental 2.12 artifact)
> >
> > Thanks to the 101 contributors on this release!
> >
> > Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> > Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> > Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> > Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> > Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> > Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> > Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> > huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> > Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> > Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> > Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> > Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> > Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> > Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> > MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> > Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> > Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> > Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> > Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> > Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> > Yang Wei, yaojuncn, Yuto Kawamura
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> > Ewen
>
>


-- 
-- Guozhang


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Jorge Esteban Quilcate Otoya
Hi Vahid,

Thanks for your comments. Check my answers below:

El vie., 24 feb. 2017 a las 19:41, Vahid S Hashemian (<
vahidhashem...@us.ibm.com>) escribió:

> Hi Jorge,
>
> Thanks for the useful KIP.
>
> I have a question regarding the proposed "plan" option.
> The "current offset" and "lag" values of a topic partition are meaningful
> within a consumer group. In other words, different consumer groups could
> have different values for these properties of each topic partition.
> I don't see that reflected in the discussion around the "plan" option.
> Unless we are assuming a "--group" option is also provided by user (which
> is not clear from the KIP if that is the case).
>

I have added an additional comment to state that this options will require
a "group" argument.
It is considered to affect only one Consumer Group.


>
> Also, I was wondering if you can provide at least one full command example
> for each of the "plan", "execute", and "export" options. They would
> definitely help in understanding some of the details.
>
>
Added to the KIP.


> Sorry for the delayed question/suggestion. I hope they make sense.
>
> Thanks.
> --Vahid
>
>
>
> From:   Jorge Esteban Quilcate Otoya 
> To: dev@kafka.apache.org
> Date:   02/24/2017 09:51 AM
> Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
>
>
>
> Great! KIP updated.
>
>
>
> El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax
> ()
> escribió:
>
> > I like this!
> >
> > --by-duration and --shift-by
> >
> >
> > -Matthias
> >
> > On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > > Renaming to --by-duration LGTM
> > >
> > > Not sure about changing it to --shift-by-duration because we could end
> up
> > > with the same redundancy as before with reset: --reset-offsets
> > > --reset-to-*.
> > >
> > > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> > consistent
> > > enough?
> > >
> > >
> > > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> > matth...@confluent.io>)
> > > escribió:
> > >
> > >> I just read the update KIP once more.
> > >>
> > >> I would suggest to rename --to-duration to --by-duration
> > >>
> > >> Or as a second idea, rename --to-duration to --shift-by-duration and
> at
> > >> the same time rename --shift-offset-by to --shift-by-offset
> > >>
> > >> Not sure what the best option is, but naming would be more consistent
> > IMHO.
> > >>
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> > >>> Hi All,
> > >>>
> > >>> If there are no more concerns, I'd like to start vote for this KIP.
> > >>>
> > >>> Thanks!
> > >>> Jorge.
> > >>>
> > >>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> > >>> quilcate.jo...@gmail.com>) escribió:
> > >>>
> >  Oh ok :)
> > 
> >  So, we can keep `--topic t1:1,2,3`
> > 
> >  I think with this one we have most of the feedback applied. I will
> > >> update
> >  the KIP with this change.
> > 
> >  El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
> > >> matth...@confluent.io>)
> >  escribió:
> > 
> >  Sounds reasonable.
> > 
> >  If we have multiple --topic arguments, it does also not matter if
> we
> > use
> >  t1:1,2 or t2=1,2
> > 
> >  I just suggested '=' because I wanted use ':' to chain multiple
> > topics.
> > 
> > 
> >  -Matthias
> > 
> >  On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> > > Yeap, `--topic t1=1,2`LGTM
> > >
> > > Don't have idea neither about getting rid of repeated --topic, but
> >  --group
> > > is also repeated in the case of deletion, so it could be ok to
> have
> > > repeated --topic arguments.
> > >
> > > El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
> >  matth...@confluent.io>)
> > > escribió:
> > >
> > >> So you suggest to merge "scope options" --topics, --topic, and
> > >> --partitions into a single option? Sound good to me.
> > >>
> > >> I like the compact way to express it, ie,
> > topicname:list-of-partitions
> > >> with "all partitions" if not partitions are specified. It's quite
> > >> intuitive to use.
> > >>
> > >> Just wondering, if we could get rid of the repeated --topic
> option;
> > >> it's
> > >> somewhat verbose. Have no good idea though who to improve it.
> > >>
> > >> If you concatenate multiple topic, we need one more character
> that
> > is
> > >> not allowed in topic names to separate the topics:
> > >>
> > >>> invalidChars = {'/', '\\', ',', '\u', ':', '"', '\'', ';',
> '*',
> > >> '?', ' ', '\t', '\r', '\n', '='};
> > >>
> > >> maybe
> > >>
> > >> --topics t1=1,2,3:t2:t3=3
> > >>
> > >> use '=' to specify partitions (instead of ':' as you proposed)
> and
> > ':'
> > >> to separate topics? All other characters seem to be worse to use
> to
> > >> me.
> > 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
I’ve added the methods on the ProducerRecord that will return a new instance of 
ProducerRecord with modified headers.

On 24/02/2017, 19:22, "Michael Pearce"  wrote:

Pattern.compile is expensive, and even if cached String.equals is faster 
than matched. also if we end up with an internal map in future for performance 
it will be easier to be by key.

As all that's needed is to get header by key.

With like the other arguements of let's implement simple and then we can 
always add pattern later as well if it's found it's needed. (As noted it's 
easier to add methods than to take away)

Great I'll update kip with extra methods on producerecord and a note that 
new objects are returned by method calls.



Sent using OWA for iPhone

From: Jason Gustafson 
Sent: Friday, February 24, 2017 6:51:45 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

The APIs in the current KIP look good to me. Just a couple questions: why
does filter not return Headers? Also would it be useful if the key is a
regex?

On the point of immutability.. One option might be to use a mutable object
only when passing the headers through the interceptor chain. I think as
long as we resort to mutability only when clear performance results show
that it is worthwhile, I am satisfied. As Ismael noted, for common
scenarios it is possible to get reasonable performance with immutable
objects.

-Jason

On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce 
wrote:

> Hi
>
> On 1,  How can you guarantee two separate implemented clients would add
> the headers in the same order we are not specifying an order at the
> protocol level  (nor should we) with regards to keyA being ordered before
> keyB? We shouldn’t be expecting keyA to be always set before keyB.
>
> On 2, I believe we have changed the naming based on feedback from Jason
> already, e.g. we don’t have “get” method that inferred O(1) performance,
> like wise nor “put” but we have an “append”
>
> On 3, in the KafkaProducer, I think we have mutability already, the value
> for time is changed if it is null, at the point of send:
> “
> long timestamp = record.timestamp() == null ?
> time.milliseconds() : record.timestamp();
> “
>
> As such the timestamp is already mutable, so what’s the difference here,
> we already have some mixed semantics. On timestamp.
> e.g. currently if I send to records with timestamp not set, the wire
> binary sent the value for the timestamp would be different, as such we 
have
> mutation for the same record.
>
> On 4, I think we should not expect not 1 or 2 headers, but infact 10’s of
> headers. This is the concern on immutable headers, whilst the append
> self-reference works nicely, what if someone needs to remove a header?
>
> Trying to get this moving:
>
> If we really wanted Immutable Headers and essentially you guys wont give
> +1 for it without.
>
> Whats the feeling for adding methods to ProducerRecord that does the
> boiler plate code or creating a new ProducerRecord with the altered new
> headers (appended or removed) inside. E.g.
>
> ProducerRecord {
>
>
>  ProducerRecord append(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.append(headersToAppend),
> ….)
>  }
>
>  ProducerRecord remove(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.remove(headersToAppend),
> ….)
>  }
>
> }
>
> Were the headers methods actually returns new objects, and the producer
> records methods create a new producer record with all the current values,
> but with the new modified headers.
>
> Then interceptors / code return this new object?
>
>
> Cheers
> Mike
>
>
>
>
>
>
> On 24/02/2017, 16:02, "isma...@gmail.com on behalf of Ismael Juma" <
> isma...@gmail.com on behalf of ism...@juma.me.uk> wrote:
>
> Hi Michael,
>
> Did you mean that you were happy to compromise to keep it mutable or
> immutable? You wrote the former, but it sounded from the sentence that
> it
> could have been a typo. So, my thoughts on this is that there are a 
few
> things to take into account:
>
> 1. Semantics
> 2. Simplicity of use (the common operations should be easy to do)
> 3. If it's easy to reason about and safe (immutability helps with 
this)
> 4. Efficiency (both memory and CPU usage)
>
> Regarding 1, I think it would be good to be very clear about the
> guarantees
> that we 

[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883467#comment-15883467
 ] 

James Cheng commented on KAFKA-4798:


Pinging [~ewencp], since you were the release manager for 0.10.2



> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
> Attachments: Screen Shot 2017-02-24 at 12.02.29 PM.png
>
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng resolved KAFKA-4798.

Resolution: Fixed

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng updated KAFKA-4798:
---
Attachment: Screen Shot 2017-02-24 at 12.02.29 PM.png

Attached screenshot of what the website tabs look like, for the 0.10.2 docs vs 
the 0.10.1.1 docs.

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
> Attachments: Screen Shot 2017-02-24 at 12.02.29 PM.png
>
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng updated KAFKA-4798:
---
Status: Reopened  (was: Reopened)

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883427#comment-15883427
 ] 

James Cheng commented on KAFKA-4798:


Sorry, reopening.

The website title bar for the 0.10.1 docs says "KafkaConsumer (kafka 0.10.1.1 
API)

The above link for 0.10.2 says "(kafka-work 0.10.2.1-SNAPSHOT API)"

It seems that the docs that are there are not the official ones from the 0.10.2 
release, but rather from a SNAPSHOT branch.


> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng updated KAFKA-4798:
---
Status: Reopened  (was: Closed)

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883423#comment-15883423
 ] 

James Cheng commented on KAFKA-4798:


Looks like someone fixed it. Closing.

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (KAFKA-4798) Javadocs for 0.10.2 are missing from kafka.apache.org

2017-02-24 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng closed KAFKA-4798.
--

> Javadocs for 0.10.2 are missing from kafka.apache.org
> -
>
> Key: KAFKA-4798
> URL: https://issues.apache.org/jira/browse/KAFKA-4798
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: James Cheng
>
> The javadocs on kafka.apache.org are missing for 0.10.2
> The following link yields a 404:
> http://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> If you look in http://kafka.apache.org/0102/, there is no javadoc directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-24 Thread Sriram Subramanian
+1. Great work in driving this to a consensus. Lots of good constructive
conversations.

On Fri, Feb 24, 2017 at 11:48 AM, Jason Gustafson 
wrote:

> +1 from me (duh). Thanks to all the reviewers. The KIP has been much
> improved because of it!
>
> -Jason
>
> On Wed, Feb 22, 2017 at 8:48 AM, Ismael Juma  wrote:
>
> > Great work on the proposal and iterating on it based on community
> feedback.
> > As Jun (and others) said, it's likely that minor changes will happen as
> the
> > PR is reviewed and additional testing takes place since this is a
> > significant change.
> >
> > I am +1 (binding) on the proposal without optional keys and values to
> keep
> > things consistent. If we show during performance testing that this is
> > worthwhile, we can update the proposal.
> >
> > Ismael
> >
> > On Tue, Feb 21, 2017 at 6:23 PM, Jun Rao  wrote:
> >
> > > It seems that it's simpler and more consistent to avoid optional keys
> and
> > > values. Not sure if it's worth squeezing every byte at the expense of
> > > additional complexity. Other than that, +1 from me.
> > >
> > > Also, since this is a large KIP, minor changes may arise as we start
> the
> > > implementation. It would be good if we can keep the community posted of
> > > those changes, if any.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Feb 21, 2017 at 4:33 PM, Michael Pearce  >
> > > wrote:
> > >
> > > > If the argument and objective within this KIP is to keep the overhead
> > of
> > > > the protocol as small as possible and remove redundancy, and every
> byte
> > > is
> > > > being counted and the introduction of varInts, then it would make
> sense
> > > to
> > > > use attributes to me.
> > > >
> > > >
> > > > On 22/02/2017, 00:14, "Jason Gustafson"  wrote:
> > > >
> > > > Done. I've left the key and value as optional since we may not
> have
> > > > reached
> > > > consensus on whether to use attributes or not. Perhaps we should
> > just
> > > > keep
> > > > it simple and not do it? The benefit seems small.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Feb 21, 2017 at 4:05 PM, Michael Pearce <
> > > michael.pea...@ig.com
> > > > >
> > > > wrote:
> > > >
> > > > > Ok, no worries, can you add it back ValueLen on this KIP, and
> > > update
> > > > the
> > > > > doc, then we can work from that ☺
> > > > >
> > > > > Cheers
> > > > > Mike
> > > > >
> > > > > On 22/02/2017, 00:02, "Jason Gustafson" 
> > > wrote:
> > > > >
> > > > > I feel it was a little odd to leave out the value length
> > > anyway,
> > > > so I
> > > > > would
> > > > > rather add it back and put headers at the end. This is more
> > > > consistent
> > > > > with
> > > > > the rest of the Kafka protocol.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Tue, Feb 21, 2017 at 3:58 PM, Michael Pearce <
> > > > michael.pea...@ig.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Or we keep as is (valuelen removed), and headers are
> added
> > > with
> > > > > headers
> > > > > > length..
> > > > > >
> > > > > > On 21/02/2017, 23:38, "Apurva Mehta" <
> apu...@confluent.io>
> > > > wrote:
> > > > > >
> > > > > > Right now, we don't need the value length: since it
> is
> > > the
> > > > last
> > > > > item
> > > > > > in the
> > > > > > message, and we have the message length, we can
> deduce
> > > the
> > > > value
> > > > > > length.
> > > > > > However, if we are adding record headers to the end,
> we
> > > > would
> > > > > need to
> > > > > > introduce the value length along with that change.
> > > > > >
> > > > > > On Tue, Feb 21, 2017 at 3:34 PM, Michael Pearce <
> > > > > michael.pea...@ig.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > It seems I cannot add comment on the doc.
> > > > > > >
> > > > > > > In the section around the message protocol.
> > > > > > >
> > > > > > > It has stated:
> > > > > > >
> > > > > > > Message =>
> > > > > > > Length => uintVar
> > > > > > > Attributes => int8
> > > > > > > TimestampDelta => intVar
> > > > > > > OffsetDelta => uintVar
> > > > > > > KeyLen => uintVar [OPTIONAL]
> > > > > > > Key => data [OPTIONAL]
> > > > > > > Value => data [OPTIONAL]
> > > > > > >
> > > > > > > Should it not be: (added missing value len)
> > > > > > >
> > > > > > > Message =>
> > > > > > > Length => uintVar
> > > > > > > Attributes => int8
> > > > > > > TimestampDelta => intVar
> > > >   

Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-24 Thread Jason Gustafson
+1 from me (duh). Thanks to all the reviewers. The KIP has been much
improved because of it!

-Jason

On Wed, Feb 22, 2017 at 8:48 AM, Ismael Juma  wrote:

> Great work on the proposal and iterating on it based on community feedback.
> As Jun (and others) said, it's likely that minor changes will happen as the
> PR is reviewed and additional testing takes place since this is a
> significant change.
>
> I am +1 (binding) on the proposal without optional keys and values to keep
> things consistent. If we show during performance testing that this is
> worthwhile, we can update the proposal.
>
> Ismael
>
> On Tue, Feb 21, 2017 at 6:23 PM, Jun Rao  wrote:
>
> > It seems that it's simpler and more consistent to avoid optional keys and
> > values. Not sure if it's worth squeezing every byte at the expense of
> > additional complexity. Other than that, +1 from me.
> >
> > Also, since this is a large KIP, minor changes may arise as we start the
> > implementation. It would be good if we can keep the community posted of
> > those changes, if any.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Feb 21, 2017 at 4:33 PM, Michael Pearce 
> > wrote:
> >
> > > If the argument and objective within this KIP is to keep the overhead
> of
> > > the protocol as small as possible and remove redundancy, and every byte
> > is
> > > being counted and the introduction of varInts, then it would make sense
> > to
> > > use attributes to me.
> > >
> > >
> > > On 22/02/2017, 00:14, "Jason Gustafson"  wrote:
> > >
> > > Done. I've left the key and value as optional since we may not have
> > > reached
> > > consensus on whether to use attributes or not. Perhaps we should
> just
> > > keep
> > > it simple and not do it? The benefit seems small.
> > >
> > > -Jason
> > >
> > > On Tue, Feb 21, 2017 at 4:05 PM, Michael Pearce <
> > michael.pea...@ig.com
> > > >
> > > wrote:
> > >
> > > > Ok, no worries, can you add it back ValueLen on this KIP, and
> > update
> > > the
> > > > doc, then we can work from that ☺
> > > >
> > > > Cheers
> > > > Mike
> > > >
> > > > On 22/02/2017, 00:02, "Jason Gustafson" 
> > wrote:
> > > >
> > > > I feel it was a little odd to leave out the value length
> > anyway,
> > > so I
> > > > would
> > > > rather add it back and put headers at the end. This is more
> > > consistent
> > > > with
> > > > the rest of the Kafka protocol.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Feb 21, 2017 at 3:58 PM, Michael Pearce <
> > > michael.pea...@ig.com
> > > > >
> > > > wrote:
> > > >
> > > > > Or we keep as is (valuelen removed), and headers are added
> > with
> > > > headers
> > > > > length..
> > > > >
> > > > > On 21/02/2017, 23:38, "Apurva Mehta" 
> > > wrote:
> > > > >
> > > > > Right now, we don't need the value length: since it is
> > the
> > > last
> > > > item
> > > > > in the
> > > > > message, and we have the message length, we can deduce
> > the
> > > value
> > > > > length.
> > > > > However, if we are adding record headers to the end, we
> > > would
> > > > need to
> > > > > introduce the value length along with that change.
> > > > >
> > > > > On Tue, Feb 21, 2017 at 3:34 PM, Michael Pearce <
> > > > michael.pea...@ig.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > It seems I cannot add comment on the doc.
> > > > > >
> > > > > > In the section around the message protocol.
> > > > > >
> > > > > > It has stated:
> > > > > >
> > > > > > Message =>
> > > > > > Length => uintVar
> > > > > > Attributes => int8
> > > > > > TimestampDelta => intVar
> > > > > > OffsetDelta => uintVar
> > > > > > KeyLen => uintVar [OPTIONAL]
> > > > > > Key => data [OPTIONAL]
> > > > > > Value => data [OPTIONAL]
> > > > > >
> > > > > > Should it not be: (added missing value len)
> > > > > >
> > > > > > Message =>
> > > > > > Length => uintVar
> > > > > > Attributes => int8
> > > > > > TimestampDelta => intVar
> > > > > > OffsetDelta => uintVar
> > > > > > KeyLen => uintVar [OPTIONAL]
> > > > > > Key => data [OPTIONAL]
> > > > > > ValueLen => uintVar [OPTIONAL]
> > > > > > Value => data [OPTIONAL]
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 21/02/2017, 23:07, "Joel Koshy" <
> > jjkosh...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > 

Jenkins build is back to normal : kafka-trunk-jdk8 #1300

2017-02-24 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
Pattern.compile is expensive, and even if cached String.equals is faster than 
matched. also if we end up with an internal map in future for performance it 
will be easier to be by key.

As all that's needed is to get header by key.

With like the other arguements of let's implement simple and then we can always 
add pattern later as well if it's found it's needed. (As noted it's easier to 
add methods than to take away)

Great I'll update kip with extra methods on producerecord and a note that new 
objects are returned by method calls.



Sent using OWA for iPhone

From: Jason Gustafson 
Sent: Friday, February 24, 2017 6:51:45 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

The APIs in the current KIP look good to me. Just a couple questions: why
does filter not return Headers? Also would it be useful if the key is a
regex?

On the point of immutability.. One option might be to use a mutable object
only when passing the headers through the interceptor chain. I think as
long as we resort to mutability only when clear performance results show
that it is worthwhile, I am satisfied. As Ismael noted, for common
scenarios it is possible to get reasonable performance with immutable
objects.

-Jason

On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce 
wrote:

> Hi
>
> On 1,  How can you guarantee two separate implemented clients would add
> the headers in the same order we are not specifying an order at the
> protocol level  (nor should we) with regards to keyA being ordered before
> keyB? We shouldn’t be expecting keyA to be always set before keyB.
>
> On 2, I believe we have changed the naming based on feedback from Jason
> already, e.g. we don’t have “get” method that inferred O(1) performance,
> like wise nor “put” but we have an “append”
>
> On 3, in the KafkaProducer, I think we have mutability already, the value
> for time is changed if it is null, at the point of send:
> “
> long timestamp = record.timestamp() == null ?
> time.milliseconds() : record.timestamp();
> “
>
> As such the timestamp is already mutable, so what’s the difference here,
> we already have some mixed semantics. On timestamp.
> e.g. currently if I send to records with timestamp not set, the wire
> binary sent the value for the timestamp would be different, as such we have
> mutation for the same record.
>
> On 4, I think we should not expect not 1 or 2 headers, but infact 10’s of
> headers. This is the concern on immutable headers, whilst the append
> self-reference works nicely, what if someone needs to remove a header?
>
> Trying to get this moving:
>
> If we really wanted Immutable Headers and essentially you guys wont give
> +1 for it without.
>
> Whats the feeling for adding methods to ProducerRecord that does the
> boiler plate code or creating a new ProducerRecord with the altered new
> headers (appended or removed) inside. E.g.
>
> ProducerRecord {
>
>
>  ProducerRecord append(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.append(headersToAppend),
> ….)
>  }
>
>  ProducerRecord remove(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.remove(headersToAppend),
> ….)
>  }
>
> }
>
> Were the headers methods actually returns new objects, and the producer
> records methods create a new producer record with all the current values,
> but with the new modified headers.
>
> Then interceptors / code return this new object?
>
>
> Cheers
> Mike
>
>
>
>
>
>
> On 24/02/2017, 16:02, "isma...@gmail.com on behalf of Ismael Juma" <
> isma...@gmail.com on behalf of ism...@juma.me.uk> wrote:
>
> Hi Michael,
>
> Did you mean that you were happy to compromise to keep it mutable or
> immutable? You wrote the former, but it sounded from the sentence that
> it
> could have been a typo. So, my thoughts on this is that there are a few
> things to take into account:
>
> 1. Semantics
> 2. Simplicity of use (the common operations should be easy to do)
> 3. If it's easy to reason about and safe (immutability helps with this)
> 4. Efficiency (both memory and CPU usage)
>
> Regarding 1, I think it would be good to be very clear about the
> guarantees
> that we are providing. It seems that we are saying that keys are
> unordered,
> but what about the case where there are multiple values for the same
> key?
> It seems that for some use cases (e.g. lineage), it may be useful to
> add
> values to the same key while preserving the order.
>
> Regarding 2, I agree that it's useful to have methods in `Headers` for
> the
> very common use cases although we have to be careful with the naming to
> avoid giving the wrong impression. Also, when it comes to `Map.Entry`,
> I
> think I'd prefer a `toMap` method that simply gives the user a Map if
> that's what they want (it makes it clear that there's a 

[GitHub] kafka pull request #2584: MINOR: Fix transient failure of testCannotSendToIn...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2584


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Christopher Shannon
Sounds good, thanks for the feedback. I will move my KIP to the discarded
section for now as the PrincipalBuilder should be sufficient.  If for some
reason it is not I can revisit this.

Chris

On Fri, Feb 24, 2017 at 1:37 PM Ismael Juma  wrote:

> Hi Christopher,
>
> It is possible to retrieve the certificates already. The Principal returned
> by TransportLayer.peerPrincipal() is a X500Principal if TLS is being used.
> We could add a method to make things nicer, potentially, but just wanted
> you to know that it's possible today.
>
> I am in favour of keeping it simple, if possible. :)
>
> As Rajini said, KIP-103 means users have a way to set the config on a per
> listener basis, so it makes sense to allow users to have both SSL client
> authentication and SASL authentication enabled at the same time if they
> wish. As long as the default remains that SSL client auth is disabled for
> SASL_SSL (which I believe is the common case), seems fine to me.
>
> Ismael
>
> On Fri, Feb 24, 2017 at 6:18 PM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
> > Rajini, If the override can be dropped for SASL_SSL then I have no
> problem
> > with doing this as SASL_SSL External (basically just TLS authentication).
> > If the configurable callback handlers KIP passes then that would effect
> > this and one of the callback handlers could be an X509 callback handler.
> >
> > Ismael, For why the PrincipalBuilder is not good enough...a custom
> > PrincipalBuilder would work (would just need to expose the peer
> > certificates through a new getter but that is simple).  The main reason
> why
> > I suggested using JAAS is that is the standard way of plugging in
> > authentication.  In terms of dual authentication, yes I could have one
> > listener as SSL and one as SASL.  Or I could just use SASL_SSL and
> > configure two login modules.
> >
> > After thinking about it more maybe the simplest approach would be to just
> > use a custom PrincipalBuilder along with a small PR to expose the peer
> > certificates from the SSL handshake.  One thing I didn't like about
> using a
> > JAAS module for SSL is that then it is a bit weird because someone might
> > configure a PrincipalBuilder for returning the Principal but usually that
> > is the JAAS modules jobs.  So perhaps we should keep it simple and just
> > rely on the existing PrincinpalBuilder interface so things don't get
> > confusing.  Thoughts? Would I just close/reject the KIP if I decide to go
> > this route?
> >
> > As a side note, I think it would be a good idea to create a Jira and PR
> > either way to get rid of the override for ssl.client.auth on SASL_SSL as
> it
> > would be good to let the user configure that regardless (i can do the
> > jira/pr)
> >
> > On Fri, Feb 24, 2017 at 9:47 AM, Ismael Juma  wrote:
> >
> > > Hi Christopher,
> > >
> > > Thanks for the KIP. I have two comments:
> > >
> > > 1. Can you please explain in the KIP (maybe in the Rejected
> Alternatives
> > > section) why a PrincipalBuilder is not good enough for your use case?
> > This
> > > is typically what people use to customise authentication for the TLS
> > case.
> > >
> > > 2. You mention that you have a use case where you need dual
> > authentication
> > > (username/password and certificate based). Could you not achieve this
> via
> > > two separate listeners? username/password could be via a listener
> > > configured to use SASL and certificate-based could be via a listener
> > > configured to use TLS.
> > >
> > > Ismael
> > >
> > > On Tue, Feb 21, 2017 at 8:23 PM, Christopher Shannon <
> > > christopher.l.shan...@gmail.com> wrote:
> > >
> > > > Thanks for the feedback Harsha.
> > > >
> > > > Can you clarify what you mean for the use cases for SASL_SSL and
> X509?
> > > My
> > > > proposal is to only have X509 based pluggable authentication for the
> > SSL
> > > > channel and not SASL_SSL.  I suppose you could use X509 credentials
> > with
> > > > SASL_SSL but the authentication mode would probably need to be SASL
> > > > EXTERNAL as the authentication is done by the SSL channel where as
> with
> > > > Kerberos or PLAINTEXT the user is providing credentials.  That's why
> I
> > > > proposed adding it to the SSL channel instead of SASL_SSL.
> > > >
> > > > That being said I guess one option would be to just allow the use of
> a
> > > X509
> > > > callback handler and don't disable client auth when using SASL_SSL.
> > Then
> > > > after login have some way to signal it's EXTERNAL mode so it doesn't
> do
> > > any
> > > > other authentication steps.
> > > >
> > > > I have a use case where I need dual authentication (both
> > > username/password
> > > > and certificate based) and ether one would work as multiple
> > LoginModules
> > > > can be chained together.
> > > >
> > > > Chris
> > > >
> > > > On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani <
> ka...@harsha.io>
> > > > wrote:
> > > >
> > > > > Hi Chris,
> > > > >   Thanks 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Jason Gustafson
The APIs in the current KIP look good to me. Just a couple questions: why
does filter not return Headers? Also would it be useful if the key is a
regex?

On the point of immutability.. One option might be to use a mutable object
only when passing the headers through the interceptor chain. I think as
long as we resort to mutability only when clear performance results show
that it is worthwhile, I am satisfied. As Ismael noted, for common
scenarios it is possible to get reasonable performance with immutable
objects.

-Jason

On Fri, Feb 24, 2017 at 8:48 AM, Michael Pearce 
wrote:

> Hi
>
> On 1,  How can you guarantee two separate implemented clients would add
> the headers in the same order we are not specifying an order at the
> protocol level  (nor should we) with regards to keyA being ordered before
> keyB? We shouldn’t be expecting keyA to be always set before keyB.
>
> On 2, I believe we have changed the naming based on feedback from Jason
> already, e.g. we don’t have “get” method that inferred O(1) performance,
> like wise nor “put” but we have an “append”
>
> On 3, in the KafkaProducer, I think we have mutability already, the value
> for time is changed if it is null, at the point of send:
> “
> long timestamp = record.timestamp() == null ?
> time.milliseconds() : record.timestamp();
> “
>
> As such the timestamp is already mutable, so what’s the difference here,
> we already have some mixed semantics. On timestamp.
> e.g. currently if I send to records with timestamp not set, the wire
> binary sent the value for the timestamp would be different, as such we have
> mutation for the same record.
>
> On 4, I think we should not expect not 1 or 2 headers, but infact 10’s of
> headers. This is the concern on immutable headers, whilst the append
> self-reference works nicely, what if someone needs to remove a header?
>
> Trying to get this moving:
>
> If we really wanted Immutable Headers and essentially you guys wont give
> +1 for it without.
>
> Whats the feeling for adding methods to ProducerRecord that does the
> boiler plate code or creating a new ProducerRecord with the altered new
> headers (appended or removed) inside. E.g.
>
> ProducerRecord {
>
>
>  ProducerRecord append(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.append(headersToAppend),
> ….)
>  }
>
>  ProducerRecord remove(Iterable headersToAppend){
> return new ProducerRecord(key, value, headers.remove(headersToAppend),
> ….)
>  }
>
> }
>
> Were the headers methods actually returns new objects, and the producer
> records methods create a new producer record with all the current values,
> but with the new modified headers.
>
> Then interceptors / code return this new object?
>
>
> Cheers
> Mike
>
>
>
>
>
>
> On 24/02/2017, 16:02, "isma...@gmail.com on behalf of Ismael Juma" <
> isma...@gmail.com on behalf of ism...@juma.me.uk> wrote:
>
> Hi Michael,
>
> Did you mean that you were happy to compromise to keep it mutable or
> immutable? You wrote the former, but it sounded from the sentence that
> it
> could have been a typo. So, my thoughts on this is that there are a few
> things to take into account:
>
> 1. Semantics
> 2. Simplicity of use (the common operations should be easy to do)
> 3. If it's easy to reason about and safe (immutability helps with this)
> 4. Efficiency (both memory and CPU usage)
>
> Regarding 1, I think it would be good to be very clear about the
> guarantees
> that we are providing. It seems that we are saying that keys are
> unordered,
> but what about the case where there are multiple values for the same
> key?
> It seems that for some use cases (e.g. lineage), it may be useful to
> add
> values to the same key while preserving the order.
>
> Regarding 2, I agree that it's useful to have methods in `Headers` for
> the
> very common use cases although we have to be careful with the naming to
> avoid giving the wrong impression. Also, when it comes to `Map.Entry`,
> I
> think I'd prefer a `toMap` method that simply gives the user a Map if
> that's what they want (it makes it clear that there's a conversion
> happening).
>
> Regarding 3, the concern I have if we make the headers mutable is that
> it
> seems to introduce some inconsistencies and potential edge cases. At
> the
> moment, it's safe to keep a ProducerRecord and resend it, for example.
> If
> the record is mutated by an interceptor, then this can lead to weird
> behaviour. Also, it seems inconsistent that one has to create a new
> ProducerRecord to modify the record timestamp, but that one has to
> mutate
> the record to add headers. It seems like one should either embrace
> immutability or mutability, but mixing both is not ideal.
>
> Regarding 4, for the cases where there are a small number of headers
> and/or
> 1 or 2 interceptors, it doesn't seem 

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Vahid S Hashemian
Hi Jorge,

Thanks for the useful KIP.

I have a question regarding the proposed "plan" option.
The "current offset" and "lag" values of a topic partition are meaningful 
within a consumer group. In other words, different consumer groups could 
have different values for these properties of each topic partition.
I don't see that reflected in the discussion around the "plan" option. 
Unless we are assuming a "--group" option is also provided by user (which 
is not clear from the KIP if that is the case).

Also, I was wondering if you can provide at least one full command example 
for each of the "plan", "execute", and "export" options. They would 
definitely help in understanding some of the details.

Sorry for the delayed question/suggestion. I hope they make sense.

Thanks.
--Vahid



From:   Jorge Esteban Quilcate Otoya 
To: dev@kafka.apache.org
Date:   02/24/2017 09:51 AM
Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets



Great! KIP updated.



El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax 
()
escribió:

> I like this!
>
> --by-duration and --shift-by
>
>
> -Matthias
>
> On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > Renaming to --by-duration LGTM
> >
> > Not sure about changing it to --shift-by-duration because we could end 
up
> > with the same redundancy as before with reset: --reset-offsets
> > --reset-to-*.
> >
> > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> consistent
> > enough?
> >
> >
> > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> matth...@confluent.io>)
> > escribió:
> >
> >> I just read the update KIP once more.
> >>
> >> I would suggest to rename --to-duration to --by-duration
> >>
> >> Or as a second idea, rename --to-duration to --shift-by-duration and 
at
> >> the same time rename --shift-offset-by to --shift-by-offset
> >>
> >> Not sure what the best option is, but naming would be more consistent
> IMHO.
> >>
> >>
> >>
> >> -Matthias
> >>
> >> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> >>> Hi All,
> >>>
> >>> If there are no more concerns, I'd like to start vote for this KIP.
> >>>
> >>> Thanks!
> >>> Jorge.
> >>>
> >>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> >>> quilcate.jo...@gmail.com>) escribió:
> >>>
>  Oh ok :)
> 
>  So, we can keep `--topic t1:1,2,3`
> 
>  I think with this one we have most of the feedback applied. I will
> >> update
>  the KIP with this change.
> 
>  El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
> >> matth...@confluent.io>)
>  escribió:
> 
>  Sounds reasonable.
> 
>  If we have multiple --topic arguments, it does also not matter if 
we
> use
>  t1:1,2 or t2=1,2
> 
>  I just suggested '=' because I wanted use ':' to chain multiple
> topics.
> 
> 
>  -Matthias
> 
>  On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> > Yeap, `--topic t1=1,2`LGTM
> >
> > Don't have idea neither about getting rid of repeated --topic, but
>  --group
> > is also repeated in the case of deletion, so it could be ok to 
have
> > repeated --topic arguments.
> >
> > El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
>  matth...@confluent.io>)
> > escribió:
> >
> >> So you suggest to merge "scope options" --topics, --topic, and
> >> --partitions into a single option? Sound good to me.
> >>
> >> I like the compact way to express it, ie,
> topicname:list-of-partitions
> >> with "all partitions" if not partitions are specified. It's quite
> >> intuitive to use.
> >>
> >> Just wondering, if we could get rid of the repeated --topic 
option;
> >> it's
> >> somewhat verbose. Have no good idea though who to improve it.
> >>
> >> If you concatenate multiple topic, we need one more character 
that
> is
> >> not allowed in topic names to separate the topics:
> >>
> >>> invalidChars = {'/', '\\', ',', '\u', ':', '"', '\'', ';', 
'*',
> >> '?', ' ', '\t', '\r', '\n', '='};
> >>
> >> maybe
> >>
> >> --topics t1=1,2,3:t2:t3=3
> >>
> >> use '=' to specify partitions (instead of ':' as you proposed) 
and
> ':'
> >> to separate topics? All other characters seem to be worse to use 
to
> >> me.
> >> But maybe you have a better idea.
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 2/23/17 3:15 AM, Jorge Esteban Quilcate Otoya wrote:
> >>> @Matthias about the point 9:
> >>>
> >>> What about keeping only the --topic option, and support this
> format:
> >>>
> >>> `--topic t1:0,1,2 --topic t2 --topic t3:2`
> >>>
> >>> In this case topics t1, t2, and t3 will be selected: topic t1 
with
> >>> partitions 0,1 and 2; topic t2 with all its partitions; and 
topic
> t3,
> >> with
> >>> only partition 2.
> >>>
> >>> Jorge.
> >>>
> >>> 

Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Ismael Juma
Hi Christopher,

It is possible to retrieve the certificates already. The Principal returned
by TransportLayer.peerPrincipal() is a X500Principal if TLS is being used.
We could add a method to make things nicer, potentially, but just wanted
you to know that it's possible today.

I am in favour of keeping it simple, if possible. :)

As Rajini said, KIP-103 means users have a way to set the config on a per
listener basis, so it makes sense to allow users to have both SSL client
authentication and SASL authentication enabled at the same time if they
wish. As long as the default remains that SSL client auth is disabled for
SASL_SSL (which I believe is the common case), seems fine to me.

Ismael

On Fri, Feb 24, 2017 at 6:18 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Rajini, If the override can be dropped for SASL_SSL then I have no problem
> with doing this as SASL_SSL External (basically just TLS authentication).
> If the configurable callback handlers KIP passes then that would effect
> this and one of the callback handlers could be an X509 callback handler.
>
> Ismael, For why the PrincipalBuilder is not good enough...a custom
> PrincipalBuilder would work (would just need to expose the peer
> certificates through a new getter but that is simple).  The main reason why
> I suggested using JAAS is that is the standard way of plugging in
> authentication.  In terms of dual authentication, yes I could have one
> listener as SSL and one as SASL.  Or I could just use SASL_SSL and
> configure two login modules.
>
> After thinking about it more maybe the simplest approach would be to just
> use a custom PrincipalBuilder along with a small PR to expose the peer
> certificates from the SSL handshake.  One thing I didn't like about using a
> JAAS module for SSL is that then it is a bit weird because someone might
> configure a PrincipalBuilder for returning the Principal but usually that
> is the JAAS modules jobs.  So perhaps we should keep it simple and just
> rely on the existing PrincinpalBuilder interface so things don't get
> confusing.  Thoughts? Would I just close/reject the KIP if I decide to go
> this route?
>
> As a side note, I think it would be a good idea to create a Jira and PR
> either way to get rid of the override for ssl.client.auth on SASL_SSL as it
> would be good to let the user configure that regardless (i can do the
> jira/pr)
>
> On Fri, Feb 24, 2017 at 9:47 AM, Ismael Juma  wrote:
>
> > Hi Christopher,
> >
> > Thanks for the KIP. I have two comments:
> >
> > 1. Can you please explain in the KIP (maybe in the Rejected Alternatives
> > section) why a PrincipalBuilder is not good enough for your use case?
> This
> > is typically what people use to customise authentication for the TLS
> case.
> >
> > 2. You mention that you have a use case where you need dual
> authentication
> > (username/password and certificate based). Could you not achieve this via
> > two separate listeners? username/password could be via a listener
> > configured to use SASL and certificate-based could be via a listener
> > configured to use TLS.
> >
> > Ismael
> >
> > On Tue, Feb 21, 2017 at 8:23 PM, Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> > > Thanks for the feedback Harsha.
> > >
> > > Can you clarify what you mean for the use cases for SASL_SSL and X509?
> > My
> > > proposal is to only have X509 based pluggable authentication for the
> SSL
> > > channel and not SASL_SSL.  I suppose you could use X509 credentials
> with
> > > SASL_SSL but the authentication mode would probably need to be SASL
> > > EXTERNAL as the authentication is done by the SSL channel where as with
> > > Kerberos or PLAINTEXT the user is providing credentials.  That's why I
> > > proposed adding it to the SSL channel instead of SASL_SSL.
> > >
> > > That being said I guess one option would be to just allow the use of a
> > X509
> > > callback handler and don't disable client auth when using SASL_SSL.
> Then
> > > after login have some way to signal it's EXTERNAL mode so it doesn't do
> > any
> > > other authentication steps.
> > >
> > > I have a use case where I need dual authentication (both
> > username/password
> > > and certificate based) and ether one would work as multiple
> LoginModules
> > > can be chained together.
> > >
> > > Chris
> > >
> > > On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani 
> > > wrote:
> > >
> > > > Hi Chris,
> > > >   Thanks for the KIP. Could you also add details/use-cases
> for
> > > > having X509 certificate based authentication in the context SASL_SSL.
> > > > The reason that we disabled the SSL auth for SASL_SSL is the intent
> > > behind
> > > > using SASL auth over SSL encryption and user  can enforce a
> > > > role based auth and have wire encryption for data transfer. If users
> > just
> > > > want SSL based authentication they have option to do so via SSL.
> > > > I think we are providing too many options 

[jira] [Commented] (KAFKA-4494) Significant startup delays in KStreams app

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883264#comment-15883264
 ] 

ASF GitHub Bot commented on KAFKA-4494:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2560


> Significant startup delays in KStreams app
> --
>
> Key: KAFKA-4494
> URL: https://issues.apache.org/jira/browse/KAFKA-4494
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: AWS Linux ami, mac os
>Reporter: j yeargers
>Assignee: Damian Guy
>  Labels: performance
> Fix For: 0.10.3.0
>
>
> Often starting a KStreams based app results in significant (5-10 minutes) 
> delay before processing of stream begins. 
> Sample debug output: 
> https://gist.github.com/jyeargers/e8398fb353d67397f99148bc970479ee
> Topology in question: stream -> map -> groupbykey.aggregate -> print
> Stream is JSON.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4494) Significant startup delays in KStreams app

2017-02-24 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4494.
--
   Resolution: Fixed
Fix Version/s: 0.10.3.0

Issue resolved by pull request 2560
[https://github.com/apache/kafka/pull/2560]

> Significant startup delays in KStreams app
> --
>
> Key: KAFKA-4494
> URL: https://issues.apache.org/jira/browse/KAFKA-4494
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: AWS Linux ami, mac os
>Reporter: j yeargers
>Assignee: Damian Guy
>  Labels: performance
> Fix For: 0.10.3.0
>
>
> Often starting a KStreams based app results in significant (5-10 minutes) 
> delay before processing of stream begins. 
> Sample debug output: 
> https://gist.github.com/jyeargers/e8398fb353d67397f99148bc970479ee
> Topology in question: stream -> map -> groupbykey.aggregate -> print
> Stream is JSON.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2560: KAFKA-4494: Reduce startup and rebalance time

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2560


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Jun Rao
Hi, Jay,

2. Regarding request.unit vs request.percentage. I started with
request.percentage too. The reasoning for request.unit is the following.
Suppose that the capacity has been reached on a broker and the admin needs
to add a new user. A simple way to increase the capacity is to increase the
number of io threads, assuming there are still enough cores. If the limit
is based on percentage, the additional capacity automatically gets
distributed to existing users and we haven't really carved out any
additional resource for the new user. Now, is it easy for a user to reason
about 0.1 unit vs 10%. My feeling is that both are hard and have to be
configured empirically. Not sure if percentage is obviously easier to
reason about.

Thanks,

Jun

On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:

> A couple of quick points:
>
> 1. Even though the implementation of this quota is only using io thread
> time, i think we should call it something like "request-time". This will
> give us flexibility to improve the implementation to cover network threads
> in the future and will avoid exposing internal details like our thread
> pools on the server.
>
> 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
> is super unintuitive as a user-facing knob. I had to read the KIP like
> eight times to understand this. I'm not sure that your point that
> increasing the number of threads is a problem with a percentage-based
> value, it really depends on whether the user thinks about the "percentage
> of request processing time" or "thread units". If they think "I have
> allocated 10% of my request processing time to user x" then it is a bug
> that increasing the thread count decreases that percent as it does in the
> current proposal. As a practical matter I think the only way to actually
> reason about this is as a percent---I just don't believe people are going
> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> think they have to understand this thread unit concept, figure out what
> they have set in number of threads, compute a percent and then come up with
> the number of thread units, and these will all be wrong if that thread
> count changes. I also think this ties us to throttling the I/O thread pool,
> which may not be where we want to end up.
>
> 3. For what it's worth I do think having a single throttle_ms field in all
> the responses that combines all throttling from all quotas is probably the
> simplest. There could be a use case for having separate fields for each,
> but I think that is actually harder to use/monitor in the common case so
> unless someone has a use case I think just one should be fine.
>
> -Jay
>
> On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far.
> >
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > wrote:
> >
> > > Thank you all for the feedback.
> > >
> > > Ismael #1. It makes sense not to throttle inter-broker requests like
> > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> > these
> > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> prevent
> > > clients from using these requests and unauthorized requests are
> included
> > > towards quotas.
> > >
> > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > separate
> > > throttle time, and all utilization based quotas could use the same
> field
> > > (we won't add another one for network thread utilization for instance).
> > But
> > > perhaps it makes sense to keep byte rate quotas separate in
> produce/fetch
> > > responses to provide separate metrics? Agree with Ismael that the name
> of
> > > the existing field should be changed if we have two. Happy to switch
> to a
> > > single combined throttle time if that is sufficient.
> > >
> > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > > property. Replication quotas use dot separated, so it will be
> consistent
> > > with all properties except byte rate quotas.
> > >
> > > Radai: #1 Request processing time rather than request rate were chosen
> > > because the time per request can vary significantly between requests as
> > > mentioned in the discussion and KIP.
> > > #2 Two separate quotas for heartbeats/regular requests feel like more
> > > configuration and more metrics. Since most users would set quotas
> higher
> > > than the expected usage and quotas are more of a safety net, a single
> > quota
> > > should work in most cases.
> > >  #3 The number of requests in purgatory is limited by the number of
> > active
> > > connections since only one request per connection will be throttled at
> a
> > > time.
> > > #4 As with byte rate quotas, to use the full allocated quotas,
> > > clients/users would need to use partitions that are distributed across
> > the
> > > cluster. The 

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-24 Thread Dong Lin
Hey Jun,

I don't think we should allow failed replicas to be re-created on the good
disks. Say there are 2 disks and each of them is 51% loaded. If any disk
fail, and we allow replicas to be re-created on the other disks, both disks
will fail. Alternatively we can disable replica creation if there is bad
disk on a broker. I personally think it is worth the additional complexity
in the broker to store created replicas in ZK so that we allow new replicas
to be created on the broker even when there is bad log directory. This
approach won't add complexity in the controller. But I am fine with
disabling replica creation when there is bad log directory that if it is
the only blocking issue for this KIP.

Whether we store created flags is independent of whether/how we store
offline replicas. Per our previous discussion, do you think it is OK not
store offline replicas in ZK and propagate the offline replicas from broker
to controller via LeaderAndIsrRequest?

Thanks,
Dong


Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Christopher Shannon
Rajini, If the override can be dropped for SASL_SSL then I have no problem
with doing this as SASL_SSL External (basically just TLS authentication).
If the configurable callback handlers KIP passes then that would effect
this and one of the callback handlers could be an X509 callback handler.

Ismael, For why the PrincipalBuilder is not good enough...a custom
PrincipalBuilder would work (would just need to expose the peer
certificates through a new getter but that is simple).  The main reason why
I suggested using JAAS is that is the standard way of plugging in
authentication.  In terms of dual authentication, yes I could have one
listener as SSL and one as SASL.  Or I could just use SASL_SSL and
configure two login modules.

After thinking about it more maybe the simplest approach would be to just
use a custom PrincipalBuilder along with a small PR to expose the peer
certificates from the SSL handshake.  One thing I didn't like about using a
JAAS module for SSL is that then it is a bit weird because someone might
configure a PrincipalBuilder for returning the Principal but usually that
is the JAAS modules jobs.  So perhaps we should keep it simple and just
rely on the existing PrincinpalBuilder interface so things don't get
confusing.  Thoughts? Would I just close/reject the KIP if I decide to go
this route?

As a side note, I think it would be a good idea to create a Jira and PR
either way to get rid of the override for ssl.client.auth on SASL_SSL as it
would be good to let the user configure that regardless (i can do the
jira/pr)

On Fri, Feb 24, 2017 at 9:47 AM, Ismael Juma  wrote:

> Hi Christopher,
>
> Thanks for the KIP. I have two comments:
>
> 1. Can you please explain in the KIP (maybe in the Rejected Alternatives
> section) why a PrincipalBuilder is not good enough for your use case? This
> is typically what people use to customise authentication for the TLS case.
>
> 2. You mention that you have a use case where you need dual authentication
> (username/password and certificate based). Could you not achieve this via
> two separate listeners? username/password could be via a listener
> configured to use SASL and certificate-based could be via a listener
> configured to use TLS.
>
> Ismael
>
> On Tue, Feb 21, 2017 at 8:23 PM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
> > Thanks for the feedback Harsha.
> >
> > Can you clarify what you mean for the use cases for SASL_SSL and X509?
> My
> > proposal is to only have X509 based pluggable authentication for the SSL
> > channel and not SASL_SSL.  I suppose you could use X509 credentials with
> > SASL_SSL but the authentication mode would probably need to be SASL
> > EXTERNAL as the authentication is done by the SSL channel where as with
> > Kerberos or PLAINTEXT the user is providing credentials.  That's why I
> > proposed adding it to the SSL channel instead of SASL_SSL.
> >
> > That being said I guess one option would be to just allow the use of a
> X509
> > callback handler and don't disable client auth when using SASL_SSL.  Then
> > after login have some way to signal it's EXTERNAL mode so it doesn't do
> any
> > other authentication steps.
> >
> > I have a use case where I need dual authentication (both
> username/password
> > and certificate based) and ether one would work as multiple LoginModules
> > can be chained together.
> >
> > Chris
> >
> > On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani 
> > wrote:
> >
> > > Hi Chris,
> > >   Thanks for the KIP. Could you also add details/use-cases for
> > > having X509 certificate based authentication in the context SASL_SSL.
> > > The reason that we disabled the SSL auth for SASL_SSL is the intent
> > behind
> > > using SASL auth over SSL encryption and user  can enforce a
> > > role based auth and have wire encryption for data transfer. If users
> just
> > > want SSL based authentication they have option to do so via SSL.
> > > I think we are providing too many options of authentication in SASL_SSL
> > > mode and can be bit confusing.
> > >
> > > Thanks,
> > > Harsha
> > >
> > >
> > > On Tue, Feb 21, 2017 at 11:23 AM Christopher Shannon <
> > > christopher.l.shan...@gmail.com> wrote:
> > >
> > > Hi everyone
> > >
> > > I have just created KIP-127 to introduce custom JAAS configuration for
> > the
> > > SSL channel:
> > >
> > > *
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 127%3A+Pluggable+JAAS+LoginModule+configuration+for+SSL
> > > <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 127%3A+Pluggable+JAAS+LoginModule+configuration+for+SSL
> > > >*
> > >
> > > The idea here is to be able to do custom authentication based off of a
> > > user's X509 credentials in addition to the SSL handshake.
> > >
> > > I have created a rough draft of a commit to give an idea of what my
> plan
> > is
> > > which matches the KIP:
> > > https://github.com/cshannon/kafka/tree/KAFKA-4784
> > >
> > > It still 

[jira] [Assigned] (KAFKA-4800) Streams State transition ASCII diagrams need fixing and polishing

2017-02-24 Thread Clemens Valiente (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clemens Valiente reassigned KAFKA-4800:
---

Assignee: Clemens Valiente

> Streams State transition ASCII diagrams need fixing and polishing
> -
>
> Key: KAFKA-4800
> URL: https://issues.apache.org/jira/browse/KAFKA-4800
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Clemens Valiente
>Priority: Minor
>  Labels: newbie
> Fix For: 0.10.3.0
>
>
> The ASCII transition diagram in KafkaStreams.java on top of "public enum 
> State" does not read well in Javadoc. Also the self-loops to running and 
> rebalancing are not necessary. Same with the StreamThread.java diagram.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: add contributor

2017-02-24 Thread Jun Rao
Hi, Clement,

Thanks for your interest in Kafka. Added you to the contributor list.

Jun

On Fri, Feb 24, 2017 at 6:10 AM, Eno Thereska 
wrote:

> Hello,
>
> Sending on behalf on user who wants to contribute. Could we add Clement
> Valiente to contributor list please? Jira user id is "cvaliente".
>
> Thanks
> Eno


Re: [VOTE] KIP-122: Add Reset Consumer Group Offsets tooling

2017-02-24 Thread Matthias J. Sax
+1

On 2/23/17 4:46 PM, Jorge Esteban Quilcate Otoya wrote:
> Hi All,
> 
> It seems that there is no further concern with the KIP-122.
> At this point we would like to start the voting process.
> 
> The KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling
> 
> 
> Thanks!
> 
> Jorge.
> 



signature.asc
Description: OpenPGP digital signature


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Jorge Esteban Quilcate Otoya
Great! KIP updated.



El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax ()
escribió:

> I like this!
>
> --by-duration and --shift-by
>
>
> -Matthias
>
> On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > Renaming to --by-duration LGTM
> >
> > Not sure about changing it to --shift-by-duration because we could end up
> > with the same redundancy as before with reset: --reset-offsets
> > --reset-to-*.
> >
> > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> consistent
> > enough?
> >
> >
> > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> matth...@confluent.io>)
> > escribió:
> >
> >> I just read the update KIP once more.
> >>
> >> I would suggest to rename --to-duration to --by-duration
> >>
> >> Or as a second idea, rename --to-duration to --shift-by-duration and at
> >> the same time rename --shift-offset-by to --shift-by-offset
> >>
> >> Not sure what the best option is, but naming would be more consistent
> IMHO.
> >>
> >>
> >>
> >> -Matthias
> >>
> >> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> >>> Hi All,
> >>>
> >>> If there are no more concerns, I'd like to start vote for this KIP.
> >>>
> >>> Thanks!
> >>> Jorge.
> >>>
> >>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> >>> quilcate.jo...@gmail.com>) escribió:
> >>>
>  Oh ok :)
> 
>  So, we can keep `--topic t1:1,2,3`
> 
>  I think with this one we have most of the feedback applied. I will
> >> update
>  the KIP with this change.
> 
>  El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
> >> matth...@confluent.io>)
>  escribió:
> 
>  Sounds reasonable.
> 
>  If we have multiple --topic arguments, it does also not matter if we
> use
>  t1:1,2 or t2=1,2
> 
>  I just suggested '=' because I wanted use ':' to chain multiple
> topics.
> 
> 
>  -Matthias
> 
>  On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> > Yeap, `--topic t1=1,2`LGTM
> >
> > Don't have idea neither about getting rid of repeated --topic, but
>  --group
> > is also repeated in the case of deletion, so it could be ok to have
> > repeated --topic arguments.
> >
> > El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
>  matth...@confluent.io>)
> > escribió:
> >
> >> So you suggest to merge "scope options" --topics, --topic, and
> >> --partitions into a single option? Sound good to me.
> >>
> >> I like the compact way to express it, ie,
> topicname:list-of-partitions
> >> with "all partitions" if not partitions are specified. It's quite
> >> intuitive to use.
> >>
> >> Just wondering, if we could get rid of the repeated --topic option;
> >> it's
> >> somewhat verbose. Have no good idea though who to improve it.
> >>
> >> If you concatenate multiple topic, we need one more character that
> is
> >> not allowed in topic names to separate the topics:
> >>
> >>> invalidChars = {'/', '\\', ',', '\u', ':', '"', '\'', ';', '*',
> >> '?', ' ', '\t', '\r', '\n', '='};
> >>
> >> maybe
> >>
> >> --topics t1=1,2,3:t2:t3=3
> >>
> >> use '=' to specify partitions (instead of ':' as you proposed) and
> ':'
> >> to separate topics? All other characters seem to be worse to use to
> >> me.
> >> But maybe you have a better idea.
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 2/23/17 3:15 AM, Jorge Esteban Quilcate Otoya wrote:
> >>> @Matthias about the point 9:
> >>>
> >>> What about keeping only the --topic option, and support this
> format:
> >>>
> >>> `--topic t1:0,1,2 --topic t2 --topic t3:2`
> >>>
> >>> In this case topics t1, t2, and t3 will be selected: topic t1 with
> >>> partitions 0,1 and 2; topic t2 with all its partitions; and topic
> t3,
> >> with
> >>> only partition 2.
> >>>
> >>> Jorge.
> >>>
> >>> El mar., 21 feb. 2017 a las 11:11, Jorge Esteban Quilcate Otoya (<
> >>> quilcate.jo...@gmail.com>) escribió:
> >>>
>  Thanks for the feedback Matthias.
> 
>  * 1. You're right. I'll reorder the scenarios.
> 
>  * 2. Agree. I'll update the KIP.
> 
>  * 3. I like it, updating to `reset-offsets`
> 
>  * 4. Agree, removing the `reset-` part
> 
>  * 5. Yes, 1.e option without --execute or --export will print out
> >> current
>  offset, and the new offset, that will be the same. The use-case of
>  this
>  option is to use it in combination with --export mostly and have a
> >> current
>  'checkpoint' to reset later. I will add to the KIP how the output
>  should
>  looks like.
> 
>  * 6. Considering 4., I will update it to `--to-offset`
> 
>  * 7. I like the idea to unify these options (plus, minus).
>  

[jira] [Commented] (KAFKA-4779) Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883144#comment-15883144
 ] 

ASF GitHub Bot commented on KAFKA-4779:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2589


> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> 
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 87, in stop_producer_and_consumer
> self.check_alive()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out: 
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the 
> producer produced ~15k messages, of which ~7k were acked, and the consumer 
> only got around ~2600 before timing out. 
> There are a lot of messages like the following for different request types on 
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in 
> produce request on partition test_topic-0. The topic/partition may not exist 
> or the user may not have Describe access to it 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2589: KAFKA-4779: Fix security upgrade system test to be...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2589


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4779) Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py

2017-02-24 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4779.

   Resolution: Fixed
Fix Version/s: 0.10.2.1
   0.10.3.0

Issue resolved by pull request 2589
[https://github.com/apache/kafka/pull/2589]

> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> 
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 87, in stop_producer_and_consumer
> self.check_alive()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out: 
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the 
> producer produced ~15k messages, of which ~7k were acked, and the consumer 
> only got around ~2600 before timing out. 
> There are a lot of messages like the following for different request types on 
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in 
> produce request on partition test_topic-0. The topic/partition may not exist 
> or the user may not have Describe access to it 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Matthias J. Sax
I like this!

--by-duration and --shift-by


-Matthias

On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> Renaming to --by-duration LGTM
> 
> Not sure about changing it to --shift-by-duration because we could end up
> with the same redundancy as before with reset: --reset-offsets
> --reset-to-*.
> 
> Maybe changing --shift-offset-by to --shift-by 'n' could make it consistent
> enough?
> 
> 
> El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax ()
> escribió:
> 
>> I just read the update KIP once more.
>>
>> I would suggest to rename --to-duration to --by-duration
>>
>> Or as a second idea, rename --to-duration to --shift-by-duration and at
>> the same time rename --shift-offset-by to --shift-by-offset
>>
>> Not sure what the best option is, but naming would be more consistent IMHO.
>>
>>
>>
>> -Matthias
>>
>> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
>>> Hi All,
>>>
>>> If there are no more concerns, I'd like to start vote for this KIP.
>>>
>>> Thanks!
>>> Jorge.
>>>
>>> El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
>>> quilcate.jo...@gmail.com>) escribió:
>>>
 Oh ok :)

 So, we can keep `--topic t1:1,2,3`

 I think with this one we have most of the feedback applied. I will
>> update
 the KIP with this change.

 El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
>> matth...@confluent.io>)
 escribió:

 Sounds reasonable.

 If we have multiple --topic arguments, it does also not matter if we use
 t1:1,2 or t2=1,2

 I just suggested '=' because I wanted use ':' to chain multiple topics.


 -Matthias

 On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> Yeap, `--topic t1=1,2`LGTM
>
> Don't have idea neither about getting rid of repeated --topic, but
 --group
> is also repeated in the case of deletion, so it could be ok to have
> repeated --topic arguments.
>
> El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
 matth...@confluent.io>)
> escribió:
>
>> So you suggest to merge "scope options" --topics, --topic, and
>> --partitions into a single option? Sound good to me.
>>
>> I like the compact way to express it, ie, topicname:list-of-partitions
>> with "all partitions" if not partitions are specified. It's quite
>> intuitive to use.
>>
>> Just wondering, if we could get rid of the repeated --topic option;
>> it's
>> somewhat verbose. Have no good idea though who to improve it.
>>
>> If you concatenate multiple topic, we need one more character that is
>> not allowed in topic names to separate the topics:
>>
>>> invalidChars = {'/', '\\', ',', '\u', ':', '"', '\'', ';', '*',
>> '?', ' ', '\t', '\r', '\n', '='};
>>
>> maybe
>>
>> --topics t1=1,2,3:t2:t3=3
>>
>> use '=' to specify partitions (instead of ':' as you proposed) and ':'
>> to separate topics? All other characters seem to be worse to use to
>> me.
>> But maybe you have a better idea.
>>
>>
>>
>> -Matthias
>>
>>
>> On 2/23/17 3:15 AM, Jorge Esteban Quilcate Otoya wrote:
>>> @Matthias about the point 9:
>>>
>>> What about keeping only the --topic option, and support this format:
>>>
>>> `--topic t1:0,1,2 --topic t2 --topic t3:2`
>>>
>>> In this case topics t1, t2, and t3 will be selected: topic t1 with
>>> partitions 0,1 and 2; topic t2 with all its partitions; and topic t3,
>> with
>>> only partition 2.
>>>
>>> Jorge.
>>>
>>> El mar., 21 feb. 2017 a las 11:11, Jorge Esteban Quilcate Otoya (<
>>> quilcate.jo...@gmail.com>) escribió:
>>>
 Thanks for the feedback Matthias.

 * 1. You're right. I'll reorder the scenarios.

 * 2. Agree. I'll update the KIP.

 * 3. I like it, updating to `reset-offsets`

 * 4. Agree, removing the `reset-` part

 * 5. Yes, 1.e option without --execute or --export will print out
>> current
 offset, and the new offset, that will be the same. The use-case of
 this
 option is to use it in combination with --export mostly and have a
>> current
 'checkpoint' to reset later. I will add to the KIP how the output
 should
 looks like.

 * 6. Considering 4., I will update it to `--to-offset`

 * 7. I like the idea to unify these options (plus, minus).
 `shift-offsets-by` is a good option, but I will like some more
 feedback
 here about the name. I will update the KIP in the meantime.

 * 8. Yes, discussed in 9.

 * 9. Agree. I'll love some feedback here. `topic` is already used by
 `delete`, and we can add `--all-topics` to consider all
>> topics/partitions
 assigned to a group. How could we define specific 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Rajini Sivaram
Thanks, Jay.

*(1) *The rename from *request.time*.percent to* io.thread*.units for the
quota configuration was based on the change from percent to thread-units,
since we will need different quota configuration for I/O threads and
network threads if we use units. If we agree that *(2)* percent (or ratio)
is a better configuration, then the name can be request,time.percent, with
the same config applying to both request thread utilization and network
thread utilization. Metrics and sensors on the brokers-side will probably
need to be separate for I/O and network threads so that these can be
accounted separately (5% request.time.percent would mean maximum 5% of
request thread utilization and maximum 5% of network thread utilization
with either violation leading to throttling).

*(3)* Agree - KIP reflects combined throttling time in a single field in
the response.


On Fri, Feb 24, 2017 at 4:10 PM, Jay Kreps  wrote:

> A couple of quick points:
>
> 1. Even though the implementation of this quota is only using io thread
> time, i think we should call it something like "request-time". This will
> give us flexibility to improve the implementation to cover network threads
> in the future and will avoid exposing internal details like our thread
> pools on the server.
>
> 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
> is super unintuitive as a user-facing knob. I had to read the KIP like
> eight times to understand this. I'm not sure that your point that
> increasing the number of threads is a problem with a percentage-based
> value, it really depends on whether the user thinks about the "percentage
> of request processing time" or "thread units". If they think "I have
> allocated 10% of my request processing time to user x" then it is a bug
> that increasing the thread count decreases that percent as it does in the
> current proposal. As a practical matter I think the only way to actually
> reason about this is as a percent---I just don't believe people are going
> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> think they have to understand this thread unit concept, figure out what
> they have set in number of threads, compute a percent and then come up with
> the number of thread units, and these will all be wrong if that thread
> count changes. I also think this ties us to throttling the I/O thread pool,
> which may not be where we want to end up.
>
> 3. For what it's worth I do think having a single throttle_ms field in all
> the responses that combines all throttling from all quotas is probably the
> simplest. There could be a use case for having separate fields for each,
> but I think that is actually harder to use/monitor in the common case so
> unless someone has a use case I think just one should be fine.
>
> -Jay
>
> On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far.
> >
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > wrote:
> >
> > > Thank you all for the feedback.
> > >
> > > Ismael #1. It makes sense not to throttle inter-broker requests like
> > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> > these
> > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> prevent
> > > clients from using these requests and unauthorized requests are
> included
> > > towards quotas.
> > >
> > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > separate
> > > throttle time, and all utilization based quotas could use the same
> field
> > > (we won't add another one for network thread utilization for instance).
> > But
> > > perhaps it makes sense to keep byte rate quotas separate in
> produce/fetch
> > > responses to provide separate metrics? Agree with Ismael that the name
> of
> > > the existing field should be changed if we have two. Happy to switch
> to a
> > > single combined throttle time if that is sufficient.
> > >
> > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > > property. Replication quotas use dot separated, so it will be
> consistent
> > > with all properties except byte rate quotas.
> > >
> > > Radai: #1 Request processing time rather than request rate were chosen
> > > because the time per request can vary significantly between requests as
> > > mentioned in the discussion and KIP.
> > > #2 Two separate quotas for heartbeats/regular requests feel like more
> > > configuration and more metrics. Since most users would set quotas
> higher
> > > than the expected usage and quotas are more of a safety net, a single
> > quota
> > > should work in most cases.
> > >  #3 The number of requests in purgatory is limited by the number of
> > active
> > > connections since only one request per connection will be throttled at
> a
> > > time.
> > > #4 As with byte rate quotas, to use the 

[GitHub] kafka pull request #2595: MINOR: Fix typos in javadoc and code comments

2017-02-24 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/2595

MINOR: Fix typos in javadoc and code comments



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka minor/fix_typos_1702

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2595


commit 8039a9e189a38576bf60f6fad14c78fbd18ed0a8
Author: Vahid Hashemian 
Date:   2017-02-24T16:55:02Z

MINOR: Fix typos in javadoc and code comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Michael Pearce
Hi

On 1,  How can you guarantee two separate implemented clients would add the 
headers in the same order we are not specifying an order at the protocol level  
(nor should we) with regards to keyA being ordered before keyB? We shouldn’t be 
expecting keyA to be always set before keyB.

On 2, I believe we have changed the naming based on feedback from Jason 
already, e.g. we don’t have “get” method that inferred O(1) performance, like 
wise nor “put” but we have an “append”

On 3, in the KafkaProducer, I think we have mutability already, the value for 
time is changed if it is null, at the point of send:
“
long timestamp = record.timestamp() == null ? time.milliseconds() : 
record.timestamp();
“

As such the timestamp is already mutable, so what’s the difference here, we 
already have some mixed semantics. On timestamp.
e.g. currently if I send to records with timestamp not set, the wire binary 
sent the value for the timestamp would be different, as such we have mutation 
for the same record.

On 4, I think we should not expect not 1 or 2 headers, but infact 10’s of 
headers. This is the concern on immutable headers, whilst the append 
self-reference works nicely, what if someone needs to remove a header?

Trying to get this moving:

If we really wanted Immutable Headers and essentially you guys wont give +1 for 
it without.

Whats the feeling for adding methods to ProducerRecord that does the boiler 
plate code or creating a new ProducerRecord with the altered new headers 
(appended or removed) inside. E.g.

ProducerRecord {


 ProducerRecord append(Iterable headersToAppend){
return new ProducerRecord(key, value, headers.append(headersToAppend), ….)
 }

 ProducerRecord remove(Iterable headersToAppend){
return new ProducerRecord(key, value, headers.remove(headersToAppend), ….)
 }

}

Were the headers methods actually returns new objects, and the producer records 
methods create a new producer record with all the current values, but with the 
new modified headers.

Then interceptors / code return this new object?


Cheers
Mike






On 24/02/2017, 16:02, "isma...@gmail.com on behalf of Ismael Juma" 
 wrote:

Hi Michael,

Did you mean that you were happy to compromise to keep it mutable or
immutable? You wrote the former, but it sounded from the sentence that it
could have been a typo. So, my thoughts on this is that there are a few
things to take into account:

1. Semantics
2. Simplicity of use (the common operations should be easy to do)
3. If it's easy to reason about and safe (immutability helps with this)
4. Efficiency (both memory and CPU usage)

Regarding 1, I think it would be good to be very clear about the guarantees
that we are providing. It seems that we are saying that keys are unordered,
but what about the case where there are multiple values for the same key?
It seems that for some use cases (e.g. lineage), it may be useful to add
values to the same key while preserving the order.

Regarding 2, I agree that it's useful to have methods in `Headers` for the
very common use cases although we have to be careful with the naming to
avoid giving the wrong impression. Also, when it comes to `Map.Entry`, I
think I'd prefer a `toMap` method that simply gives the user a Map if
that's what they want (it makes it clear that there's a conversion
happening).

Regarding 3, the concern I have if we make the headers mutable is that it
seems to introduce some inconsistencies and potential edge cases. At the
moment, it's safe to keep a ProducerRecord and resend it, for example. If
the record is mutated by an interceptor, then this can lead to weird
behaviour. Also, it seems inconsistent that one has to create a new
ProducerRecord to modify the record timestamp, but that one has to mutate
the record to add headers. It seems like one should either embrace
immutability or mutability, but mixing both is not ideal.

Regarding 4, for the cases where there are a small number of headers and/or
1 or 2 interceptors, it doesn't seem difficult to come up with reasonable
implementations for mutable and immutable cases that do a good enough job.
However, if the number of headers and interceptors is large, then care is
needed for the immutable case to avoid unnecessary copying. A simple option
that adds little memory overhead if we have Header instances is to simply
add a self-reference to the previous Header in the linked list.

Ismael

On Thu, Feb 23, 2017 at 1:09 AM, Michael Pearce 
wrote:

> Im happy to compromise to keep it mutable but move to an append style api.
> (as in guava interables concat)
>
> class Headers {
>Headers append(Iterable headers);
> }
>
>
> I don’t think we’d want prepend, this would give the 

[jira] [Created] (KAFKA-4801) Transient test failure (part 2): ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-02-24 Thread Armin Braun (JIRA)
Armin Braun created KAFKA-4801:
--

 Summary: Transient test failure (part 2): 
ConsumerBounceTest.testConsumptionWithBrokerFailures
 Key: KAFKA-4801
 URL: https://issues.apache.org/jira/browse/KAFKA-4801
 Project: Kafka
  Issue Type: Sub-task
Reporter: Armin Braun
Priority: Minor


There is still some (but very little ... when reproducing this you need more 
than 100 runs in half the cases statistically) instability left in the test

{code}
ConsumerBounceTest.testConsumptionWithBrokerFailures
{code}

Resulting in this exception being thrown at a relatively low rate (I'd say def 
less than 0.5% of all runs on my machine).

{code}
kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED
java.lang.IllegalArgumentException: You can only check the position for 
partitions assigned to this consumer.
at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271)
at 
kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96)
at 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69)
{code}

this was also reported in a comment to the original KAFKA-4198

https://issues.apache.org/jira/browse/KAFKA-4198?focusedCommentId=15765468=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15765468






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Jay Kreps
A couple of quick points:

1. Even though the implementation of this quota is only using io thread
time, i think we should call it something like "request-time". This will
give us flexibility to improve the implementation to cover network threads
in the future and will avoid exposing internal details like our thread
pools on the server.

2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
is super unintuitive as a user-facing knob. I had to read the KIP like
eight times to understand this. I'm not sure that your point that
increasing the number of threads is a problem with a percentage-based
value, it really depends on whether the user thinks about the "percentage
of request processing time" or "thread units". If they think "I have
allocated 10% of my request processing time to user x" then it is a bug
that increasing the thread count decreases that percent as it does in the
current proposal. As a practical matter I think the only way to actually
reason about this is as a percent---I just don't believe people are going
to think, "ah, 4.3 thread units, that is the right amount!". Instead I
think they have to understand this thread unit concept, figure out what
they have set in number of threads, compute a percent and then come up with
the number of thread units, and these will all be wrong if that thread
count changes. I also think this ties us to throttling the I/O thread pool,
which may not be where we want to end up.

3. For what it's worth I do think having a single throttle_ms field in all
the responses that combines all throttling from all quotas is probably the
simplest. There could be a use case for having separate fields for each,
but I think that is actually harder to use/monitor in the common case so
unless someone has a use case I think just one should be fine.

-Jay

On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
wrote:

> I have updated the KIP based on the discussions so far.
>
>
> Regards,
>
> Rajini
>
> On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram 
> wrote:
>
> > Thank you all for the feedback.
> >
> > Ismael #1. It makes sense not to throttle inter-broker requests like
> > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> these
> > requests to bypass quotas for DoS attacks is to ensure that ACLs prevent
> > clients from using these requests and unauthorized requests are included
> > towards quotas.
> >
> > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> separate
> > throttle time, and all utilization based quotas could use the same field
> > (we won't add another one for network thread utilization for instance).
> But
> > perhaps it makes sense to keep byte rate quotas separate in produce/fetch
> > responses to provide separate metrics? Agree with Ismael that the name of
> > the existing field should be changed if we have two. Happy to switch to a
> > single combined throttle time if that is sufficient.
> >
> > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > property. Replication quotas use dot separated, so it will be consistent
> > with all properties except byte rate quotas.
> >
> > Radai: #1 Request processing time rather than request rate were chosen
> > because the time per request can vary significantly between requests as
> > mentioned in the discussion and KIP.
> > #2 Two separate quotas for heartbeats/regular requests feel like more
> > configuration and more metrics. Since most users would set quotas higher
> > than the expected usage and quotas are more of a safety net, a single
> quota
> > should work in most cases.
> >  #3 The number of requests in purgatory is limited by the number of
> active
> > connections since only one request per connection will be throttled at a
> > time.
> > #4 As with byte rate quotas, to use the full allocated quotas,
> > clients/users would need to use partitions that are distributed across
> the
> > cluster. The alternative of using cluster-wide quotas instead of
> per-broker
> > quotas would be far too complex to implement.
> >
> > Dong : We currently have two ClientQuotaManagers for quota types Fetch
> and
> > Produce. A new one will be added for IOThread, which manages quotas for
> I/O
> > thread utilization. This will not update the Fetch or Produce queue-size,
> > but will have a separate metric for the queue-size.  I wasn't planning to
> > add any additional metrics apart from the equivalent ones for existing
> > quotas as part of this KIP. Ratio of byte-rate to I/O thread utilization
> > could be slightly misleading since it depends on the sequence of
> requests.
> > But we can look into more metrics after the KIP is implemented if
> required.
> >
> > I think we need to limit the maximum delay since all requests are
> > throttled. If a client has a quota of 0.001 units and a single request
> used
> > 50ms, we don't want to delay all requests from the client by 50 seconds,
> > throwing the 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-24 Thread Ismael Juma
Hi Michael,

Did you mean that you were happy to compromise to keep it mutable or
immutable? You wrote the former, but it sounded from the sentence that it
could have been a typo. So, my thoughts on this is that there are a few
things to take into account:

1. Semantics
2. Simplicity of use (the common operations should be easy to do)
3. If it's easy to reason about and safe (immutability helps with this)
4. Efficiency (both memory and CPU usage)

Regarding 1, I think it would be good to be very clear about the guarantees
that we are providing. It seems that we are saying that keys are unordered,
but what about the case where there are multiple values for the same key?
It seems that for some use cases (e.g. lineage), it may be useful to add
values to the same key while preserving the order.

Regarding 2, I agree that it's useful to have methods in `Headers` for the
very common use cases although we have to be careful with the naming to
avoid giving the wrong impression. Also, when it comes to `Map.Entry`, I
think I'd prefer a `toMap` method that simply gives the user a Map if
that's what they want (it makes it clear that there's a conversion
happening).

Regarding 3, the concern I have if we make the headers mutable is that it
seems to introduce some inconsistencies and potential edge cases. At the
moment, it's safe to keep a ProducerRecord and resend it, for example. If
the record is mutated by an interceptor, then this can lead to weird
behaviour. Also, it seems inconsistent that one has to create a new
ProducerRecord to modify the record timestamp, but that one has to mutate
the record to add headers. It seems like one should either embrace
immutability or mutability, but mixing both is not ideal.

Regarding 4, for the cases where there are a small number of headers and/or
1 or 2 interceptors, it doesn't seem difficult to come up with reasonable
implementations for mutable and immutable cases that do a good enough job.
However, if the number of headers and interceptors is large, then care is
needed for the immutable case to avoid unnecessary copying. A simple option
that adds little memory overhead if we have Header instances is to simply
add a self-reference to the previous Header in the linked list.

Ismael

On Thu, Feb 23, 2017 at 1:09 AM, Michael Pearce 
wrote:

> Im happy to compromise to keep it mutable but move to an append style api.
> (as in guava interables concat)
>
> class Headers {
>Headers append(Iterable headers);
> }
>
>
> I don’t think we’d want prepend, this would give the idea of guaranteed
> ordering, when in actual fact we don’t provide that guarantee (.e.g one
> client can put headerA, then headerB, but another could put headerB then
> headerA, this shouldn’t cause issues), Also what if we changed to a hashmap
> for the internal implementation, its just a bucket of entries no ordering.
> I think we just need to provide an api to add/append headers.
>
> This ok? If so ill update KIP to record this.
>
> Cheers
> Mike
>
> On 23/02/2017, 00:37, "Jason Gustafson"  wrote:
>
> The point about usability is fair. It's also reasonable to expect that
> common use cases such as appending headers should be done efficiently.
>
> Perhaps we could compromise with something like this?
>
> class Headers {
>  Headers append(Iterable headers);
>  Headers prepend(Iterable headers);
> }
>
> That retains ease of use while still giving ourselves some flexibility
> in
> the implementation.
>
> -Jason
>
>
> On Wed, Feb 22, 2017 at 3:03 PM, Michael Pearce  >
> wrote:
>
> > I wasn’t referring to the headers needing to be copied, im meaning
> the
> > fact we’d be forcing a new producer record to be created, with all
> the
> > contents copied.
> >
> > i.e what will happen is utility method will be created or end up
> being
> > used, which does this, and returns the new ProducerRecord instance.
> >
> > ProducerRecord  addHeader(ProducerRecord record, Header header){
> > Return New ProducerRecord(record.key, record.value,
> record.timestamp…..,
> > record.headers.concat(header))
> > }
> >
> > To me this seems ugly, but will be inevitable if we don’t make adding
> > headers to existing records a simple clean method call.
> >
> >
> >
> > On 22/02/2017, 22:57, "Michael Pearce" 
> wrote:
> >
> > Lazy init can achieve/avoid that.
> >
> > Re the concat, why don’t we implement that inside the Headers
> rather
> > than causing everyone to implement this as adding headers in
> interceptors
> > will be a dominant use case. We want a user friendly API. Having as
> a user
> > having to code this instead of having the headers handle this for me
> seems
> > redundant.
> >
> > On 22/02/2017, 22:34, "Jason Gustafson" 
> wrote:
> >

[GitHub] kafka pull request #2594: MINOR: add code quality checks to checkstyle.xml. ...

2017-02-24 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2594

MINOR: add code quality checks to checkstyle.xml. Also add suppressions

Add code quality/complexity checks to checkstyle

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka checkstyle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2594


commit 16d00bb36eddf53b2bd29cc27ec532d80455fb29
Author: Damian Guy 
Date:   2017-02-24T15:46:43Z

add code quality checks to checkstyle.xml. Also add suppressions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Broaden the utilization of Kafka by being more user friendly

2017-02-24 Thread Werner Daehn
I would have a couple of ideas about Kafka and would love to get your
input. If I am totally off, if it is possible today without me noticing or
if you can embrace/do disregard the idea.

Statement 1: A typical scenario for Kafka can be coupling on-premise system
with a Kafka based cloud application. Today Kafka requires a direct
connection but the on-prem side does allow http only.

Kafka Rest is obviously one solution, but it is slow and has overhead.
Suggestion: In Kafka Connect add the ability to run connectors on-prem.
Each opens one long running http connection with the Kafka server streaming
the data. Or maybe a WebSocket as a bridge.


Statement 2: For above use cases you need to support security and
multi-tenant.

Suggestion: Above http connection should validate that a connector has the
proper certificate and is entitled to write into a Kafka topic.


Statement 3: Connect needs an UI for being configured by a non-Kafka user.

Suggestion: The connectors are hosted by a Tomcat which also acts as the
configuration, administration and monitoring UI for the on-prem side.


I have a few more thoughts about metadata (schema registry), a mapping
(Kafka Connect Transform) UI,... but above would be the most pressing
points for me.

Thanks in advance


[GitHub] kafka pull request #2592: MINOR: fixed javadoc typo in KafkaProducer::partit...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2592


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Ismael Juma
Hi Christopher,

Thanks for the KIP. I have two comments:

1. Can you please explain in the KIP (maybe in the Rejected Alternatives
section) why a PrincipalBuilder is not good enough for your use case? This
is typically what people use to customise authentication for the TLS case.

2. You mention that you have a use case where you need dual authentication
(username/password and certificate based). Could you not achieve this via
two separate listeners? username/password could be via a listener
configured to use SASL and certificate-based could be via a listener
configured to use TLS.

Ismael

On Tue, Feb 21, 2017 at 8:23 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Thanks for the feedback Harsha.
>
> Can you clarify what you mean for the use cases for SASL_SSL and X509?  My
> proposal is to only have X509 based pluggable authentication for the SSL
> channel and not SASL_SSL.  I suppose you could use X509 credentials with
> SASL_SSL but the authentication mode would probably need to be SASL
> EXTERNAL as the authentication is done by the SSL channel where as with
> Kerberos or PLAINTEXT the user is providing credentials.  That's why I
> proposed adding it to the SSL channel instead of SASL_SSL.
>
> That being said I guess one option would be to just allow the use of a X509
> callback handler and don't disable client auth when using SASL_SSL.  Then
> after login have some way to signal it's EXTERNAL mode so it doesn't do any
> other authentication steps.
>
> I have a use case where I need dual authentication (both username/password
> and certificate based) and ether one would work as multiple LoginModules
> can be chained together.
>
> Chris
>
> On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani 
> wrote:
>
> > Hi Chris,
> >   Thanks for the KIP. Could you also add details/use-cases for
> > having X509 certificate based authentication in the context SASL_SSL.
> > The reason that we disabled the SSL auth for SASL_SSL is the intent
> behind
> > using SASL auth over SSL encryption and user  can enforce a
> > role based auth and have wire encryption for data transfer. If users just
> > want SSL based authentication they have option to do so via SSL.
> > I think we are providing too many options of authentication in SASL_SSL
> > mode and can be bit confusing.
> >
> > Thanks,
> > Harsha
> >
> >
> > On Tue, Feb 21, 2017 at 11:23 AM Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> > Hi everyone
> >
> > I have just created KIP-127 to introduce custom JAAS configuration for
> the
> > SSL channel:
> >
> > *
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 127%3A+Pluggable+JAAS+LoginModule+configuration+for+SSL
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 127%3A+Pluggable+JAAS+LoginModule+configuration+for+SSL
> > >*
> >
> > The idea here is to be able to do custom authentication based off of a
> > user's X509 credentials in addition to the SSL handshake.
> >
> > I have created a rough draft of a commit to give an idea of what my plan
> is
> > which matches the KIP:
> > https://github.com/cshannon/kafka/tree/KAFKA-4784
> >
> > It still needs some work (needs more tests for example) but I wanted to
> get
> > some feedback before I went any farther on this and do a pull request.
> >
> > Thanks,
> > Chris
> >
>


add contributor

2017-02-24 Thread Eno Thereska
Hello,

Sending on behalf on user who wants to contribute. Could we add Clement 
Valiente to contributor list please? Jira user id is "cvaliente".

Thanks
Eno

[jira] [Created] (KAFKA-4800) Streams State transition ASCII diagrams need fixing and polishing

2017-02-24 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4800:
---

 Summary: Streams State transition ASCII diagrams need fixing and 
polishing
 Key: KAFKA-4800
 URL: https://issues.apache.org/jira/browse/KAFKA-4800
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
Priority: Minor
 Fix For: 0.10.3.0


The ASCII transition diagram in KafkaStreams.java on top of "public enum State" 
does not read well in Javadoc. Also the self-loops to running and rebalancing 
are not necessary. Same with the StreamThread.java diagram.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2578: MINOR: ConfigKey variable renamed

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2578


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Rajini Sivaram
Christopher,

1. We currently disable ssl.client.auth for SASL_SSL. This was done at a
time when a broker that had two listeners SSL and SASL_SSL using
ssl.client.auth=required for SSL and ssl.client.auth=none for SASL_SSL had
no way of specifying two different values for the property. With the
changes under KIP-103, this is no longer a limitation. So we can stop
overriding ssl.client.auth for SASL_SSL.

2. Now you want to do custom authentication for SSL with access to the
client certificates. I think you should consider doing this as a SASL
mechanism - EXTERNAL makes sense as you suggested. Perhaps you would need
changes to login code, but it may be possible to work with just custom
callbacks. KIP-86 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-86%3A+Configurable+SASL+callback+handlers)
might help, though this has not yet gone through the approval process. It
may need some changes for your specific use case.


On Fri, Feb 24, 2017 at 11:32 AM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Does anyone else have any comments or suggestions on this?
>
> On Tue, Feb 21, 2017 at 4:05 PM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
> > I should mention another reason I went with adding this enhancement to
> the
> > SSL channel instead of SASL_SSL is that as you can see from my sample
> > commit, I had to delay the JAAS LoginManager from getting loaded until
> the
> > authenticate() call in SslServerAuthenticator in order to make sure that
> > the SSL handshake was done because loading the LoginManager does the
> actual
> > login() call and requires the X509 callback handler.
> >
> > The SASL_SSL implementation loads the LoginManager during the configure
> in
> > SaslChannelBuilder which is too early as the X509 credentials won't be
> > available yet without the handshake being completed so this would require
> > some refactoring to get to this to work properly and load at the right
> time.
> >
> > On Tue, Feb 21, 2017 at 3:44 PM, Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> >> As a follow up to my previous post, EXTERNAL could be added to the list
> >> of mechanisms supported with the existing property:
> sasl.enabled.mechanisms
> >> so I think this could also be achieved with SASL_SSL.  If EXTERNAL is
> used
> >> then it would not disable the client certificate from being required.
> >>
> >> So I can go either way on this, I can update my KIP to allow X509
> >> authentication with SASL_SSL through the EXTERNAL mechanism or keep the
> >> proposal as is for the SSL channel based on what everyone thinks.
> >>
> >> On Tue, Feb 21, 2017 at 3:23 PM, Christopher Shannon <
> >> christopher.l.shan...@gmail.com> wrote:
> >>
> >>> Thanks for the feedback Harsha.
> >>>
> >>> Can you clarify what you mean for the use cases for SASL_SSL and X509?
> >>> My proposal is to only have X509 based pluggable authentication for
> the SSL
> >>> channel and not SASL_SSL.  I suppose you could use X509 credentials
> with
> >>> SASL_SSL but the authentication mode would probably need to be SASL
> >>> EXTERNAL as the authentication is done by the SSL channel where as with
> >>> Kerberos or PLAINTEXT the user is providing credentials.  That's why I
> >>> proposed adding it to the SSL channel instead of SASL_SSL.
> >>>
> >>> That being said I guess one option would be to just allow the use of a
> >>> X509 callback handler and don't disable client auth when using
> SASL_SSL.
> >>> Then after login have some way to signal it's EXTERNAL mode so it
> doesn't
> >>> do any other authentication steps.
> >>>
> >>> I have a use case where I need dual authentication (both
> >>> username/password and certificate based) and ether one would work as
> >>> multiple LoginModules can be chained together.
> >>>
> >>> Chris
> >>>
> >>> On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani 
> >>> wrote:
> >>>
>  Hi Chris,
>    Thanks for the KIP. Could you also add details/use-cases for
>  having X509 certificate based authentication in the context SASL_SSL.
>  The reason that we disabled the SSL auth for SASL_SSL is the intent
>  behind
>  using SASL auth over SSL encryption and user  can enforce a
>  role based auth and have wire encryption for data transfer. If users
>  just
>  want SSL based authentication they have option to do so via SSL.
>  I think we are providing too many options of authentication in
> SASL_SSL
>  mode and can be bit confusing.
> 
>  Thanks,
>  Harsha
> 
> 
>  On Tue, Feb 21, 2017 at 11:23 AM Christopher Shannon <
>  christopher.l.shan...@gmail.com> wrote:
> 
>  Hi everyone
> 
>  I have just created KIP-127 to introduce custom JAAS configuration for
>  the
>  SSL channel:
> 
>  *
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-127%3A
>  +Pluggable+JAAS+LoginModule+configuration+for+SSL
>  <
>  

[jira] [Resolved] (KAFKA-4198) Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-02-24 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4198.

   Resolution: Fixed
Fix Version/s: 0.10.2.1
   0.10.3.0

> Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures
> 
>
> Key: KAFKA-4198
> URL: https://issues.apache.org/jira/browse/KAFKA-4198
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>Assignee: Armin Braun
>  Labels: transient-unit-test-failure
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> The issue seems to be that we call `startup` while `shutdown` is still taking 
> place.
> {code}
> java.lang.AssertionError: expected:<107> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:91)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:90)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:90)
>   at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> {code}
> {code}
> java.lang.IllegalStateException: Kafka server is still shutting down, cannot 
> re-start!
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at 
> kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
>   at 
> kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
>   at 
> kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk8 #1298

2017-02-24 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-4198; Fix race condition in KafkaServer.shutdown()

--
[...truncated 305.90 KB...]
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.doStartOperation(DefaultBuildOperationWorkerRegistry.java:65)
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.access$400(DefaultBuildOperationWorkerRegistry.java:30)
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry$DefaultOperation.operationStart(DefaultBuildOperationWorkerRegistry.java:163)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.processTestClass(ForkingTestClassProcessor.java:68)
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:47)
at sun.reflect.GeneratedMethodAccessor240.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Cannot nest operations in the same thread. Each nested operation must run in 
its own thread.
java.lang.UnsupportedOperationException: Cannot nest operations in the same 
thread. Each nested operation must run in its own thread.
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.doStartOperation(DefaultBuildOperationWorkerRegistry.java:65)
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.access$400(DefaultBuildOperationWorkerRegistry.java:30)
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry$DefaultOperation.operationStart(DefaultBuildOperationWorkerRegistry.java:163)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.processTestClass(ForkingTestClassProcessor.java:68)
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:47)
at sun.reflect.GeneratedMethodAccessor240.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Cannot nest operations in the same thread. Each nested operation must run in 
its own thread.
java.lang.UnsupportedOperationException: Cannot nest operations in the same 
thread. Each nested operation must run in its own thread.
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.doStartOperation(DefaultBuildOperationWorkerRegistry.java:65)
at 
org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry.access$400(DefaultBuildOperationWorkerRegistry.java:30)
at 

[jira] [Assigned] (KAFKA-4198) Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-02-24 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-4198:
--

Assignee: Armin Braun

> Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures
> 
>
> Key: KAFKA-4198
> URL: https://issues.apache.org/jira/browse/KAFKA-4198
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>Assignee: Armin Braun
>  Labels: transient-unit-test-failure
>
> The issue seems to be that we call `startup` while `shutdown` is still taking 
> place.
> {code}
> java.lang.AssertionError: expected:<107> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:91)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:90)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:90)
>   at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> {code}
> {code}
> java.lang.IllegalStateException: Kafka server is still shutting down, cannot 
> re-start!
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at 
> kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
>   at 
> kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
>   at 
> kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Rajini Sivaram
I have updated the KIP based on the discussions so far.


Regards,

Rajini

On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram 
wrote:

> Thank you all for the feedback.
>
> Ismael #1. It makes sense not to throttle inter-broker requests like
> LeaderAndIsr etc. The simplest way to ensure that clients cannot use these
> requests to bypass quotas for DoS attacks is to ensure that ACLs prevent
> clients from using these requests and unauthorized requests are included
> towards quotas.
>
> Ismael #2, Jay #1 : I was thinking that these quotas can return a separate
> throttle time, and all utilization based quotas could use the same field
> (we won't add another one for network thread utilization for instance). But
> perhaps it makes sense to keep byte rate quotas separate in produce/fetch
> responses to provide separate metrics? Agree with Ismael that the name of
> the existing field should be changed if we have two. Happy to switch to a
> single combined throttle time if that is sufficient.
>
> Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> property. Replication quotas use dot separated, so it will be consistent
> with all properties except byte rate quotas.
>
> Radai: #1 Request processing time rather than request rate were chosen
> because the time per request can vary significantly between requests as
> mentioned in the discussion and KIP.
> #2 Two separate quotas for heartbeats/regular requests feel like more
> configuration and more metrics. Since most users would set quotas higher
> than the expected usage and quotas are more of a safety net, a single quota
> should work in most cases.
>  #3 The number of requests in purgatory is limited by the number of active
> connections since only one request per connection will be throttled at a
> time.
> #4 As with byte rate quotas, to use the full allocated quotas,
> clients/users would need to use partitions that are distributed across the
> cluster. The alternative of using cluster-wide quotas instead of per-broker
> quotas would be far too complex to implement.
>
> Dong : We currently have two ClientQuotaManagers for quota types Fetch and
> Produce. A new one will be added for IOThread, which manages quotas for I/O
> thread utilization. This will not update the Fetch or Produce queue-size,
> but will have a separate metric for the queue-size.  I wasn't planning to
> add any additional metrics apart from the equivalent ones for existing
> quotas as part of this KIP. Ratio of byte-rate to I/O thread utilization
> could be slightly misleading since it depends on the sequence of requests.
> But we can look into more metrics after the KIP is implemented if required.
>
> I think we need to limit the maximum delay since all requests are
> throttled. If a client has a quota of 0.001 units and a single request used
> 50ms, we don't want to delay all requests from the client by 50 seconds,
> throwing the client out of all its consumer groups. The issue is only if a
> user is allocated a quota that is insufficient to process one large
> request. The expectation is that the units allocated per user will be much
> higher than the time taken to process one request and the limit should
> seldom be applied. Agree this needs proper documentation.
>
> Regards,
>
> Rajini
>
>
> On Thu, Feb 23, 2017 at 8:04 PM, radai  wrote:
>
>> @jun: i wasnt concerned about tying up a request processing thread, but
>> IIUC the code does still read the entire request out, which might add-up
>> to
>> a non-negligible amount of memory.
>>
>> On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin  wrote:
>>
>> > Hey Rajini,
>> >
>> > The current KIP says that the maximum delay will be reduced to window
>> size
>> > if it is larger than the window size. I have a concern with this:
>> >
>> > 1) This essentially means that the user is allowed to exceed their quota
>> > over a long period of time. Can you provide an upper bound on this
>> > deviation?
>> >
>> > 2) What is the motivation for cap the maximum delay by the window size?
>> I
>> > am wondering if there is better alternative to address the problem.
>> >
>> > 3) It means that the existing metric-related config will have a more
>> > directly impact on the mechanism of this io-thread-unit-based quota. The
>> > may be an important change depending on the answer to 1) above. We
>> probably
>> > need to document this more explicitly.
>> >
>> > Dong
>> >
>> >
>> > On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin  wrote:
>> >
>> > > Hey Jun,
>> > >
>> > > Yeah you are right. I thought it wasn't because at LinkedIn it will be
>> > too
>> > > much pressure on inGraph to expose those per-clientId metrics so we
>> ended
>> > > up printing them periodically to local log. Never mind if it is not a
>> > > general problem.
>> > >
>> > > Hey Rajini,
>> > >
>> > > - I agree with Jay that we probably don't want to add a new field for
>> > > every quota 

[jira] [Commented] (KAFKA-4198) Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882534#comment-15882534
 ] 

ASF GitHub Bot commented on KAFKA-4198:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2568


> Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures
> 
>
> Key: KAFKA-4198
> URL: https://issues.apache.org/jira/browse/KAFKA-4198
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>  Labels: transient-unit-test-failure
>
> The issue seems to be that we call `startup` while `shutdown` is still taking 
> place.
> {code}
> java.lang.AssertionError: expected:<107> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:91)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:90)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:90)
>   at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> {code}
> {code}
> java.lang.IllegalStateException: Kafka server is still shutting down, cannot 
> re-start!
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at 
> kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
>   at 
> kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
>   at 
> kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2568: Kafka 4198: Fix Race Condition in KafkaServer Shut...

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2568


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4799) session timeout during event processing shuts down stream

2017-02-24 Thread Jacob Gur (JIRA)
Jacob Gur created KAFKA-4799:


 Summary: session timeout during event processing shuts down stream
 Key: KAFKA-4799
 URL: https://issues.apache.org/jira/browse/KAFKA-4799
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.1.1
 Environment: kafka streams client running on os x, with docker machine 
running broker
Reporter: Jacob Gur
Priority: Critical


I have a simple stream application like this:

{code:title=Part of my class|borderStyle=solid}
private  IConsumerSubscription buildSubscriptionStream(
Class clazz, Consumer consumer, String group,
Function> 
topicStreamFunc)
{
KStreamBuilder builder = new KStreamBuilder();

KStream stream = topicStreamFunc.apply(builder);
stream.foreach((k, v) -> {
try {
T value = 
_jsonObjectMapper.mapFromJsonString(v, clazz);
consumer.accept(value);

Logger.trace("Consumed message {}", value);

} catch (Throwable th) {
Logger.warn("Error while consuming message", 
th);
}
});

final KafkaStreams streams = new KafkaStreams(builder, 
constructProperties(group));
streams.start();

return streams::close;
}
{code}

There is just one client running this application stream.

If I run the client in a debugger with a breakpoint on the event processor 
(i.e., inside the foreach lambda) with debugger suspending all threads for 
perhaps more than 10 seconds, then when I resume the application:
Actual behavior - the stream shuts down
Expected behavior - the stream should recover, perhaps temporarily removed from 
partition but then re-added and recovered.

It looks like what happens is this:
1) The kafka client session times out.
2) The partition is revoked
3) The streams library has a rebalance listener that tries to commit offsets, 
but that commit fails due to a rebalance exception.
4) Stream shuts down.

Steps 3 and 4 occur in StreamThread's rebalance listener.

It seems that it should be more resilient and recover just like a regular 
KafkaConsumer would. Its partition would be revoked, and then it would get it 
back again and resume processing at the last offset.

Is current behavior expected and I'm not understanding the intention? Or is 
this a bug?

Thanks!





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-127: Pluggable JAAS LoginModule configuration for SSL

2017-02-24 Thread Christopher Shannon
Does anyone else have any comments or suggestions on this?

On Tue, Feb 21, 2017 at 4:05 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> I should mention another reason I went with adding this enhancement to the
> SSL channel instead of SASL_SSL is that as you can see from my sample
> commit, I had to delay the JAAS LoginManager from getting loaded until the
> authenticate() call in SslServerAuthenticator in order to make sure that
> the SSL handshake was done because loading the LoginManager does the actual
> login() call and requires the X509 callback handler.
>
> The SASL_SSL implementation loads the LoginManager during the configure in
> SaslChannelBuilder which is too early as the X509 credentials won't be
> available yet without the handshake being completed so this would require
> some refactoring to get to this to work properly and load at the right time.
>
> On Tue, Feb 21, 2017 at 3:44 PM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
>> As a follow up to my previous post, EXTERNAL could be added to the list
>> of mechanisms supported with the existing property: sasl.enabled.mechanisms
>> so I think this could also be achieved with SASL_SSL.  If EXTERNAL is used
>> then it would not disable the client certificate from being required.
>>
>> So I can go either way on this, I can update my KIP to allow X509
>> authentication with SASL_SSL through the EXTERNAL mechanism or keep the
>> proposal as is for the SSL channel based on what everyone thinks.
>>
>> On Tue, Feb 21, 2017 at 3:23 PM, Christopher Shannon <
>> christopher.l.shan...@gmail.com> wrote:
>>
>>> Thanks for the feedback Harsha.
>>>
>>> Can you clarify what you mean for the use cases for SASL_SSL and X509?
>>> My proposal is to only have X509 based pluggable authentication for the SSL
>>> channel and not SASL_SSL.  I suppose you could use X509 credentials with
>>> SASL_SSL but the authentication mode would probably need to be SASL
>>> EXTERNAL as the authentication is done by the SSL channel where as with
>>> Kerberos or PLAINTEXT the user is providing credentials.  That's why I
>>> proposed adding it to the SSL channel instead of SASL_SSL.
>>>
>>> That being said I guess one option would be to just allow the use of a
>>> X509 callback handler and don't disable client auth when using SASL_SSL.
>>> Then after login have some way to signal it's EXTERNAL mode so it doesn't
>>> do any other authentication steps.
>>>
>>> I have a use case where I need dual authentication (both
>>> username/password and certificate based) and ether one would work as
>>> multiple LoginModules can be chained together.
>>>
>>> Chris
>>>
>>> On Tue, Feb 21, 2017 at 3:06 PM, Harsha Chintalapani 
>>> wrote:
>>>
 Hi Chris,
   Thanks for the KIP. Could you also add details/use-cases for
 having X509 certificate based authentication in the context SASL_SSL.
 The reason that we disabled the SSL auth for SASL_SSL is the intent
 behind
 using SASL auth over SSL encryption and user  can enforce a
 role based auth and have wire encryption for data transfer. If users
 just
 want SSL based authentication they have option to do so via SSL.
 I think we are providing too many options of authentication in SASL_SSL
 mode and can be bit confusing.

 Thanks,
 Harsha


 On Tue, Feb 21, 2017 at 11:23 AM Christopher Shannon <
 christopher.l.shan...@gmail.com> wrote:

 Hi everyone

 I have just created KIP-127 to introduce custom JAAS configuration for
 the
 SSL channel:

 *
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-127%3A
 +Pluggable+JAAS+LoginModule+configuration+for+SSL
 <
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-127%3A
 +Pluggable+JAAS+LoginModule+configuration+for+SSL
 >*

 The idea here is to be able to do custom authentication based off of a
 user's X509 credentials in addition to the SSL handshake.

 I have created a rough draft of a commit to give an idea of what my
 plan is
 which matches the KIP:
 https://github.com/cshannon/kafka/tree/KAFKA-4784

 It still needs some work (needs more tests for example) but I wanted to
 get
 some feedback before I went any farther on this and do a pull request.

 Thanks,
 Chris

>>>
>>>
>>
>


KIPs to broaden the utilization of Kafka by making it more user friendly

2017-02-24 Thread Werner Daehn
I would have a couple of ideas about Kafka and would love to get your
input. If I am totally off, if it is possible today without me noticing or
if you can embrace/do disregard the idea - appreciate all your feedback.

Statement 1: A typical scenario for Kafka can be coupling on-premise system
with a Kafka based cloud application. Today Kafka requires a direct
connection but the on-prem side does allow http only.

Kafka Rest is obviously one solution, but it is slow and has overhead.
Suggestion: In Kafka Connect add the ability to run connectors on-prem.
Each opens one long running http connection with the Kafka server streaming
the data. Or maybe a WebSocket as a bridge.


Statement 2: For above use cases you need to support security and
multi-tenant.

Suggestion: Above http connection should validate that a connector has the
proper certificate and is entitled to write into a Kafka topic.


Statement 3: Connect needs an UI for being configured by a non-Kafka user.

Suggestion: The connectors are hosted by a Tomcat which also acts as the
configuration, administration and monitoring UI for the on-prem side.


I have a few more thoughts about metadata (schema registry), a mapping
(Kafka Connect Transform) UI,... but above would be the most pressing
points for me.

Thanks in advance


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-24 Thread Jorge Esteban Quilcate Otoya
Renaming to --by-duration LGTM

Not sure about changing it to --shift-by-duration because we could end up
with the same redundancy as before with reset: --reset-offsets
--reset-to-*.

Maybe changing --shift-offset-by to --shift-by 'n' could make it consistent
enough?


El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax ()
escribió:

> I just read the update KIP once more.
>
> I would suggest to rename --to-duration to --by-duration
>
> Or as a second idea, rename --to-duration to --shift-by-duration and at
> the same time rename --shift-offset-by to --shift-by-offset
>
> Not sure what the best option is, but naming would be more consistent IMHO.
>
>
>
> -Matthias
>
> On 2/23/17 4:42 PM, Jorge Esteban Quilcate Otoya wrote:
> > Hi All,
> >
> > If there are no more concerns, I'd like to start vote for this KIP.
> >
> > Thanks!
> > Jorge.
> >
> > El jue., 23 feb. 2017 a las 22:50, Jorge Esteban Quilcate Otoya (<
> > quilcate.jo...@gmail.com>) escribió:
> >
> >> Oh ok :)
> >>
> >> So, we can keep `--topic t1:1,2,3`
> >>
> >> I think with this one we have most of the feedback applied. I will
> update
> >> the KIP with this change.
> >>
> >> El jue., 23 feb. 2017 a las 22:38, Matthias J. Sax (<
> matth...@confluent.io>)
> >> escribió:
> >>
> >> Sounds reasonable.
> >>
> >> If we have multiple --topic arguments, it does also not matter if we use
> >> t1:1,2 or t2=1,2
> >>
> >> I just suggested '=' because I wanted use ':' to chain multiple topics.
> >>
> >>
> >> -Matthias
> >>
> >> On 2/23/17 10:49 AM, Jorge Esteban Quilcate Otoya wrote:
> >>> Yeap, `--topic t1=1,2`LGTM
> >>>
> >>> Don't have idea neither about getting rid of repeated --topic, but
> >> --group
> >>> is also repeated in the case of deletion, so it could be ok to have
> >>> repeated --topic arguments.
> >>>
> >>> El jue., 23 feb. 2017 a las 19:14, Matthias J. Sax (<
> >> matth...@confluent.io>)
> >>> escribió:
> >>>
>  So you suggest to merge "scope options" --topics, --topic, and
>  --partitions into a single option? Sound good to me.
> 
>  I like the compact way to express it, ie, topicname:list-of-partitions
>  with "all partitions" if not partitions are specified. It's quite
>  intuitive to use.
> 
>  Just wondering, if we could get rid of the repeated --topic option;
> it's
>  somewhat verbose. Have no good idea though who to improve it.
> 
>  If you concatenate multiple topic, we need one more character that is
>  not allowed in topic names to separate the topics:
> 
> > invalidChars = {'/', '\\', ',', '\u', ':', '"', '\'', ';', '*',
>  '?', ' ', '\t', '\r', '\n', '='};
> 
>  maybe
> 
>  --topics t1=1,2,3:t2:t3=3
> 
>  use '=' to specify partitions (instead of ':' as you proposed) and ':'
>  to separate topics? All other characters seem to be worse to use to
> me.
>  But maybe you have a better idea.
> 
> 
> 
>  -Matthias
> 
> 
>  On 2/23/17 3:15 AM, Jorge Esteban Quilcate Otoya wrote:
> > @Matthias about the point 9:
> >
> > What about keeping only the --topic option, and support this format:
> >
> > `--topic t1:0,1,2 --topic t2 --topic t3:2`
> >
> > In this case topics t1, t2, and t3 will be selected: topic t1 with
> > partitions 0,1 and 2; topic t2 with all its partitions; and topic t3,
>  with
> > only partition 2.
> >
> > Jorge.
> >
> > El mar., 21 feb. 2017 a las 11:11, Jorge Esteban Quilcate Otoya (<
> > quilcate.jo...@gmail.com>) escribió:
> >
> >> Thanks for the feedback Matthias.
> >>
> >> * 1. You're right. I'll reorder the scenarios.
> >>
> >> * 2. Agree. I'll update the KIP.
> >>
> >> * 3. I like it, updating to `reset-offsets`
> >>
> >> * 4. Agree, removing the `reset-` part
> >>
> >> * 5. Yes, 1.e option without --execute or --export will print out
>  current
> >> offset, and the new offset, that will be the same. The use-case of
> >> this
> >> option is to use it in combination with --export mostly and have a
>  current
> >> 'checkpoint' to reset later. I will add to the KIP how the output
> >> should
> >> looks like.
> >>
> >> * 6. Considering 4., I will update it to `--to-offset`
> >>
> >> * 7. I like the idea to unify these options (plus, minus).
> >> `shift-offsets-by` is a good option, but I will like some more
> >> feedback
> >> here about the name. I will update the KIP in the meantime.
> >>
> >> * 8. Yes, discussed in 9.
> >>
> >> * 9. Agree. I'll love some feedback here. `topic` is already used by
> >> `delete`, and we can add `--all-topics` to consider all
>  topics/partitions
> >> assigned to a group. How could we define specific topics/partitions?
> >>
> >> * 10. Haven't thought about it, but make sense.
> >> ,, would be enough.
> >>
> >> * 11. Agree. Solved with 10.
>