[jira] [Created] (KAFKA-5808) The Preferred Replica should be global and dynamic

2017-08-29 Thread Canes Kelly (JIRA)
Canes Kelly created KAFKA-5808:
--

 Summary: The Preferred Replica should be global and dynamic
 Key: KAFKA-5808
 URL: https://issues.apache.org/jira/browse/KAFKA-5808
 Project: Kafka
  Issue Type: New Feature
  Components: core
Affects Versions: 0.11.0.0
Reporter: Canes Kelly
 Fix For: 0.11.0.0


When we create a topic in kafka, broker assigns replicas for partitions in this 
topic, and the First Replica will be the Preferred Replica which means that 
kafka cluster will migrate partition leader to Preferred Replica on the basis 
of ''imbalance rate''.

Consider that with the increasing of the brokers, the partitions Preferred 
Replicas are always the one assigned when created those topic. So the load 
balancing is not scalable with the change of the scale of the brokers.

So I would like to propose to modify the assignment of the Preferred Replica 
when brokers increase with appropriate consideration of performance declining.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : kafka-trunk-jdk8 #1955

2017-08-29 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk8 #1954

2017-08-29 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-5379: ProcessorContext.appConfigs() should return parsed values

--
[...truncated 4.79 MB...]

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled 
STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfValueSerdeConfigFails STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfValueSerdeConfigFails PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerAutoCommitIsOverridden STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerAutoCommitIsOverridden PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowExceptionIfNotAtLestOnceOrExactlyOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowExceptionIfNotAtLestOnceOrExactlyOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingConsumerIsolationLevelIfEosDisabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingConsumerIsolationLevelIfEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfRestoreConsumerConfig STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfRestoreConsumerConfig PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfProducerConfig STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfProducerConfig PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeBackwardsCompatibleWithDeprecatedConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeBackwardsCompatibleWithDeprecatedConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConifgsOnRestoreConsumer STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConifgsOnRestoreConsumer PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptExactlyOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptExactlyOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetInternalLeaveGroupOnCloseConfigToFalseInConsumer STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetInternalLeaveGroupOnCloseConfigToFalseInConsumer PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfConsumerConfig STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfConsumerConfig PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfRestoreConsumerAutoCommitIsOverridden STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfRestoreConsumerAutoCommitIsOverridden PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnError STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnError PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnErrorUsingDeprecatedConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnErrorUsingDeprecatedConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerIsolationLevelIsOverriddenIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerIsolationLevelIsOverriddenIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigCommitIntervalMsIfExactlyOnceEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigCommitIntervalMsIfExactlyOnceEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfKeySerdeConfigFails STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfKeySerdeConfigFails 

Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Jun Rao
Hi, Rajini,

Thanks for the updated KIP. I agree that those additional metrics can be
useful. I was thinking what would an admin do if the value of one of
those metrics is abnormal. An admin probably want to determine which client
causes the abnormaly. So,  a couple of more comments below.

10. About FetchDownConversionsMs. Should we model it at the topic level or
at the request level as a new latency component like messageConversionTime
in addition to the existing localTime, requestQueueTime, etc? The benefit
of the latter is that we can include it in the request log and use it to
figure out which client is doing the conversion. Also, should we also track
the conversion time in the producer?

11. About ProduceBatchSize. Currently, the largest chunk of memory is
allocated at the request level (mostly produce request). So, instead
of ProduceBatchSize
, perhaps we can add a request size metric for each type of request and
also include it in the request log. This way, if there is a memory issue,
we can trace it back to a particular client. Similarly, for
ProduceUncompressedBatchSize,
would it be better to track it at the request level as something like
temporaryMemorySize and include it in the request log?

Thanks,

Jun

On Tue, Aug 29, 2017 at 10:07 AM, Roger Hoover 
wrote:

> Great suggestions, Ismael and thanks for incorporating them, Rajini.
>
> Tracking authentication success and failures (#3) across listeners seems
> very useful for cluster administrators to identify misconfigured client or
> bad actors, especially until all clients implement KIP-152 which will add
> an explicit error code for authentication failures.  Currently, clients
> just get disconnected so it's hard to distinguish authentication failures
> from any other error that can cause disconnect.  This broker-side metric is
> useful regardless but can help fill this gap until all clients support KIP
> 152.
>
> Just to be clear, the ones called `successful-authentication-rate` and
> `failed-authentication-rate` will also have failed-authentication-count
> and successful-authentication-count to match KIP 187?
>
> On Tue, Aug 29, 2017 at 7:26 AM, Rajini Sivaram 
> wrote:
>
> > Hi Ismael,
> >
> > Thank you for the suggestions. The additional metrics sound very useful
> and
> > I have added them to the KIP.
> >
> > Regards,
> >
> > Rajini
> >
> > On Tue, Aug 29, 2017 at 5:34 AM, Ismael Juma  wrote:
> >
> > > Hi Rajini,
> > >
> > > There are a few other metrics that could potentially be useful. I'd be
> > > interested in what you and the community thinks:
> > >
> > > 1. The KIP currently includes `FetchDownConversionsPerSec`, which is
> > > useful. In the common case, one would want to avoid down conversion by
> > > using the lower message format supported by most of the consumers.
> > However,
> > > there are good reasons to use a newer message format even if there are
> > some
> > > legacy consumers around. It would be good to quantify the cost of these
> > > consumers a bit more clearly. Looking at the request metric
> `LocalTimeMs`
> > > provides a hint, but it may be useful to have a dedicated
> > > `FetchDownConversionsMs` metric.
> > >
> > > 2. Large messages can cause GC issues (it's particularly bad if fetch
> > down
> > > conversion takes place). One can currently configure the max message
> > batch
> > > size per topic to keep this under control, but that is the size after
> > > compression. However, we decompress the batch to validate produce
> > requests
> > > and we decompress and recompress during fetch downconversion). It would
> > be
> > > helpful to have topic metrics for the produce message batch size after
> > > decompression (and perhaps compressed as well as that would help
> > understand
> > > the compression ratio).
> > >
> > > 3. Authentication success/failures per second. This is helpful to
> > > understand if some clients are misconfigured or if bad actors are
> trying
> > to
> > > authenticate.
> > >
> > > Thoughts?
> > >
> > > Ismael
> > >
> > >
> > >
> > > On Wed, Aug 23, 2017 at 2:53 AM, Jun Rao  wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Yes, if those error metrics are registered dynamically, we could
> worry
> > > > about expiration later.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Aug 18, 2017 at 1:55 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Perhaps we could register dynamically for now. If we find that the
> > cost
> > > > of
> > > > > retaining these is high, we can add the code to expire them later.
> Is
> > > > that
> > > > > ok?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 18, 2017 at 9:41 AM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > Can we quantify the cost of having these metrics around if they
> are
> > > > > > dynamically registered? Given that the 

Re: [DISCUSS] KIP-191: KafkaConsumer.subscribe() overload that takes just Pattern

2017-08-29 Thread Becket Qin
Sounds good to me as well.

On Tue, Aug 29, 2017 at 2:43 AM, Ismael Juma  wrote:

> Sounds good to me too. Since this is a non controversial change, I suggest
> starting the vote in 1-2 days if no-one else comments.
>
> Ismael
>
> On Thu, Aug 24, 2017 at 7:32 PM, Jason Gustafson 
> wrote:
>
> > Seems reasonable. I don't recall any specific reason for not providing
> this
> > method initially.
> >
> > -Jason
> >
> > On Thu, Aug 24, 2017 at 5:50 AM, Attila Kreiner 
> wrote:
> >
> > > Hi All,
> > >
> > > I created KIP-191:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 191%3A+KafkaConsumer.subscribe%28%29+overload+that+takes+just+Pattern
> > >
> > > Jira: https://issues.apache.org/jira/browse/KAFKA-5726
> > > PR: https://github.com/apache/kafka/pull/3669
> > >
> > > Please check it.
> > >
> > > Thanks,
> > > Attila
> > >
> >
>


[GitHub] kafka-site pull request #72: Jason and Becket are PMC members

2017-08-29 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka-site/pull/72

Jason and Becket are PMC members



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka-site update-pmc-members

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #72


commit 6625934c1250c22cf9ba222f871aff78951b6071
Author: Jason Gustafson 
Date:   2017-08-30T01:05:52Z

Jason and Becket are PMC members




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-0.10.2-jdk7 #194

2017-08-29 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-184 Rename LogCleaner and related classes to LogCompactor

2017-08-29 Thread Jason Gustafson
I think we need a KIP for config deprecations. The case for this one seems
clear-cut since we strongly depend on the log cleaner for the management of
consumer offsets and the transaction log, but we can see what others think.

-Jason

On Fri, Aug 25, 2017 at 10:00 AM, Pranav Maniar 
wrote:

> Yes I can take it up deprecation of  "log.cleaner.enable"
>
> Will it require KIP ?
> Since as per my understanding we will be honoring value set for
> "log.cleaner.enable" till the time it is around. For now just a warning
> message about deprecation will be logged only.
>
> Or should we remove the cofig now only?
>
>
> Thanks,
> Pranav
>
> On Wed, Aug 23, 2017 at 3:37 AM, Jason Gustafson 
> wrote:
>
> > Hi Pranav,
> >
> > Yeah, I'd recommend closing it since the benefit is unclear and since no
> > one has jumped in to offer stronger support for the change. Were you
> > planning to do a KIP to deprecate `log.cleaner.enable`? I still think
> that
> > makes sense.
> >
> > Thanks,
> > Jason
> >
> > On Tue, Aug 22, 2017 at 1:47 PM, Colin McCabe 
> wrote:
> >
> > > Hmm.  There are a lot of configuration keys that involve "log cleaner."
> > > It seems like if we rename this component, logically we'd have to
> rename
> > > all of them and support the old versions as deprecated config keys:
> > >
> > >   val LogCleanupPolicyProp = "log.cleanup.policy"
> > >   val LogCleanerThreadsProp = "log.cleaner.threads"
> > >   val LogCleanerIoMaxBytesPerSecondProp =
> > >   "log.cleaner.io.max.bytes.per.second"
> > >   val LogCleanerDedupeBufferSizeProp = "log.cleaner.dedupe.buffer.
> size"
> > >   val LogCleanerIoBufferSizeProp = "log.cleaner.io.buffer.size"
> > >   val LogCleanerDedupeBufferLoadFactorProp =
> > >   "log.cleaner.io.buffer.load.factor"
> > >   val LogCleanerBackoffMsProp = "log.cleaner.backoff.ms"
> > >   val LogCleanerMinCleanRatioProp = "log.cleaner.min.cleanable.ratio"
> > >   val LogCleanerEnableProp = "log.cleaner.enable"
> > >   val LogCleanerDeleteRetentionMsProp =
> > >   "log.cleaner.delete.retention.ms"
> > >   val LogCleanerMinCompactionLagMsProp =
> > >   "log.cleaner.min.compaction.lag.ms"
> > >
> > > This seems like it would be quite painful to users, since they'd have
> to
> > > deal with deprecation warnings and multiple names for the same
> > > configuration.  In general I think Jason and Ismael's point is valid:
> do
> > > we have evidence that "log cleaner" is causing confusion?  If not, it
> > > may not be worth it to rename this at the moment.
> > >
> > > regards,
> > > Colin
> > >
> > >
> > > On Mon, Aug 21, 2017, at 05:19, Pranav Maniar wrote:
> > > > Hi Jason,
> > > >
> > > > Haven't heard from other on this KIP. Should I close it ?
> > > >
> > > > ~Pranav
> > > >
> > > > On Thu, Aug 10, 2017 at 12:04 AM, Jason Gustafson <
> ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Pranav,
> > > > >
> > > > > Let's see what others think before closing the KIP. If there are
> > strong
> > > > > reasons for the renaming, I would reconsider.
> > > > >
> > > > > As far as deprecating `log.cleaner.enable`, I think it's a good
> idea
> > > and
> > > > > can be done in a separate KIP. Guozhang's suggestion seems
> > reasonable,
> > > but
> > > > > I'd just turn it on always (it won't cause much harm if there are
> no
> > > topics
> > > > > enabled for compaction). This is an implementation detail which
> > > probably
> > > > > doesn't need to be included in the KIP.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Wed, Aug 9, 2017 at 10:47 AM, Pranav Maniar <
> pranav9...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Ismael, Jason for the suggestion.
> > > > > > My bad. I should have followed up on mail-list discussion before
> > > starting
> > > > > > KIP. Apologies.
> > > > > >
> > > > > > I am relatively new, so I do not know if any confusion was
> reported
> > > in
> > > > > past
> > > > > > due to terminology. May be others can chime in.
> > > > > > If the old naming is fine with majority then no changes will be
> > > needed. I
> > > > > > will mark JIRA as wont'fix and close the KIP !
> > > > > >
> > > > > > Ismael, Jason,
> > > > > > There was another suggestion from Guozhang on deprecating and
> > > eventually
> > > > > > removing log.cleaner.enable property all together and always
> > > enabling log
> > > > > > cleaner if "log.cleanup.policy=compact".
> > > > > > What are your suggestion on this ?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Pranav
> > > > > >
> > > > > > On Wed, Aug 9, 2017 at 10:27 PM, Jason Gustafson <
> > ja...@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Yes, as Ismael noted above, I am not fond of this renaming.
> Keep
> > in
> > > > > mind
> > > > > > > that the LogCleaner does not only handle compaction. It is
> > > possible to
> > > > > > > configure a cleanup policy of "compact" and "delete," in which
> > > case the
> > > > > > > LogCleaner also handles 

Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Hi Rajini,

One more thought.  Maybe we should also add an error_message field in the
response like we do with CreateTopics response so that the server can
return an appropriate message that we can bubble up to the user.  Examples
would be "Invalid username or password", "SASL Impersonation not allowed",
or "You account has been locked, please contact cluster admin".

Thanks,

Roger

On Tue, Aug 29, 2017 at 12:41 PM, Roger Hoover 
wrote:

> Hi Rajini,
>
> The metrics in KIP-188 will provide counts across all users but the log
> could potentially be used to audit individual authentication events.  I
> think these would be useful at INFO level but if it's inconsistent with the
> rest of Kafka, DEBUG is ok too.  The default log4j config for Kafka
> separates authorization logs.  It seems like a good idea to treat
> authentication logs the same way whether or not we choose DEBUG or INFO.
>
> https://github.com/apache/kafka/blob/trunk/config/log4j.properties#L54-L58
>
> Cheers,
>
> Roger
>
> On Tue, Aug 29, 2017 at 10:51 AM, Rajini Sivaram 
> wrote:
>
>> Hi Roger,
>>
>> If we are changing logging level for successful SASL authentications in
>> the
>> broker, we should probably do the same for SSL too. Since KIP-188 proposes
>> to add new metrics for successful and failed authentications which may be
>> more useful for monitoring, do we really need info-level logging for
>> authentication? At the moment, there don't seem to be any per-connection
>> informational messages at info-level, but if you think it is useful, we
>> could do this in a separate JIRA. Let me know what you think.
>>
>> On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover 
>> wrote:
>>
>> > Just re-read the KIP and was wondering if you think INFO would be ok for
>> > logging successful authentications?  They should be relatively
>> infrequent.
>> >
>> > On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover 
>> > wrote:
>> >
>> > > +1 (non-binding).  Thanks, Rajini
>> > >
>> > > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma 
>> wrote:
>> > >
>> > >> Thanks for the KIP, +1 (binding) from me.
>> > >>
>> > >> Ismael
>> > >>
>> > >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
>> > rajinisiva...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi all,
>> > >> >
>> > >> > I would like to start vote on KIP-152 to improve diagnostics of
>> > >> > authentication failures and to update clients to treat
>> authentication
>> > >> > failures as fatal exceptions rather than transient errors:
>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
>> > >> >
>> > >> > Thank you...
>> > >> >
>> > >> > Rajini
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>


Jenkins build is back to normal : kafka-trunk-jdk7 #2688

2017-08-29 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Becket Qin
Yeah, I think expiring a batch but still wait for the response is probably
reasonable given the result is not guaranteed anyways.

@Jun,

I think the frequent PID reset may still be possible if we do not wait for
the in-flight response to return. Consider two partitions p0 and p1, the
deadline of the batches for p0 are T + 10, T + 30, T + 50... The deadline
of the batches for p1 are T + 20, T + 40, T + 60... Assuming each request
takes more than 10 ms to get the response. The following sequence may be
possible:

T: PID0 send batch0_p0(PID0), batch0_p1(PID0)
T + 10: PID0 expires batch0_p0(PID0), without resetting PID, sends
batch1_p0(PID0) and batch0_p1(PID0, retry)
T + 20: PID0 expires batch0_p1(PID0, retry), resets the PID to PID1, sends
batch1_p0(PID0, retry) and batch1_p1(PID1)
T + 30: PID1 expires batch1_p0(PID0, retry), without resetting PID, sends
batch2_p0(PID1) and batch1_p1(PID1, retry)
T + 40: PID1 expires batch1_p1(PID1, retry), resets the PID to PID2, sends
batch2_p0(PID1, retry) and sends batch2_p1(PID2)


In the above example, the producer will reset PID once every two requests.
The example did not take retry backoff into consideration, but it still
seems possible to encounter frequent PID reset if we do not wait for the
request to finish. Also, in this case we will have a lot of retries and
mixture of PIDs which seem to be pretty complicated.

I think Jason's suggestion will address both concerns, i.e. we fire the
callback at exactly delivery.timeout.ms, but we will still wait for the
response to be returned before sending the next request.

Thanks,

Jiangjie (Becket) Qin


On Tue, Aug 29, 2017 at 4:00 PM, Jun Rao  wrote:

> Hmm, I thought delivery.timeout.ms bounds the time from a message is in
> the
> accumulator (i.e., when send() returns) to the time when the callback is
> called. If we wait for request.timeout.ms for an inflight request and the
> remaining delivery.timeout.ms is less than request.timeout.ms, the
> callback
> may be called later than delivery.timeout.ms, right?
>
> Jiangjie's concern on resetting the pid on every expired batch is probably
> not an issue if we only reset the pid when the expired batch's pid is the
> same as the current pid, as Jason suggested.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 29, 2017 at 3:09 PM, Jason Gustafson 
> wrote:
>
> > I think the semantics of delivery.timeout.ms need to allow for the
> > possibility that the record was actually written. Unless we can keep on
> > retrying indefinitely, there's really no way to know for sure whether the
> > record was written or not. A delivery timeout just means that we cannot
> > guarantee that the record was delivered.
> >
> > -Jason
> >
> > On Tue, Aug 29, 2017 at 2:51 PM, Becket Qin 
> wrote:
> >
> > > Hi Jason,
> > >
> > > If we expire the batch from user's perspective but still waiting for
> the
> > > response, would that mean it is likely that the batch will be
> > successfully
> > > appended but the users will receive a TimeoutException? That seems a
> > little
> > > non-intuitive to the users. Arguably it maybe OK though because
> currently
> > > when TimeoutException is thrown, there is no guarantee whether the
> > messages
> > > are delivered or not.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > I think I'm with Becket. We should wait for request.timeout.ms for
> > each
> > > > produce request we send. We can still await the response internally
> for
> > > > PID/sequence maintenance even if we expire the batch from the user's
> > > > perspective. New sequence numbers would be assigned based on the
> > current
> > > > PID until the response returns and we find whether a PID reset is
> > > actually
> > > > needed. This makes delivery.timeout.ms a hard limit which is easier
> to
> > > > explain.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe 
> > > wrote:
> > > >
> > > > > I'm updating the kip-91 writeup. There seems to be some confusion
> > about
> > > > > expiring an inflight request. An inflight request gets a full
> > > > > delivery.timeout.ms duration from creation, right? So it should be
> > > > > max(remaining delivery.timeout.ms, request.timeout.ms)?
> > > > >
> > > > > Jun, we do want to wait for an inflight request for longer than
> > > > > request.timeout.ms. right?
> > > > >
> > > > > What happens to a batch when retries * (request.timeout.ms +
> > > > > retry.backoff.ms) < delivery.timeout.ms  and all retries are
> > > > exhausted?  I
> > > > > remember an internal discussion where we concluded that retries can
> > be
> > > no
> > > > > longer relevant (i.e., ignored, which is same as retries=MAX_LONG)
> > when
> > > > > there's an end-to-end delivery.timeout.ms. Do you agree?
> > > > >
> > > > > Regards,
> > > > > Sumant
> > > > >
> > > > > On 27 

Re: Permission to create KIP for discussions and the max.in.flight.requests.per.connection config doc

2017-08-29 Thread Ismael Juma
Hi Bhaskar,

Comment inline.

On Sun, Aug 27, 2017 at 12:37 PM, Bhaskar Gollapudi <
bhaskargollap...@gmail.com> wrote:

> max.in.flight.requests.per.connection The maximum number of unacknowledged
> requests the client will send on a single connection before blocking. Note
> that if this setting is set to be greater than 1 and there are failed
> sends, there is a risk of message re-ordering due to retries (i.e., if
> retries are enabled).
> I think this documentation is a bit confusing as it say this controls the #
> of unack'd requests client can send before being blocked. However , AFAI
> checked the code, it only checks for pending writes to the socketChannel
> before allowing next write, and this write might have been buffered in the
> socket's internal buffers (  based on SO_SNDBUF
>  StandardSocketOptions.html#SO_SNDBUF>
> value
> ) without any ack as yet. There is still a chance that this message might
> get lost ( network failure and session loss ) so its not correct to equate
> this state to saying the  message has been acked.
>

There is more to it than that. Sender has a guaranteeMessageOrder parameter
that is set to true if max.in.flight.requests.per.connection is 1.

Ismael


Re: [DISCUSS] KIP-189: Improve principal builder interface and add support for SASL

2017-08-29 Thread Colin McCabe
Thanks, Jason.  +1 (non-binding)

Colin

On Tue, Aug 29, 2017, at 09:12, Jason Gustafson wrote:
> Hi Colin,
> 
> Thanks for the comments. Seems reasonable to provide a safer `equals` for
> extensions. I don't think this needs to be part of the KIP, but I can add
> it to my patch.
> 
> Moving `fromString` makes sense also. This method is technically already
> part of the public API, which means we should probably deprecate it
> instead
> of removing it from `KafkaPrincipal`. I'll mention this in the KIP.
> 
> Thanks,
> Jason
> 
> On Mon, Aug 28, 2017 at 5:50 PM, Ted Yu  wrote:
> 
> > bq. change the check in equals() to be this.getClass().equals(other.
> > getClass())
> >
> > I happened to have Effective Java on hand.
> > Please take a look at related discussion on page 39.
> >
> > Josh later on mentioned Liskov substitution principle and a workaround
> > (favoring composition).
> >
> > FYI
> >
> >
> >
> > On Mon, Aug 28, 2017 at 4:48 PM, Colin McCabe  wrote:
> >
> > > Thanks, Jason, this is a great improvement!
> > >
> > > One minor nit.  The current KafkaPrincipal#equals looks like this:
> > >
> > > >@Override
> > > >public boolean equals(Object o) {
> > > >if (this == o) return true;
> > > >if (!(o instanceof KafkaPrincipal)) return false;
> > > >
> > > >KafkaPrincipal that = (KafkaPrincipal) o;
> > > >
> > > >if (!principalType.equals(that.principalType)) return false;
> > > >return name.equals(that.name);
> > > >}
> > >
> > > So if I implement MyKafkaPrincipalWithGroup that has an extra groupName
> > > field, I can have this situation:
> > >
> > > > KafkaPrincipal oldPrincipal = new KafkaPrincipal("User", "foo");
> > > > MyKafkaPrincipalWithGroup newPrincipal =
> > > >new MyKafkaPrincipalWithGroup("User", "foo", "mygroup")
> > > > System.out.println("" + oldPrincipal == newPrincipal) // true
> > > > System.out.println("" + newPrincipal == oldPrincipal) // false
> > >
> > > This is clearly bad, because it makes equality non-transitive.  The
> > > reason for this problem is because KafkaPrincipal#equals checks if
> > > MyKafkaPrincipalWithGroup is an instance of KafkaPrincipal-- and it is.
> > > It then proceeds to check if the user is the same-- and it is.  So it
> > > returns true.  It does not check the groups field, because it doesn't
> > > know about it.  On the other hand, MyKafkaPrincipalWithGroup#equals will
> > > check to see KafkaPrincipal is an instance of
> > > MyKafkaPrincipalWithGroup-- and it isn't.  So it returns false.
> > >
> > > In the KafkaPrincipal base class, it would be better to change the check
> > > in equals() to be this.getClass().equals(other.getClass()).  In other
> > > words, check for exact class equality, not instanceof.
> > >
> > > Alternately, we could implement a final equals method in the base class
> > > that compares by the toString method, under the assumption that any
> > > difference in KafkaPrincipal objects should be reflected in their
> > > serialized form.
> > >
> > > >@Override
> > > >public final boolean equals(Object o) {
> > > >if (this == o) return true;
> > > >if (!(o instanceof KafkaPrincipal)) return false;
> > > >return toString().equals(o.toString());
> > > >}
> > >
> > > Another question related to subclassing KafkaPrincipal: should we move
> > > KafkaPrincipal#fromString into an internal, non-public class?  It seems
> > > like people might expect
> > > KafkaPrincipal.fromString(myCustomPrincipal.toString()) to work, but it
> > > will not for subclasses.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Aug 28, 2017, at 15:51, Jason Gustafson wrote:
> > > > Thanks all for the discussion. I'll begin a vote in the next couple
> > days.
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Aug 25, 2017 at 5:01 PM, Don Bosco Durai 
> > > > wrote:
> > > >
> > > > > Jason, thanks for the clarification.
> > > > >
> > > > > Bosco
> > > > >
> > > > >
> > > > > On 8/25/17, 4:59 PM, "Jason Gustafson"  wrote:
> > > > >
> > > > > Hey Don,
> > > > >
> > > > > That is not actually part of the KIP. It was a (somewhat
> > pedantic)
> > > > > example
> > > > > used to illustrate how the kafka principal semantics could be
> > > applied
> > > > > to
> > > > > authorizers which understood group-level ACLs. The key point is
> > > this:
> > > > > although a principal is identified only by its type and name, the
> > > > > KafkaPrincipal can be used to represent relations to other
> > > principals.
> > > > > In
> > > > > this case, we have a user principal which is related to a group
> > > > > principal
> > > > > through the UserPrincipalAndGroup object. A GroupAuthorizer could
> > > then
> > > > > leverage this relation. As you suggest, a true implementation
> > would
> > > > > allow
> > > > > multiple groups.
> > > > >
> > > > > I will add a note to the KIP to 

Re: [DISCUSS] KIP-171: Extend Consumer Group Reset Offset for Stream Application

2017-08-29 Thread Guozhang Wang
Hi Jorge,

Thanks for the KIP. It would be a great to add feature to the reset tools.
I made a pass over it and it looks good to me overall. I have a few
comments:

1. For all the scenarios, do we allow users to specify more than one
parameters? If not could you make that clear in the wiki, e.g. we would
return with an error message saying that only one is allowed; if yes then
what precedence order we are following?

2. Personally I feel that "--by-duration", "--to-offset" and "--shift-by"
are a tad overkill, because 1) they assume there exist some committed
offset for each of the topic, but that may not be always true, 2) offset /
time shifting amount on different topics may not be a good fit universally,
i.e. one could imagine the we want to reset all input topics to their
offsets of a given time, but resetting all topics' offset to the same value
or let all of them shifting the same amount of offsets are usually not
applicable. For "--by-duration" it seems could be easily supported by the
"to-date".

For the general consumer group reset tool, since it could be set one per
partition these parameters may be more useful.

3. As for the implementation details, when removing zookeeper config in
`kafka-streams-application-reset`, we should consider return a meaning
error message otherwise it would be "unrecognized config" blah.


If you feel confident about the wiki after discussing about these points,
please feel free to move on to start a voting thread. Note that we are
about 3 weeks away from KIP deadline and 4 weeks away from feature deadline.


Guozhang





On Tue, Aug 22, 2017 at 1:45 PM, Matthias J. Sax 
wrote:

> Thanks for the update Jorge.
>
> I don't have any further comments.
>
>
> -Matthias
>
> On 8/12/17 6:43 PM, Jorge Esteban Quilcate Otoya wrote:
> > I have updated the KIP:
> >
> > - Change execution parameters, using `--dry-run`
> > - Reference KAFKA-4327
> > - And advise about changes on `StreamResetter`
> >
> > Also includes that it will cover a change on `ConsumerGroupCommand` to
> > align execution options.
> >
> > El dom., 16 jul. 2017 a las 5:37, Matthias J. Sax (<
> matth...@confluent.io>)
> > escribió:
> >
> >> Thanks a lot for the update!
> >>
> >> I like the KIP!
> >>
> >> One more question about `--dry-run` vs `--execute`: While I agree that
> >> we should use the same flag for both tools, I am not sure which one is
> >> the better one... My personal take is, that I like `--dry-run` better.
> >> Not sure what others think.
> >>
> >> One more comment: with the removal of ZK, we can also tackle this JIRA:
> >> https://issues.apache.org/jira/browse/KAFKA-4327 If we do so, I think
> we
> >> should mention it in the KIP.
> >>
> >> I am also not sure about backward compatibility issue for this case.
> >> Actually, I don't expect people to call `StreamsResetter` from Java
> >> code, but you can never know. So if we break this, we need to make sure
> >> to cover it in the KIP and later on in the release notes.
> >>
> >>
> >> -Matthias
> >>
> >> On 7/14/17 7:15 AM, Jorge Esteban Quilcate Otoya wrote:
> >>> Hi,
> >>>
> >>> KIP is updated.
> >>> Changes:
> >>> 1. Approach discussed to keep both tools (streams application resetter
> >> and
> >>> consumer group reset offset).
> >>> 2. Options has been aligned between both tools.
> >>> 3. Zookeeper option from streams-application-resetted will be removed,
> >> and
> >>> replaced internally for Kafka AdminClient.
> >>>
> >>> Looking forward to your feedback.
> >>>
> >>> El jue., 29 jun. 2017 a las 15:04, Matthias J. Sax (<
> >> matth...@confluent.io>)
> >>> escribió:
> >>>
>  Damian,
> 
> > resets everything and clears up
> >> the state stores.
> 
>  That's not correct. The reset tool does not touch local store. For
> this,
>  we have `KafkaStreams#clenup` -- otherwise, you would need to run the
>  tool in every machine you run an application instance.
> 
>  With regard to state, the tool only deletes the underlying changelog
>  topics.
> 
>  Just to clarify. The idea of allowing to rest with different offset is
>  to clear all state but just use a different start offset (instead of
>  zero). This is for use case where your topic has a larger retention
> time
>  than the amount of data you want to reprocess. But we always need to
>  start with an empty state. (Resetting with consistent state is
> something
>  we might do at some point, but it's much hard and not part of this
> KIP)
> 
> > @matthias, could we remove the ZK dependency from the streams reset
> >> tool
> > now?
> 
>  I think so. The new AdminClient provide the feature we need AFAIK. I
>  guess we can picky back this into the KIP (we would need a KIP anyway
>  because we deprecate `--zookeeper` what is an public API change).
> 
> 
>  I just want to point out, that even without ZK dependency, I prefer to
>  wrap the consumer offset tool and keep 

Re: Permission to create KIP for discussions and the max.in.flight.requests.per.connection config doc

2017-08-29 Thread Jun Rao
Hi, Bhaskar,

Thanks for your interest. Just granted you the wiki permission.

Jun


On Sun, Aug 27, 2017 at 4:37 AM, Bhaskar Gollapudi <
bhaskargollap...@gmail.com> wrote:

> Hi ,
>
> I would like to have have some permissions to create a KIP and start a
> discussion.
>
> I think I was able to create a JIRA but cant do anything more beyond that.
> I guess the process is to first create a KIP and get it discussed and
> voted.
>
> https://issues.apache.org/jira/browse/KAFKA-5761
>
> Can someone please let me know what is needed for me to be having the
> permission to create a discussion thread ?
>
> There are a couple of other points too that I want to put forward to the
> Kafka dev community : like the point I noticed about
> the max.in.flight.requests.per.connection property -
>
> max.in.flight.requests.per.connection The maximum number of unacknowledged
> requests the client will send on a single connection before blocking. Note
> that if this setting is set to be greater than 1 and there are failed
> sends, there is a risk of message re-ordering due to retries (i.e., if
> retries are enabled).
> I think this documentation is a bit confusing as it say this controls the #
> of unack'd requests client can send before being blocked. However , AFAI
> checked the code, it only checks for pending writes to the socketChannel
> before allowing next write, and this write might have been buffered in the
> socket's internal buffers (  based on SO_SNDBUF
>  StandardSocketOptions.html#SO_SNDBUF>
> value
> ) without any ack as yet. There is still a chance that this message might
> get lost ( network failure and session loss ) so its not correct to equate
> this state to saying the  message has been acked.
>
> Regards
> Bhaskar
>


[GitHub] kafka pull request #3721: Upgrade to ducktape 0.7.1

2017-08-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5768) Upgrade ducktape version to 0.7.1, and use new kill_java_processes

2017-08-29 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-5768.

   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 3721
[https://github.com/apache/kafka/pull/3721]

> Upgrade ducktape version to 0.7.1, and use new kill_java_processes
> --
>
> Key: KAFKA-5768
> URL: https://issues.apache.org/jira/browse/KAFKA-5768
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 1.0.0
>
>
> Upgrade the ducktape version to 0.7.0.  Use the new {{kill_java_processes}} 
> function in ducktape to kill only the processes that are part of a service 
> when starting or stopping a service, rather than killing all java processes 
> (in some cases)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[DISCUSS] KIP-192 - Provide cleaner semantics when idempotence is enabled

2017-08-29 Thread Apurva Mehta
Hi,

In the discussion of KIP-185 (enable idempotence by default), we discovered
some shortcomings of the existing idempotent producer implementation.
Fixing these issues requires changes to the ProduceRequest and
ProduceResponse protocols as well as changes to the values of the
'enable.idempotence' producer config.

Hence, I split off those changes into another KIP so as to decouple the two
issues. Please have a look at my follow up KIP below:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled

KIP-185 depends on KIP-192, and I hope to make progress on the latter
independently.

Thanks,
Apurva


Jenkins build is back to normal : kafka-0.11.0-jdk7 #289

2017-08-29 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jun Rao
Hmm, I thought delivery.timeout.ms bounds the time from a message is in the
accumulator (i.e., when send() returns) to the time when the callback is
called. If we wait for request.timeout.ms for an inflight request and the
remaining delivery.timeout.ms is less than request.timeout.ms, the callback
may be called later than delivery.timeout.ms, right?

Jiangjie's concern on resetting the pid on every expired batch is probably
not an issue if we only reset the pid when the expired batch's pid is the
same as the current pid, as Jason suggested.

Thanks,

Jun

On Tue, Aug 29, 2017 at 3:09 PM, Jason Gustafson  wrote:

> I think the semantics of delivery.timeout.ms need to allow for the
> possibility that the record was actually written. Unless we can keep on
> retrying indefinitely, there's really no way to know for sure whether the
> record was written or not. A delivery timeout just means that we cannot
> guarantee that the record was delivered.
>
> -Jason
>
> On Tue, Aug 29, 2017 at 2:51 PM, Becket Qin  wrote:
>
> > Hi Jason,
> >
> > If we expire the batch from user's perspective but still waiting for the
> > response, would that mean it is likely that the batch will be
> successfully
> > appended but the users will receive a TimeoutException? That seems a
> little
> > non-intuitive to the users. Arguably it maybe OK though because currently
> > when TimeoutException is thrown, there is no guarantee whether the
> messages
> > are delivered or not.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson 
> > wrote:
> >
> > > I think I'm with Becket. We should wait for request.timeout.ms for
> each
> > > produce request we send. We can still await the response internally for
> > > PID/sequence maintenance even if we expire the batch from the user's
> > > perspective. New sequence numbers would be assigned based on the
> current
> > > PID until the response returns and we find whether a PID reset is
> > actually
> > > needed. This makes delivery.timeout.ms a hard limit which is easier to
> > > explain.
> > >
> > > -Jason
> > >
> > > On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe 
> > wrote:
> > >
> > > > I'm updating the kip-91 writeup. There seems to be some confusion
> about
> > > > expiring an inflight request. An inflight request gets a full
> > > > delivery.timeout.ms duration from creation, right? So it should be
> > > > max(remaining delivery.timeout.ms, request.timeout.ms)?
> > > >
> > > > Jun, we do want to wait for an inflight request for longer than
> > > > request.timeout.ms. right?
> > > >
> > > > What happens to a batch when retries * (request.timeout.ms +
> > > > retry.backoff.ms) < delivery.timeout.ms  and all retries are
> > > exhausted?  I
> > > > remember an internal discussion where we concluded that retries can
> be
> > no
> > > > longer relevant (i.e., ignored, which is same as retries=MAX_LONG)
> when
> > > > there's an end-to-end delivery.timeout.ms. Do you agree?
> > > >
> > > > Regards,
> > > > Sumant
> > > >
> > > > On 27 August 2017 at 12:08, Jun Rao  wrote:
> > > >
> > > > > Hi, Jiangjie,
> > > > >
> > > > > If we want to enforce delivery.timeout.ms, we need to take the min
> > > > right?
> > > > > Also, if a user sets a large delivery.timeout.ms, we probably
> don't
> > > want
> > > > > to
> > > > > wait for an inflight request longer than request.timeout.ms.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin 
> > > > wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > I see what you mean. That makes sense. So in the above case after
> > the
> > > > > > producer resets PID, when it retry batch_0_tp1, the batch will
> > still
> > > > have
> > > > > > the old PID even if the producer has already got a new PID.
> > > > > >
> > > > > > @Jun, do you mean max(remaining delivery.timeout.ms,
> > > > request.timeout.ms)
> > > > > > instead of min(remaining delivery.timeout.ms, request.timeout.ms
> )?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > Hi, Becket,
> > > > > > >
> > > > > > > Good point on expiring inflight requests. Perhaps we can expire
> > an
> > > > > > inflight
> > > > > > > request after min(remaining delivery.timeout.ms,
> > > request.timeout.ms
> > > > ).
> > > > > > This
> > > > > > > way, if a user sets a high delivery.timeout.ms, we can still
> > > recover
> > > > > > from
> > > > > > > broker power outage sooner.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket Qin <
> > becket@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jason,
> > > > > > > >
> > > > > > > > 

Build failed in Jenkins: kafka-0.10.2-jdk7 #193

2017-08-29 Thread Apache Jenkins Server
See 


Changes:

[me] MINOR: Verify startup of zookeeper service in system tests

--
[...truncated 323.77 KB...]

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault STARTED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault PASSED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol PASSED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled STARTED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled PASSED

kafka.server.KafkaConfigTest > testAdvertisePortDefault STARTED

kafka.server.KafkaConfigTest > testAdvertisePortDefault PASSED

kafka.server.KafkaConfigTest > testVersionConfiguration STARTED

kafka.server.KafkaConfigTest > testVersionConfiguration PASSED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol PASSED

kafka.server.SslReplicaFetchTest > testReplicaFetcherThread STARTED

kafka.server.SslReplicaFetchTest > testReplicaFetcherThread PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade STARTED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade PASSED

kafka.server.SaslPlaintextReplicaFetchTest > testReplicaFetcherThread STARTED

kafka.server.SaslPlaintextReplicaFetchTest > testReplicaFetcherThread PASSED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithLeaderThrottle STARTED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithLeaderThrottle PASSED

kafka.server.ReplicationQuotasTest > shouldThrottleOldSegments STARTED

kafka.server.ReplicationQuotasTest > shouldThrottleOldSegments PASSED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithFollowerThrottle STARTED

kafka.server.ReplicationQuotasTest > 
shouldBootstrapTwoBrokersWithFollowerThrottle PASSED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK STARTED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK PASSED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot STARTED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot PASSED

kafka.server.ServerStartupTest > testConflictBrokerStartupWithSamePort STARTED

kafka.server.ServerStartupTest > testConflictBrokerStartupWithSamePort PASSED

kafka.server.ServerStartupTest > testConflictBrokerRegistration STARTED

kafka.server.ServerStartupTest > testConflictBrokerRegistration PASSED

kafka.server.ServerStartupTest > testBrokerSelfAware STARTED

kafka.server.ServerStartupTest > testBrokerSelfAware PASSED

kafka.server.ProduceRequestTest > testSimpleProduceRequest STARTED

kafka.server.ProduceRequestTest > testSimpleProduceRequest PASSED

kafka.server.ProduceRequestTest > testCorruptLz4ProduceRequest STARTED

kafka.server.ProduceRequestTest > testCorruptLz4ProduceRequest PASSED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping STARTED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping PASSED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse STARTED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse PASSED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks STARTED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks PASSED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower STARTED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower PASSED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
STARTED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
PASSED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent STARTED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
PASSED

kafka.server.OffsetCommitTest > testUpdateOffsets STARTED

kafka.server.OffsetCommitTest > testUpdateOffsets PASSED

kafka.server.OffsetCommitTest > testLargeMetadataPayload STARTED

kafka.server.OffsetCommitTest > testLargeMetadataPayload PASSED

kafka.server.OffsetCommitTest > testOffsetsDeleteAfterTopicDeletion STARTED

kafka.server.OffsetCommitTest > 

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jason Gustafson
I think the semantics of delivery.timeout.ms need to allow for the
possibility that the record was actually written. Unless we can keep on
retrying indefinitely, there's really no way to know for sure whether the
record was written or not. A delivery timeout just means that we cannot
guarantee that the record was delivered.

-Jason

On Tue, Aug 29, 2017 at 2:51 PM, Becket Qin  wrote:

> Hi Jason,
>
> If we expire the batch from user's perspective but still waiting for the
> response, would that mean it is likely that the batch will be successfully
> appended but the users will receive a TimeoutException? That seems a little
> non-intuitive to the users. Arguably it maybe OK though because currently
> when TimeoutException is thrown, there is no guarantee whether the messages
> are delivered or not.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson 
> wrote:
>
> > I think I'm with Becket. We should wait for request.timeout.ms for each
> > produce request we send. We can still await the response internally for
> > PID/sequence maintenance even if we expire the batch from the user's
> > perspective. New sequence numbers would be assigned based on the current
> > PID until the response returns and we find whether a PID reset is
> actually
> > needed. This makes delivery.timeout.ms a hard limit which is easier to
> > explain.
> >
> > -Jason
> >
> > On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe 
> wrote:
> >
> > > I'm updating the kip-91 writeup. There seems to be some confusion about
> > > expiring an inflight request. An inflight request gets a full
> > > delivery.timeout.ms duration from creation, right? So it should be
> > > max(remaining delivery.timeout.ms, request.timeout.ms)?
> > >
> > > Jun, we do want to wait for an inflight request for longer than
> > > request.timeout.ms. right?
> > >
> > > What happens to a batch when retries * (request.timeout.ms +
> > > retry.backoff.ms) < delivery.timeout.ms  and all retries are
> > exhausted?  I
> > > remember an internal discussion where we concluded that retries can be
> no
> > > longer relevant (i.e., ignored, which is same as retries=MAX_LONG) when
> > > there's an end-to-end delivery.timeout.ms. Do you agree?
> > >
> > > Regards,
> > > Sumant
> > >
> > > On 27 August 2017 at 12:08, Jun Rao  wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > If we want to enforce delivery.timeout.ms, we need to take the min
> > > right?
> > > > Also, if a user sets a large delivery.timeout.ms, we probably don't
> > want
> > > > to
> > > > wait for an inflight request longer than request.timeout.ms.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > I see what you mean. That makes sense. So in the above case after
> the
> > > > > producer resets PID, when it retry batch_0_tp1, the batch will
> still
> > > have
> > > > > the old PID even if the producer has already got a new PID.
> > > > >
> > > > > @Jun, do you mean max(remaining delivery.timeout.ms,
> > > request.timeout.ms)
> > > > > instead of min(remaining delivery.timeout.ms, request.timeout.ms)?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Becket,
> > > > > >
> > > > > > Good point on expiring inflight requests. Perhaps we can expire
> an
> > > > > inflight
> > > > > > request after min(remaining delivery.timeout.ms,
> > request.timeout.ms
> > > ).
> > > > > This
> > > > > > way, if a user sets a high delivery.timeout.ms, we can still
> > recover
> > > > > from
> > > > > > broker power outage sooner.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket Qin <
> becket@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jason,
> > > > > > >
> > > > > > > delivery.timeout.ms sounds good to me.
> > > > > > >
> > > > > > > I was referring to the case that we are resetting the
> > PID/sequence
> > > > > after
> > > > > > > expire a batch. This is more about the sending the batches
> after
> > > the
> > > > > > > expired batch.
> > > > > > >
> > > > > > > The scenario being discussed is expiring one of the batches in
> a
> > > > > > in-flight
> > > > > > > request and retry the other batches in the that in-flight
> > request.
> > > So
> > > > > > > consider the following case:
> > > > > > > 1. Producer sends request_0 with two batches (batch_0_tp0 and
> > > > > > batch_0_tp1).
> > > > > > > 2. Broker receives the request enqueued the request to the log.
> > > > > > > 3. Before the producer receives the response from the broker,
> > > > > batch_0_tp0
> > > > > > > expires. The producer will expire batch_0_tp0 immediately,
> resets
> > > > PID,
> > > > 

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Becket Qin
Hi Jason,

If we expire the batch from user's perspective but still waiting for the
response, would that mean it is likely that the batch will be successfully
appended but the users will receive a TimeoutException? That seems a little
non-intuitive to the users. Arguably it maybe OK though because currently
when TimeoutException is thrown, there is no guarantee whether the messages
are delivered or not.

Thanks,

Jiangjie (Becket) Qin

On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson 
wrote:

> I think I'm with Becket. We should wait for request.timeout.ms for each
> produce request we send. We can still await the response internally for
> PID/sequence maintenance even if we expire the batch from the user's
> perspective. New sequence numbers would be assigned based on the current
> PID until the response returns and we find whether a PID reset is actually
> needed. This makes delivery.timeout.ms a hard limit which is easier to
> explain.
>
> -Jason
>
> On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe  wrote:
>
> > I'm updating the kip-91 writeup. There seems to be some confusion about
> > expiring an inflight request. An inflight request gets a full
> > delivery.timeout.ms duration from creation, right? So it should be
> > max(remaining delivery.timeout.ms, request.timeout.ms)?
> >
> > Jun, we do want to wait for an inflight request for longer than
> > request.timeout.ms. right?
> >
> > What happens to a batch when retries * (request.timeout.ms +
> > retry.backoff.ms) < delivery.timeout.ms  and all retries are
> exhausted?  I
> > remember an internal discussion where we concluded that retries can be no
> > longer relevant (i.e., ignored, which is same as retries=MAX_LONG) when
> > there's an end-to-end delivery.timeout.ms. Do you agree?
> >
> > Regards,
> > Sumant
> >
> > On 27 August 2017 at 12:08, Jun Rao  wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > If we want to enforce delivery.timeout.ms, we need to take the min
> > right?
> > > Also, if a user sets a large delivery.timeout.ms, we probably don't
> want
> > > to
> > > wait for an inflight request longer than request.timeout.ms.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin 
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > I see what you mean. That makes sense. So in the above case after the
> > > > producer resets PID, when it retry batch_0_tp1, the batch will still
> > have
> > > > the old PID even if the producer has already got a new PID.
> > > >
> > > > @Jun, do you mean max(remaining delivery.timeout.ms,
> > request.timeout.ms)
> > > > instead of min(remaining delivery.timeout.ms, request.timeout.ms)?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao  wrote:
> > > >
> > > > > Hi, Becket,
> > > > >
> > > > > Good point on expiring inflight requests. Perhaps we can expire an
> > > > inflight
> > > > > request after min(remaining delivery.timeout.ms,
> request.timeout.ms
> > ).
> > > > This
> > > > > way, if a user sets a high delivery.timeout.ms, we can still
> recover
> > > > from
> > > > > broker power outage sooner.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket Qin  >
> > > > wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > delivery.timeout.ms sounds good to me.
> > > > > >
> > > > > > I was referring to the case that we are resetting the
> PID/sequence
> > > > after
> > > > > > expire a batch. This is more about the sending the batches after
> > the
> > > > > > expired batch.
> > > > > >
> > > > > > The scenario being discussed is expiring one of the batches in a
> > > > > in-flight
> > > > > > request and retry the other batches in the that in-flight
> request.
> > So
> > > > > > consider the following case:
> > > > > > 1. Producer sends request_0 with two batches (batch_0_tp0 and
> > > > > batch_0_tp1).
> > > > > > 2. Broker receives the request enqueued the request to the log.
> > > > > > 3. Before the producer receives the response from the broker,
> > > > batch_0_tp0
> > > > > > expires. The producer will expire batch_0_tp0 immediately, resets
> > > PID,
> > > > > and
> > > > > > then resend batch_0_tp1, and maybe send batch_1_tp0 (i.e. the
> next
> > > > batch
> > > > > to
> > > > > > the expired batch) as well.
> > > > > >
> > > > > > For batch_0_tp1, it is OK to reuse PID and and sequence number.
> The
> > > > > problem
> > > > > > is for batch_1_tp0, If we reuse the same PID and the broker has
> > > already
> > > > > > appended batch_0_tp0, the broker will think batch_1_tp0 is a
> > > duplicate
> > > > > with
> > > > > > the same sequence number. As a result broker will drop
> batch_0_tp1.
> > > > That
> > > > > is
> > > > > > why we have to either bump up sequence number or reset PID. To
> > avoid
> > > > this
> > > > > > 

[GitHub] kafka pull request #3757: KAFKA-5379 follow up: reduce redundant mock proces...

2017-08-29 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3757

KAFKA-5379 follow up: reduce redundant mock processor context



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka K5379-follow-up

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3757.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3757


commit c750932d05559c3e7d28c64a1a334db680cfe77f
Author: Guozhang Wang 
Date:   2017-08-29T20:59:44Z

remove two test processor context into mock processor context

commit d2fc3853ff3caa6f10e9932e90b995ff5a4807ea
Author: Guozhang Wang 
Date:   2017-08-29T21:27:28Z

some reverts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5779) Single message may exploit application based on KStream

2017-08-29 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-5779.

Resolution: Not A Problem

> Single message may exploit application based on KStream
> ---
>
> Key: KAFKA-5779
> URL: https://issues.apache.org/jira/browse/KAFKA-5779
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.1, 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Critical
>
> The context: in Kafka streamming I am *defining* simple KStream processing:
> {code}
> stringInput // line 54 of the SingleTopicStreamer class
> .filter( streamFilter::passOrFilterMessages )
> .map( normalizer )
> .to( outTopicName );
> {code}
> For some reasons I got wrong message (I am still investigating what is the 
> problem), 
> but anyhow my services was exploited with FATAL error:
> {code}
> 2017-08-22 17:08:44 FATAL SingleTopicStreamer:54 - Caught unhandled 
> exception: Input record ConsumerRecord(topic = XXX_topic, partition = 8, 
> offset = 15, CreateTime = -1, serialized key size = -1, serialized value size 
> = 255, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, 
> value = 
> {"recordTimestamp":"2017-08-22T17:07:40:619+02:00","logLevel":"INFO","sourceApplication":"WPT","message":"Kafka-Init","businessError":false,"normalizedStatus":"green","logger":"CoreLogger"})
>  has invalid (negative) timestamp. Possibly because a pre-0.10 producer 
> client was used to write this record to Kafka without embedding a timestamp, 
> or because the input topic was created before upgrading the Kafka cluster to 
> 0.10+. Use a different TimestampExtractor to process this data.; 
> [org.apache.kafka.streams.processor.FailOnInvalidTimestamp.onInvalidTimestamp(FailOnInvalidTimestamp.java:63),
>  
> org.apache.kafka.streams.processor.ExtractRecordMetadataTimestamp.extract(ExtractRecordMetadataTimestamp.java:61),
>  
> org.apache.kafka.streams.processor.FailOnInvalidTimestamp.extract(FailOnInvalidTimestamp.java:46),
>  
> org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:85),
>  
> org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117),
>  
> org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:464),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:650),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:556),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)]
>  in thread restreamer-d4e77d18-6e7b-4708-8436-7fea0d4b1cdf-StreamThread-3
> {code}
> The possible reason about using old producer in message is false, as we are 
> using Kafka 0.10.2.1 and 0.11.0.0 and the topics had been created within this 
> version of Kafka. 
> The sender application is .NET client from Confluent.
> All the matter is a bit problematic with this exception, as it was suggested 
> it is thrown in scope of initialization of the stream, but effectively it 
> happend in processing, so adding try{} catch {} around stringInput statement 
> does not help, as stream was correctly defined, but only one message send 
> later had exploited all the app.
> In my opinion KStream shall be robust enough to catch all such a exception 
> and shall protect application from death due to single corrupted message. 
> Especially when timestamp is not embedded. In such a case one can patch 
> message with current timestamp without loss of overall performance.
> I would expect Kafka Stream will handle this.
> I will continue to investigate, what is the problem with the message, but it 
> is quite hard to me, as it happens internally in Kafka stream combined with 
> .NET producer.
> And I had already tested, that this problem does not occur when I got these 
> concrete messages in old-fashioned Kafka Consumer :-).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: kafka-trunk-jdk7 #2687

2017-08-29 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-5379: ProcessorContext.appConfigs() should return parsed values

--
[...truncated 914.18 KB...]

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryTailIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryTailIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica 
STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown PASSED


[jira] [Resolved] (KAFKA-3570) It'd be nice to line up display output in columns in ConsumerGroupCommand

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-3570.
--
Resolution: Fixed

> It'd be nice to line up display output in columns in ConsumerGroupCommand
> -
>
> Key: KAFKA-3570
> URL: https://issues.apache.org/jira/browse/KAFKA-3570
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Greg Zoller
>Priority: Trivial
>
> Not a huge deal but the output for ConsumerGroupCommand is pretty messy.  
> It'd be cool to line up the columns on the display output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1980) Console consumer throws OutOfMemoryError with large max-messages

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1980.
--
Resolution: Fixed

Closing as per comments.  Pl reopen if you think the issue still exists


> Console consumer throws OutOfMemoryError with large max-messages
> 
>
> Key: KAFKA-1980
> URL: https://issues.apache.org/jira/browse/KAFKA-1980
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1.1, 0.8.2.0
>Reporter: Håkon Hitland
>Priority: Minor
> Attachments: kafka-1980.patch
>
>
> Tested on kafka_2.11-0.8.2.0
> Steps to reproduce:
> - Have any topic with at least 1 GB of data.
> - Use kafka-console-consumer.sh on the topic passing a large number to 
> --max-messages, e.g.:
> $ bin/kafka-console-consumer.sh --zookeeper localhost --topic test.large 
> --from-beginning --max-messages  | head -n 40
> Expected result:
> Should stream messages up to max-messages
> Result:
> Out of memory error:
> [2015-02-23 19:41:35,006] ERROR OOME with size 1048618 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>   at 
> kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:80)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:63)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:71)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:112)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:111)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111)
>   at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:110)
>   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> As a first guess I'd say that this is caused by slice() taking more memory 
> than expected. Perhaps because it is called on an Iterable and not an 
> Iterator?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3756: KAFKA-5261 added cached authorizer

2017-08-29 Thread simplesteph
GitHub user simplesteph opened a pull request:

https://github.com/apache/kafka/pull/3756

KAFKA-5261 added cached authorizer

attempt to improve Kafka performance by caching the results of the 
authorisation calls.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/simplesteph/kafka KAFKA-5261

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3756


commit 4fc2c0f365b675eb471a848b29001973ec1c2aa5
Author: Stephane Maarek 
Date:   2017-08-29T20:03:56Z

added cached authorizer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-1675) bootstrapping tidy-up

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1675.
--
Resolution: Fixed

 gradlew, gradlew.bat scripts are removed from repo. Pl reopen if you think the 
issue still exists


> bootstrapping tidy-up
> -
>
> Key: KAFKA-1675
> URL: https://issues.apache.org/jira/browse/KAFKA-1675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Szczepan Faber
>Assignee: Ivan Lyutov
> Attachments: KAFKA-1675.patch
>
>
> I'd like to suggest following changes:
> 1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
> scripts don't work, e.g. they fail with exception when invoked. I just got a 
> user report where those scripts were invoked by the user and it led to an 
> exception that was not easy to grasp. Bootstrapping step will generate those 
> files anyway.
> 2. move the 'gradleVersion' extra property from the 'build.gradle' into 
> 'gradle.properties'. Otherwise it is hard to automate the bootstrapping 
> process - in order to find out the gradle version, I need to evaluate the 
> build script, and for that I need gradle with correct version (kind of a 
> vicious circle). Project properties declared in the gradle.properties file 
> can be accessed exactly the same as the 'ext' properties, for example: 
> 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3455: KAFKA-5379 - ProcessorContext.appConfigs() should ...

2017-08-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5379) ProcessorContext.appConfigs() should return parsed/validated values

2017-08-29 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-5379.
--
   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 3455
[https://github.com/apache/kafka/pull/3455]

> ProcessorContext.appConfigs() should return parsed/validated values
> ---
>
> Key: KAFKA-5379
> URL: https://issues.apache.org/jira/browse/KAFKA-5379
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Tommy Becker
>Assignee: Tommy Becker
>Priority: Minor
> Fix For: 1.0.0
>
>
> As part of KAFKA-5334, it was decided that the current behavior of 
> {{ProcessorContext.appConfigs()}} is sub-optimal in that it returns the 
> original unparsed config values. Alternatively, the parsed values could be 
> returned which would allow callers to know what they are getting as well 
> avoid duplicating type conversions (e.g. className -> class).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2173) Kafka died after throw more error

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-2173.
--
Resolution: Cannot Reproduce

 Might have fixed in latest versions. Pl reopen if you think the issue still 
exists


> Kafka died after throw more error
> -
>
> Key: KAFKA-2173
> URL: https://issues.apache.org/jira/browse/KAFKA-2173
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
> Environment: VPS Server CentOs 6.6 4G Ram
>Reporter: Truyet Nguyen
>
> Kafka is died after server.log throw more error: 
> [2015-05-05 16:08:34,616] ERROR Closing socket for /127.0.0.1 because of 
> error (kafka.network.Processor)
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470)
>   at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
>   at kafka.network.MultiSend.writeTo(Transmission.scala:101)
>   at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
>   at kafka.network.Processor.write(SocketServer.scala:472)
>   at kafka.network.Processor.run(SocketServer.scala:342)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1636) High CPU in very active environment

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1636.
--
Resolution: Won't Fix

ConsumerIterator waits for the data from the underlying stream. Pl reopen if 
you think the issue still exists


> High CPU in very active environment
> ---
>
> Key: KAFKA-1636
> URL: https://issues.apache.org/jira/browse/KAFKA-1636
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
> Environment: Redhat 64 bit
>  2.6.32-431.23.3.el6.x86_64 #1 SMP Wed Jul 16 06:12:23 EDT 2014 x86_64 x86_64 
> x86_64 GNU/Linux
>Reporter: Laurie Turner
>Assignee: Neha Narkhede
>
> Found the same issue on StackOverFlow below:
> http://stackoverflow.com/questions/22983435/kafka-consumer-threads-are-in-waiting-state-and-lag-is-getting-increased
> This is a very busy environment and the majority of the CPU seems to be busy 
> in the in the await method. 
> 3XMTHREADINFO3   Java callstack:
> 4XESTACKTRACEat sun/misc/Unsafe.park(Native Method)
> 4XESTACKTRACEat 
> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:237(Compiled
>  Code))
> 4XESTACKTRACEat 
> java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2093(Compiled
>  Code))
> 4XESTACKTRACEat 
> java/util/concurrent/LinkedBlockingQueue.poll(LinkedBlockingQueue.java:478(Compiled
>  Code))
> 4XESTACKTRACEat 
> kafka/consumer/ConsumerIterator.makeNext(ConsumerIterator.scala:65(Compiled 
> Code))
> 4XESTACKTRACEat 
> kafka/consumer/ConsumerIterator.makeNext(ConsumerIterator.scala:33(Compiled 
> Code))
> 4XESTACKTRACEat 
> kafka/utils/IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:61(Compiled
>  Code))
> 4XESTACKTRACEat 
> kafka/utils/IteratorTemplate.hasNext(IteratorTemplate.scala:53(Compiled Code))



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Hi Rajini,

The metrics in KIP-188 will provide counts across all users but the log
could potentially be used to audit individual authentication events.  I
think these would be useful at INFO level but if it's inconsistent with the
rest of Kafka, DEBUG is ok too.  The default log4j config for Kafka
separates authorization logs.  It seems like a good idea to treat
authentication logs the same way whether or not we choose DEBUG or INFO.

https://github.com/apache/kafka/blob/trunk/config/log4j.properties#L54-L58

Cheers,

Roger

On Tue, Aug 29, 2017 at 10:51 AM, Rajini Sivaram 
wrote:

> Hi Roger,
>
> If we are changing logging level for successful SASL authentications in the
> broker, we should probably do the same for SSL too. Since KIP-188 proposes
> to add new metrics for successful and failed authentications which may be
> more useful for monitoring, do we really need info-level logging for
> authentication? At the moment, there don't seem to be any per-connection
> informational messages at info-level, but if you think it is useful, we
> could do this in a separate JIRA. Let me know what you think.
>
> On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover 
> wrote:
>
> > Just re-read the KIP and was wondering if you think INFO would be ok for
> > logging successful authentications?  They should be relatively
> infrequent.
> >
> > On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover 
> > wrote:
> >
> > > +1 (non-binding).  Thanks, Rajini
> > >
> > > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma 
> wrote:
> > >
> > >> Thanks for the KIP, +1 (binding) from me.
> > >>
> > >> Ismael
> > >>
> > >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I would like to start vote on KIP-152 to improve diagnostics of
> > >> > authentication failures and to update clients to treat
> authentication
> > >> > failures as fatal exceptions rather than transient errors:
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> > >> >
> > >> > Thank you...
> > >> >
> > >> > Rajini
> > >> >
> > >>
> > >
> > >
> >
>


[jira] [Created] (KAFKA-5807) NPE on Connector.validate

2017-08-29 Thread Jeremy Custenborder (JIRA)
Jeremy Custenborder created KAFKA-5807:
--

 Summary: NPE on Connector.validate
 Key: KAFKA-5807
 URL: https://issues.apache.org/jira/browse/KAFKA-5807
 Project: Kafka
  Issue Type: Bug
Reporter: Jeremy Custenborder
Assignee: Jeremy Custenborder
Priority: Minor


NPE is thrown when a developer returns a null when overloading 
Connector.validate(). 

{code}
[2017-08-23 13:36:30,086] ERROR Stopping after connector error 
(org.apache.kafka.connect.cli.ConnectStandalone:99)
java.lang.NullPointerException
at 
org.apache.kafka.connect.connector.Connector.validate(Connector.java:134)
at 
org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:254)
at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:158)
at 
org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:93)
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jason Gustafson
I think I'm with Becket. We should wait for request.timeout.ms for each
produce request we send. We can still await the response internally for
PID/sequence maintenance even if we expire the batch from the user's
perspective. New sequence numbers would be assigned based on the current
PID until the response returns and we find whether a PID reset is actually
needed. This makes delivery.timeout.ms a hard limit which is easier to
explain.

-Jason

On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe  wrote:

> I'm updating the kip-91 writeup. There seems to be some confusion about
> expiring an inflight request. An inflight request gets a full
> delivery.timeout.ms duration from creation, right? So it should be
> max(remaining delivery.timeout.ms, request.timeout.ms)?
>
> Jun, we do want to wait for an inflight request for longer than
> request.timeout.ms. right?
>
> What happens to a batch when retries * (request.timeout.ms +
> retry.backoff.ms) < delivery.timeout.ms  and all retries are exhausted?  I
> remember an internal discussion where we concluded that retries can be no
> longer relevant (i.e., ignored, which is same as retries=MAX_LONG) when
> there's an end-to-end delivery.timeout.ms. Do you agree?
>
> Regards,
> Sumant
>
> On 27 August 2017 at 12:08, Jun Rao  wrote:
>
> > Hi, Jiangjie,
> >
> > If we want to enforce delivery.timeout.ms, we need to take the min
> right?
> > Also, if a user sets a large delivery.timeout.ms, we probably don't want
> > to
> > wait for an inflight request longer than request.timeout.ms.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin 
> wrote:
> >
> > > Hi Jason,
> > >
> > > I see what you mean. That makes sense. So in the above case after the
> > > producer resets PID, when it retry batch_0_tp1, the batch will still
> have
> > > the old PID even if the producer has already got a new PID.
> > >
> > > @Jun, do you mean max(remaining delivery.timeout.ms,
> request.timeout.ms)
> > > instead of min(remaining delivery.timeout.ms, request.timeout.ms)?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao  wrote:
> > >
> > > > Hi, Becket,
> > > >
> > > > Good point on expiring inflight requests. Perhaps we can expire an
> > > inflight
> > > > request after min(remaining delivery.timeout.ms, request.timeout.ms
> ).
> > > This
> > > > way, if a user sets a high delivery.timeout.ms, we can still recover
> > > from
> > > > broker power outage sooner.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > delivery.timeout.ms sounds good to me.
> > > > >
> > > > > I was referring to the case that we are resetting the PID/sequence
> > > after
> > > > > expire a batch. This is more about the sending the batches after
> the
> > > > > expired batch.
> > > > >
> > > > > The scenario being discussed is expiring one of the batches in a
> > > > in-flight
> > > > > request and retry the other batches in the that in-flight request.
> So
> > > > > consider the following case:
> > > > > 1. Producer sends request_0 with two batches (batch_0_tp0 and
> > > > batch_0_tp1).
> > > > > 2. Broker receives the request enqueued the request to the log.
> > > > > 3. Before the producer receives the response from the broker,
> > > batch_0_tp0
> > > > > expires. The producer will expire batch_0_tp0 immediately, resets
> > PID,
> > > > and
> > > > > then resend batch_0_tp1, and maybe send batch_1_tp0 (i.e. the next
> > > batch
> > > > to
> > > > > the expired batch) as well.
> > > > >
> > > > > For batch_0_tp1, it is OK to reuse PID and and sequence number. The
> > > > problem
> > > > > is for batch_1_tp0, If we reuse the same PID and the broker has
> > already
> > > > > appended batch_0_tp0, the broker will think batch_1_tp0 is a
> > duplicate
> > > > with
> > > > > the same sequence number. As a result broker will drop batch_0_tp1.
> > > That
> > > > is
> > > > > why we have to either bump up sequence number or reset PID. To
> avoid
> > > this
> > > > > complexity, I was suggesting not expire the in-flight batch
> > > immediately,
> > > > > but wait for the produce response. If the batch has been
> successfully
> > > > > appended, we do not expire it. Otherwise, we expire it.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Aug 24, 2017 at 11:26 AM, Jason Gustafson <
> > ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > @Becket
> > > > > >
> > > > > > Good point about unnecessarily resetting the PID in cases where
> we
> > > know
> > > > > the
> > > > > > request has failed. Might be worth opening a JIRA to try and
> > improve
> > > > > this.
> > > > > >
> > > > > > So if we expire the batch prematurely and resend all
> > > > > > > the other 

[GitHub] kafka pull request #3755: KAFKA-5806. Fix transient unit test failure in tro...

2017-08-29 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3755

KAFKA-5806. Fix transient unit test failure in trogdor coordinator shutdown

In the coordinator, we should check that 'shutdown' is not true before 
going to sleep waiting for the condition.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5806

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3755


commit 5391c92df4610fbf5479e24e271574561e0da447
Author: Colin P. Mccabe 
Date:   2017-08-29T19:32:23Z

KAFKA-5806. Fix transient unit test failure in trogdor coordinator shutdown




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5806) Fix transient unit test failure in trogdor coordinator shutdown

2017-08-29 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-5806:
--

 Summary: Fix transient unit test failure in trogdor coordinator 
shutdown
 Key: KAFKA-5806
 URL: https://issues.apache.org/jira/browse/KAFKA-5806
 Project: Kafka
  Issue Type: Sub-task
  Components: system tests
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Fix transient unit test failure in trogdor coordinator shutdown



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4634) Issue of one kafka brokers not listed in zookeeper

2017-08-29 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4634.
--
Resolution: Fixed

Similar issue fixed in KAFKA-1387/Newer versions. Pl reopen if you think the 
issue still exists


> Issue of one kafka brokers not listed in zookeeper
> --
>
> Key: KAFKA-4634
> URL: https://issues.apache.org/jira/browse/KAFKA-4634
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
> Environment: Zookeeper version: 3.4.6-1569965, built on 02/20/2014 
> 09:09 GMT
> kafka_2.10-0.8.2.1
>Reporter: Maharajan Shunmuga Sundaram
>
> Hi,
> We have incident that one of the 10 brokers not listed in brokers list of 
> zookeeper.
> This is verified by running following command
> >> echo dump | nc cz2 2181
> SessionTracker dump:
> Session Sets (4):
> 0 expire at Fri Jan 13 22:32:14 EST 2017:
> 0 expire at Fri Jan 13 22:32:16 EST 2017:
> 7 expire at Fri Jan 13 22:32:18 EST 2017:
> 0x259968e41e3
> 0x35996670d5d0001
> 0x35996670d5d
> 0x159966708470004
> 0x159966e4776
> 0x159966708470003
> 0x2599672df26
> 3 expire at Fri Jan 13 22:32:20 EST 2017:
> 0x159968e41dd
> 0x259966708550001
> 0x25996670855
> ephemeral nodes dump:
> Sessions with Ephemerals (9):
> 0x25996670855:
> /brokers/ids/112
> 0x259968e41e3:
> /brokers/ids/213
> 0x159968e41dd:
> /brokers/ids/19
> 0x159966708470003:
> /brokers/ids/110
> 0x35996670d5d:
> /brokers/ids/113
> /controller
> 0x259966708550001:
> /brokers/ids/111
> 0x159966708470004:
> /brokers/ids/212
> 0x2599672df26:
> /brokers/ids/29
> 0x35996670d5d0001:
> /brokers/ids/210
> --
> There are 10 sessions, but only 9 sessions are listed with brokers.
> Broker with id 211 is not listed. Session 0x159966e4776 is not shown with 
> broker id 211.
> In the broker side log, I do see it is connected
> >> zgrep "0x159966e4776" *log*
>  
> zk.log:[2017-01-13 01:05:28,513] INFO Session establishment complete on 
> server cz1/10.254.2.19:2181, sessionid = 0x159966e4776, negotiated 
> timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:38,163] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:39,101] WARN Session 0x159966e4776 for server 
> null, unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:40,121] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:41,770] WARN Session 0x159966e4776 for server 
> null, unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:42,439] WARN Session 0x159966e4776 for server 
> null, unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:43,235] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:44,950] WARN Session 0x159966e4776 for server 
> null, unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:45,837] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> .
> .
> .
> .
> zk.log:[2017-01-13 01:40:14,818] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:15,916] WARN Session 0x159966e4776 for server 
> null, unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:19,692] INFO Client session timed out, have not 
> heard from server in 3676ms for sessionid 0x159966e4776, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:20,632] INFO Unable to read additional data from 
> server sessionid 0x159966e4776, likely server has closed socket, closing 
> 

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Sumant Tambe
I'm updating the kip-91 writeup. There seems to be some confusion about
expiring an inflight request. An inflight request gets a full
delivery.timeout.ms duration from creation, right? So it should be
max(remaining delivery.timeout.ms, request.timeout.ms)?

Jun, we do want to wait for an inflight request for longer than
request.timeout.ms. right?

What happens to a batch when retries * (request.timeout.ms +
retry.backoff.ms) < delivery.timeout.ms  and all retries are exhausted?  I
remember an internal discussion where we concluded that retries can be no
longer relevant (i.e., ignored, which is same as retries=MAX_LONG) when
there's an end-to-end delivery.timeout.ms. Do you agree?

Regards,
Sumant

On 27 August 2017 at 12:08, Jun Rao  wrote:

> Hi, Jiangjie,
>
> If we want to enforce delivery.timeout.ms, we need to take the min right?
> Also, if a user sets a large delivery.timeout.ms, we probably don't want
> to
> wait for an inflight request longer than request.timeout.ms.
>
> Thanks,
>
> Jun
>
> On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin  wrote:
>
> > Hi Jason,
> >
> > I see what you mean. That makes sense. So in the above case after the
> > producer resets PID, when it retry batch_0_tp1, the batch will still have
> > the old PID even if the producer has already got a new PID.
> >
> > @Jun, do you mean max(remaining delivery.timeout.ms, request.timeout.ms)
> > instead of min(remaining delivery.timeout.ms, request.timeout.ms)?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao  wrote:
> >
> > > Hi, Becket,
> > >
> > > Good point on expiring inflight requests. Perhaps we can expire an
> > inflight
> > > request after min(remaining delivery.timeout.ms, request.timeout.ms).
> > This
> > > way, if a user sets a high delivery.timeout.ms, we can still recover
> > from
> > > broker power outage sooner.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Aug 24, 2017 at 12:52 PM, Becket Qin 
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > delivery.timeout.ms sounds good to me.
> > > >
> > > > I was referring to the case that we are resetting the PID/sequence
> > after
> > > > expire a batch. This is more about the sending the batches after the
> > > > expired batch.
> > > >
> > > > The scenario being discussed is expiring one of the batches in a
> > > in-flight
> > > > request and retry the other batches in the that in-flight request. So
> > > > consider the following case:
> > > > 1. Producer sends request_0 with two batches (batch_0_tp0 and
> > > batch_0_tp1).
> > > > 2. Broker receives the request enqueued the request to the log.
> > > > 3. Before the producer receives the response from the broker,
> > batch_0_tp0
> > > > expires. The producer will expire batch_0_tp0 immediately, resets
> PID,
> > > and
> > > > then resend batch_0_tp1, and maybe send batch_1_tp0 (i.e. the next
> > batch
> > > to
> > > > the expired batch) as well.
> > > >
> > > > For batch_0_tp1, it is OK to reuse PID and and sequence number. The
> > > problem
> > > > is for batch_1_tp0, If we reuse the same PID and the broker has
> already
> > > > appended batch_0_tp0, the broker will think batch_1_tp0 is a
> duplicate
> > > with
> > > > the same sequence number. As a result broker will drop batch_0_tp1.
> > That
> > > is
> > > > why we have to either bump up sequence number or reset PID. To avoid
> > this
> > > > complexity, I was suggesting not expire the in-flight batch
> > immediately,
> > > > but wait for the produce response. If the batch has been successfully
> > > > appended, we do not expire it. Otherwise, we expire it.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Thu, Aug 24, 2017 at 11:26 AM, Jason Gustafson <
> ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > @Becket
> > > > >
> > > > > Good point about unnecessarily resetting the PID in cases where we
> > know
> > > > the
> > > > > request has failed. Might be worth opening a JIRA to try and
> improve
> > > > this.
> > > > >
> > > > > So if we expire the batch prematurely and resend all
> > > > > > the other batches in the same request, chances are there will be
> > > > > > duplicates. If we wait for the response instead, it is less
> likely
> > to
> > > > > > introduce duplicates, and we may not need to reset the PID.
> > > > >
> > > > >
> > > > > Not sure I follow this. Are you assuming that we change the batch
> > > > > PID/sequence of the retried batches after resetting the PID? I
> think
> > we
> > > > > probably need to ensure that when we retry a batch, we always use
> the
> > > > same
> > > > > PID/sequence.
> > > > >
> > > > > By the way, as far as naming, `max.message.delivery.wait.ms` is
> > quite
> > > a
> > > > > mouthful. Could we shorten it? Perhaps `delivery.timeout.ms`?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Wed, Aug 23, 2017 at 8:51 PM, Becket Qin 

Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Rajini Sivaram
Hi Roger,

If we are changing logging level for successful SASL authentications in the
broker, we should probably do the same for SSL too. Since KIP-188 proposes
to add new metrics for successful and failed authentications which may be
more useful for monitoring, do we really need info-level logging for
authentication? At the moment, there don't seem to be any per-connection
informational messages at info-level, but if you think it is useful, we
could do this in a separate JIRA. Let me know what you think.

On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover 
wrote:

> Just re-read the KIP and was wondering if you think INFO would be ok for
> logging successful authentications?  They should be relatively infrequent.
>
> On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover 
> wrote:
>
> > +1 (non-binding).  Thanks, Rajini
> >
> > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma  wrote:
> >
> >> Thanks for the KIP, +1 (binding) from me.
> >>
> >> Ismael
> >>
> >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to start vote on KIP-152 to improve diagnostics of
> >> > authentication failures and to update clients to treat authentication
> >> > failures as fatal exceptions rather than transient errors:
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> >> >
> >> > Thank you...
> >> >
> >> > Rajini
> >> >
> >>
> >
> >
>


[GitHub] kafka pull request #3713: MINOR: simplify state transition for Kafka Streams...

2017-08-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3713


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Jason Gustafson
Great improvement! +1

On Tue, Aug 29, 2017 at 10:09 AM, Roger Hoover 
wrote:

> Just re-read the KIP and was wondering if you think INFO would be ok for
> logging successful authentications?  They should be relatively infrequent.
>
> On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover 
> wrote:
>
> > +1 (non-binding).  Thanks, Rajini
> >
> > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma  wrote:
> >
> >> Thanks for the KIP, +1 (binding) from me.
> >>
> >> Ismael
> >>
> >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to start vote on KIP-152 to improve diagnostics of
> >> > authentication failures and to update clients to treat authentication
> >> > failures as fatal exceptions rather than transient errors:
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> >> >
> >> > Thank you...
> >> >
> >> > Rajini
> >> >
> >>
> >
> >
>


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Just re-read the KIP and was wondering if you think INFO would be ok for
logging successful authentications?  They should be relatively infrequent.

On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover 
wrote:

> +1 (non-binding).  Thanks, Rajini
>
> On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma  wrote:
>
>> Thanks for the KIP, +1 (binding) from me.
>>
>> Ismael
>>
>> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram 
>> wrote:
>>
>> > Hi all,
>> >
>> > I would like to start vote on KIP-152 to improve diagnostics of
>> > authentication failures and to update clients to treat authentication
>> > failures as fatal exceptions rather than transient errors:
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
>> >
>> > Thank you...
>> >
>> > Rajini
>> >
>>
>
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Roger Hoover
Great suggestions, Ismael and thanks for incorporating them, Rajini.

Tracking authentication success and failures (#3) across listeners seems
very useful for cluster administrators to identify misconfigured client or
bad actors, especially until all clients implement KIP-152 which will add
an explicit error code for authentication failures.  Currently, clients
just get disconnected so it's hard to distinguish authentication failures
from any other error that can cause disconnect.  This broker-side metric is
useful regardless but can help fill this gap until all clients support KIP
152.

Just to be clear, the ones called `successful-authentication-rate` and
`failed-authentication-rate` will also have failed-authentication-count
and successful-authentication-count to match KIP 187?

On Tue, Aug 29, 2017 at 7:26 AM, Rajini Sivaram 
wrote:

> Hi Ismael,
>
> Thank you for the suggestions. The additional metrics sound very useful and
> I have added them to the KIP.
>
> Regards,
>
> Rajini
>
> On Tue, Aug 29, 2017 at 5:34 AM, Ismael Juma  wrote:
>
> > Hi Rajini,
> >
> > There are a few other metrics that could potentially be useful. I'd be
> > interested in what you and the community thinks:
> >
> > 1. The KIP currently includes `FetchDownConversionsPerSec`, which is
> > useful. In the common case, one would want to avoid down conversion by
> > using the lower message format supported by most of the consumers.
> However,
> > there are good reasons to use a newer message format even if there are
> some
> > legacy consumers around. It would be good to quantify the cost of these
> > consumers a bit more clearly. Looking at the request metric `LocalTimeMs`
> > provides a hint, but it may be useful to have a dedicated
> > `FetchDownConversionsMs` metric.
> >
> > 2. Large messages can cause GC issues (it's particularly bad if fetch
> down
> > conversion takes place). One can currently configure the max message
> batch
> > size per topic to keep this under control, but that is the size after
> > compression. However, we decompress the batch to validate produce
> requests
> > and we decompress and recompress during fetch downconversion). It would
> be
> > helpful to have topic metrics for the produce message batch size after
> > decompression (and perhaps compressed as well as that would help
> understand
> > the compression ratio).
> >
> > 3. Authentication success/failures per second. This is helpful to
> > understand if some clients are misconfigured or if bad actors are trying
> to
> > authenticate.
> >
> > Thoughts?
> >
> > Ismael
> >
> >
> >
> > On Wed, Aug 23, 2017 at 2:53 AM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Yes, if those error metrics are registered dynamically, we could worry
> > > about expiration later.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Aug 18, 2017 at 1:55 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Perhaps we could register dynamically for now. If we find that the
> cost
> > > of
> > > > retaining these is high, we can add the code to expire them later. Is
> > > that
> > > > ok?
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > >
> > > > On Fri, Aug 18, 2017 at 9:41 AM, Ismael Juma 
> > wrote:
> > > >
> > > > > Can we quantify the cost of having these metrics around if they are
> > > > > dynamically registered? Given that the maximum is bounded at
> > > development
> > > > > time, is it really worth all the extra code?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Fri, Aug 18, 2017 at 9:34 AM, Rajini Sivaram <
> > > rajinisiva...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > It feels more consistent to add errors as yammer metrics similar
> to
> > > > other
> > > > > > request metrics. Perhaps we could add some code to track and
> remove
> > > > these
> > > > > > if unused? It is a bit more work, but it would keep the externals
> > > > > > consistent.
> > > > > >
> > > > > > Ismael/Manikumar,
> > > > > >
> > > > > > Agree that version as a String attribute makes more sense.
> > > > Unfortunately,
> > > > > > the whole KafkaMetric implementation is written around a single
> > > > "double"
> > > > > > type, so introducing a new type is a big change. But I suppose it
> > can
> > > > be
> > > > > > done. I have updated the KIP.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 18, 2017 at 7:42 AM, Manikumar <
> > > manikumar.re...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I agree it will be good if we can add  "commit id/version" as
> an
> > > > > > > attribute value.
> > > > > > > It will be easy to parse. But as of now, KafkaMetric supports
> > only
> > > > > > > numerical values.
> > > > > > >
> > > > > > > On Fri, Aug 18, 2017 at 5:49 AM, Ismael Juma <
> ism...@juma.me.uk>
> > > > > wrote:
> > > > > > >
> > > > 

Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
+1 (non-binding).  Thanks, Rajini

On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma  wrote:

> Thanks for the KIP, +1 (binding) from me.
>
> Ismael
>
> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I would like to start vote on KIP-152 to improve diagnostics of
> > authentication failures and to update clients to treat authentication
> > failures as fatal exceptions rather than transient errors:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> >
> > Thank you...
> >
> > Rajini
> >
>


[jira] [Created] (KAFKA-5805) The console consumer crushes broker on Windows

2017-08-29 Thread Aleksandar (JIRA)
Aleksandar created KAFKA-5805:
-

 Summary: The console consumer crushes broker on Windows
 Key: KAFKA-5805
 URL: https://issues.apache.org/jira/browse/KAFKA-5805
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
 Environment: Windows 7 x86 || Windows 10 x64
jre1.8.0_144
jdk1.8.0_144


Reporter: Aleksandar
Priority: Critical


I was just following Quick start guide  on 
http://kafka.apache.org/documentation.html#quickstart

Started ZooKeeper server
Started Kafka Server
Created Test topic
Published message under test topic

On other terminal windows I run kafka-console-consumer.bat and that crushed 
Broker with following exception:

java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(Unknown Source)
at kafka.log.AbstractIndex.(AbstractIndex.scala:63)
at kafka.log.OffsetIndex.(OffsetIndex.scala:52)
at kafka.log.LogSegment.(LogSegment.scala:77)
at kafka.log.Log.loadSegments(Log.scala:385)
at kafka.log.Log.(Log.scala:179)
at kafka.log.Log$.apply(Log.scala:1580)
at kafka.log.LogManager$$anonfun$createLog$1.apply(LogManager.scala:417)
at kafka.log.LogManager$$anonfun$createLog$1.apply(LogManager.scala:412)
at scala.Option.getOrElse(Option.scala:121)
at kafka.log.LogManager.createLog(LogManager.scala:412)
at 
kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:122)
at 
kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:119)
at kafka.utils.Pool.getAndMaybePut(Pool.scala:70)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:118)
at 
kafka.cluster.Partition$$anonfun$3$$anonfun$5.apply(Partition.scala:179)
at 
kafka.cluster.Partition$$anonfun$3$$anonfun$5.apply(Partition.scala:179)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.cluster.Partition$$anonfun$3.apply(Partition.scala:179)
at kafka.cluster.Partition$$anonfun$3.apply(Partition.scala:173)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221)
at kafka.cluster.Partition.makeLeader(Partition.scala:173)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:929)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:928)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:928)
at 
kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:873)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:168)
at kafka.server.KafkaApis.handle(KafkaApis.scala:101)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:66)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
... 43 more
[2017-08-29 18:36:37,110] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 0 milliseconds. 
(kafka.coordinator.group.GroupMetadataManager)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Jason Gustafson
Thanks, +1

On Tue, Aug 29, 2017 at 8:37 AM, Gwen Shapira  wrote:

> +1
> Thank you for this improvement!
>
> On Tue, Aug 29, 2017 at 7:46 AM, Mickael Maison 
> wrote:
>
> > +1 non binding
> > Thanks
> >
> > On Tue, Aug 29, 2017 at 2:17 AM, Manikumar 
> > wrote:
> > > +1 (non-binding)
> > >
> > > On Tue, Aug 29, 2017 at 2:43 PM, Stevo Slavić 
> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> On Tue, Aug 29, 2017 at 11:09 AM, Ismael Juma 
> > wrote:
> > >>
> > >> > Thanks for the KIP, +1 (binding) from me.
> > >> >
> > >> > Ismael
> > >> >
> > >> > On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Hi all,
> > >> > >
> > >> > > I would like to start the vote on KIP-187 that adds a cumulative
> > count
> > >> > > metric associated with each Kafka rate metric to improve
> downstream
> > >> > > processing of rate metrics. Details are here:
> > >> > >
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
> > >> > >
> > >> > >
> > >> > > Thank you,
> > >> > >
> > >> > > Rajini
> > >> > >
> > >> >
> > >>
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [DISCUSS] KIP-189: Improve principal builder interface and add support for SASL

2017-08-29 Thread Jason Gustafson
Hi Colin,

Thanks for the comments. Seems reasonable to provide a safer `equals` for
extensions. I don't think this needs to be part of the KIP, but I can add
it to my patch.

Moving `fromString` makes sense also. This method is technically already
part of the public API, which means we should probably deprecate it instead
of removing it from `KafkaPrincipal`. I'll mention this in the KIP.

Thanks,
Jason

On Mon, Aug 28, 2017 at 5:50 PM, Ted Yu  wrote:

> bq. change the check in equals() to be this.getClass().equals(other.
> getClass())
>
> I happened to have Effective Java on hand.
> Please take a look at related discussion on page 39.
>
> Josh later on mentioned Liskov substitution principle and a workaround
> (favoring composition).
>
> FYI
>
>
>
> On Mon, Aug 28, 2017 at 4:48 PM, Colin McCabe  wrote:
>
> > Thanks, Jason, this is a great improvement!
> >
> > One minor nit.  The current KafkaPrincipal#equals looks like this:
> >
> > >@Override
> > >public boolean equals(Object o) {
> > >if (this == o) return true;
> > >if (!(o instanceof KafkaPrincipal)) return false;
> > >
> > >KafkaPrincipal that = (KafkaPrincipal) o;
> > >
> > >if (!principalType.equals(that.principalType)) return false;
> > >return name.equals(that.name);
> > >}
> >
> > So if I implement MyKafkaPrincipalWithGroup that has an extra groupName
> > field, I can have this situation:
> >
> > > KafkaPrincipal oldPrincipal = new KafkaPrincipal("User", "foo");
> > > MyKafkaPrincipalWithGroup newPrincipal =
> > >new MyKafkaPrincipalWithGroup("User", "foo", "mygroup")
> > > System.out.println("" + oldPrincipal == newPrincipal) // true
> > > System.out.println("" + newPrincipal == oldPrincipal) // false
> >
> > This is clearly bad, because it makes equality non-transitive.  The
> > reason for this problem is because KafkaPrincipal#equals checks if
> > MyKafkaPrincipalWithGroup is an instance of KafkaPrincipal-- and it is.
> > It then proceeds to check if the user is the same-- and it is.  So it
> > returns true.  It does not check the groups field, because it doesn't
> > know about it.  On the other hand, MyKafkaPrincipalWithGroup#equals will
> > check to see KafkaPrincipal is an instance of
> > MyKafkaPrincipalWithGroup-- and it isn't.  So it returns false.
> >
> > In the KafkaPrincipal base class, it would be better to change the check
> > in equals() to be this.getClass().equals(other.getClass()).  In other
> > words, check for exact class equality, not instanceof.
> >
> > Alternately, we could implement a final equals method in the base class
> > that compares by the toString method, under the assumption that any
> > difference in KafkaPrincipal objects should be reflected in their
> > serialized form.
> >
> > >@Override
> > >public final boolean equals(Object o) {
> > >if (this == o) return true;
> > >if (!(o instanceof KafkaPrincipal)) return false;
> > >return toString().equals(o.toString());
> > >}
> >
> > Another question related to subclassing KafkaPrincipal: should we move
> > KafkaPrincipal#fromString into an internal, non-public class?  It seems
> > like people might expect
> > KafkaPrincipal.fromString(myCustomPrincipal.toString()) to work, but it
> > will not for subclasses.
> >
> > best,
> > Colin
> >
> >
> > On Mon, Aug 28, 2017, at 15:51, Jason Gustafson wrote:
> > > Thanks all for the discussion. I'll begin a vote in the next couple
> days.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 25, 2017 at 5:01 PM, Don Bosco Durai 
> > > wrote:
> > >
> > > > Jason, thanks for the clarification.
> > > >
> > > > Bosco
> > > >
> > > >
> > > > On 8/25/17, 4:59 PM, "Jason Gustafson"  wrote:
> > > >
> > > > Hey Don,
> > > >
> > > > That is not actually part of the KIP. It was a (somewhat
> pedantic)
> > > > example
> > > > used to illustrate how the kafka principal semantics could be
> > applied
> > > > to
> > > > authorizers which understood group-level ACLs. The key point is
> > this:
> > > > although a principal is identified only by its type and name, the
> > > > KafkaPrincipal can be used to represent relations to other
> > principals.
> > > > In
> > > > this case, we have a user principal which is related to a group
> > > > principal
> > > > through the UserPrincipalAndGroup object. A GroupAuthorizer could
> > then
> > > > leverage this relation. As you suggest, a true implementation
> would
> > > > allow
> > > > multiple groups.
> > > >
> > > > I will add a note to the KIP to emphasize that this is just an
> > example.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Fri, Aug 25, 2017 at 4:37 PM, Don Bosco Durai <
> bo...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > Jason, thanks for confirming that. Since there are existing
> > custom
> > > > > plugins, we might have to give enough time for 

[VOTE] KIP-183 - Change PreferredReplicaLeaderElectionCommand to use AdminClient

2017-08-29 Thread Tom Bentley
Hi all,

I would like to start the vote on KIP-183 which will provide an AdminClient
interface for electing the preferred replica, and refactor the
kafka-preferred-replica-election.sh tool to use this interface. More
details here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-183+-+Change+PreferredReplicaLeaderElectionCommand+to+use+AdminClient


Regards,

Tom


Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Gwen Shapira
+1
Thank you for this improvement!

On Tue, Aug 29, 2017 at 7:46 AM, Mickael Maison 
wrote:

> +1 non binding
> Thanks
>
> On Tue, Aug 29, 2017 at 2:17 AM, Manikumar 
> wrote:
> > +1 (non-binding)
> >
> > On Tue, Aug 29, 2017 at 2:43 PM, Stevo Slavić  wrote:
> >
> >> +1 (non-binding)
> >>
> >> On Tue, Aug 29, 2017 at 11:09 AM, Ismael Juma 
> wrote:
> >>
> >> > Thanks for the KIP, +1 (binding) from me.
> >> >
> >> > Ismael
> >> >
> >> > On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram <
> rajinisiva...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I would like to start the vote on KIP-187 that adds a cumulative
> count
> >> > > metric associated with each Kafka rate metric to improve downstream
> >> > > processing of rate metrics. Details are here:
> >> > >
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
> >> > >
> >> > >
> >> > > Thank you,
> >> > >
> >> > > Rajini
> >> > >
> >> >
> >>
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



[GitHub] kafka pull request #3754: KAFKA-5804: retain duplicates in ChangeLoggingWind...

2017-08-29 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/3754

KAFKA-5804: retain duplicates in ChangeLoggingWindowBytesStore

`ChangeLoggingWindowBytesStore` needs to have the same `retainDuplicates` 
functionality as `RocksDBWindowStore` else data could be lost upon 
failover/restoration.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka hotfix-changelog-window-store

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3754


commit d63358a67a98d2fbd53ea47f9b7d54c5dd65b937
Author: Damian Guy 
Date:   2017-08-29T15:18:31Z

retain duplicates in ChangeLoggingWindowBytesStore




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5804) ChangeLoggingWindowBytesStore needs to retain duplicates when writing to the log

2017-08-29 Thread Damian Guy (JIRA)
Damian Guy created KAFKA-5804:
-

 Summary: ChangeLoggingWindowBytesStore needs to retain duplicates 
when writing to the log
 Key: KAFKA-5804
 URL: https://issues.apache.org/jira/browse/KAFKA-5804
 Project: Kafka
  Issue Type: Bug
Reporter: Damian Guy
Assignee: Damian Guy


The {{ChangeLoggingWindowBytesStore}} needs to have the same duplicate 
retaining logic as {{RocksDBWindowStore}} otherwise data loss may occur when 
performing windowed joins. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Mickael Maison
+1 non binding
Thanks

On Tue, Aug 29, 2017 at 2:17 AM, Manikumar  wrote:
> +1 (non-binding)
>
> On Tue, Aug 29, 2017 at 2:43 PM, Stevo Slavić  wrote:
>
>> +1 (non-binding)
>>
>> On Tue, Aug 29, 2017 at 11:09 AM, Ismael Juma  wrote:
>>
>> > Thanks for the KIP, +1 (binding) from me.
>> >
>> > Ismael
>> >
>> > On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram > >
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > I would like to start the vote on KIP-187 that adds a cumulative count
>> > > metric associated with each Kafka rate metric to improve downstream
>> > > processing of rate metrics. Details are here:
>> > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
>> > >
>> > >
>> > > Thank you,
>> > >
>> > > Rajini
>> > >
>> >
>>


[GitHub] kafka pull request #3753: Allow timestamp parameter in `ProcessorTopologyTes...

2017-08-29 Thread sebigavril
GitHub user sebigavril opened a pull request:

https://github.com/apache/kafka/pull/3753

Allow timestamp parameter in `ProcessorTopologyTestDriver.process`

All current implementations process records using the same timestamp. This 
makes it difficult to test operations that require time windows, like 
`KStream-KStream joins`. 

This change would allow tests to simulate records created at different 
times, thus making it possible to test operations like the above mentioned 
joins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sebigavril/kafka 
allow-timestamps-in-test-driver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3753


commit e8d3f4fdb1d28ca8710afe64bacd9231bd7a82b7
Author: Sebastian Gavril 
Date:   2017-08-29T14:34:10Z

Allow timestamp parameter in ProcessorTopologyTestDriver.process




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Rajini Sivaram
Hi Ismael,

Thank you for the suggestions. The additional metrics sound very useful and
I have added them to the KIP.

Regards,

Rajini

On Tue, Aug 29, 2017 at 5:34 AM, Ismael Juma  wrote:

> Hi Rajini,
>
> There are a few other metrics that could potentially be useful. I'd be
> interested in what you and the community thinks:
>
> 1. The KIP currently includes `FetchDownConversionsPerSec`, which is
> useful. In the common case, one would want to avoid down conversion by
> using the lower message format supported by most of the consumers. However,
> there are good reasons to use a newer message format even if there are some
> legacy consumers around. It would be good to quantify the cost of these
> consumers a bit more clearly. Looking at the request metric `LocalTimeMs`
> provides a hint, but it may be useful to have a dedicated
> `FetchDownConversionsMs` metric.
>
> 2. Large messages can cause GC issues (it's particularly bad if fetch down
> conversion takes place). One can currently configure the max message batch
> size per topic to keep this under control, but that is the size after
> compression. However, we decompress the batch to validate produce requests
> and we decompress and recompress during fetch downconversion). It would be
> helpful to have topic metrics for the produce message batch size after
> decompression (and perhaps compressed as well as that would help understand
> the compression ratio).
>
> 3. Authentication success/failures per second. This is helpful to
> understand if some clients are misconfigured or if bad actors are trying to
> authenticate.
>
> Thoughts?
>
> Ismael
>
>
>
> On Wed, Aug 23, 2017 at 2:53 AM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Yes, if those error metrics are registered dynamically, we could worry
> > about expiration later.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 18, 2017 at 1:55 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Perhaps we could register dynamically for now. If we find that the cost
> > of
> > > retaining these is high, we can add the code to expire them later. Is
> > that
> > > ok?
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > >
> > > On Fri, Aug 18, 2017 at 9:41 AM, Ismael Juma 
> wrote:
> > >
> > > > Can we quantify the cost of having these metrics around if they are
> > > > dynamically registered? Given that the maximum is bounded at
> > development
> > > > time, is it really worth all the extra code?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Aug 18, 2017 at 9:34 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > It feels more consistent to add errors as yammer metrics similar to
> > > other
> > > > > request metrics. Perhaps we could add some code to track and remove
> > > these
> > > > > if unused? It is a bit more work, but it would keep the externals
> > > > > consistent.
> > > > >
> > > > > Ismael/Manikumar,
> > > > >
> > > > > Agree that version as a String attribute makes more sense.
> > > Unfortunately,
> > > > > the whole KafkaMetric implementation is written around a single
> > > "double"
> > > > > type, so introducing a new type is a big change. But I suppose it
> can
> > > be
> > > > > done. I have updated the KIP.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > > On Fri, Aug 18, 2017 at 7:42 AM, Manikumar <
> > manikumar.re...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I agree it will be good if we can add  "commit id/version" as an
> > > > > > attribute value.
> > > > > > It will be easy to parse. But as of now, KafkaMetric supports
> only
> > > > > > numerical values.
> > > > > >
> > > > > > On Fri, Aug 18, 2017 at 5:49 AM, Ismael Juma 
> > > > wrote:
> > > > > >
> > > > > > > Hi Rajini,
> > > > > > >
> > > > > > > About the gauges, I was thinking that the attribute would be
> the
> > > > value
> > > > > > > (i.e. commit id or version). I understand that Kafka Metrics
> > > doesn't
> > > > > > > support this (unlike Yammer Metrics), but would it make sense
> to
> > > add?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Thu, Aug 17, 2017 at 2:54 PM, Rajini Sivaram <
> > > > > rajinisiva...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ismael,
> > > > > > > >
> > > > > > > > Thank you for the review.
> > > > > > > >
> > > > > > > > 1. Agree on keeping it simple with dynamic registration and
> no
> > > > > expiry.
> > > > > > > Will
> > > > > > > > wait for Jun's feedback before updating KIP.
> > > > > > > > 2. I have switched to two metrics for commit-id and version
> > (not
> > > > sure
> > > > > > if
> > > > > > > it
> > > > > > > > matches what you meant). I also added the client-id tag which
> > is
> > > > used
> > > > > > in
> > > > > > > > all metrics from clients.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Rajini

Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-29 Thread Ismael Juma
Thanks for the proposals. I think they make sense and I also agree with
Jason's suggestions. Also, it would be good to include the updated
ProduceRequest/Response schema in the KIP.

Ismael

On Tue, Aug 22, 2017 at 11:42 PM, Jason Gustafson 
wrote:

> Thanks Apurva,
>
> On compatibility: I think the proposal makes sense. It's a pity that we
> can't support idempotence for 0.11.0.0 brokers in the "safe" mode even if
> it is supported by the broker. I can already imagine users complaining
> about this, but I guess it's the consequence of missing the impact of that
> validation check and not thinking through the ultimate goal of enabling
> idempotence by default. A couple minor comments:
>
> 1. Instead of "safe," Ismael suggested "requested" as an alternative. That
> seems to suggest more clearly that idempotence will only be used when the
> broker supports it.
> 2. Should we deprecate the "true" and "false" options? It's a little weird
> long term to support them in addition to the descriptive names.
>
> On the OutOfOrderSequence proposal: high-level, the design makes sense. A
> couple questions:
>
> 1. With this proposal, OutOfOrderSequence means that we must have a last
> produced offset. Is the idea to expose that in the
> OutOfOrderSequenceException so that users know which data was lost?
> 2. Previously we discussed duplicate handling. Currently we raise
> OutOfOrderSequence if we happen to get a sequence number which is earlier
> than the sequence numbers we have cached. Alternatively, you suggested we
> can return a separate DuplicateError for this case, which clients can
> ignore if they do not care about the offset. I think it might make sense to
> include that here so that the OutOfOrderSequence error is unambiguous.
>
> Finally, do you plan to roll these proposals into the current KIP or do
> them separately? Probably makes sense to combine them since they both
> require a bump to the ProduceRequest.
>
> Thanks,
> Jason
>
>
>
> On Fri, Aug 18, 2017 at 5:18 PM, Apurva Mehta  wrote:
>
> > Thanks Jason and Ismael.
> >
> > The message format problem is an acute one: if we enable idempotence by
> > default, the UnsupportedVersionException when writing to topics with the
> > older message format would mean that our prescribed upgrade steps would
> not
> > work. I have detailed the problems and the solutions on this page (linked
> > to from the wiki):
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Exactly+Once+-+Dealing+with+older+message+formats+
> > when+idempotence+is+enabled
> >
> > It is worth discussing the solution to the problem proposed there. If it
> is
> > conceptually sound, it doesnt' seem too hard to implement.
> >
> > As far as the other problem of the spurious OutOfOrderSequence problem, I
> > have documented a proposed solution here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Exactly+Once+-+Solving+the+problem+of+spurious+
> > OutOfOrderSequence+errors
> >
> > This solution is a bit more involved in terms of effort.
> >
> > I think we cannot make the idempotent producer the default unless we
> solve
> > the message format compatibility problem. I would also prefer to solve
> the
> > second problem before making idempotence the default.
> >
> > I would be interested to hear everyone's thoughts on the two solutions
> > proposed above.
> >
> > Thanks,
> > Apurva
> >
> > On Fri, Aug 18, 2017 at 9:24 AM, Jason Gustafson 
> > wrote:
> >
> > > >
> > > >  so this change will break client backward compatibility while
> > connecting
> > > > to 0.10.X brokers.
> > > >  users need to change producer default settings while connecting
> older
> > > > brokers.
> > >
> > >
> > > At the moment, I think the answer is yes. The old broker will not
> support
> > > the InitProducerId request, so the producer will immediately fail.
> > Similar
> > > to the handling of old message formats mentioned above, we probably
> need
> > to
> > > change this so that we just revert to old producer semantics if the
> > broker
> > > can't support idempotence.
> > >
> > > -Jason
> > >
> > >
> > > On Fri, Aug 18, 2017 at 8:48 AM, Manikumar 
> > > wrote:
> > >
> > > > >
> > > > > 3. The message format requirement is a good point. This should be
> > > > mentioned
> > > > > in the compatibility section. Users who are still using the old
> > message
> > > > > format will get an error after the upgrade, right?
> > > > >
> > > >
> > > >  so this change will break client backward compatibility while
> > connecting
> > > > to 0.10.X brokers.
> > > >  users need to change producer default settings while connecting
> older
> > > > brokers.
> > > >
> > >
> >
>


Build failed in Jenkins: kafka-0.11.0-jdk7 #288

2017-08-29 Thread Apache Jenkins Server
See 


Changes:

[damian.guy] KAFKA-5787; StoreChangelogReader needs to restore partitions that 
were

--
[...truncated 2.44 MB...]

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldNotWriteToChangeLogOnPutIfAbsentWhenValueForKeyExists PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnNullOnPutIfAbsentWhenNoPreviousValue STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnNullOnPutIfAbsentWhenNoPreviousValue PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldLogKeyNullOnDelete STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldLogKeyNullOnDelete PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnNullOnGetWhenDoesntExist STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnNullOnGetWhenDoesntExist PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnOldValueOnDelete STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnOldValueOnDelete PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldNotWriteToInnerOnPutIfAbsentWhenValueForKeyExists STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldNotWriteToInnerOnPutIfAbsentWhenValueForKeyExists PASSED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnValueOnGetWhenExists STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStoreTest > 
shouldReturnValueOnGetWhenExists PASSED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldRemove STARTED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldRemove PASSED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldPutAndFetch STARTED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldPutAndFetch PASSED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldRollSegments STARTED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldRollSegments PASSED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldFindValuesWithinRange STARTED

org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest > 
shouldFindValuesWithinRange PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
STARTED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled 
STARTED

org.apache.kafka.streams.StreamsConfigTest > 

Re: [DISCUSS] 0.11.0.1 bug fix release

2017-08-29 Thread Damian Guy
We still have 2 outstanding issues to close. They both have patches
available and will hopefully be completed shortly.

Thanks,
Damian

On Thu, 24 Aug 2017 at 10:05 Damian Guy  wrote:

> A quick update. There are 2 remaining issues, that both have patches
> available. Hopefully they will be merged soon and we can begin:
> https://issues.apache.org/jira/projects/KAFKA/versions/12340632
>
> Thanks,
> Damian
>
> On Tue, 22 Aug 2017 at 10:29 Damian Guy  wrote:
>
>> An update on the 0.11.0.1 release. We currently have 5 outstanding
>> issues: https://issues.apache.org/jira/projects/KAFKA/versions/12340632
>> 3 with patches available that we can hopefully get merged pretty soon (1
>> is actually already on 0.11.0.1)
>> 2 issues that are Open, 1 is unassigned.
>> Hopefully we can get this cleaned up in the next day or two and then i
>> can go about building an RC.
>>
>> Thanks,
>> Damian
>>
>> On Thu, 17 Aug 2017 at 17:45 Damian Guy  wrote:
>>
>>> Just a quick update.
>>>
>>> The list has reduced to 6 remaining issues:
>>> https://issues.apache.org/jira/projects/KAFKA/versions/12340632
>>>
>>> Thanks to everyone for completing and/or moving tickets to future
>>> releases.
>>>
>>> Damian
>>>
>>>
>>> On Thu, 17 Aug 2017 at 09:50 Damian Guy  wrote:
>>>
 Hi Srikanth,
 Optimistically i'm aiming for end of next week. Though it depends on
 how quickly the outstanding issues are closed and any other blockers that
 arise.

 Thanks,
 Damian

 On Thu, 17 Aug 2017 at 07:59 Srikanth Sampath <
 ssampath.apa...@gmail.com> wrote:

> Thanks Damian.  What's the ballpark when 0.11.0.1 will be available?
> -Srikanth
>
> On Wed, Aug 16, 2017 at 5:59 PM, Damian Guy 
> wrote:
>
> > Hi,
> >
> > It seems like it must be time for 0.11.0.1 bug fix release!
> >
> > Since the 0.11.0.0 release we've fixed 30 JIRAs that
> > are targeted for 0.11.0.1:
> >
> > https://issues.apache.org/jira/browse/KAFKA-5700?jql=
> > project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Fixed%
> > 20AND%20fixVersion%20%3D%200.11.0.1%20ORDER%20BY%
> > 20priority%20DESC%2C%20key%20DESC
> >
> > We have 15 outstanding issues that are targeted at 0. <
> http://0.10.2.1/>
> > 11.0.1:
> >
> > https://issues.apache.org/jira/browse/KAFKA-5567?jql=
> > project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%
> > 20fixVersion%20%3D%200.11.0.1%20ORDER%20BY%20priority%
> > 20DESC%2C%20key%20DESC
> >
> > Can the owners of the remaining issues please resolve them or move
> them to
> > a future release.
> >
> > As soon as the remaining tasks for 0.11.0.1 reaches zero i'll create
> the
> > first RC.
> >
> > Thanks,
> > Damian
> >
>



Re: [DISCUSS] KIP-191: KafkaConsumer.subscribe() overload that takes just Pattern

2017-08-29 Thread Ismael Juma
Sounds good to me too. Since this is a non controversial change, I suggest
starting the vote in 1-2 days if no-one else comments.

Ismael

On Thu, Aug 24, 2017 at 7:32 PM, Jason Gustafson  wrote:

> Seems reasonable. I don't recall any specific reason for not providing this
> method initially.
>
> -Jason
>
> On Thu, Aug 24, 2017 at 5:50 AM, Attila Kreiner  wrote:
>
> > Hi All,
> >
> > I created KIP-191:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 191%3A+KafkaConsumer.subscribe%28%29+overload+that+takes+just+Pattern
> >
> > Jira: https://issues.apache.org/jira/browse/KAFKA-5726
> > PR: https://github.com/apache/kafka/pull/3669
> >
> > Please check it.
> >
> > Thanks,
> > Attila
> >
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Ismael Juma
Hi Rajini,

There are a few other metrics that could potentially be useful. I'd be
interested in what you and the community thinks:

1. The KIP currently includes `FetchDownConversionsPerSec`, which is
useful. In the common case, one would want to avoid down conversion by
using the lower message format supported by most of the consumers. However,
there are good reasons to use a newer message format even if there are some
legacy consumers around. It would be good to quantify the cost of these
consumers a bit more clearly. Looking at the request metric `LocalTimeMs`
provides a hint, but it may be useful to have a dedicated
`FetchDownConversionsMs` metric.

2. Large messages can cause GC issues (it's particularly bad if fetch down
conversion takes place). One can currently configure the max message batch
size per topic to keep this under control, but that is the size after
compression. However, we decompress the batch to validate produce requests
and we decompress and recompress during fetch downconversion). It would be
helpful to have topic metrics for the produce message batch size after
decompression (and perhaps compressed as well as that would help understand
the compression ratio).

3. Authentication success/failures per second. This is helpful to
understand if some clients are misconfigured or if bad actors are trying to
authenticate.

Thoughts?

Ismael



On Wed, Aug 23, 2017 at 2:53 AM, Jun Rao  wrote:

> Hi, Rajini,
>
> Yes, if those error metrics are registered dynamically, we could worry
> about expiration later.
>
> Thanks,
>
> Jun
>
> On Fri, Aug 18, 2017 at 1:55 AM, Rajini Sivaram 
> wrote:
>
> > Perhaps we could register dynamically for now. If we find that the cost
> of
> > retaining these is high, we can add the code to expire them later. Is
> that
> > ok?
> >
> > Regards,
> >
> > Rajini
> >
> >
> >
> > On Fri, Aug 18, 2017 at 9:41 AM, Ismael Juma  wrote:
> >
> > > Can we quantify the cost of having these metrics around if they are
> > > dynamically registered? Given that the maximum is bounded at
> development
> > > time, is it really worth all the extra code?
> > >
> > > Ismael
> > >
> > > On Fri, Aug 18, 2017 at 9:34 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Jun,
> > > >
> > > > It feels more consistent to add errors as yammer metrics similar to
> > other
> > > > request metrics. Perhaps we could add some code to track and remove
> > these
> > > > if unused? It is a bit more work, but it would keep the externals
> > > > consistent.
> > > >
> > > > Ismael/Manikumar,
> > > >
> > > > Agree that version as a String attribute makes more sense.
> > Unfortunately,
> > > > the whole KafkaMetric implementation is written around a single
> > "double"
> > > > type, so introducing a new type is a big change. But I suppose it can
> > be
> > > > done. I have updated the KIP.
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Fri, Aug 18, 2017 at 7:42 AM, Manikumar <
> manikumar.re...@gmail.com>
> > > > wrote:
> > > >
> > > > > I agree it will be good if we can add  "commit id/version" as an
> > > > > attribute value.
> > > > > It will be easy to parse. But as of now, KafkaMetric supports only
> > > > > numerical values.
> > > > >
> > > > > On Fri, Aug 18, 2017 at 5:49 AM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > Hi Rajini,
> > > > > >
> > > > > > About the gauges, I was thinking that the attribute would be the
> > > value
> > > > > > (i.e. commit id or version). I understand that Kafka Metrics
> > doesn't
> > > > > > support this (unlike Yammer Metrics), but would it make sense to
> > add?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Aug 17, 2017 at 2:54 PM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ismael,
> > > > > > >
> > > > > > > Thank you for the review.
> > > > > > >
> > > > > > > 1. Agree on keeping it simple with dynamic registration and no
> > > > expiry.
> > > > > > Will
> > > > > > > wait for Jun's feedback before updating KIP.
> > > > > > > 2. I have switched to two metrics for commit-id and version
> (not
> > > sure
> > > > > if
> > > > > > it
> > > > > > > matches what you meant). I also added the client-id tag which
> is
> > > used
> > > > > in
> > > > > > > all metrics from clients.
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Rajini
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Aug 17, 2017 at 10:55 AM, Ismael Juma <
> ism...@juma.me.uk
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP, Rajini. I think this is helpful too. A
> few
> > > > minor
> > > > > > > > comments.
> > > > > > > >
> > > > > > > > 1. About the number of metrics and expiration, if we
> > dynamically
> > > > > > register
> > > > > > > > metrics for the error codes, the number is likely to be much
> > > lower
> > > > > than
> > > > > > > > 

Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Manikumar
+1 (non-binding)

On Tue, Aug 29, 2017 at 2:43 PM, Stevo Slavić  wrote:

> +1 (non-binding)
>
> On Tue, Aug 29, 2017 at 11:09 AM, Ismael Juma  wrote:
>
> > Thanks for the KIP, +1 (binding) from me.
> >
> > Ismael
> >
> > On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram  >
> > wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote on KIP-187 that adds a cumulative count
> > > metric associated with each Kafka rate metric to improve downstream
> > > processing of rate metrics. Details are here:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
> > >
> > >
> > > Thank you,
> > >
> > > Rajini
> > >
> >
>


Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Stevo Slavić
+1 (non-binding)

On Tue, Aug 29, 2017 at 11:09 AM, Ismael Juma  wrote:

> Thanks for the KIP, +1 (binding) from me.
>
> Ismael
>
> On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I would like to start the vote on KIP-187 that adds a cumulative count
> > metric associated with each Kafka rate metric to improve downstream
> > processing of rate metrics. Details are here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
> >
> >
> > Thank you,
> >
> > Rajini
> >
>


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Ismael Juma
Thanks for the KIP, +1 (binding) from me.

Ismael

On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> I would like to start vote on KIP-152 to improve diagnostics of
> authentication failures and to update clients to treat authentication
> failures as fatal exceptions rather than transient errors:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 152+-+Improve+diagnostics+for+SASL+authentication+failures
>
> Thank you...
>
> Rajini
>


Re: [VOTE] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-29 Thread Ismael Juma
Thanks for the KIP, +1 (binding) from me.

Ismael

On Thu, Aug 24, 2017 at 6:48 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> I would like to start the vote on KIP-187 that adds a cumulative count
> metric associated with each Kafka rate metric to improve downstream
> processing of rate metrics. Details are here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
>
>
> Thank you,
>
> Rajini
>


[GitHub] kafka pull request #3747: KAFKA-5787: StoreChangelogReader needs to restore ...

2017-08-29 Thread dguy
Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/3747


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---