Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-15 Thread Becket Qin
Hi Apurva,

Thanks for the clarification of the definition. The definitions are clear
and helpful.

It seems the scope of this KIP is just about the producer side
configuration change, but not attempting to achieve the exactly once
semantic with all default settings out of the box. The broker still needs
to be configured appropriately to achieve the exactly once semantic. If so,
the current proposal sounds reasonable to me. Apologies if I misunderstood
the goal of this KIP.

Regarding the max.in.flight.requests.per.connection, I don't think we have
to support infinite number of in flight requests. But admittedly there are
use cases that people would want to have reasonably high in flight
requests. Given that we need to make code changes to support idempotence
and in.flight.request > 1, it would be nice to see if we can cover those
use cases instead of doing that later. We can discuss this in a separate
thread.

Thanks,

Jiangjie (Becket) Qin


On Tue, Aug 15, 2017 at 1:46 PM, Guozhang Wang  wrote:

> Hi Jay,
>
> I chatted with Apurva offline, and we think the key of the discussion is
> that, as summarized in the updated KIP wiki, whether we should consider
> replication as a necessary condition of at-least-once, and of course also
> exactly-once. Originally I think replication is not a necessary condition
> for at-least-once, since the scope of failures that we should be covering
> is different in my definition; if we claim that "even for at-least-once,
> you should have replication factor larger than 2, let alone exactly-once"
> then I agree that having acks=all on the client side should also be a
> necessary condition for at-least-once, and for exactly-once as well. Then
> this KIP would be just providing what is necessary but not sufficient
> conditions, from client-side configs to achieve EOS, while you also need
> the broker-side configs together to really support it.
>
> Guozhang
>
>
> On Tue, Aug 15, 2017 at 1:15 PM, Jay Kreps  wrote:
>
> > Hey Guozhang,
> >
> > I think the argument is that with acks=1 the message could be lost and
> > hence you aren't guaranteeing exactly once delivery.
> >
> > -Jay
> >
> > On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang 
> wrote:
> >
> > > Just want to clarify that regarding 1), I'm fine with changing it to
> > `all`
> > > but just wanted to argue it is not necessarily correlate with the
> > > exactly-once semantics, but rather on persistence v.s. availability
> > > trade-offs, so I'd like to discuss them separately.
> > >
> > > Regarding 2), one minor concern I had is that the enforcement is on the
> > > client side while the parts it affects is on the broker side. I.e. the
> > > broker code would assume at most 5 in.flight when idempotent is turned
> > on,
> > > but this is not enforced at the broker but relying at the client side's
> > > sanity. So other implementations of the client that may not obey this
> may
> > > likely break the broker code. If we do enforce this we'd better enforce
> > it
> > > at the broker side. Also, I'm wondering if we have considered the
> > approach
> > > for brokers to read the logs in order to get the starting offset when
> it
> > > does not about it in its snapshot, that whether it is worthwhile if we
> > > assume that such issues are very rare to happen?
> > >
> > >
> > > Guozhang
> > >
> > >
> > >
> > > On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I just want to summarize where we are in this discussion
> > > >
> > > > There are two major points of contention: should we have acks=1 or
> > > acsk=all
> > > > by default? and how to cap max.in.flight.requests.per.connection?
> > > >
> > > > 1) acks=1 vs acks=all1
> > > >
> > > > Here are the tradeoffs of each:
> > > >
> > > > If you have replication-factor=N, your data is resilient N-1 to disk
> > > > failures. For N>1, here is the tradeoff between acks=1 and acks=all.
> > > >
> > > > With proposed defaults and acks=all, the stock Kafka producer and the
> > > > default broker settings would guarantee that ack'd messages would be
> in
> > > the
> > > > log exactly once.
> > > >
> > > > With the proposed defaults and acks=1, the stock Kafka producer and
> the
> > > > default broker settings would guarantee that 'retained ack'd messages
> > > would
> > > > be in the log exactly once. But all ack'd messages may not be
> > retained'.
> > > >
> > > > If you leave replication-factor=1, acks=1 and acks=all have identical
> > > > semantics and performance, but you are resilient to 0 disk failures.
> > > >
> > > > I think the measured cost (again the performance details are in the
> > wiki)
> > > > of acks=all is well worth the much clearer semantics. What does the
> > rest
> > > of
> > > > the community think?
> > > >
> > > > 2) capping max.in.flight at 5 when idempotence is enabled.
> > > >
> > > > We need to limit the max.in.flight for the broker to 

[GitHub] kafka pull request #3672: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-15 Thread rhauch
Github user rhauch closed the pull request at:

https://github.com/apache/kafka/pull/3672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5731) Connect WorkerSinkTask out of order offset commit can lead to inconsistent state

2017-08-15 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-5731.

   Resolution: Fixed
Fix Version/s: (was: 1.0.0)

Issue resolved by pull request 3672
[https://github.com/apache/kafka/pull/3672]

> Connect WorkerSinkTask out of order offset commit can lead to inconsistent 
> state
> 
>
> Key: KAFKA-5731
> URL: https://issues.apache.org/jira/browse/KAFKA-5731
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Jason Gustafson
>Assignee: Randall Hauch
> Fix For: 0.11.0.1
>
>
> In Connect's WorkerSinkTask, we do sequence number validation to ensure that 
> offset commits are handled in the right order 
> (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L199).
>  
> Unfortunately, for asynchronous commits, the {{lastCommittedOffsets}} field 
> is overridden regardless of this sequence check as long as the response had 
> no error 
> (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L284):
> {code:java}
> OffsetCommitCallback cb = new OffsetCommitCallback() {
> @Override
> public void onComplete(Map 
> offsets, Exception error) {
> if (error == null) {
> lastCommittedOffsets = offsets;
> }
> onCommitCompleted(error, seqno);
> }
> };
> {code}
> Hence if we get an out of order commit, then the internal state will be 
> inconsistent. To fix this, we should only override {{lastCommittedOffsets}} 
> after sequence validation as part of the {{onCommitCompleted(...)}} method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3674: KAFKA-5737. KafkaAdminClient thread should be daem...

2017-08-15 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3674

KAFKA-5737. KafkaAdminClient thread should be daemon



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5737

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3674


commit 35dcae186e254e871def6939d4cac9a213d724d6
Author: Colin P. Mccabe 
Date:   2017-08-15T23:20:56Z

KAFKA-5737. KafkaAdminClient thread should be daemon




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5737) KafkaAdminClient thread should be daemon

2017-08-15 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-5737:
--

 Summary: KafkaAdminClient thread should be daemon
 Key: KAFKA-5737
 URL: https://issues.apache.org/jira/browse/KAFKA-5737
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


The admin client thread should be daemon, for consistency with the consumer and 
producer threads.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: kafka-trunk-jdk8 #1911

2017-08-15 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-5731; Corrected how the sink task worker updates the last

--
[...truncated 4.28 MB...]
kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker STARTED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed STARTED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest > testJavaProducer STARTED

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder STARTED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.producer.SyncProducerTest > testReachableServer STARTED

kafka.producer.SyncProducerTest > testReachableServer PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge PASSED

kafka.producer.SyncProducerTest > testNotEnoughReplicas STARTED

kafka.producer.SyncProducerTest > testNotEnoughReplicas PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero PASSED

kafka.producer.SyncProducerTest > testProducerCanTimeout STARTED

kafka.producer.SyncProducerTest > testProducerCanTimeout PASSED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse STARTED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse PASSED

kafka.producer.SyncProducerTest > testEmptyProduceRequest STARTED

kafka.producer.SyncProducerTest > testEmptyProduceRequest PASSED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse STARTED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse PASSED

kafka.producer.ProducerTest > testSendToNewTopic STARTED

kafka.producer.ProducerTest > testSendToNewTopic PASSED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout STARTED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout PASSED

kafka.producer.ProducerTest > testSendNullMessage STARTED

kafka.producer.ProducerTest > testSendNullMessage PASSED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo STARTED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo PASSED

kafka.producer.ProducerTest > testSendWithDeadBroker STARTED

kafka.producer.ProducerTest > testSendWithDeadBroker PASSED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 PASSED

unit.kafka.server.KafkaApisTest > testReadUncommittedConsumerListOffsetLatest 
STARTED

unit.kafka.server.KafkaApisTest > testReadUncommittedConsumerListOffsetLatest 
PASSED

unit.kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion STARTED

unit.kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion PASSED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleWriteTxnMarkersRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleWriteTxnMarkersRequestWhenInterBrokerProtocolNotSupported
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnknownTopicWhenPartitionIsNotHosted STARTED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnknownTopicWhenPartitionIsNotHosted PASSED

unit.kafka.server.KafkaApisTest > 
testReadCommittedConsumerListOffsetEarliestOffsetEqualsLastStableOffset STARTED

unit.kafka.server.KafkaApisTest > 
testReadCommittedConsumerListOffsetEarliestOffsetEqualsLastStableOffset PASSED

unit.kafka.server.KafkaApisTest > testReadCommittedConsumerListOffsetLatest 
STARTED

unit.kafka.server.KafkaApisTest > testReadCommittedConsumerListOffsetLatest 
PASSED


[GitHub] kafka pull request #3673: MINOR: Consolidate broker request/response handlin...

2017-08-15 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3673

MINOR: Consolidate broker request/response handling

This patch contains a few small improvements to make request/response 
handling more consistent. Primarily it consolidates request/response 
serialization logic so that `SaslServerAuthenticator` and `KafkaApis` follow 
the same path. It also reduces the amount of custom logic needed to handle 
unsupported versions of the ApiVersions requests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka consolidate-response-handling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3673


commit bbbeb17ed5aa4fcd5ca52bb98564dafdd854bb94
Author: Jason Gustafson 
Date:   2017-08-04T21:58:11Z

MINOR: Consolidate broker request/response handling




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-131 : Add access to OffsetStorageReader from SourceConnector

2017-08-15 Thread Randall Hauch
Sorry it's taken me so long to come back to this.

Have you considered creating a `SourceConnectorContext` interface that
extends `ConnectorContext` and that adds the method to access the offset
storage? This would very closely match the existing `SourceTaskContext`.

`SourceConnector` implementations could always cast the `context` field in
the superclass to `SourceConnectorContext`, but perhaps a slightly better
way to do this is to add the following method to the `SourceConnector`
class:


public SourceConnectorContext context() {
return (SourceConnectorContext)context;
}


Now, `SourceConnector` implementations can either cast themselves or use
this additional method to obtain the correctly cast context.

In fact, it might be good to do this for `SinkConnector` as well, and we
could even add a `context()` method in the `Connector` interface, since
subinterfaces can change the return type to be a subtype of that returned
by the interface:

ConnectorContext context();

One advantage of this approach is that `SourceConnectorContext` and
`SinkConnectorContext` remain interfaces. Another is not adding new method
to `SourceConnector` that implementers may need to learn that they should
not override or implement them. A third is that now we have a
`SourceConnectorContext` and `SinkConnectorContext` to which we can add
more methods if needed, and they are very similar to `SourceTaskContext`
and `SinkTaskContext`.

Thoughts?

On Wed, Apr 5, 2017 at 3:59 PM, Florian Hussonnois 
wrote:

> Hi All,
>
> Is there any feedback regarding that KIP ?
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+
> OffsetStorageReader+from+SourceConnector
>
> Thanks,
>
> 2017-03-14 22:51 GMT+01:00 Florian Hussonnois :
>
> > Hi Matthias,
> >
> > Sorry I didn't know this page. Ths KIP has been added to it.
> >
> > Thanks,
> >
> > 2017-03-13 21:30 GMT+01:00 Matthias J. Sax :
> >
> >> Can you please add the KIP to this table:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
> >> Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
> >>
> >> Thanks,
> >>
> >>  Matthias
> >>
> >>
> >> On 3/7/17 1:24 PM, Florian Hussonnois wrote:
> >> > Hi all,
> >> >
> >> > I've created a new KIP to add access to OffsetStorageReader from
> >> > SourceConnector
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+
> >> Add+access+to+OffsetStorageReader+from+SourceConnector
> >> >
> >> > Thanks.
> >> >
> >>
> >>
> >
> >
> > --
> > Florian HUSSONNOIS
> >
>
>
>
> --
> Florian HUSSONNOIS
>


Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-15 Thread Sumant Tambe
Question about "the closing of a batch can be delayed longer than linger.ms":
Is it possible to cause an indefinite delay? At some point bytes limit
might kick in. Also, why is closing of a batch coupled with availability of
its destination? In this approach a batch chosen for eviction due to delay
needs to "close" anyway, right (without regards to destination
availability)?

I'm not too worried about notifying at super-exact time specified in the
configs. But expiring before the full wait-span has elapsed sounds a little
weird. So expiration time has a +/- spread. It works more like a hint than
max. So why not message.delivery.wait.hint.ms?

Yeah, cancellable future will be similar in complexity.

I'm unsure if max.message.delivery.wait.ms will the final nail for producer
timeouts. We still won't have a precise way to control delay in just the
accumulator segment. batch.expiry.ms does not try to abstract. It's very
specific.

My biggest concern at the moment is implementation complexity.

At this state, I would like to encourage other independent opinions.

Regards,
Sumant

On 11 August 2017 at 17:35, Jun Rao  wrote:

> Hi, Sumant,
>
> 1. Yes, it's probably reasonable to require max.message.delivery.wait.ms >
> linger.ms. As for retries, perhaps we can set the default retries to
> infinite or just ignore it. Then the latency will be bounded by
> max.message.delivery.wait.ms. request.timeout.ms is the max time the
> request will be spending on the server. The client can expire an inflight
> request early if needed.
>
> 2. Well, since max.message.delivery.wait.ms specifies the max, calling the
> callback a bit early may be ok? Note that max.message.delivery.wait.ms
> only
> comes into play in the rare error case. So, I am not sure if we need to be
> very precise. The issue with starting the clock on closing a batch is that
> currently if the leader is not available, the closing of a batch can be
> delayed longer than linger.ms.
>
> 4. As you said, future.get(timeout) itself doesn't solve the problem since
> you still need a way to expire the record in the sender. The amount of work
> to implement a cancellable future is probably the same?
>
> Overall, my concern with patch work is that we have iterated on the produce
> request timeout multiple times and new issues keep coming back. Ideally,
> this time, we want to have a solution that covers all cases, even though
> that requires a bit more work.
>
> Thanks,
>
> Jun
>
>
> On Fri, Aug 11, 2017 at 12:30 PM, Sumant Tambe  wrote:
>
> > Hi Jun,
> >
> > Thanks for looking into it.
> >
> > Yes, we did consider this message-level timeout approach and expiring
> > batches selectively in a request but rejected it due to the reasons of
> > added complexity without a strong benefit to counter-weigh that. Your
> > proposal is a slight variation so I'll mention some issues here.
> >
> > 1. It sounds like max.message.delivery.wait.ms will overlap with "time
> > segments" of both linger.ms and retries * (request.timeout.ms +
> > retry.backoff.ms). In that case, which config set takes precedence? It
> > would not make sense to configure configs from both sets. Especially, we
> > discussed exhaustively internally that retries and
> > max.message.delivery.wait.ms can't / shouldn't be configured together.
> > Retires become moot as you already mention. I think that's going to be
> > surprising to anyone wanting to use max.message.delivery.wait.ms. We
> > probably need max.message.delivery.wait.ms > linger.ms or something like
> > that.
> >
> > 2. If clock starts when a batch is created and expire when
> > max.message.delivery.wait.ms is over in the accumulator, the last few
> > messages in the expiring batch may not have lived long enough. As the
> > config seems to suggests per-message timeout, it's incorrect to expire
> > messages prematurely. On the other hand if clock starts after batch is
> > closed (which also implies that linger.ms is not covered by the
> > max.message.delivery.wait.ms config), no message would be be expired too
> > soon. Yeah, expiration may be little bit too late but hey, this ain't
> > real-time service.
> >
> > 3. I agree that steps #3, #4, (and #5) are complex to implement. On the
> > other hand, batch.expiry.ms is next to trivial to implement. We just
> pass
> > the config all the way down to ProducerBatch.maybeExpire and be done with
> > it.
> >
> > 4. Do you think the effect of max.message.delivery.wait.ms can be
> > simulated
> > with future.get(timeout) method? Copying excerpt from the kip-91: An
> > end-to-end timeout may be partially emulated using the
> future.get(timeout).
> > The timeout must be greater than (batch.expiry.ms + nRetries * (
> > request.timeout.ms + retry.backoff.ms)). Note that when future times
> out,
> > Sender may continue to send the records in the background. To avoid that,
> > implementing a cancellable future is a possibility.
> >
> > For simplicity, we could just implement a 

[jira] [Created] (KAFKA-5736) Improve error message in Connect when all kafka brokers are down

2017-08-15 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-5736:
-

 Summary: Improve error message in Connect when all kafka brokers 
are down
 Key: KAFKA-5736
 URL: https://issues.apache.org/jira/browse/KAFKA-5736
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.11.0.0
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis
 Fix For: 0.11.0.1




Currently when all the Kafka brokers are down, Kafka Connect is failing with a 
pretty unintuitive message when it tries to, for instance, reconfigure tasks. 

Example output: 
{code:java}
[2017-08-15 19:12:09,959] ERROR Failed to reconfigure connector's tasks, 
retrying after backoff: 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
java.lang.IllegalArgumentException: CircularIterator can only be used on 
non-empty lists
at 
org.apache.kafka.common.utils.CircularIterator.(CircularIterator.java:29)
at 
org.apache.kafka.clients.consumer.RoundRobinAssignor.assign(RoundRobinAssignor.java:61)
at 
org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:68)
at 

... (connector code)

at 
org.apache.kafka.connect.runtime.Worker.connectorTaskConfigs(Worker.java:230)
{code}

The error message needs to be improved, since its root cause is the absence 
kafka brokers for assignment. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : kafka-trunk-jdk7 #2642

2017-08-15 Thread Apache Jenkins Server
See 




[GitHub] kafka pull request #3672: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-15 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3672

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets (0.11.0)

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

**This is for the `0.11.0` branch; see #3662 for the equivalent and 
already-approved PR for `trunk`.**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5731-0.11.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3672


commit 526dbcb776effcc1661e51293a6d03256b19d0a6
Author: Randall Hauch 
Date:   2017-08-12T00:42:06Z

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit 80965cb5f8771e63b0dad095287cb9a29dea47f6
Author: Randall Hauch 
Date:   2017-08-14T19:11:08Z

KAFKA-5731 Corrected mock consumer behavior during rebalance

Corrects the test case added in the previous commit to properly revoke the 
existing partition assignments before adding new partition assigments.

commit 37687c544513566c7d728273137a12751702ad41
Author: Randall Hauch 
Date:   2017-08-14T19:11:45Z

KAFKA-5731 Added expected call that was missing in another test

commit bfac0688ab64935bc4ac9c11e0a6251ca03e1043
Author: Randall Hauch 
Date:   2017-08-14T22:24:35Z

KAFKA-5731 Improved log messages related to offset commits

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit bbc316dfa4353a7e914d11ceaca2b60c1bdaf291
Author: Randall Hauch 
Date:   2017-08-15T14:47:05Z

KAFKA-5731 More cleanup of log messages related to offset commits

commit 00b17ebbb5effb7f8aa171ea69b0227c7b009e97
Author: Randall Hauch 
Date:   2017-08-15T16:21:52Z

KAFKA-5731 More improvements to the log messages in WorkerSinkTask

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit 3a5c31f1b912e06aded4b74524c73dfe1033e76c
Author: Randall Hauch 
Date:   2017-08-15T16:31:28Z

KAFKA-5731 Removed unnecessary log message

commit 8b91f93e8c4c7b6b8e1aa6721f86ff01f8ecf40e
Author: Randall Hauch 
Date:   2017-08-15T17:54:16Z

KAFKA-5731 Additional tweaks to debug and trace log messages to ensure 
clarity and usefulness

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit bea03f69055524e9005302200f3a560a9cad2c3f
Author: Randall Hauch 
Date:   2017-08-15T19:30:09Z

KAFKA-5731 Use the correct value in trace messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [KAFKA-5138] MirrorMaker doesn't exit on send failure occasionally

2017-08-15 Thread Tamás Máté
Hey,

I think I have found something.
My guess is that when the AbstactCoordinator.maybeLeaveGroup(...)
function's pollNoWakeUp part throws an exception, then it can not call
resetGeneration() and the HearBeatThread stays in STABLE state. The broker
won't be notified about the consumer's leave request so it thinks that
everything is all right and responds to its requests.

If this is the case it seems impossible to fix it at consumer side, maybe
with a new config parameter (leave retry?).
The other option could be a MirrorMaker fix for example when the producer
dies shoot the consumers in the head.

What do you think about these?

Although, I still couldn't repro the issue, will try to do that tomorrow. :)

Best regards,
Tamas

On 15 August 2017 at 14:49, Tamás Máté  wrote:

> Hi Guys,
>
> I have just started to work on this ticket a little more than a week ago:
> https://issues.apache.org/jira/browse/KAFKA-5138
>
> I could not reproduce it sadly, but from the logs Dustin gave and from the
> code it seems like this might not be just a MirrorMaker issue but a
> consumer one.
>
> My theory is
>  1) MM send failure happens because of heavy load
>  2) MM starts to close its producer
>  3) during MM shutdown and the source server starts a consumer rebalance
> (the consumers couldn't respond because of the heavy load)
>  4) heartbeat response gets delayed
>  5) MM producer closed, but MM gets a heartbeat response and resets the
> connection
>  6) because there is thread left in the JVM it can't shut down
>  7) MM hangs
>
> Maybe the order is a bit different, I couldn't prove it without
> reproduction.
>
> I played with the following configs under 100ms and then stress tested the
> source cluster with JMeter.
>  - request.timeout.ms
>  - replica.lag.time.max.ms
>  - session.timeout.ms
>  - group.min.session.timeout.ms
>  - group.max.session.timeout.ms
>  - heartbeat.interval.ms
>
> Could you give me some pointers how could I reproduce this issue?
>
> Thanks,
> Tamas
>
>


[GitHub] kafka pull request #3662: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-15 Thread Guozhang Wang
Hi Jay,

I chatted with Apurva offline, and we think the key of the discussion is
that, as summarized in the updated KIP wiki, whether we should consider
replication as a necessary condition of at-least-once, and of course also
exactly-once. Originally I think replication is not a necessary condition
for at-least-once, since the scope of failures that we should be covering
is different in my definition; if we claim that "even for at-least-once,
you should have replication factor larger than 2, let alone exactly-once"
then I agree that having acks=all on the client side should also be a
necessary condition for at-least-once, and for exactly-once as well. Then
this KIP would be just providing what is necessary but not sufficient
conditions, from client-side configs to achieve EOS, while you also need
the broker-side configs together to really support it.

Guozhang


On Tue, Aug 15, 2017 at 1:15 PM, Jay Kreps  wrote:

> Hey Guozhang,
>
> I think the argument is that with acks=1 the message could be lost and
> hence you aren't guaranteeing exactly once delivery.
>
> -Jay
>
> On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang  wrote:
>
> > Just want to clarify that regarding 1), I'm fine with changing it to
> `all`
> > but just wanted to argue it is not necessarily correlate with the
> > exactly-once semantics, but rather on persistence v.s. availability
> > trade-offs, so I'd like to discuss them separately.
> >
> > Regarding 2), one minor concern I had is that the enforcement is on the
> > client side while the parts it affects is on the broker side. I.e. the
> > broker code would assume at most 5 in.flight when idempotent is turned
> on,
> > but this is not enforced at the broker but relying at the client side's
> > sanity. So other implementations of the client that may not obey this may
> > likely break the broker code. If we do enforce this we'd better enforce
> it
> > at the broker side. Also, I'm wondering if we have considered the
> approach
> > for brokers to read the logs in order to get the starting offset when it
> > does not about it in its snapshot, that whether it is worthwhile if we
> > assume that such issues are very rare to happen?
> >
> >
> > Guozhang
> >
> >
> >
> > On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta 
> > wrote:
> >
> > > Hello,
> > >
> > > I just want to summarize where we are in this discussion
> > >
> > > There are two major points of contention: should we have acks=1 or
> > acsk=all
> > > by default? and how to cap max.in.flight.requests.per.connection?
> > >
> > > 1) acks=1 vs acks=all1
> > >
> > > Here are the tradeoffs of each:
> > >
> > > If you have replication-factor=N, your data is resilient N-1 to disk
> > > failures. For N>1, here is the tradeoff between acks=1 and acks=all.
> > >
> > > With proposed defaults and acks=all, the stock Kafka producer and the
> > > default broker settings would guarantee that ack'd messages would be in
> > the
> > > log exactly once.
> > >
> > > With the proposed defaults and acks=1, the stock Kafka producer and the
> > > default broker settings would guarantee that 'retained ack'd messages
> > would
> > > be in the log exactly once. But all ack'd messages may not be
> retained'.
> > >
> > > If you leave replication-factor=1, acks=1 and acks=all have identical
> > > semantics and performance, but you are resilient to 0 disk failures.
> > >
> > > I think the measured cost (again the performance details are in the
> wiki)
> > > of acks=all is well worth the much clearer semantics. What does the
> rest
> > of
> > > the community think?
> > >
> > > 2) capping max.in.flight at 5 when idempotence is enabled.
> > >
> > > We need to limit the max.in.flight for the broker to de-duplicate
> > messages
> > > properly. The limitation would only apply when idempotence is enabled.
> > The
> > > shared numbers show that when the client-broker latency is low, there
> is
> > no
> > > performance gain for max.inflight > 2.
> > >
> > > Further, it is highly debatable that max.in.flight=500 is significantly
> > > better than max.in.flight=5  for a really high latency client-broker
> > link,
> > > and so far there are no hard numbers one way or another. However,
> > assuming
> > > that max.in.flight=500 is significantly better than max.inflight=5 in
> > some
> > > niche use case, the user would have to sacrifice idempotence for
> > > throughput. In this extreme corner case, I think it is an acceptable
> > > tradeoff.
> > >
> > > What does the community think?
> > >
> > > Thanks,
> > > Apurva
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-15 Thread Rajini Sivaram
I have updated the KIP based on the discussions so far. It will be good if
we can get some more feedback so that this can be implemented for 1.0.0.


Thanks,

Rajini


On Thu, May 4, 2017 at 10:22 PM, Ismael Juma  wrote:

> Hi Rajini,
>
> I think we were talking about slightly different things. I was just
> referring to the fact that there are cases where we throw an
> AuthorizationException back to the user without retrying from various
> methods (poll, commitSync, etc).
>
> As you said, my initial preference was for not retrying at all because it
> is what you want in the common case of a misconfigured application. I
> hadn't considered credential updates for authenticators that rely on
> eventual consistency. Thinking about it some more, it seems like this
> should be solved by the authenticator implementation as well. For example,
> it could refresh the cached data for a user if authentication failed (a
> good implementation would be a bit more involved to avoid going to the
> underlying data source too often).
>
> Given that, not retrying sounds good to me.
>
> Ismael
>
> On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram 
> wrote:
>
> > Hi Ismael,
> >
> > I thought the blocking waits in the producer and consumer are always
> > related to retrying for metadata. So an authorization exception that
> > impacts this wait can only be due to Describe authorization failure -
> that
> > always retries?
> >
> > I agree that connecting to different brokers when authentication fails
> with
> > one is not desirable. But I am not keen on retrying with a suitable
> backoff
> > until timeout either. Because that has the same problem as the scenario
> > that you described. The next metadata request could be to broker-1 to
> which
> > authentication succeeds and subsequent produce/consume  to broker-0 could
> > still fail.
> >
> > How about we just fail fast if one authentication fails - I think that is
> > what you were suggesting in the first place? We don't need to blackout
> any
> > nodes beyond the reconnect backoff interval. Applications can still retry
> > if they want to. In the case of credential updates, it will be up to the
> > application to retry. During regular operation, a misconfigured
> application
> > fails fast with a meaningful exception. What do you think?
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma  wrote:
> >
> > > H Rajini,
> > >
> > > Comments inline.
> > >
> > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Thank you for reviewing the KIP.
> > > >
> > > > An authenticated client that is not authorized to access a topic is
> > never
> > > > told that the operation was not authorized. This is to prevent the
> > client
> > > > from finding out if the topic exists by sending an unauthorized
> > request.
> > > So
> > > > in this case, the client will retry metadata requests with the
> > configured
> > > > backoff until it times out.
> > >
> > >
> > > This is true if the user does not have Describe permission. If the user
> > has
> > > Describe access and no Read or Write access, then the user is informed
> > that
> > > the operation was not authorized.
> > >
> > >
> > > > Another important distinction for authorization failures is that the
> > > > connection is not terminated.
> > > >
> > > > For unauthenticated clients, we do want to inform the client that
> > > > authentication failed. The connection is terminated by the broker.
> > > > Especially if the client is using SASL_SSL, we really do want to
> avoid
> > > > reconnections that result in unnecessary expensive handshakes. So we
> > want
> > > > to return an exception to the user with minimal retries.
> > > >
> > >
> > > Agreed.
> > >
> > > I was thinking that it may be useful to try more than one broker for
> the
> > > > case where brokers are being upgraded and some brokers haven't yet
> seen
> > > the
> > > > latest credentials. I suppose I was thinking that at the moment we
> keep
> > > on
> > > > retrying every broker forever in the consumer and suddenly if we stop
> > > > retrying altogether, it could potentially lead to some unforeseen
> > timing
> > > > issues. Hence the suggestion to try every broker once.
> > > >
> > >
> > > I see. Retrying forever is a side-effect of auto-topic creation, but
> it's
> > > something we want to move away from. As mentioned, we actually don't
> > retry
> > > at all if the user has Describe permission.
> > >
> > > Broker upgrades could be fixed by ensuring that the latest credentials
> > are
> > > loaded before the broker starts serving requests. More problematic is
> > > dealing with credential updates. This is another distinction when
> > compared
> > > to authorization.
> > >
> > > I am not sure if trying different brokers really helps us though. Say,
> we
> > > fail to authenticate with broker 0 and then we 

Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-15 Thread Jay Kreps
Hey Guozhang,

I think the argument is that with acks=1 the message could be lost and
hence you aren't guaranteeing exactly once delivery.

-Jay

On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang  wrote:

> Just want to clarify that regarding 1), I'm fine with changing it to `all`
> but just wanted to argue it is not necessarily correlate with the
> exactly-once semantics, but rather on persistence v.s. availability
> trade-offs, so I'd like to discuss them separately.
>
> Regarding 2), one minor concern I had is that the enforcement is on the
> client side while the parts it affects is on the broker side. I.e. the
> broker code would assume at most 5 in.flight when idempotent is turned on,
> but this is not enforced at the broker but relying at the client side's
> sanity. So other implementations of the client that may not obey this may
> likely break the broker code. If we do enforce this we'd better enforce it
> at the broker side. Also, I'm wondering if we have considered the approach
> for brokers to read the logs in order to get the starting offset when it
> does not about it in its snapshot, that whether it is worthwhile if we
> assume that such issues are very rare to happen?
>
>
> Guozhang
>
>
>
> On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta 
> wrote:
>
> > Hello,
> >
> > I just want to summarize where we are in this discussion
> >
> > There are two major points of contention: should we have acks=1 or
> acsk=all
> > by default? and how to cap max.in.flight.requests.per.connection?
> >
> > 1) acks=1 vs acks=all1
> >
> > Here are the tradeoffs of each:
> >
> > If you have replication-factor=N, your data is resilient N-1 to disk
> > failures. For N>1, here is the tradeoff between acks=1 and acks=all.
> >
> > With proposed defaults and acks=all, the stock Kafka producer and the
> > default broker settings would guarantee that ack'd messages would be in
> the
> > log exactly once.
> >
> > With the proposed defaults and acks=1, the stock Kafka producer and the
> > default broker settings would guarantee that 'retained ack'd messages
> would
> > be in the log exactly once. But all ack'd messages may not be retained'.
> >
> > If you leave replication-factor=1, acks=1 and acks=all have identical
> > semantics and performance, but you are resilient to 0 disk failures.
> >
> > I think the measured cost (again the performance details are in the wiki)
> > of acks=all is well worth the much clearer semantics. What does the rest
> of
> > the community think?
> >
> > 2) capping max.in.flight at 5 when idempotence is enabled.
> >
> > We need to limit the max.in.flight for the broker to de-duplicate
> messages
> > properly. The limitation would only apply when idempotence is enabled.
> The
> > shared numbers show that when the client-broker latency is low, there is
> no
> > performance gain for max.inflight > 2.
> >
> > Further, it is highly debatable that max.in.flight=500 is significantly
> > better than max.in.flight=5  for a really high latency client-broker
> link,
> > and so far there are no hard numbers one way or another. However,
> assuming
> > that max.in.flight=500 is significantly better than max.inflight=5 in
> some
> > niche use case, the user would have to sacrifice idempotence for
> > throughput. In this extreme corner case, I think it is an acceptable
> > tradeoff.
> >
> > What does the community think?
> >
> > Thanks,
> > Apurva
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Resolved] (KAFKA-2283) scheduler exception on non-controller node when shutdown

2017-08-15 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-2283.
--
Resolution: Fixed

> scheduler exception on non-controller node when shutdown
> 
>
> Key: KAFKA-2283
> URL: https://issues.apache.org/jira/browse/KAFKA-2283
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
> Environment: linux debian
>Reporter: allenlee
>Assignee: Neha Narkhede
>Priority: Minor
>
> When broker shutdown, there is an error log about 'Kafka scheduler has not 
> been started'.
> It only appears on non-controller node. If this broker is the controller, it 
> shutdown without warning log.
> IMHO, *autoRebalanceScheduler.shutdown()* should only valid for controller, 
> right?
> {quote}
> [2015-06-17 22:32:51,814] INFO Shutdown complete. (kafka.log.LogManager)
> [2015-06-17 22:32:51,815] WARN Kafka scheduler has not been started 
> (kafka.utils.Utils$)
> java.lang.IllegalStateException: Kafka scheduler has not been started
> at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
> at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
> at 
> kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
> at 
> kafka.controller.KafkaController.shutdown(KafkaController.scala:664)
> at 
> kafka.server.KafkaServer$$anonfun$shutdown$8.apply$mcV$sp(KafkaServer.scala:285)
> at kafka.utils.Utils$.swallow(Utils.scala:172)
> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> at kafka.utils.Logging$class.swallow(Logging.scala:94)
> at kafka.utils.Utils$.swallow(Utils.scala:45)
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:285)
> at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
> at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> [2015-06-17 22:32:51,818] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2220) Improvement: Could we support rewind by time ?

2017-08-15 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-2220.
--
Resolution: Fixed

This got fixed in  KAFKA-4743 / KIP-122.

> Improvement: Could we support  rewind by time  ?
> 
>
> Key: KAFKA-2220
> URL: https://issues.apache.org/jira/browse/KAFKA-2220
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Li Junjun
> Attachments: screenshot.png
>
>
> Improvement: Support  rewind by time  !
> My scenarios as follow:
>A program read record from kafka  and process  then write to a dir in 
> HDFS like /hive/year=/month=xx/day=xx/hour=10 .  If  the program goes 
> down . I can restart it , so it read from last offset . 
> But  what if the program was config with wrong params , so I need remove  
> dir hour=10 and reconfig my program and  I  need to find  the offset where 
> hour=10 start  , but now I can't do this.
> And there are many  scenarios like this.
> so , can we  add  a time  partition , so  we can rewind by time ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1832) Async Producer will cause 'java.net.SocketException: Too many open files' when broker host does not exist

2017-08-15 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1832.
--
Resolution: Fixed

Fixed in  KAFKA-1041

> Async Producer will cause 'java.net.SocketException: Too many open files' 
> when broker host does not exist
> -
>
> Key: KAFKA-1832
> URL: https://issues.apache.org/jira/browse/KAFKA-1832
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1, 0.8.1.1
> Environment: linux
>Reporter: barney
>Assignee: Jun Rao
>
> h3.How to replay the problem:
> * producer configuration:
> ** producer.type=async
> ** metadata.broker.list=not.existed.com:9092
> Make sure the host '*not.existed.com*' does not exist in DNS server or 
> /etc/hosts;
> * send a lot of messages continuously using the above producer
> It will cause '*java.net.SocketException: Too many open files*' after a 
> while, or you can use '*lsof -p $pid|wc -l*' to check the count of open files 
> which will be increasing as time goes by until it reaches the system 
> limit(check by '*ulimit -n*').
> h3.Problem cause:
> {code:title=kafka.network.BlockingChannel|borderStyle=solid} 
> channel.connect(new InetSocketAddress(host, port))
> {code}
> this line will throw an exception 
> '*java.nio.channels.UnresolvedAddressException*' when broker host does not 
> exist, and at this same time the field '*connected*' is false;
> In *kafka.producer.SyncProducer*, '*disconnect()*' will not invoke 
> '*blockingChannel.disconnect()*' because '*blockingChannel.isConnected*' is 
> false which means the FileDescriptor will be created but never closed;
> h3.More:
> When the broker is an non-existent ip(for example: 
> metadata.broker.list=1.1.1.1:9092) instead of an non-existent host, the 
> problem will not appear;
> In *SocketChannelImpl.connect()*, '*Net.checkAddress()*' is not in try-catch 
> block but '*Net.connect()*' is in, that makes the difference;
> h3.Temporary Solution:
> {code:title=kafka.network.BlockingChannel|borderStyle=solid} 
> try
> {
> channel.connect(new InetSocketAddress(host, port))
> }
> catch
> {
> case e: UnresolvedAddressException => 
> {
> disconnect();
> throw e
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3622: HOTFIX: state transition cherry picking

2017-08-15 Thread enothereska
Github user enothereska closed the pull request at:

https://github.com/apache/kafka/pull/3622


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-0.11.0-jdk7 #271

2017-08-15 Thread Apache Jenkins Server
See 




[GitHub] kafka pull request #3671: KAFKA-5723[WIP]: Refactor BrokerApiVersionsCommand...

2017-08-15 Thread adyach
GitHub user adyach opened a pull request:

https://github.com/apache/kafka/pull/3671

KAFKA-5723[WIP]: Refactor BrokerApiVersionsCommand to use the new 
AdminClient

This PR brings refactoring to new AdminClient java class for 
BrokerApiVersionsCommand. The code was not tested, because I just want to make 
sure, that I am going in the right direction with the implementation, at the 
end tests will be in place. There are also no java doc for the same reasons. I 
took a look at #3514 to be more consistent with the implementation for the 
whole topic of AdminClient refactoring, so I grabbed argparse4j.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adyach/kafka kafka-5723

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3671.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3671


commit cc7b0eb5055ae44467fd41699ca53bf5a91da371
Author: Andrey Dyachkov 
Date:   2017-08-14T20:46:27Z

kafka-5723: preparation phase

commit 8c2864cfda8e12db836de98ed7a59047f2311f85
Author: Andrey Dyachkov 
Date:   2017-08-15T14:43:07Z

kafka-5723: draft impl for ListBrokersVersionInfo command




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[KAFKA-5138] MirrorMaker doesn't exit on send failure occasionally

2017-08-15 Thread Tamás Máté
Hi Guys,

I have just started to work on this ticket a little more than a week ago:
https://issues.apache.org/jira/browse/KAFKA-5138

I could not reproduce it sadly, but from the logs Dustin gave and from the
code it seems like this might not be just a MirrorMaker issue but a
consumer one.

My theory is
 1) MM send failure happens because of heavy load
 2) MM starts to close its producer
 3) during MM shutdown and the source server starts a consumer rebalance
(the consumers couldn't respond because of the heavy load)
 4) heartbeat response gets delayed
 5) MM producer closed, but MM gets a heartbeat response and resets the
connection
 6) because there is thread left in the JVM it can't shut down
 7) MM hangs

Maybe the order is a bit different, I couldn't prove it without
reproduction.

I played with the following configs under 100ms and then stress tested the
source cluster with JMeter.
 - request.timeout.ms
 - replica.lag.time.max.ms
 - session.timeout.ms
 - group.min.session.timeout.ms
 - group.max.session.timeout.ms
 - heartbeat.interval.ms

Could you give me some pointers how could I reproduce this issue?

Thanks,
Tamas


[jira] [Created] (KAFKA-5735) Client-ids are not handled consistently by clients and broker

2017-08-15 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-5735:
-

 Summary: Client-ids are not handled consistently by clients and 
broker
 Key: KAFKA-5735
 URL: https://issues.apache.org/jira/browse/KAFKA-5735
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.11.0.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 1.0.0


At the moment, Java clients expect client-ids to use a limited set of 
characters so that they can be used without quoting in metrics. 
kafka-configs.sh allows quotas to be defined only for that limited set. But the 
broker does not validate client-ids. And the documentation does not mention any 
limitations. We need to either limit characters used in client-ids, document 
and validate them or we should allow any characters and treat them consistently 
in the same way as we handle user principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3670: MINOR: fixed StickyAssignor javadoc

2017-08-15 Thread mimaison
GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/3670

MINOR: fixed StickyAssignor javadoc

StickyAssignor javadoc has a bunch of formatting issues which make it 
pretty hard to read:

http://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html

cc @vahidhashemian 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka sticky_assignor_javadoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3670


commit 204f0ba542d026149ed7c48f2fdb16d9dbcb510c
Author: Mickael Maison 
Date:   2017-08-15T10:25:02Z

MINOR: fixed StickyAssignor javadoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5734) Heap (Old generation space) gradually increase

2017-08-15 Thread jang (JIRA)
jang created KAFKA-5734:
---

 Summary: Heap (Old generation space) gradually increase
 Key: KAFKA-5734
 URL: https://issues.apache.org/jira/browse/KAFKA-5734
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.10.2.0
 Environment: ubuntu 14.04 / java 1.7.0
Reporter: jang
 Attachments: heap-log.xlsx

I set up kafka server on ubuntu with 4GB ram.

Heap ( Old generation space ) size is increasing gradually like attached excel 
file which recorded gc info in 1 minute interval.

Finally OU occupies 2.6GB and GC expend too much time ( And out of memory 
exception )

kafka process argumens are below.

_java -Xmx3000M -Xms2G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 
-XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC 
-Djava.awt.headless=true 
-Xloggc:/usr/local/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dkafka.logs.dir=/usr/local/kafka/bin/../logs 
-Dlog4j.configuration=file:/usr/local/kafka/bin/../config/log4j.properties_








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3669: KAFKA-5726: KafkaConsumer.subscribe() overload tha...

2017-08-15 Thread attilakreiner
GitHub user attilakreiner opened a pull request:

https://github.com/apache/kafka/pull/3669

KAFKA-5726: KafkaConsumer.subscribe() overload that takes just Pattern

- changed the interface & implementations
- updated tests to use the new method where applicable

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/attilakreiner/kafka KAFKA-5726

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3669


commit 8e7b6e741f0ad042421ada39e0c9c1546d231bfb
Author: Attila Kreiner 
Date:   2017-08-14T20:56:21Z

KAFKA-5726: KafkaConsumer.subscribe() overload that takes just Pattern




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3668: KAFKA-5679: Add logging for broker termination due...

2017-08-15 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/3668

KAFKA-5679: Add logging for broker termination due to SIGTERM or SIGINT



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-5679

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3668


commit e6f02a6eb90c242d850d3aab2e59d74162d88626
Author: rajini 
Date:   2017-08-15T07:29:49Z

KAFKA-5679: Add logging for broker termination due to SIGTERM or SIGINT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---