[jira] [Commented] (KAFKA-6054) ERROR "SubscriptionInfo - unable to decode subscription data: version=2" when upgrading from 0.10.0.0 to 0.10.2.1

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381566#comment-16381566
 ] 

ASF GitHub Bot commented on KAFKA-6054:
---

mjsax opened a new pull request #4630: KAFKA-6054: Code cleanup to prepare the 
actual fix for an upgrade path
URL: https://github.com/apache/kafka/pull/4630
 
 
   Small change in decoding version 1 metadata: don't upgrade to version 2 
automatically


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ERROR "SubscriptionInfo - unable to decode subscription data: version=2" when 
> upgrading from 0.10.0.0 to 0.10.2.1
> -
>
> Key: KAFKA-6054
> URL: https://issues.apache.org/jira/browse/KAFKA-6054
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: James Cheng
>Priority: Major
>
> We upgraded an app from kafka-streams 0.10.0.0 to 0.10.2.1. We did a rolling 
> upgrade of the app, so that one point, there were both 0.10.0.0-based 
> instances and 0.10.2.1-based instances running. 
> We observed the following stack trace:
> {code}
> 2017-10-11 07:02:19.964 [StreamThread-3] ERROR o.a.k.s.p.i.a.SubscriptionInfo 
> -
> unable to decode subscription data: version=2
> org.apache.kafka.streams.errors.TaskAssignmentException: unable to decode
> subscription data: version=2
> at 
> org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.decode(SubscriptionInfo.java:113)
> at 
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:235)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:404)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$900(AbstractCoordinator.java:81)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:358)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:340)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
> 
> {code}
> I spoke with [~mjsax] and he said this is a known issue that happens when you 
> have both 0.10.0.0 instances and 0.10.2.1 instances running at the same time, 
> because the internal version number of the protocol changed when adding 
> 

[jira] [Updated] (KAFKA-4831) Extract WindowedSerde to public APIs

2018-02-28 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4831:
---
Labels: needs-kip newbie user-experience  (was: newbie user-experience)

> Extract WindowedSerde to public APIs
> 
>
> Key: KAFKA-4831
> URL: https://issues.apache.org/jira/browse/KAFKA-4831
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Vitaly Pushkar
>Priority: Major
>  Labels: needs-kip, newbie, user-experience
>
> Now that we have augmented WindowSerde with non-arg parameters, the next step 
> is to extract it out as part of the public APIs so that users who wants to 
> I/O windowed streams can use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4936) Allow dynamic routing of output records

2018-02-28 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381457#comment-16381457
 ] 

Matthias J. Sax commented on KAFKA-4936:


Thanks for you interest in picking this up.

If we integrate this in the library, I don't think that we need an additional 
producer, but can extend the existing `SinkNode` implementation. This would 
avoid the requirement of sync flushs and also work with EOS enabled.

The SinkNode could accept a "topic extractor" function that computes the topic 
for each record and would not be configured with a single topic name. That's at 
least my thinking. Can you share your ideas how to implement this?

I am still wondering about the existence of output topics (ie, should we 
require this?) vs auto-topic creation? It should be clear to the user what the 
impact requirements/impact is with this regard. Maybe I am overthinking this, 
but if auto-topic create is enabled, and something goes wrong (ie, the "topic 
name extractor" return the wrong field from an AVRO record), users might end up 
with many topics that get created on error what could result in a quite messed 
up cluster – I am a little concerned about this. Even with small input rates 
(let's say 1000 records per second) on could create 1000 different topic per 
second...

> Allow dynamic routing of output records
> ---
>
> Key: KAFKA-4936
> URL: https://issues.apache.org/jira/browse/KAFKA-4936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> Currently, all used output topics must be know beforehand, and thus, it's not 
> possible to send output records to topic in a dynamic fashion.
> There have been couple of request for this feature and we should consider 
> adding it. There are many open questions however, with regard to topic 
> creation and configuration (replication factor, number of partitions) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (KAFKA-6348) Kafka consumer can't restore from coordinator failure

2018-02-28 Thread sikaijian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sikaijian updated KAFKA-6348:
-
Comment: was deleted

(was: I got the same problem. In my case, I use kafka for logs. So I killed one 
 broker. And it worked. But when I start the broker, I got failure agin.

 
{code:java}
2018-02-27 11:09:07 DEBUG kafka-consumer-ElasticsearchStorage - Issuing group 
metadata request to broker 11
2018-02-27 11:09:07 DEBUG kafka-consumer-ElasticsearchStorage - Group metadata 
response ClientResponse(receivedTimeMs=1519700947797, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@4afb1ca2,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=196,client_id=consumer-1},
 body={group_id=ElasticsearchStorage-group}), createdTimeMs=1519700947796, 
sendTimeMs=1519700947796), 
responseBody={error_code=0,coordinator={node_id=4,host=kafkabroker5.pajkdc.com,port=9092}})
2018-02-27 11:09:07 DEBUG kafka-consumer-ElasticsearchStorage - (Re-)joining 
group ElasticsearchStorage-group
2018-02-27 11:09:07 DEBUG kafka-consumer-ElasticsearchStorage - Issuing request 
(JOIN_GROUP: 
{group_id=ElasticsearchStorage-group,session_timeout=3,member_id=,protocol_type=consumer,group_protocols=[{protocol_name=range,protocol_metadata=java.nio.HeapByteBuffer[pos=0
 lim=32 cap=32]}]}) to coordinator 2147483643
2018-02-27 11:09:07 INFO kafka-consumer-ElasticsearchStorage - Marking the 
coordinator 2147483643 dead.
2018-02-27 11:09:07 INFO kafka-consumer-ElasticsearchStorage - Attempt to join 
group ElasticsearchStorage-group failed due to obsolete coordinator 
information, retrying.
{code}
The coordinator for consumer group "ElasticsearchStorage-group" is broker 4. I 
just want to change the coordinator. So I killed it.

As I know, only these brokers who own the partition of __consumer_offset can 
work as a coordinator for consumer group.  And a consumer group's offset locate 
on only one partition  of __consumer_offset. And in my case  
Math.abs("ElasticsearchStorage-group".hashCode()) %  100(partitionCount of 
__consumer_offset, in my case is 100) = 4. But I consume 3 topics using the 
same consumer group "ElasticsearchStorage-group". Only on topic met the 
problem. 

I still can't figure out the  root cause. I am using 0.9.0.1, and the new 
consumer.  )

> Kafka consumer can't restore from coordinator failure
> -
>
> Key: KAFKA-6348
> URL: https://issues.apache.org/jira/browse/KAFKA-6348
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, core
>Affects Versions: 0.10.1.1
>Reporter: Renjie Liu
>Priority: Major
>
> Kafka consumer blocks and keep reporting coordinator is dead. I tried to 
> restart the process and it still can't work. Then we shutdown the broker and 
> restart consumer, but it still keep reporting coordinator is dead. This 
> situation continues until we change our group id and it works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6499) Avoid creating dummy checkpoint files with no state stores

2018-02-28 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6499:
---
Fix Version/s: (was: 1.2.0)

> Avoid creating dummy checkpoint files with no state stores
> --
>
> Key: KAFKA-6499
> URL: https://issues.apache.org/jira/browse/KAFKA-6499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 1.1.0
>
>
> Today, for a streams task that contains no state stores, its processor state 
> manager would still write a dummy checkpoint file that contains some 
> characters (version, size). This introduces unnecessary disk IOs and should 
> be avoidable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6327) IllegalArgumentException in RocksDB when RocksDBException being generated

2018-02-28 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381357#comment-16381357
 ] 

Guozhang Wang commented on KAFKA-6327:
--

This will be fixed in RocksDB release rocksdb-5.9.2 and beyond, most likely in 
v5.10.3.

> IllegalArgumentException in RocksDB when RocksDBException being generated
> -
>
> Key: KAFKA-6327
> URL: https://issues.apache.org/jira/browse/KAFKA-6327
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Anthony May
>Priority: Minor
>
> RocksDB had a bug where RocksDBException subCodes related to disk usage were 
> not present and when a RocksDBException is generated for those it throws an 
> IllegalArgumentException instead obscuring the error. This is 
> [fixed|https://github.com/facebook/rocksdb/pull/3050] in RocksDB master but 
> doesn't appear to have been released yet. Adding this issue so that it can be 
> tracked for a future release.
> Exception:
> {noformat}
> java.lang.IllegalArgumentException: Illegal value provided for SubCode.
>   at org.rocksdb.Status$SubCode.getSubCode(Status.java:109)
>   at org.rocksdb.Status.(Status.java:30)
>   at org.rocksdb.RocksDB.write0(Native Method)
>   at org.rocksdb.RocksDB.write(RocksDB.java:602)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6555) Making state store queryable during restoration

2018-02-28 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381278#comment-16381278
 ] 

Guozhang Wang commented on KAFKA-6555:
--

First of all, as I mentioned before this change itself deserves a KIP 
discussion given its impact and scope so that we can bring this to a more broad 
audience's attention for feedbacks.

And here is my two cents: although this ticket may be orthogonal to 6144 and 
6145, they are trying to tackle the same general issue, which is during a 
rebalance, the stores may be unavailable for queries during restoration, and 
how long that unavailability would be depends on whether or not you have 
standby replicas, and if the rebalance is triggered by fail-over or scaling 
out, etc. But ultimately we'd like to reduce that unavailability window as much 
as possible by trading some data consistency off.

My take then is that, if eventually we are going to support KAFKA-6145 (which I 
think we should), then the case of scaling out should be well covered, and 
hence I'd prefer to just read from the restoring active task for simplicity, 
and additionally make it configurable as Damian suggested. So the situation 
becomes:

1. For scaling out scenario, KAFKA-6145 will make sure we have a 
close-to-latest replica when the actual rebalance happens, so this scenario is 
effectively reduced to a "controlled fail over" scenario.
2. For fail-over scenario, we can assume that the restoring state should be the 
closest to latest, and hence have a knob to allow users to read stale data from 
it during restoring to reduce unavailability gap.

I.e. we will drop KAFKA-6144, and only do this JIRA and KAFKA-6145 by the end 
of the day.

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6593) Coordinator disconnect in heartbeat thread can cause commitSync to block indefinitely

2018-02-28 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-6593:
---
Priority: Critical  (was: Major)

> Coordinator disconnect in heartbeat thread can cause commitSync to block 
> indefinitely
> -
>
> Key: KAFKA-6593
> URL: https://issues.apache.org/jira/browse/KAFKA-6593
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0, 0.11.0.2
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: consumer.log
>
>
> If a coordinator disconnect is observed in the heartbeat thread, it can cause 
> a pending offset commit to be cancelled just before the foreground thread 
> begins waiting on its response in poll(). Since the poll timeout is 
> Long.MAX_VALUE, this will cause the consumer to effectively hang until some 
> other network event causes the poll() to return. We try to protect this case 
> with a poll condition on the future, but this isn't bulletproof since the 
> future can be completed outside of the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4936) Allow dynamic routing of output records

2018-02-28 Thread Blake Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380868#comment-16380868
 ] 

Blake Miller commented on KAFKA-4936:
-

IMO, such a feature does not need to support automatic topic creation in order 
to be useful. As [~mjsax] pointed out, it adds significant complexity. I might 
consider picking this up. I see from the Confluent Google Group that Damian Guy 
suggested a workaround:

"If you don't know the set of topics then you would need to use a custom 
Processor and you would also need to create an instance of the KafkaProducer. 
Keeping in mind that in order to guarantee at-least-once each producer.send 
would need to be synchronous."

[https://groups.google.com/forum/#!topic/confluent-platform/wnXLKw1-XQk]

I suppose a proper implementation in the KStreams API would do something 
analogous under the hood. Does that sound reasonable?

 

Supporting exactly-once here sounds plausible but the details are a little 
beyond me at present. It seems like the feature might be useful without that, 
as well, since currently the only options seem to be a custom Processor + 
Producer, or just fall back to using Producer & Consumer directly instead of 
Kafka Streams, neither of which would support exactly-once.

> Allow dynamic routing of output records
> ---
>
> Key: KAFKA-4936
> URL: https://issues.apache.org/jira/browse/KAFKA-4936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> Currently, all used output topics must be know beforehand, and thus, it's not 
> possible to send output records to topic in a dynamic fashion.
> There have been couple of request for this feature and we should consider 
> adding it. There are many open questions however, with regard to topic 
> creation and configuration (replication factor, number of partitions) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6598) Kafka to support using ETCD beside Zookeeper

2018-02-28 Thread Colin P. McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380753#comment-16380753
 ] 

Colin P. McCabe commented on KAFKA-6598:


Thanks for looking at this.  We're planning on removing the ZooKeeper 
dependency (the KIP hasn't been posted yet, but will soon).  There are a number 
of advantages in removing the dependency on external systems.  Therefore, it 
shouldn't be necessary to add pluggability at this layer of the code.

> Kafka to support using ETCD beside Zookeeper
> 
>
> Key: KAFKA-6598
> URL: https://issues.apache.org/jira/browse/KAFKA-6598
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients, core
>Reporter: Sebastian Toader
>Priority: Major
>
> The current Kafka implementation is bound to {{Zookeeper}} to store its 
> metadata for forming a cluster of nodes (producer/consumer/broker). 
> As Kafka is becoming popular for streaming in various environments where 
> {{Zookeeper}} is either not easy to deploy/manage or there are better 
> alternatives to it there is a need 
> to run Kafka with other metastore implementation than {{Zookeeper}}.
> {{etcd}} can provide the same semantics as {{Zookeeper}} for Kafka and since 
> {{etcd}} is the favorable choice in certain environments (e.g. Kubernetes) 
> Kafka should be able to run with {{etcd}}.
> From the user's point of view should be straightforward to configure to use 
> {{etcd}} by just simply specifying a connection string that point to {{etcd}} 
> cluster.
> To avoid introducing instability the original interfaces should be kept and 
> only the low level {{Zookeeper}} API calls should be replaced with \{{etcd}} 
> API calls in case Kafka is configured 
> to use {{etcd}}.
> On the long run (which is out of scope of this jira) there should be an 
> abstract layer in Kafka which then various metastore implementations would 
> implement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-4632) Kafka Connect WorkerSinkTask.closePartitions doesn't handle WakeupException

2018-02-28 Thread Andrey Bratus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380686#comment-16380686
 ] 

Andrey Bratus edited comment on KAFKA-4632 at 2/28/18 5:17 PM:
---

I am experiencing a similar issue, when pausing a connector (version 1.0.0)
{noformat}
WorkerSinkTask{id=cerved-balance-sheets-collaudo-1} Task threw an uncaught and 
unrecoverable exception

org.apache.kafka.common.errors.WakeupException: null
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:441)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:190)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1242)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at 
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214){noformat}

  Looks like it happens in the finally clause:
{code:java}
@Override
public void execute() {
initializeAndStart();
try {
while (!isStopping())
iteration();
} finally {
// Make sure any uncommitted data has been committed and the task has
// a chance to clean up its state
closePartitions();
}
}
{code}
 

Should we reopen this?


was (Author: andreybratus):
I am experiencing a similar issue, when pausing a connector (version 1.0.0)

 
WorkerSinkTask\{id=cerved-balance-sheets-collaudo-1} Task threw an uncaught and 
unrecoverable exception
 

The stack trace is:
org.apache.kafka.common.errors.WakeupException: null
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:441)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:190)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1242)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
 

Looks like it happens in the finally clause:
{code:java}
@Override
public void execute() {
initializeAndStart();
try {
while (!isStopping())
iteration();
} finally {
// Make sure any uncommitted data has been committed and the task has
// a chance to clean up its state
closePartitions();
}
}
{code}
 

Should we reopen this?

> Kafka Connect WorkerSinkTask.closePartitions doesn't handle WakeupException
> ---
>
> Key: KAFKA-4632
> URL: https://issues.apache.org/jira/browse/KAFKA-4632
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0
>Reporter: Scott Reynolds
>Priority: Major
> Fix For: 0.10.0.1, 0.10.1.0
>
>
> WorkerSinkTask's closePartitions method isn't handling WakeupException that 
> can be thrown from commitSync.
> {code}
> org.apache.kafka.common.errors.WakeupException
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup
>  (ConsumerNetworkClient.java:404)
> at 

[jira] [Commented] (KAFKA-4632) Kafka Connect WorkerSinkTask.closePartitions doesn't handle WakeupException

2018-02-28 Thread Andrey Bratus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380686#comment-16380686
 ] 

Andrey Bratus commented on KAFKA-4632:
--

I am experiencing a similar issue, when pausing a connector (version 1.0.0)

 
WorkerSinkTask\{id=cerved-balance-sheets-collaudo-1} Task threw an uncaught and 
unrecoverable exception
 

The stack trace is:
org.apache.kafka.common.errors.WakeupException: null
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:441)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:190)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1242)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
 

Looks like it happens in the finally clause:
{code:java}
@Override
public void execute() {
initializeAndStart();
try {
while (!isStopping())
iteration();
} finally {
// Make sure any uncommitted data has been committed and the task has
// a chance to clean up its state
closePartitions();
}
}
{code}
 

Should we reopen this?

> Kafka Connect WorkerSinkTask.closePartitions doesn't handle WakeupException
> ---
>
> Key: KAFKA-4632
> URL: https://issues.apache.org/jira/browse/KAFKA-4632
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0
>Reporter: Scott Reynolds
>Priority: Major
> Fix For: 0.10.0.1, 0.10.1.0
>
>
> WorkerSinkTask's closePartitions method isn't handling WakeupException that 
> can be thrown from commitSync.
> {code}
> org.apache.kafka.common.errors.WakeupException
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup
>  (ConsumerNetworkClient.java:404)
> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll 
> (ConsumerNetworkClient.java:245)
> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll 
> (ConsumerNetworkClient.java:180)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync
>  (ConsumerCoordinator.java:499)
> at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync 
> (KafkaConsumer.java:1104)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync 
> (WorkerSinkTask.java:245)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit 
> (WorkerSinkTask.java:264)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets 
> (WorkerSinkTask.java:305)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions 
> (WorkerSinkTask.java:435)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.execute 
> (WorkerSinkTask.java:147)
> at org.apache.kafka.connect.runtime.WorkerTask.doRun (WorkerTask.java:140)
> at org.apache.kafka.connect.runtime.WorkerTask.run (WorkerTask.java:175)
> at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
> at java.util.concurrent.FutureTask.run (FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:617)
> at java.lang.Thread.run (Thread.java:745)
> {code}
> I believe it should catch it and ignore it as that is what the poll method 
> does when isStopping is true
> {code:java}
> } catch (WakeupException we) {
> log.trace("{} consumer woken up", id);
> if (isStopping())
> return;
> if (shouldPause()) {
> pauseAll();
> } else if (!pausedForRedelivery) {
> resumeAll();
> }
> }
> {code}
> But unsure, love some insight into this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6601) Kafka manager does not provide consumer offset producer rate with kafka v2.10-0.10.2.0

2018-02-28 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau resolved KAFKA-6601.
-
Resolution: Cannot Reproduce

Hi [~raj6329], 

I've just tested this with kafka_2.10_0.10.2.0 and there doesn't seem to be an 
issue with that metric. I can clearly see it with jconsole. Do you have an 
active consumer that is committing offsets to your cluster? The metric will 
after a server restart only show up once someone actually produces to that 
topic.

If that doesn't fix it, then I am afraid you should open an issue for this on 
the KafkaManager bugtracker, as I cannot see anything wrong on the Kafka side.

> Kafka manager does not provide consumer offset producer rate with kafka 
> v2.10-0.10.2.0
> --
>
> Key: KAFKA-6601
> URL: https://issues.apache.org/jira/browse/KAFKA-6601
> Project: Kafka
>  Issue Type: Bug
>Reporter: Rajendra Jangir
>Priority: Major
>
> I am using kafka-manager and kafka version 2.10-0.10.2.
> And I am not able to see producer rate for _consumer_offset topic._
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6555) Making state store queryable during restoration

2018-02-28 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380293#comment-16380293
 ] 

Damian Guy commented on KAFKA-6555:
---

I think if we are going to allow stale reads from restoring tasks then it 
probably should be configurable as this may not always be desirable. Imagine a 
situation where there is a lot of data to restore, you could be reading 
different incorrect values for quite a while. Sure there may be situations 
where you don't care, but I'm sure there are situations when you do care.

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5828) Allow explicitly setting polling interval for Kafka connectors

2018-02-28 Thread Behrang Saeedzadeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380122#comment-16380122
 ] 

Behrang Saeedzadeh commented on KAFKA-5828:
---

Hi

I think quotas could have this scenario quite well. 

> Allow explicitly setting polling interval for Kafka connectors
> --
>
> Key: KAFKA-5828
> URL: https://issues.apache.org/jira/browse/KAFKA-5828
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Behrang Saeedzadeh
>Priority: Major
>
> I have a Kafka cluster deployed in our internal data center and a Kafka 
> Connect server deployed in AWS that gets data records from a topic on the 
> Kafka cluster and writes them into a Kinesis stream.
> We want to ensure that our Kafka connect server does not saturate the 
> available bandwidth between our internal data center and AWS.
> Using {{max.partition.fetch.bytes}} we can limit the upper bound of data that 
> can be fetched in each poll call. If we can also configure the poll interval, 
> then we can limit how much data is transferred per partition per second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6597) Issues with Zookeeper and Kafka startup in Windows environment

2018-02-28 Thread Alex Dunayevsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Dunayevsky updated KAFKA-6597:
---
Environment: MS Windows 7 Corporate Edition

> Issues with Zookeeper and Kafka startup in Windows environment
> --
>
> Key: KAFKA-6597
> URL: https://issues.apache.org/jira/browse/KAFKA-6597
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.1, 0.11.0.1
> Environment: MS Windows 7 Corporate Edition
>Reporter: Alex Dunayevsky
>Priority: Trivial
>
> Inability to start Zookeeper and Kafka services using standard Kafka .bat 
> utilities for Windows environment
> *Problem 1:* CLASSPATH string not being formed correctly in 
> bin\windows\kafka-run-class.bat.
> |bin\windows\zookeeper-server-start.bat config\zookeeper.properties
>  ... class not found ...|
> *Possible working solution*:
> Assign CLASSPATH correctly in *bin\windows\kafka-run-class.bat:*
> |set CLASSPATH=%~dp0..\..\libs*|
>  
> *Problem 2:* *call :concat* may crash *bin\windows\kafka-run-class.bat* :
> |rem Classpath addition for release
>  call :concat %BASE_DIR%\libs*|
> *Possible working solution:*
> Comment or delete those lines of code.
> |rem Classpath addition for release
>  rem call :concat %BASE_DIR%\libs*|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)