[jira] [Commented] (KAFKA-7358) Alternative Partitioner to Support "Always Round-Robin" Selection

2018-09-06 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606712#comment-16606712
 ] 

Matthias J. Sax commented on KAFKA-7358:


Is this a duplicate to https://issues.apache.org/jira/browse/KAFKA- ?

> Alternative Partitioner to Support "Always Round-Robin" Selection
> -
>
> Key: KAFKA-7358
> URL: https://issues.apache.org/jira/browse/KAFKA-7358
> Project: Kafka
>  Issue Type: Wish
>  Components: clients
>Reporter: M. Manna
>Assignee: M. Manna
>Priority: Minor
>
> In my organisation, we have been using kafka as the basic publish-subscribe 
> messaging system provider. Our goal is the send event-based (secure, 
> encrypted) SQL messages reliably, and process them accordingly. For us, the 
> message keys represent some metadata which we use to either ignore messages 
> (if a loopback to the sender), or log some information. We have the following 
> use case for messaging:
> 1) A Database transaction event takes place
> 2) The event is captured and messaged across 10 data centres all around the 
> world.
> 3) A group of consumers (for each data centre with a unique consumer-group 
> ID) are will process messages from their respective partitions. 1 consumer 
> per partition.
> Under the circumstances, we only need a guarantee that same message won't be 
> sent to multiple partitions. In other words, 1 partition will +never+ be 
> sought by multiple consumers.
> Using DefaultPartitioner, we can achieve this only with NULL keys. But since 
> we need keys for metadata, we cannot maintain "Round-robin" selection of 
> partitions because a key hash will determine which partition to choose. We 
> need to have round-robin style selection regardless of key type (NULL or 
> not-NULL)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception

2018-09-06 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6260:
---
Component/s: consumer

> AbstractCoordinator not clearly handles NULL Exception
> --
>
> Key: KAFKA-6260
> URL: https://issues.apache.org/jira/browse/KAFKA-6260
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0
> Environment: RedHat Linux
>Reporter: Seweryn Habdank-Wojewodzki
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 1.0.1, 1.1.0
>
>
> The error reporting is not clear. But it seems that Kafka Heartbeat shuts 
> down application due to NULL exception caused by "fake" disconnections.
> One more comment. We are processing messages in the stream, but sometimes we 
> have to block processing for minutes, as consumers are not handling too much 
> load. Is it possibble that when stream is waiting, then heartbeat is as well 
> blocked?
> Can you check that?
> {code}
> 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending Heartbeat request to coordinator 
> cljp01.eb.lan.at:9093 (id: 2147483646 rack: null)
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending HEARTBEAT 
> {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08}
>  with correlation id 24 to node 2147483646
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT 
> with correlation id 24, received {throttle_time_ms=0,error_code=0}
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout.
> 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled request 
> {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]}
>  with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, 
> apiVersion=6, 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  correlationId=21) with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Fetch request 
> {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, 
> maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed 
> org.apache.kafka.common.errors.DisconnectException: null
> 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Found least loaded node cljp01.eb.lan.at:9093 (id: 1 
> rack: DC-1)
> 2017-11-23 23:54:52 DEBUG 

[jira] [Updated] (KAFKA-6457) Error: NOT_LEADER_FOR_PARTITION leads to NPE

2018-09-06 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6457:
---
Component/s: (was: streams)
 consumer

> Error: NOT_LEADER_FOR_PARTITION leads to NPE
> 
>
> Key: KAFKA-6457
> URL: https://issues.apache.org/jira/browse/KAFKA-6457
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.0.1
>
>
> One of our nodes was dead. Then the second one tooks all responsibility.
> But streamming aplication in the meanwhile crashed due to NPE caused by 
> {{Error: NOT_LEADER_FOR_PARTITION}}.
> The stack trace is below.
>  
> Is it something expected?
>  
> {code:java}
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer ...2018-01-17 
> 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-5, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-7, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [ERROR] AbstractCoordinator:296 - [Consumer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-consumer,
>  groupId=restreamer-my] Heartbeat thread for group restreamer-my failed due 
> to unexpected error
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) 
> ~[my-restreamer.jar:?]
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:395) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934)
>  [my-restreamer.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KAFKA-6457) Error: NOT_LEADER_FOR_PARTITION leads to NPE

2018-09-06 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-6457:


> Error: NOT_LEADER_FOR_PARTITION leads to NPE
> 
>
> Key: KAFKA-6457
> URL: https://issues.apache.org/jira/browse/KAFKA-6457
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.0.1
>
>
> One of our nodes was dead. Then the second one tooks all responsibility.
> But streamming aplication in the meanwhile crashed due to NPE caused by 
> {{Error: NOT_LEADER_FOR_PARTITION}}.
> The stack trace is below.
>  
> Is it something expected?
>  
> {code:java}
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer ...2018-01-17 
> 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-5, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-7, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [ERROR] AbstractCoordinator:296 - [Consumer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-consumer,
>  groupId=restreamer-my] Heartbeat thread for group restreamer-my failed due 
> to unexpected error
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) 
> ~[my-restreamer.jar:?]
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:395) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934)
>  [my-restreamer.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6457) Error: NOT_LEADER_FOR_PARTITION leads to NPE

2018-09-06 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6457.

Resolution: Duplicate

> Error: NOT_LEADER_FOR_PARTITION leads to NPE
> 
>
> Key: KAFKA-6457
> URL: https://issues.apache.org/jira/browse/KAFKA-6457
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.0.1
>
>
> One of our nodes was dead. Then the second one tooks all responsibility.
> But streamming aplication in the meanwhile crashed due to NPE caused by 
> {{Error: NOT_LEADER_FOR_PARTITION}}.
> The stack trace is below.
>  
> Is it something expected?
>  
> {code:java}
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer ...2018-01-17 
> 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-5, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-7, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [ERROR] AbstractCoordinator:296 - [Consumer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-consumer,
>  groupId=restreamer-my] Heartbeat thread for group restreamer-my failed due 
> to unexpected error
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) 
> ~[my-restreamer.jar:?]
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:395) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934)
>  [my-restreamer.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Andy Bryant (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606492#comment-16606492
 ] 

Andy Bryant commented on KAFKA-7373:


Thanks Stanislav - good spot on the KIP. Hopefully that one makes it into 2.1.0.

Regarding the OutOfMemory, try the following prior to the call to increase the 
default heap allocation for the call.
{code:java}
KAFKA_HEAP_OPTS=-Xms512m -Xmx1g{code}

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5235) GetOffsetShell: retrieve offsets for all given topics and partitions with single request to the broker

2018-09-06 Thread Andy Bryant (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606488#comment-16606488
 ] 

Andy Bryant commented on KAFKA-5235:


Looking forward to this KIP being implemented. We currently can't use 
GetOffsetShell as it stands with our cluster as is uses SSL authentication (see 
KAFKA-7373). Having consistent parameter values will be a welcome change too.

> GetOffsetShell: retrieve offsets for all given topics and partitions with 
> single request to the broker
> --
>
> Key: KAFKA-5235
> URL: https://issues.apache.org/jira/browse/KAFKA-5235
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Arseniy Tashoyan
>Priority: Major
>  Labels: kip, tool
> Fix For: 2.1.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> GetOffsetShell is implemented on old SimpleConsumer. It needs Zookeeper to 
> retrieve metadata about topics and partitions. At present, GetOffsetShell 
> does the following:
> - get metadata from Zookeeper
> - iterate over partitions
> - for each partition, connect to its leader broker and request offsets
> Instead, GetOffsetShell can use new KafkaConsumer and retrieve offsets by 
> means of endOffsets(), beginningOffsets() and offsetsForTimes() methods. One 
> request is sufficient for all topics and partitions.
> As far as GetOffsetShell is re-implemented with new KafkaConsumer API, it 
> will not depend on obsolete API: SimpleConsumer, old producer API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7044) kafka-consumer-groups.sh NullPointerException describing round robin or sticky assignors

2018-09-06 Thread Anna Povzner (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606375#comment-16606375
 ] 

Anna Povzner commented on KAFKA-7044:
-

Hi [~vahid], We are seeing this bug a lot recently, and really want to get it 
fixed asap. I will try Jason's suggestion and open a PR if it works. I hope 
that's ok with you. 

> kafka-consumer-groups.sh NullPointerException describing round robin or 
> sticky assignors
> 
>
> Key: KAFKA-7044
> URL: https://issues.apache.org/jira/browse/KAFKA-7044
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.1.0
> Environment: CentOS 7.4, Oracle JDK 1.8
>Reporter: Jeff Field
>Assignee: Vahid Hashemian
>Priority: Major
> Fix For: 2.1.0
>
>
> We've recently moved to using the round robin assignor for one of our 
> consumer groups, and started testing the sticky assignor. In both cases, 
> using Kafka 1.1.0 we get a null pointer exception *unless* the group being 
> described is rebalancing:
> {code:java}
> kafka-consumer-groups --bootstrap-server fqdn:9092 --describe --group 
> groupname-for-consumer
> Error: Executing consumer group command failed due to null
> [2018-06-12 01:32:34,179] DEBUG Exception in consumer group command 
> (kafka.admin.ConsumerGroupCommand$)
> java.lang.NullPointerException
> at scala.Predef$.Long2long(Predef.scala:363)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:612)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:610)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:610)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describePartitions(ConsumerGroupCommand.scala:328)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.collectConsumerAssignment(ConsumerGroupCommand.scala:308)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:544)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:571)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:565)
> at scala.collection.immutable.List.flatMap(List.scala:338)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:565)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:558)
> at scala.Option.map(Option.scala:146)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectGroupOffsets(ConsumerGroupCommand.scala:558)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeGroup(ConsumerGroupCommand.scala:271)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:544)
> at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:77)
> at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
> [2018-06-12 01:32:34,255] DEBUG Removed sensor with name connections-closed: 
> (org.apache.kafka.common.metrics.Metrics){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7383) Verify leader epoch in produce requests (KIP-359)

2018-09-06 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7383:
--

 Summary: Verify leader epoch in produce requests (KIP-359)
 Key: KAFKA-7383
 URL: https://issues.apache.org/jira/browse/KAFKA-7383
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Implementation of 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+produce+requests.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6817) UnknownProducerIdException when writing messages with old timestamps

2018-09-06 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606362#comment-16606362
 ] 

Matthias J. Sax commented on KAFKA-6817:


Yes, it marked as `1.1` and thus an issue for `2.0`, too (otherwise, it would 
be marked as fixed in `2.0`). It's a broker side issue and thus a broker 
upgrade will be required after it is fixed.

> UnknownProducerIdException when writing messages with old timestamps
> 
>
> Key: KAFKA-6817
> URL: https://issues.apache.org/jira/browse/KAFKA-6817
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Odin Standal
>Priority: Major
>
> We are seeing the following exception in our Kafka application: 
> {code:java}
> ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer 
> due to the following error: org.apache.kafka.streams.errors.StreamsException: 
> task [0_0] Abort sending since an error caught with a previous record (key 
> 22 value some-value timestamp 1519200902670) to topic 
> exactly-once-test-topic- v2 due to This exception is raised by the broker if 
> it could not locate the producer metadata associated with the producerId in 
> question. This could happen if, for instance, the producer's records were 
> deleted because their retention time had elapsed. Once the last records of 
> the producerId are removed, the producer's metadata is removed from the 
> broker, and future appends by the producer will return this exception. at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
> at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
>  at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) 
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
>  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163) at 
> java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.kafka.common.errors.UnknownProducerIdException
> {code}
> We discovered this error when we had the need to reprocess old messages. See 
> more details on 
> [Stackoverflow|https://stackoverflow.com/questions/49872827/unknownproduceridexception-in-kafka-streams-when-enabling-exactly-once?noredirect=1#comment86901917_49872827]
> We have reproduced the error with a smaller example application. The error 
> occurs after 10 minutes of producing messages that have old timestamps (type 
> 1 year old). The topic we are writing to has a retention.ms set to 1 year so 
> we are expecting the messages to stay there.
> After digging through the ProducerStateManager-code in the Kafka source code 
> we have a theory of what might be wrong.
> The ProducerStateManager.removeExpiredProducers() seems to remove producers 
> from memory erroneously when processing records which are older than the 
> maxProducerIdExpirationMs (coming from the `transactional.id.expiration.ms` 
> configuration), which is set by default to 7 days. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5117) Kafka Connect REST endpoints reveal Password typed values

2018-09-06 Thread satyanarayan komandur (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606326#comment-16606326
 ] 

satyanarayan komandur commented on KAFKA-5117:
--

I would like to add couple of more points related to this KIP

Currently i noticed even accessing end point

connectors/\{connector-name}/status is also hitting the configuration. I think 
this endpoint need not gather config information.

 

 

 

> Kafka Connect REST endpoints reveal Password typed values
> -
>
> Key: KAFKA-5117
> URL: https://issues.apache.org/jira/browse/KAFKA-5117
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Thomas Holmes
>Priority: Major
>  Labels: needs-kip
>
> A Kafka Connect connector can specify ConfigDef keys as type of Password. 
> This type was added to prevent logging the values (instead "[hidden]" is 
> logged).
> This change does not apply to the values returned by executing a GET on 
> {{connectors/\{connector-name\}}} and 
> {{connectors/\{connector-name\}/config}}. This creates an easily accessible 
> way for an attacker who has infiltrated your network to gain access to 
> potential secrets that should not be available.
> I have started on a code change that addresses this issue by parsing the 
> config values through the ConfigDef for the connector and returning their 
> output instead (which leads to the masking of Password typed configs as 
> [hidden]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7214) Mystic FATAL error

2018-09-06 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606303#comment-16606303
 ] 

John Roesler commented on KAFKA-7214:
-

Hi [~habdank],

If I understood the scenario, you get stability problems when you increase the 
load by 50x but only increase memory 6x. This doesn't seem too surprising.

Following a simple linear projection, if your app is healthy at 100 msg/s with 
300 MB heap, when you increase the load by 50x (if it were heap-constrained to 
begin with) you also need 50x the heap, which puts you at 15GB. The fact that 
you are healthy at 1/3 this, or 5GB, indicates that it actually wasn't heap 
constrained to begin with.

It seems like your initial hypothesis was that you needed 3MB per msg/sec, and 
your new hypothesis is that you need 1MB per msg/sec. So if you scale up again, 
you can use this as a starting point and adjust down or up, depending on how 
the system performs.

Note that on the lower end of the spectrum, the overhead will tend to dominate, 
so I'm not sure if you can run 100 msg/s in only 100MB of heap, and you almost 
certainly cannot run 1 msg/s in 1MB of heap.

Scaling up, the data itself will dominate the heap, but you'll find that there 
is also a limit to this thinking, as Java performs poorly with very large heaps 
(like terrabyte range).

 

About your analysis:

> 5000 Msg/s ~ 150 000 Mgs/30 sec ~ 150 MB

This is a good lower bound, since you know that at a minimum all the live 
messages must reside in memory, but it is not likely to be a tight lower bound.

This assumes that there is no overhead at all. That is, that the only thing in 
the heap is the messages themselves. Which cannot be true, since the JVM has 
overhead of its own, and Streams and the Clients have their own data structures 
to maintain, and each message is also resident a couple of different times due 
to serialization and deserialization, and finally because Java is memory 
managed, so every object continues to occupy heap after it is live until it 
gets collected.

 

Does this help?

-John

 

> Mystic FATAL error
> --
>
> Key: KAFKA-7214
> URL: https://issues.apache.org/jira/browse/KAFKA-7214
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.3, 1.1.1
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Critical
>
> Dears,
> Very often at startup of the streaming application I got exception:
> {code}
> Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-00, 
> topic=my_instance_medium_topic, partition=1, offset=198900203; 
> [org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:212),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:347),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:420),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:339),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:648),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:513),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:482),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:459)]
>  in thread 
> my_application-my_instance-my_instance_medium-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62
> {code}
> and then (without shutdown request from my side):
> {code}
> 2018-07-30 07:45:02 [ar313] [INFO ] StreamThread:912 - stream-thread 
> [my_application-my_instance-my_instance-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62]
>  State transition from PENDING_SHUTDOWN to DEAD.
> {code}
> What is this?
> How to correctly handle it?
> Thanks in advance for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-06 Thread Manikumar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606206#comment-16606206
 ] 

Manikumar commented on KAFKA-4544:
--

[~asasvari]

_>> Initially, I wanted to use console_consumer.py and verifiable clients to 
validate things (messages produced / consumed), but I ran into some issues:_

We can use "sasl.jaas.config" client config property to pass token credentials. 
With this we can avoid jaas.conf for token authentication. This can simplify 
the code. We should be able to pass this property to console_consumer.py and 
verifiable clients.

{code:java}
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required \
 username="BQoHvMSzRpCC0Z8EHnCwkA" \
 
password="WKzmngyxSyoEdlkzUQoP7TsTrsNzeMC6+aKg7S0oeLkV+dnzBMjYo3tTtlAYYSFmLs4bTjf+lTZ1LCHR/ZZFNA=="
 \
 tokenauth="true";
{code}



> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6817) UnknownProducerIdException when writing messages with old timestamps

2018-09-06 Thread Collin Scangarella (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606121#comment-16606121
 ] 

Collin Scangarella edited comment on KAFKA-6817 at 9/6/18 5:28 PM:
---

Does this bug impact 2.0? If not, would a streams upgrade be enough? Or would 
we have to upgrade the brokers as well?


was (Author: col...@scangarella.com):
Does this bug impact 2.0?

> UnknownProducerIdException when writing messages with old timestamps
> 
>
> Key: KAFKA-6817
> URL: https://issues.apache.org/jira/browse/KAFKA-6817
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Odin Standal
>Priority: Major
>
> We are seeing the following exception in our Kafka application: 
> {code:java}
> ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer 
> due to the following error: org.apache.kafka.streams.errors.StreamsException: 
> task [0_0] Abort sending since an error caught with a previous record (key 
> 22 value some-value timestamp 1519200902670) to topic 
> exactly-once-test-topic- v2 due to This exception is raised by the broker if 
> it could not locate the producer metadata associated with the producerId in 
> question. This could happen if, for instance, the producer's records were 
> deleted because their retention time had elapsed. Once the last records of 
> the producerId are removed, the producer's metadata is removed from the 
> broker, and future appends by the producer will return this exception. at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
> at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
>  at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) 
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
>  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163) at 
> java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.kafka.common.errors.UnknownProducerIdException
> {code}
> We discovered this error when we had the need to reprocess old messages. See 
> more details on 
> [Stackoverflow|https://stackoverflow.com/questions/49872827/unknownproduceridexception-in-kafka-streams-when-enabling-exactly-once?noredirect=1#comment86901917_49872827]
> We have reproduced the error with a smaller example application. The error 
> occurs after 10 minutes of producing messages that have old timestamps (type 
> 1 year old). The topic we are writing to has a retention.ms set to 1 year so 
> we are expecting the messages to stay there.
> After digging through the ProducerStateManager-code in the Kafka source code 
> we have a theory of what might be wrong.
> The ProducerStateManager.removeExpiredProducers() seems to remove producers 
> from memory erroneously when processing records which are older than the 
> maxProducerIdExpirationMs (coming from the `transactional.id.expiration.ms` 
> configuration), which is set by default to 7 days. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7093) Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0

2018-09-06 Thread William Hammond (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606126#comment-16606126
 ] 

William Hammond commented on KAFKA-7093:


Seeing the same issue as well, only on the __consumer_offsets partitions. 

> Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0
> 
>
> Key: KAFKA-7093
> URL: https://issues.apache.org/jira/browse/KAFKA-7093
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.1.0
>Reporter: Suleyman
>Priority: Major
>
> I upgraded to kafka version from 0.11.0.1 to 1.1.0. After the upgrade, I'm 
> getting the below warn message too much.
> WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. 
> This implies messages have arrived out of order. New: \{epoch:0, 
> offset:793868383}, Current: \{epoch:4, offset:792201264} for Partition: 
> __consumer_offsets-42 (kafka.server.epoch.LeaderEpochFileCache) 
> How can I resolve this warn messages? And why I'm getting this warn messages?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6817) UnknownProducerIdException when writing messages with old timestamps

2018-09-06 Thread Collin Scangarella (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606121#comment-16606121
 ] 

Collin Scangarella commented on KAFKA-6817:
---

Does this bug impact 2.0?

> UnknownProducerIdException when writing messages with old timestamps
> 
>
> Key: KAFKA-6817
> URL: https://issues.apache.org/jira/browse/KAFKA-6817
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Odin Standal
>Priority: Major
>
> We are seeing the following exception in our Kafka application: 
> {code:java}
> ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer 
> due to the following error: org.apache.kafka.streams.errors.StreamsException: 
> task [0_0] Abort sending since an error caught with a previous record (key 
> 22 value some-value timestamp 1519200902670) to topic 
> exactly-once-test-topic- v2 due to This exception is raised by the broker if 
> it could not locate the producer metadata associated with the producerId in 
> question. This could happen if, for instance, the producer's records were 
> deleted because their retention time had elapsed. Once the last records of 
> the producerId are removed, the producer's metadata is removed from the 
> broker, and future appends by the producer will return this exception. at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
>  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596) 
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
> at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
>  at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) 
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
>  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163) at 
> java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.kafka.common.errors.UnknownProducerIdException
> {code}
> We discovered this error when we had the need to reprocess old messages. See 
> more details on 
> [Stackoverflow|https://stackoverflow.com/questions/49872827/unknownproduceridexception-in-kafka-streams-when-enabling-exactly-once?noredirect=1#comment86901917_49872827]
> We have reproduced the error with a smaller example application. The error 
> occurs after 10 minutes of producing messages that have old timestamps (type 
> 1 year old). The topic we are writing to has a retention.ms set to 1 year so 
> we are expecting the messages to stay there.
> After digging through the ProducerStateManager-code in the Kafka source code 
> we have a theory of what might be wrong.
> The ProducerStateManager.removeExpiredProducers() seems to remove producers 
> from memory erroneously when processing records which are older than the 
> maxProducerIdExpirationMs (coming from the `transactional.id.expiration.ms` 
> configuration), which is set by default to 7 days. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6699) When one of two Kafka nodes are dead, streaming API cannot handle messaging

2018-09-06 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606059#comment-16606059
 ] 

Matthias J. Sax commented on KAFKA-6699:


It's recommended to run with at least 3 broker, use replication factor 3, isr = 
2, and acks=all to guarantee consistency if one broker goes down.

With your setting, ie, ack=1, a write is acknowledge if the leader received it, 
but before is was replicated. Thus, if the leader broker fails, before 
replication happens, you might loose the write (even if it was acked to the 
producer client).

Also, with ISR=2, if one broker goes down, you are not able to write any 
longer. This is the reason why your Streams application starts to fail – the 
cluster cannot provide 2 ISRs, and thus rejects writes. You can either add one 
more broker, or reduce ISR to 1, to allow writes if one broker goes down.

Overall, this ticket seems to be a config issue, rather than a bug. In general, 
please use the mailing list and open tickets if you know it's a bug. If you can 
verify that changing the config/setup resolves your issue, please report back 
so we can close this ticket.

> When one of two Kafka nodes are dead, streaming API cannot handle messaging
> ---
>
> Key: KAFKA-6699
> URL: https://issues.apache.org/jira/browse/KAFKA-6699
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
>
> Dears,
> I am observing quite often, when Kafka Broker is partly dead(*), then 
> application, which uses streaming API are doing nothing.
> (*) Partly dead in my case it means that one of two Kafka nodes are out of 
> order. 
> Especially when disk is full on one machine, then Broker is going in some 
> strange state, where streaming API goes vacations. It seems like regular 
> producer/consumer API has no problem in such a case.
> Can you have a look on that matter?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KAFKA-5882) NullPointerException in StreamTask

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki closed KAFKA-5882.
-

Not more visible in Kafka 1.1.1

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.1.1
>
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2018-09-06 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606053#comment-16606053
 ] 

Guozhang Wang commented on KAFKA-5882:
--

Thanks [~habdank] for confirming, since [~vikasamar] mentioned they see the 
issue but in 0.11.0.x only, I think it is okay to resolve it as for 1.1.1 for 
now.

If people see a similar issue beyond 1.1.1, we can always create a new JIRA.

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.1.1
>
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5882) NullPointerException in StreamTask

2018-09-06 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-5882.
--
   Resolution: Fixed
Fix Version/s: 1.1.1

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.1.1
>
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-06 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605944#comment-16605944
 ] 

Attila Sasvari commented on KAFKA-4544:
---

[~omkreddy] thanks for the info, I extended the test case to better cover the 
lifecycle of a delegation token based on your idea:
- Create delegation token
- Create a console-producer using SCRAM and delegation token and produce a 
message
- Verify message is created (with kafka.search_data_files() )
- Create a console-consumer using SCRAM and delegation token and consume 1 
message 
- Expire the token, immediately
- Try producing one more message with the expired token
- Verify the last message is not persisted by the broker

Initially, I wanted to use console_consumer.py and verifiable clients to 
validate things (messages produced / consumed), but I ran into some issues:
- jaas.conf / KafkaClient config cannot include more login modules 
{code}  
Multiple LoginModule-s in JAAS 
Caused by: java.lang.IllegalArgumentException: JAAS config property contains 2 
login modules, should be 1 module
at 
org.apache.kafka.common.security.JaasContext.load(JaasContext.java:95)
at 
org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84)
at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:119)
at 
org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:65)
at 
org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:88)
at 
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:419)
{code}
- To request a delegation token, we need GSSAPI (and use keytab), subsequently, 
consumers and producers use the delegation token. So I ended up constructing 
manually the jaas.config and client configs in my POC. 
- With and even without my changes, JMX failed to start up when I tried to run 
{{./ducker-ak test ../kafkatest/sanity_checks/test_console_consumer.py}}:
{code}
Exception in thread ConsoleConsumer-0-140287252789520-worker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/services/background_thread.py",
 line 35, in _protected_worker
self._worker(idx, node)
  File "/opt/kafka-dev/tests/kafkatest/services/console_consumer.py", line 229, 
in _worker
self.start_jmx_tool(idx, node)
  File "/opt/kafka-dev/tests/kafkatest/services/monitor/jmx.py", line 86, in 
start_jmx_tool
wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: ducker@ducker04: Jmx tool took too long to start
{code}

Right now a lot of things are 
[hardcoded|https://github.com/asasvari/kafka/commit/edfc37012079764d2a589dbf5f24ad04505975d4#diff-3e7b2bdbd55d075bcebbbe5ba8c4e269]
 (using shell scripts) in my POC. It would be nice to extract common 
functionalities and make them easily reusable (e.g. creating wrappers in 
python, for example, to do delegation token handling). 
 

> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605937#comment-16605937
 ] 

Stanislav Kozlovski commented on KAFKA-7373:


Hi [~kiwiandy],

There is KIP-308 for GetOffsetShell which suggests the addition of a 
`--consumer-property` field:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+GetOffsetShell%3A+new+KafkaConsumer+API%2C+support+for+multiple+topics%2C+minimize+the+number+of+requests+to+server

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Stanislav Kozlovski (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski reassigned KAFKA-7373:
--

Assignee: (was: Andy Bryant)

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Stanislav Kozlovski (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski reassigned KAFKA-7373:
--

Assignee: Andy Bryant  (was: Stanislav Kozlovski)

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Assignee: Andy Bryant
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605931#comment-16605931
 ] 

ASF GitHub Bot commented on KAFKA-7373:
---

stanislavkozlovski closed pull request #5617: KAFKA-7373: Allow GetOffsetShell 
command to accept a configurations file
URL: https://github.com/apache/kafka/pull/5617
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/core/src/main/scala/kafka/tools/GetOffsetShell.scala 
b/core/src/main/scala/kafka/tools/GetOffsetShell.scala
index eafddc66de4..6174479217f 100644
--- a/core/src/main/scala/kafka/tools/GetOffsetShell.scala
+++ b/core/src/main/scala/kafka/tools/GetOffsetShell.scala
@@ -26,6 +26,7 @@ import org.apache.kafka.clients.consumer.{ConsumerConfig, 
KafkaConsumer}
 import org.apache.kafka.common.{PartitionInfo, TopicPartition}
 import org.apache.kafka.common.requests.ListOffsetRequest
 import org.apache.kafka.common.serialization.ByteArrayDeserializer
+import org.apache.kafka.common.utils.Utils
 
 import scala.collection.JavaConverters._
 
@@ -61,6 +62,10 @@ object GetOffsetShell {
.describedAs("ms")
.ofType(classOf[java.lang.Integer])
.defaultsTo(1000)
+val configOpt = parser.accepts("config", s"Configuration properties file. 
Useful for configuring authentication. Note that all argument options take 
precedence over this config.")
+  .withOptionalArg()
+  .describedAs("config file")
+  .ofType(classOf[String])
 
if (args.length == 0)
   CommandLineUtils.printUsageAndDie(parser, "An interactive shell for 
getting topic offsets.")
@@ -89,7 +94,10 @@ object GetOffsetShell {
 }
 val listOffsetsTimestamp = options.valueOf(timeOpt).longValue
 
-val config = new Properties
+val config = if (options.has(configOpt))
+  Utils.loadProps(options.valueOf(configOpt))
+else
+  new Properties
 config.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)
 config.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, clientId)
 val consumer = new KafkaConsumer(config, new ByteArrayDeserializer, new 
ByteArrayDeserializer)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Assignee: Stanislav Kozlovski
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6777) Wrong reaction on Out Of Memory situation

2018-09-06 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605926#comment-16605926
 ] 

John Roesler commented on KAFKA-6777:
-

Yes, it certainly seems safer to generally avoid catching Errors.

> But anyhow, I would expect, that lack of critical resources like memory will 
>quickly lead to crash with FATAL error e.g. Out Of Memory.

Have you seen some evidence that this is not happening? Note that frequent or 
long GC pauses are not the same as running out of memory.

> Wrong reaction on Out Of Memory situation
> -
>
> Key: KAFKA-6777
> URL: https://issues.apache.org/jira/browse/KAFKA-6777
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> Dears,
> We already encountered many times problems related to Out Of Memory situation 
> in Kafka Broker and streaming clients.
> The scenario is the following.
> When Kafka Broker (or Streaming Client) is under load and has too less 
> memory, there are no errors in server logs. One can see some cryptic entries 
> in GC logs, but they are definitely not self-explaining.
> Kafka Broker (and Streaming Clients) works further. Later we see in JMX 
> monitoring, that JVM uses more and more time in GC. In our case it grows from 
> e.g. 1% to 80%-90% of CPU time is used by GC.
> Next, software collapses into zombie mode – process in not ending. In such a 
> case I would expect, that process is crashing (e.g. got SIG SEGV). Even worse 
> Kafka treats such a zombie process normal and somewhat sends messages, which 
> are in fact getting lost, also the cluster is not excluding broken nodes. The 
> question is how to configure Kafka to really terminate the JVM and not remain 
> in zombie mode, to give a chance to other nodes to know, that something is 
> dead.
> I would expect that in Out Of Memory situation JVM is ended if not graceful 
> than at least process is crashed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605911#comment-16605911
 ] 

ASF GitHub Bot commented on KAFKA-7373:
---

stanislavkozlovski opened a new pull request #5617: KAFKA-7373: Allow 
GetOffsetShell command to accept a configurations file
URL: https://github.com/apache/kafka/pull/5617
 
 
   GetOffsetShell doesn't provide a mechanism to provide additional 
configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
This leaves it unable to connect to a broker using SSL.
   
   This PR allows it to accept a client configuration file, subsequently 
allowing it to provide SSL configurations and connect to a broker.
   
   I tested this manually. Trying to connect to a broker's SSL listener raised 
an out of memory error for me. After passing in the appropriate configurations 
via a config file, it connected successfully


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Assignee: Stanislav Kozlovski
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605905#comment-16605905
 ] 

Stanislav Kozlovski commented on KAFKA-7373:


I get another error with Kafka trunk
{code:java}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
java.base/java.nio.HeapByteBuffer.(HeapByteBuffer.java:68) at 
java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:341) at 
org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
 at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:360) 
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:321) at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:609) at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:541) 
at org.apache.kafka.common.network.Selector.poll(Selector.java:467) at 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510) at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
 at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
 at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:272)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:255)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1799)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1777)
 at kafka.tools.GetOffsetShell$.listPartitionInfos(GetOffsetShell.scala:156) at 
kafka.tools.GetOffsetShell$.main(GetOffsetShell.scala:110) at 
kafka.tools.GetOffsetShell.main(GetOffsetShell.scala)
{code}

I'm going to implement the passing of a config file approach

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Assignee: Stanislav Kozlovski
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-06 Thread Stanislav Kozlovski (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski reassigned KAFKA-7373:
--

Assignee: Stanislav Kozlovski

> GetOffsetShell doesn't work when SSL authentication is enabled
> --
>
> Key: KAFKA-7373
> URL: https://issues.apache.org/jira/browse/KAFKA-7373
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Assignee: Stanislav Kozlovski
>Priority: Major
>
> GetOffsetShell doesn't provide a mechanism to provide additional 
> configuration for the underlying KafkaConsumer as does the `ConsoleConsumer`. 
> Passing SSL config as system properties doesn't propagate to the consumer 
> either.
> {code:java}
> 10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> ${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Timeout expired while fetching topic metadata{code}
> Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
> KafkaConsumer configuration resolved the issue.
> Providing {{consumer-property}} and {{consumer-config}} configuration options 
> for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
> using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605883#comment-16605883
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-5882:
---

In kafka 1.1.1 I do not see this anymore.
Shall I close the issue with comment about that?

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KAFKA-6457) Error: NOT_LEADER_FOR_PARTITION leads to NPE

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki closed KAFKA-6457.
-

In Kafka 1.1.1 see it no more.

> Error: NOT_LEADER_FOR_PARTITION leads to NPE
> 
>
> Key: KAFKA-6457
> URL: https://issues.apache.org/jira/browse/KAFKA-6457
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
> Fix For: 1.0.1
>
>
> One of our nodes was dead. Then the second one tooks all responsibility.
> But streamming aplication in the meanwhile crashed due to NPE caused by 
> {{Error: NOT_LEADER_FOR_PARTITION}}.
> The stack trace is below.
>  
> Is it something expected?
>  
> {code:java}
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer ...2018-01-17 
> 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-5, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer]
>  Got error produce response with correlation id 768962 on topic-partition 
> my_internal_topic-7, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION
> 2018-01-17 11:47:21 [my] [ERROR] AbstractCoordinator:296 - [Consumer 
> clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-consumer,
>  groupId=restreamer-my] Heartbeat thread for group restreamer-my failed due 
> to unexpected error
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) 
> ~[my-restreamer.jar:?]
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:395) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) 
> ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275)
>  ~[my-restreamer.jar:?]
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934)
>  [my-restreamer.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6699) When one of two Kafka nodes are dead, streaming API cannot handle messaging

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605867#comment-16605867
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-6699:
---

ISR: 2
ack: 1

Do you suggest to have more brokers?

> When one of two Kafka nodes are dead, streaming API cannot handle messaging
> ---
>
> Key: KAFKA-6699
> URL: https://issues.apache.org/jira/browse/KAFKA-6699
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
>
> Dears,
> I am observing quite often, when Kafka Broker is partly dead(*), then 
> application, which uses streaming API are doing nothing.
> (*) Partly dead in my case it means that one of two Kafka nodes are out of 
> order. 
> Especially when disk is full on one machine, then Broker is going in some 
> strange state, where streaming API goes vacations. It seems like regular 
> producer/consumer API has no problem in such a case.
> Can you have a look on that matter?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7214) Mystic FATAL error

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605851#comment-16605851
 ] 

Seweryn Habdank-Wojewodzki edited comment on KAFKA-7214 at 9/6/18 2:40 PM:
---

Problem is that  KSTREAM-SOURCE-X is mostly  KSTREAM-SOURCE-0 
independently of which process and how much processes are running (or trying to 
run).

How I reproduce error at my side. Let's assume I have low message flow < 100 
Msg/sec Msg size ~ 1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300MB JVM Heap. It is starting. Cool.

At second server I am starting. second instance. The same. It is starting.

The other case. Let's assume I have a bit higher message flow > 5000 Msg/sec 
Msg size ~ 1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300 MB JVM Heap. It is not starting, even if memory spec 
stays that it is enough to calculate 30 sec of messages.
5000 Msg/s ~ 150 000 Mgs/30 sec ~ 150 MB.
I am giving to app 2GB Heap. Is starting. Everything between 300 MB and 2 GB 
leads at some point to yet another mystic crasches.

At second server I am starting. second instance. If I am starting it with 300 
MB - I got immediately this error. Application tries to starrt, but then I got 
this error and all affected topics are goig to be dead. If I am giving 1GB, it 
is better application works some hours, but any minimal peak aroud 5000 Msg/s 
to e.g. 7000 Msg/s, causes the same. Finally - now - I am starting processes 
with 5GB. they could work continuously like 2-4 days.

I am sorry I have no better description.
Once I tried to start TRACE level logs in Kafka, but this is impossible with 
message flow at 5000 Msg/s.




was (Author: habdank):
Problem is that  KSTREAM-SOURCE-X is mostly  KSTREAM-SOURCE-0 
independently of which process and how much processes are running (or trying to 
run).

How I reproduce error at my side. Let's assume I have low message flow < 100 
Msg/sec Msg size ~ 1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300MB JVM Heap. It is starting. Cool.

At second server I am starting. second instance. The same. It is starting.

The other case. Let's assume I have a bit higher message flow > 5000 Msg/sec 
Msg size ~ 1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300 MB JVM Heap. It is not starting, even in memory spec 
stays that it is enough to calculate 30 sec of messages.
5000 Msg/s ~ 150 000 Mgs/30 sec ~ 150 MB.
I am giving to app 2GB Heap. Is starting. Everything between 300 MB and 2 GB 
leads at some point to yet another mystic crasches.

At second server I am starting. second instance. If I am starting it with 300 
MB - I got immediately this error. Application tries to starrt, but then I got 
this error and all affected topics are goig to be dead. If I am giving 1GB, it 
is better application works some hours, but any minimal peak aroud 5000 Msg/s 
to e.g. 7000 Msg/s, causes the same. Finally - now - I am starting processes 
with 5GB. they could work continuously like 2-4 days.

I am sorry I have no better description.
Once I tried to start TRACE level logs in Kafka, but this is impossible with 
message flow at 5000 Msg/s.



> Mystic FATAL error
> --
>
> Key: KAFKA-7214
> URL: https://issues.apache.org/jira/browse/KAFKA-7214
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.3, 1.1.1
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Critical
>
> Dears,
> Very often at startup of the streaming application I got exception:
> {code}
> Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-00, 
> topic=my_instance_medium_topic, partition=1, offset=198900203; 
> [org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:212),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:347),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:420),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:339),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:648),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:513),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:482),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:459)]
>  in thread 
> 

[jira] [Commented] (KAFKA-7214) Mystic FATAL error

2018-09-06 Thread Seweryn Habdank-Wojewodzki (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605851#comment-16605851
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-7214:
---

Problem is that  KSTREAM-SOURCE-X is mostly  KSTREAM-SOURCE-0 
independently of which process and how much processes are running (or trying to 
run).

How I reproduce error at my side. Let's assume I have low message flow < 100 
Msg/sec Msg size ~ 1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300MB JVM Heap. It is starting. Cool.

At second server I am starting. second instance. The same. It is starting.

The other case. Let's assume I have low message flow > 5000 Msg/sec Msg size ~ 
1kB.

I am starting app using streaming API. This app reads from 30 topics and send 
messages to 1 topic.
Let's give this app 300 MB JVM Heap. It is not starting, even in memory spec 
stays that it is enough to calculate 30 sec of messages.
5000 Msg/s ~ 150 000 Mgs/30 sec ~ 150 MB.
I am giving to app 2GB Heap. Is starting. Everything between 300 MB and 2 GB 
leads at some point to yet another mystic crasches.

At second server I am starting. second instance. If I am starting it with 300 
MB - I got immediately this error. Application tries to starrt, but then I got 
this error and all affected topics are goig to be dead. If I am giving 1GB, it 
is better application works some hours, but any minimal peak aroud 5000 Msg/s 
to e.g. 7000 Msg/s, causes the same. Finally - now - I am starting processes 
with 5GB. they could work continuously like 2-4 days.

I am sorry I have no better description.
Once I tried to start TRACE level logs in Kafka, but this is impossible with 
message flow at 5000 Msg/s.



> Mystic FATAL error
> --
>
> Key: KAFKA-7214
> URL: https://issues.apache.org/jira/browse/KAFKA-7214
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.3, 1.1.1
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Critical
>
> Dears,
> Very often at startup of the streaming application I got exception:
> {code}
> Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-00, 
> topic=my_instance_medium_topic, partition=1, offset=198900203; 
> [org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:212),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:347),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:420),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:339),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:648),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:513),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:482),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:459)]
>  in thread 
> my_application-my_instance-my_instance_medium-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62
> {code}
> and then (without shutdown request from my side):
> {code}
> 2018-07-30 07:45:02 [ar313] [INFO ] StreamThread:912 - stream-thread 
> [my_application-my_instance-my_instance-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62]
>  State transition from PENDING_SHUTDOWN to DEAD.
> {code}
> What is this?
> How to correctly handle it?
> Thanks in advance for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7382) We shoud guarantee at lest one replica of partition should be alive when create or update topic

2018-09-06 Thread zhaoshijie (JIRA)
zhaoshijie created KAFKA-7382:
-

 Summary: We shoud guarantee at lest one replica of partition 
should be alive when create or update topic
 Key: KAFKA-7382
 URL: https://issues.apache.org/jira/browse/KAFKA-7382
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.0
Reporter: zhaoshijie


For example:I have brokers: 1,2,3,4,5. I create a new topic by command: 
{code:java}
sh kafka-topics.sh --create --topic replicaserror --zookeeper localhost:2181 
--replica-assignment 11:12:13,12:13:14,14:15:11,14:12:11,13:14:11
{code}
Then kafkaController will process this,after partitionStateMachine and 
replicaStateMachine handle state change,topic metadatas and state will be 
strange,partitions is on NewPartition and replicas is on OnlineReplica. 
Next wo can not delete this topic(bacase state change illegal ),This will cause 
a number of problems.So i think wo shoud check replicas assignment when create 
or update topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7093) Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0

2018-09-06 Thread Kathleen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605452#comment-16605452
 ] 

Kathleen commented on KAFKA-7093:
-

We see this error as well on a partition specific to our code but also on a 
partition of __consumer_offsets.

> Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0
> 
>
> Key: KAFKA-7093
> URL: https://issues.apache.org/jira/browse/KAFKA-7093
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.1.0
>Reporter: Suleyman
>Priority: Major
>
> I upgraded to kafka version from 0.11.0.1 to 1.1.0. After the upgrade, I'm 
> getting the below warn message too much.
> WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. 
> This implies messages have arrived out of order. New: \{epoch:0, 
> offset:793868383}, Current: \{epoch:4, offset:792201264} for Partition: 
> __consumer_offsets-42 (kafka.server.epoch.LeaderEpochFileCache) 
> How can I resolve this warn messages? And why I'm getting this warn messages?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7093) Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0

2018-09-06 Thread Jameel Al-Aziz (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605361#comment-16605361
 ] 

Jameel Al-Aziz commented on KAFKA-7093:
---

We've been observing the error as well. It occurred after trying to recover 
from a broker outage. The errors appear to prevent proper recovery or partition 
reassignment.

> Kafka warn messages after upgrade from 0.11.0.1 to 1.1.0
> 
>
> Key: KAFKA-7093
> URL: https://issues.apache.org/jira/browse/KAFKA-7093
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.1.0
>Reporter: Suleyman
>Priority: Major
>
> I upgraded to kafka version from 0.11.0.1 to 1.1.0. After the upgrade, I'm 
> getting the below warn message too much.
> WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. 
> This implies messages have arrived out of order. New: \{epoch:0, 
> offset:793868383}, Current: \{epoch:4, offset:792201264} for Partition: 
> __consumer_offsets-42 (kafka.server.epoch.LeaderEpochFileCache) 
> How can I resolve this warn messages? And why I'm getting this warn messages?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)