[jira] [Updated] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-04-04 Thread Deepak Goyal (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6718:

Summary: Rack Aware Stand-by Task Assignment for Kafka Streams  (was: Rack 
Aware Replica Task Assignment for Kafka Streams)

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Deepak Goyal
>Assignee: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6631) Kafka Streams - Rebalancing exception in Kafka 1.0.0

2018-06-07 Thread Deepak Goyal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504532#comment-16504532
 ] 

Deepak Goyal commented on KAFKA-6631:
-

[~alex.iv] What version of Kafka were you using at the broker end?
 [~guozhang] This is the same issue that we are facing in KAFKA-6976.
 Unexpectedly, some how we are exceeding 1GB which seems to be impossible but 
this is what our stack trace is reporting: 
{{org.apache.kafka.common.network.InvalidReceiveException: Invalid receive 
(size = 1195725856 larger than 104857600)}}

> Kafka Streams - Rebalancing exception in Kafka 1.0.0
> 
>
> Key: KAFKA-6631
> URL: https://issues.apache.org/jira/browse/KAFKA-6631
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
> Environment: Container Linux by CoreOS 1576.5.0
>Reporter: Alexander Ivanichev
>Priority: Critical
>
>  
> In Kafka Streams 1.0.0, we saw a strange rebalance error, our stream app 
> performs window based aggregations, sometimes on start when all stream 
> workers  join the app just crash, however if we enable only one worker than 
> it works fine, sometime 2 workers work just fine, but when third join the app 
> crashes again, some critical issue with rebalance.
> {code:java}
> 018-03-08T18:51:01.226243000Z org.apache.kafka.common.KafkaException: 
> Unexpected error from SyncGroup: The server experienced an unexpected error 
> when processing the request
> 2018-03-08T18:51:01.226557000Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:566)
> 2018-03-08T18:51:01.22686Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:539)
> 2018-03-08T18:51:01.227328000Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:808)
> 2018-03-08T18:51:01.22763Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:788)
> 2018-03-08T18:51:01.228152000Z at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
> 2018-03-08T18:51:01.228449000Z at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
> 2018-03-08T18:51:01.228897000Z at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
> 2018-03-08T18:51:01.229196000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:506)
> 2018-03-08T18:51:01.229673000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
> 2018-03-08T18:51:01.229971000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:268)
> 2018-03-08T18:51:01.230436000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
> 2018-03-08T18:51:01.230749000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:174)
> 2018-03-08T18:51:01.231065000Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:364)
> 2018-03-08T18:51:01.231584000Z at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
> 2018-03-08T18:51:01.231911000Z at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
> 2018-03-08T18:51:01.23219Z at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1138)
> 2018-03-08T18:51:01.232643000Z at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1103)
> 2018-03-08T18:51:01.233121000Z at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:851)
> 2018-03-08T18:51:01.233409000Z at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:808)
> 2018-03-08T18:51:01.23372Z at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
> 2018-03-08T18:51:01.234196000Z at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
> 2018-03-08T18:51:01.234655000Z org.apache.kafka.common.KafkaException: 
> Unexpected error from SyncGroup: The server experienced an unexpected error 
> when processing the request
> 2018-03-08T18:51:01.234972000Z exception in thread, closing process
> 2018-03-08T18:51:01.23550Z at 
> 

[jira] [Commented] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-06-03 Thread Deepak Goyal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499347#comment-16499347
 ] 

Deepak Goyal commented on KAFKA-6976:
-

[^kafkaStreamsDeadState.log] 
The logs attached are DEBUG level. At line number 64017, you'll be able to see 
the client going into dead state. Also, please note that logs are from an 
instance which served as an application leader for the Kafka-Streams.

> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, Kafka-Streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 machines of Kakfa Streams application. We see that 
> all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we only have 100 kafka-streams application machines, 
> things are working as expected. We see that instances are consuming messages 
> from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497020#comment-16497020
 ] 

Deepak Goyal edited comment on KAFKA-6976 at 5/31/18 7:08 PM:
--

Furthermore, in the course of our application development, our application need 
consuming from four topics (each with 256 partitions). In this scenario, we 
could achieve a working cluster with only 60 machines. Increasing beyond 60 
instances, all instances including the older ones went into DEAD state. We have 
an approximate need of 400 machines in our cluster.


was (Author: _deepakgoyal):
Furthermore, in the course of our application development, our application need 
consuming from four topics (each with 256 partitions). In this scenario, we 
could achieve a working cluster with only 60 machines. We have an approximate 
need of 400 machines in our cluster.

> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, Kafka-Streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 machines of Kakfa Streams application. We see that 
> all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we only have 100 kafka-streams application machines, 
> things are working as expected. We see that instances are consuming messages 
> from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497020#comment-16497020
 ] 

Deepak Goyal commented on KAFKA-6976:
-

Furthermore, in the course of our application development, our application need 
consuming from four topics (each with 256 partitions). In this scenario, we 
could achieve a working cluster with only 60 machines. We have an approximate 
need of 400 machines in our cluster.

> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, Kafka-Streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 machines of Kakfa Streams application. We see that 
> all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we only have 100 kafka-streams application machines, 
> things are working as expected. We see that instances are consuming messages 
> from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6976:

Affects Version/s: 0.10.2.0
  Description: 
We are using Kafka 0.10.2.0, Kafka-Streams 1.1.0. We have Kafka Cluster of 16 
machines, and topic that is being consumed by Kafka Streams has 256 partitions. 
We spawned 400 machines of Kakfa Streams application. We see that all of the 
StreamThreads go in to DEAD state.
{quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
[ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
from PENDING_SHUTDOWN to DEAD 
(org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
State transition from REBALANCING to ERROR 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
have died. The instance will be in error state and should be closed. 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
Shutdown complete (org.apache.kafka.streams.processor.internals.StreamThread)}}
{quote}
Please note that when we only have 100 kafka-streams application machines, 
things are working as expected. We see that instances are consuming messages 
from topic.

  was:
We are using Kafka 0.10.2.0, kafka streams 1.1.0. We have Kafka Cluster of 16 
machines, and topic that is being consumed by Kafka Streams has 256 partitions. 
We spawned 400 instances of Kakfa Streams application. We see that all of the 
StreamThreads go in to DEAD state.
{quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
[ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
from PENDING_SHUTDOWN to DEAD 
(org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
State transition from REBALANCING to ERROR 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
have died. The instance will be in error state and should be closed. 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
Shutdown complete (org.apache.kafka.streams.processor.internals.StreamThread)}}
{quote}

Please note that when we have only 100 kafka instances, things are working as 
expected. We see that instances are consuming messages from topic.


> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, Kafka-Streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 machines of Kakfa Streams application. We see that 
> all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we only have 100 kafka-streams application machines, 
> things are working as expected. We see that instances are consuming messages 
> from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6976:

Issue Type: Bug  (was: New Feature)

> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, kafka streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 instances of Kakfa Streams application. We see 
> that all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we have only 100 kafka instances, things are working as 
> expected. We see that instances are consuming messages from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6976:

Priority: Blocker  (was: Critical)

> Kafka Streams instances going in to DEAD state
> --
>
> Key: KAFKA-6976
> URL: https://issues.apache.org/jira/browse/KAFKA-6976
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Blocker
>
> We are using Kafka 0.10.2.0, kafka streams 1.1.0. We have Kafka Cluster of 16 
> machines, and topic that is being consumed by Kafka Streams has 256 
> partitions. We spawned 400 instances of Kakfa Streams application. We see 
> that all of the StreamThreads go in to DEAD state.
> {quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
> [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
> from PENDING_SHUTDOWN to DEAD 
> (org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
> 05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
> State transition from REBALANCING to ERROR 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
> stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
> have died. The instance will be in error state and should be closed. 
> (org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
> stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
> Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread)}}
> {quote}
> Please note that when we have only 100 kafka instances, things are working as 
> expected. We see that instances are consuming messages from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)
Deepak Goyal created KAFKA-6976:
---

 Summary: Kafka Streams instances going in to DEAD state
 Key: KAFKA-6976
 URL: https://issues.apache.org/jira/browse/KAFKA-6976
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Deepak Goyal


We are using Kafka 0.10.2.0, kafka streams 1.1.0. We have Kafka Cluster of 16 
machines, and topic that is being consumed by Kafka Streams has 256 partitions. 
We spawned 400 instances of Kakfa Streams application. We see that all of the 
StreamThreads go in to DEAD state.
{quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
[ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
from PENDING_SHUTDOWN to DEAD 
(org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
State transition from REBALANCING to ERROR 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
have died. The instance will be in error state and should be closed. 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
Shutdown complete (org.apache.kafka.streams.processor.internals.StreamThread)}}
{quote}

Please note that when we have only 100 kafka instances, things are working as 
expected. We see that instances are consuming messages from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6718) Rack Aware Replica Task Assignment for Kafka Streams

2018-03-28 Thread Deepak Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417173#comment-16417173
 ] 

Deepak Goyal commented on KAFKA-6718:
-

Meanwhile, please look at the PR: [https://github.com/apache/kafka/pull/4785] 

> Rack Aware Replica Task Assignment for Kafka Streams
> 
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Deepak Goyal
>Assignee: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6718) Rack Aware Replica Task Assignment for Kafka Streams

2018-03-27 Thread Deepak Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6718:

Description: 
|Machines in data centre are sometimes grouped in racks. Racks provide 
isolation as each rack may be in a different physical location and has its own 
power source. When tasks are properly replicated across racks, it provides 
fault tolerance in that if a rack goes down, the remaining racks can continue 
to serve traffic.
  
 This feature is already implemented at Kafka 
[KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
 but we needed similar for task assignments at Kafka Streams Application layer. 
  
 This features enables replica tasks to be assigned on different racks for 
fault-tolerance.
 NUM_STANDBY_REPLICAS = x
 totalTasks = x+1 (replica + active)
 # If there are no rackID provided: Cluster will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.|

We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
helps StickyPartitionAssignor to assign tasks in such a way that no two replica 
tasks are on same rack if possible.
 Post that it also helps to maintain stickyness with-in the rack.|

  was:
|Machines in data centre are sometimes grouped in racks. Racks provide 
isolation as each rack may be in a different physical location and has its own 
power source. When tasks are properly replicated across racks, it provides 
fault tolerance in that if a rack goes down, the remaining racks can continue 
to serve traffic.
  
 This feature is already implemented at Kafka 
[KIP-36\|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
 but we needed similar for task assignments at Kafka Streams Application layer. 
  
 This features enables replica tasks to be assigned on different racks for 
fault-tolerance.
 NUM_STANDBY_REPLICAS = x
 totalTasks = x+1 (replica + active)
 # If there are no rackID provided: Cluster will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.|

We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
helps StickyPartitionAssignor to assign tasks in such a way that no two replica 
tasks are on same rack if possible.
 Post that it also helps to maintain stickyness with-in the rack.|


> Rack Aware Replica Task Assignment for Kafka Streams
> 
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Deepak Goyal
>Priority: Major
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks < number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6718) Rack Aware Replica Task Assignment for Kafka Streams

2018-03-27 Thread Deepak Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6718:

Description: 
|Machines in data centre are sometimes grouped in racks. Racks provide 
isolation as each rack may be in a different physical location and has its own 
power source. When tasks are properly replicated across racks, it provides 
fault tolerance in that if a rack goes down, the remaining racks can continue 
to serve traffic.
  
 This feature is already implemented at Kafka 
[KIP-36\|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
 but we needed similar for task assignments at Kafka Streams Application layer. 
  
 This features enables replica tasks to be assigned on different racks for 
fault-tolerance.
 NUM_STANDBY_REPLICAS = x
 totalTasks = x+1 (replica + active)
 # If there are no rackID provided: Cluster will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.|

We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
helps StickyPartitionAssignor to assign tasks in such a way that no two replica 
tasks are on same rack if possible.
 Post that it also helps to maintain stickyness with-in the rack.|

  was:
|Machines in data centre are sometimes grouped in racks. Racks provide 
isolation as each rack may be in a different physical location and has its own 
power source. When tasks are properly replicated across racks, it provides 
fault tolerance in that if a rack goes down, the remaining racks can continue 
to serve traffic.
 
This feature is already implemented at Kafka but we needed similar for task 
assignments at Kafka Streams Application layer. 
 
This features enables replica tasks to be assigned on different racks for 
fault-tolerance.
NUM_STANDBY_REPLICAS = x
 totalTasks = x+1 (replica + active)
 # If there are no rackID provided: Cluster will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.

We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
helps StickyPartitionAssignor to assign tasks in such a way that no two replica 
tasks are on same rack if possible.
Post that it also helps to maintain stickyness with-in the rack.|


> Rack Aware Replica Task Assignment for Kafka Streams
> 
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Deepak Goyal
>Priority: Major
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36\|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks < number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6718) Rack Aware Replica Task Assignment for Kafka Streams

2018-03-27 Thread Deepak Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Goyal updated KAFKA-6718:

Description: 
|Machines in data centre are sometimes grouped in racks. Racks provide 
isolation as each rack may be in a different physical location and has its own 
power source. When tasks are properly replicated across racks, it provides 
fault tolerance in that if a rack goes down, the remaining racks can continue 
to serve traffic.
 
This feature is already implemented at Kafka but we needed similar for task 
assignments at Kafka Streams Application layer. 
 
This features enables replica tasks to be assigned on different racks for 
fault-tolerance.
NUM_STANDBY_REPLICAS = x
 totalTasks = x+1 (replica + active)
 # If there are no rackID provided: Cluster will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.

We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
helps StickyPartitionAssignor to assign tasks in such a way that no two replica 
tasks are on same rack if possible.
Post that it also helps to maintain stickyness with-in the rack.|

  was:
|This features enables replica tasks to be assigned on different racks.
Replication factor = x
Number of Replica tasks = x
totalTasks = x+1 (replica + active) # If there are no racks provided: Cluster 
will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.|


> Rack Aware Replica Task Assignment for Kafka Streams
> 
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Deepak Goyal
>Priority: Major
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>  
> This feature is already implemented at Kafka but we needed similar for task 
> assignments at Kafka Streams Application layer. 
>  
> This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
> NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks < number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
> Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6718) Rack Aware Replica Task Assignment for Kafka Streams

2018-03-27 Thread Deepak Goyal (JIRA)
Deepak Goyal created KAFKA-6718:
---

 Summary: Rack Aware Replica Task Assignment for Kafka Streams
 Key: KAFKA-6718
 URL: https://issues.apache.org/jira/browse/KAFKA-6718
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 1.1.0
Reporter: Deepak Goyal


|This features enables replica tasks to be assigned on different racks.
Replication factor = x
Number of Replica tasks = x
totalTasks = x+1 (replica + active) # If there are no racks provided: Cluster 
will behave rack-unaware
 # If same rackId is given to all the nodes: Cluster will behave rack-unaware
 # If (totalTasks >= number of racks), then Cluster will be rack aware i.e. 
each replica task is each assigned to a different rack.
 # Id (totalTasks < number of racks), then it will first assign tasks on 
different racks, further tasks will be assigned to least loaded node, cluster 
wide.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)