[jira] [Commented] (KAFKA-2561) Optionally support OpenSSL for SSL/TLS

2018-01-13 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325333#comment-16325333
 ] 

Prasanna Gautam commented on KAFKA-2561:


[~ijuma] is this still being planned anytime soon? I'd like to check if Java 9 
or using Open/Boring/LibreSSL have any meaningful performance improvments for 
SSL.

> Optionally support OpenSSL for SSL/TLS 
> ---
>
> Key: KAFKA-2561
> URL: https://issues.apache.org/jira/browse/KAFKA-2561
> Project: Kafka
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ismael Juma
>
> JDK's `SSLEngine` is unfortunately a bit slow (KAFKA-2431 covers this in more 
> detail). We should consider supporting OpenSSL for SSL/TLS. Initial 
> experiments on my laptop show that it performs a lot better:
> {code}
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, config
> 2015-09-21 14:41:58:245, 2015-09-21 14:47:02:583, 28610.2295, 94.0081, 
> 3000, 98574.6111, Java 8u60/server auth JDK 
> SSLEngine/TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
> 2015-09-21 14:38:24:526, 2015-09-21 14:40:19:941, 28610.2295, 247.8900, 
> 3000, 259931.5514, Java 8u60/server auth 
> OpenSslEngine/TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2015-09-21 14:49:03:062, 2015-09-21 14:50:27:764, 28610.2295, 337.7751, 
> 3000, 354182.9000, Java 8u60/plaintext
> {code}
> Extracting the throughput figures:
> * JDK SSLEngine: 94 MB/s
> * OpenSSL SSLEngine: 247 MB/s
> * Plaintext: 337 MB/s (code from trunk, so no zero-copy due to KAFKA-2517)
> In order to get these figures, I used Netty's `OpenSslEngine` by hacking 
> `SSLFactory` to use Netty's `SslContextBuilder` and made a few changes to 
> `SSLTransportLayer` in order to workaround differences in behaviour between 
> `OpenSslEngine` and JDK's SSLEngine (filed 
> https://github.com/netty/netty/issues/4235 and 
> https://github.com/netty/netty/issues/4238 upstream).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6082) consider fencing zookeeper updates with controller epoch zkVersion

2017-12-17 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294391#comment-16294391
 ] 

Prasanna Gautam commented on KAFKA-6082:


[~onurkaraman] Does the require fencing all ZK updates from controller and 
brokers, or some subset of changes?

> consider fencing zookeeper updates with controller epoch zkVersion
> --
>
> Key: KAFKA-6082
> URL: https://issues.apache.org/jira/browse/KAFKA-6082
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>
> If we want, we can use multi-op to fence zookeeper updates with the 
> controller epoch's zkVersion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6065) Add zookeeper metrics to ZookeeperClient as in KIP-188

2017-11-25 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265758#comment-16265758
 ] 

Prasanna Gautam commented on KAFKA-6065:


Not at all. Thanks. 




> Add zookeeper metrics to ZookeeperClient as in KIP-188
> --
>
> Key: KAFKA-6065
> URL: https://issues.apache.org/jira/browse/KAFKA-6065
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> Among other things, KIP-188 added latency metrics to ZkUtils. We should add 
> the same metrics to ZookeeperClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-11-14 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252192#comment-16252192
 ] 

Prasanna Gautam commented on KAFKA-5473:


[~junrao] [~ijuma] I have updated the PR without the config. I don't quite 
follow what you mean by adding in kafkaController.newSession() because I can't 
find it in KafkaController anymore. I'm currently running the startup function 
if the state callback stops returning SessionEstablishmentError and if it's not 
already in a startingUp state.  Did you mean a different way to run it?
Also, I have been getting ducktape errors on TravisCI for the tests. I'd assume 
this requires some ducktape tests too?

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-11-10 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247927#comment-16247927
 ] 

Prasanna Gautam commented on KAFKA-5473:


[~junrao] so looks like I can get it done this weekend. I think the KafkaHealth 
just needs to log errors and the reconnects can all be handled in the New 
ZKClient.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-11-06 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240979#comment-16240979
 ] 

Prasanna Gautam commented on KAFKA-5473:


Ok, yeah. I ran into same/related issue this weekend on a 12-node cluster, so I 
can continue on this. Are you still planning to expose a metric that the 
zookeeper node is in reconnecting state?


> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6065) Add zookeeper metrics to ZookeeperClient as in KIP-188

2017-10-27 Thread Prasanna Gautam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Gautam reassigned KAFKA-6065:
--

Assignee: Prasanna Gautam

> Add zookeeper metrics to ZookeeperClient as in KIP-188
> --
>
> Key: KAFKA-6065
> URL: https://issues.apache.org/jira/browse/KAFKA-6065
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> Among other things, KIP-188 added latency metrics to ZkUtils. We should add 
> the same metrics to ZookeeperClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-10-27 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232
 ] 

Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:43 AM:
--

Thanks [~junrao], looks like a good start. 
-Is there plan for a way to get notified via a metric or state change when 
kafka gets in this state? I think it would be useful to know how often the 
cluster is getting in that state and trigger alerts. -
I missed the ZKSessionState being set to RECONNECTING on first read.


was (Author: prasincs):
Thanks [~junrao], looks like a good start.- Is there plan for a way to get 
notified via a metric or state change when kafka gets in this state? I think it 
would be useful to know how often the cluster is getting in that state and 
trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on 
first read.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-10-27 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232
 ] 

Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:43 AM:
--

Thanks [~junrao], looks like a good start. 


was (Author: prasincs):
Thanks [~junrao], looks like a good start. 
-Is there plan for a way to get notified via a metric or state change when 
kafka gets in this state? I think it would be useful to know how often the 
cluster is getting in that state and trigger alerts. -
I missed the ZKSessionState being set to RECONNECTING on first read.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-10-27 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232
 ] 

Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:42 AM:
--

Thanks [~junrao], looks like a good start.- Is there plan for a way to get 
notified via a metric or state change when kafka gets in this state? I think it 
would be useful to know how often the cluster is getting in that state and 
trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on 
first read.


was (Author: prasincs):
Thanks [~junrao], looks like a good start. Is there plan for a way to get 
notified via a metric or state change when kafka gets in this state? I think it 
would be useful to know how often the cluster is getting in that state and 
trigger alerts. 

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-10-27 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232
 ] 

Prasanna Gautam commented on KAFKA-5473:


Thanks [~junrao], looks like a good start. Is there plan for a way to get 
notified via a metric or state change when kafka gets in this state? I think it 
would be useful to know how often the cluster is getting in that state and 
trigger alerts. 

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-09-29 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185434#comment-16185434
 ] 

Prasanna Gautam commented on KAFKA-5473:


[~ijuma] I added a new configuration that's consistent with [~junrao] was 
mentioning previously. I have added zookeeper.connection.retry.timeout.ms to 
set an upper bound on how long to wait before killing the connection and 
triggering the shutdown. This is looking like a bigger structure change than 
I'd originally anticipated. I want to make sure I'm on right track. Since 
ZkUtils is initialized and needs to be closed/reconnected in ZKServer object, 
does it make sense to pass state of connection to the KafkaServer so that 
timeout can be guaranteed and the services cleanly shut down.  
This is different than other examples in the codebase where ZK is used to share 
state, but since this involves ZK not being available, etc, we need a different 
mechanism to inform KafkaServer that it needs to start reconnect, then use the 
ZKUtils instance thereafter. if the reconnect retry timeout has reached, then 
start shutdown process. The IZkStateListener is used in multiple places in 
code, and I think it's easier to make another class like 
ZKSessionTimeoutRecovery that only handles reconnects, and clean exit if that 
fails. 

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.0.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-09-28 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185144#comment-16185144
 ] 

Prasanna Gautam commented on KAFKA-5473:


Yeah, I'm OK if someone can pick it up too. I'm aiming for sometime around 
tomorrow or over the weekend for a PR. Will be happy to review, test and help 
in any way I can.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.0.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-09-28 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184653#comment-16184653
 ] 

Prasanna Gautam commented on KAFKA-5473:


Think I can make it to the code freeze. I'm at a conference this week and it's 
a bit of hassle to get the env well setup on the machine I have here. Is there 
an easy way to bootstrap the environment for testing? I'd like to reuse 
anything that's already been done for that.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.0.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-09-28 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184314#comment-16184314
 ] 

Prasanna Gautam commented on KAFKA-5473:


[~ijuma] Yes I intend to send a PR for this. I need to resume this and test. 

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
> Fix For: 1.0.1
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5628) Kafka Startup fails on corrupted index files

2017-07-23 Thread Prasanna Gautam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Gautam updated KAFKA-5628:
---
   Priority: Minor  (was: Major)
Description: 
One of our kafka brokers shut down after a load test and while there are some 
corrupted index files , the broker is failing to start with a unsafe memory 
access error


{code:java}
[2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable 
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53)
at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854)
at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.log.LogSegment.recover(LogSegment.scala:224)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at kafka.log.Log.loadSegments(Log.scala:188)
at kafka.log.Log.(Log.scala:116)
at 
kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:157)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

This doesn't seem to be same as 
https://issues.apache.org/jira/browse/KAFKA-1554 because these topics are 
actively in use and the other empty indices are recovered fine..

It seems the machine had died because the disk was full. 
It seems to have resolved after the disk issue. Should kafka just check disk at 
startup and refuse to continue starting up? 

  was:
One of our kafka brokers shut down after a load test and while there are some 
corrupted index files , the broker is failing to start with a unsafe memory 
access error


{code:java}
[2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable 
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53)
at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854)
at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.log.LogSegment.recover(LogSegment.scala:224)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 

[jira] [Assigned] (KAFKA-5628) Kafka Startup fails on corrupted index files

2017-07-23 Thread Prasanna Gautam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Gautam reassigned KAFKA-5628:
--

 Assignee: Jun Rao
Affects Version/s: 0.10.2.0
  Environment: Ubuntu 14.04, Java 8(1.8.0_65)
  Description: 
One of our kafka brokers shut down after a load test and while there are some 
corrupted index files , the broker is failing to start with a unsafe memory 
access error


{code:java}
[2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable 
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53)
at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854)
at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.log.LogSegment.recover(LogSegment.scala:224)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at kafka.log.Log.loadSegments(Log.scala:188)
at kafka.log.Log.(Log.scala:116)
at 
kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:157)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

This doesn't seem to be same as 
https://issues.apache.org/jira/browse/KAFKA-1554 because these topics are 
actively in use and the other empty indices are recovered fine..

It seems the machine had died because the disk was full.


> Kafka Startup fails on corrupted index files
> 
>
> Key: KAFKA-5628
> URL: https://issues.apache.org/jira/browse/KAFKA-5628
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
> Environment: Ubuntu 14.04, Java 8(1.8.0_65)
>Reporter: Prasanna Gautam
>Assignee: Jun Rao
>
> One of our kafka brokers shut down after a load test and while there are some 
> corrupted index files , the broker is failing to start with a unsafe memory 
> access error
> {code:java}
> [2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> java.lang.InternalError: a fault occurred in a recent unsafe memory access 
> operation in compiled Java code
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53)
> at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854)
> at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827)
> at 
> org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136)
> at 
> org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149)
> at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225)
> at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 

[jira] [Created] (KAFKA-5628) Kafka Startup fails on corrupted index files

2017-07-23 Thread Prasanna Gautam (JIRA)
Prasanna Gautam created KAFKA-5628:
--

 Summary: Kafka Startup fails on corrupted index files
 Key: KAFKA-5628
 URL: https://issues.apache.org/jira/browse/KAFKA-5628
 Project: Kafka
  Issue Type: Bug
Reporter: Prasanna Gautam






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-06-23 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061610#comment-16061610
 ] 

Prasanna Gautam commented on KAFKA-5473:


[~junrao] Why not do an exponential backoff (with jitter) with an upper bound? 
If you're temporarily disconnected, it should recover within a few seconds, 
otherwise an upper bound before the broker dies feels like a more sensible 
solution. This way ZK nodes being network-partitioned from kafka wouldn't 
immediately bring down all brokers if its a recoverable issue.
Also, if there's a cleaner way to exit that allows all writes to be synced to 
disk, that seems more preferable than System.exit() too.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-06-23 Thread Prasanna Gautam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Gautam reassigned KAFKA-5473:
--

Assignee: Prasanna Gautam

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-06-22 Thread Prasanna Gautam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060243#comment-16060243
 ] 

Prasanna Gautam commented on KAFKA-5473:


I don't think failing the Broker immediately is the right solution, even though 
that's what we're effectively doing. I think it should be automatically handled 
and if the state cannot be handled, then the broker should fail. [~junrao] If 
there are no plans for assigning to someone in near future, I can take a stab 
at it. 

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)