[jira] [Commented] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-10-16 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174222#comment-14174222
 ] 

Ewen Cheslack-Postava commented on KAFKA-1108:
--

Updated reviewboard https://reviews.apache.org/r/26770/diff/
 against branch origin/trunk

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1108.patch, KAFKA-1108_2014-10-16_13:53:11.patch


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-10-15 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172775#comment-14172775
 ] 

Ewen Cheslack-Postava commented on KAFKA-1108:
--

Created reviewboard https://reviews.apache.org/r/26770/diff/
 against branch origin/trunk

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1108.patch


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122062#comment-14122062
 ] 

Guozhang Wang commented on KAFKA-1108:
--

Moving to 0.9 for now.

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
 Fix For: 0.9.0


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)