[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2014-09-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof SzafraƄski updated KAFKA-1054:

Summary: Eliminate Compilation Warnings for 0.8 Final Release  (was: 
Elimiate Compilation Warnings for 0.8 Final Release)

 Eliminate Compilation Warnings for 0.8 Final Release
 

 Key: KAFKA-1054
 URL: https://issues.apache.org/jira/browse/KAFKA-1054
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1054.patch


 Currently we have a total number of 38 warnings for source code compilation 
 of 0.8.
 1) 3 from Unchecked type pattern
 2) 6 from Unchecked conversion
 3) 29 from Deprecated Hadoop API functions
 It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1633) Data loss if broker is killed

2014-09-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142124#comment-14142124
 ] 

Jun Rao commented on KAFKA-1633:


If both brokers are shut down normally, there shouldn't be any data loss since 
during the normal shutdown, we force unflushed data to disk.

The hard killing case is probably a bit different. Depending on the OS and file 
system, unflushed data could be lost during a hard kill. If that's the case, 
when both brokers are hard killed, some previously acked messages (but not 
flushed yet) could be lost.

 Data loss if broker is killed
 -

 Key: KAFKA-1633
 URL: https://issues.apache.org/jira/browse/KAFKA-1633
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.1.1
 Environment: centos 6.3, open jdk 7
Reporter: gautham varada
Assignee: Jun Rao

 We have a 2 node kafka cluster, we experienced data loss when we did a kill 
 -9 on the brokers.  We also found a work around to prevent this loss.
 Replication factor :2, 4 partitions
 Steps to reproduce
 1. Create a 2 node cluster with replication factor 2, num partitions 4
 2. We used Jmeter to pump events
 3. We used kafka web console to inspect the log size after the test
 During the test, we simultaneously killed the brokers using kill -9 and we 
 tallied the metrics reported by jmeter and the size we observed in the web 
 console, we lost tons of messages.
 We went back and set the Producer retry to 1 instead of the default 3 and 
 repeated the above tests and we did not loose a single message.
 We repeated the above tests with the Producer retry set to 3 and 1 with a 
 single broker and we observed data loss when the retry was 3 and no loss when 
 the retry was 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25703: Patch for KAFKA-1490

2014-09-20 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25703/#review54082
---


Thanks for the patch. Could you update the README to explain any extra step 
that's needed before using gradlew?

- Jun Rao


On Sept. 16, 2014, 6:54 p.m., Ivan Lyutov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25703/
 ---
 
 (Updated Sept. 16, 2014, 6:54 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1490
 https://issues.apache.org/jira/browse/KAFKA-1490
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Removed wrapper generated contents and added default task to generate gradle 
 wrapper binary.
 
 
 remove gradlew initial setup output from source distribution
 
 
 Diffs
 -
 
   build.gradle 74c8c8a9e2f4d9a651181d5337d5a8f07f0cb313 
   gradle/wrapper/gradle-wrapper.jar a7634b071cb255e91a4572934e55b8cd8877b3e4 
   gradle/wrapper/gradle-wrapper.properties 
 d238df326fec6d925ccecdecaac0bcedf8e68672 
   wrapper.gradle PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25703/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ivan Lyutov
 




[jira] [Resolved] (KAFKA-1634) Update protocol wiki to reflect the new offset management feature

2014-09-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-1634.

Resolution: Fixed

The wiki is actually consistent with the code in trunk. The timestamp is used 
to decide whether some old consumer offsets can be garbage collected. 
Typically, the timestamp should just be the time when the offset is committed 
(and can be obtained just on the broker side). However, this idea is that a 
client, if needed, can set a larger timestamp if it wants to preserve the 
offset for longer. It's not clear to me if this is super useful. We can debate 
that in a separate jira if needed.

 Update protocol wiki to reflect the new offset management feature
 -

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Joel Koshy
Priority: Blocker
 Fix For: 0.8.2


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-09-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142132#comment-14142132
 ] 

Jun Rao commented on KAFKA-1493:


Since this is a blocker for 0.8.2, if we can't get this fixed in the next few 
days, I suggest that we just remove the documentation in producerConfig about 
the LZ4 and leave LZ4 an unsupported compression codec for now.

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Assignee: James Oliver
Priority: Blocker
 Fix For: 0.8.2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2014-09-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1054:
-
Reviewer: Neha Narkhede

 Eliminate Compilation Warnings for 0.8 Final Release
 

 Key: KAFKA-1054
 URL: https://issues.apache.org/jira/browse/KAFKA-1054
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1054.patch


 Currently we have a total number of 38 warnings for source code compilation 
 of 0.8.
 1) 3 from Unchecked type pattern
 2) 6 from Unchecked conversion
 3) 29 from Deprecated Hadoop API functions
 It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-1634) Update protocol wiki to reflect the new offset management feature

2014-09-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede reopened KAFKA-1634:
--
  Assignee: Jun Rao  (was: Joel Koshy)

Reopening until we have filed a JIRA to fix the issue.

 Update protocol wiki to reflect the new offset management feature
 -

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Jun Rao
Priority: Blocker
 Fix For: 0.8.2


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142168#comment-14142168
 ] 

Neha Narkhede commented on KAFKA-1618:
--

Thanks for the patch, [~balaji.sesha...@dish.com]. Few review comments -
ConsoleProducer
1. The variable regex is not used on line 222. 
2. In the validate() API, shouldn't the regex be val regex = new 
Regex(:[0-9].*) instead?
3. Since the PortValidator is common to all tools, it is worth moving the 
validateAndDie to a new class ToolsUtils. I'm sure we can refactor the tools to 
move some commonly usable APIs there.
4. Let's rename validateAndDie to validatePortOrDie. When you create 
ToolsUtils, let's only move validatePortOrDie there and move the logic from 
validate to validatePortOrDie. validate is only used by that API.

Throughout the patch-
1. Please remove the whitespace introduced for the imports. Let's keep things 
consistent across the codebase
2. Please get rid of the whitespace between if and (. Also not consistent with 
the rest of the code. 

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: Balaji Seshadri
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1618-ALL.patch, KAFKA-1618.patch


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group

2014-09-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1637:
-
Assignee: (was: Neha Narkhede)

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
  Labels: newbie

 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group

2014-09-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1637:
-
Labels: newbie  (was: )

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
  Labels: newbie

 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1476) Get a list of consumer groups

2014-09-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142170#comment-14142170
 ] 

Neha Narkhede commented on KAFKA-1476:
--

[~balaji.sesha...@dish.com],for #1, #2, please read my previous comment. For 
#3, we need 
For reading offsets,
--offsets --describe --group groupName

Let's file another JIRA for reset offset tool as that requires more thought and 
that tool doesn't even exist today.



 Get a list of consumer groups
 -

 Key: KAFKA-1476
 URL: https://issues.apache.org/jira/browse/KAFKA-1476
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Williams
Assignee: Balaji Seshadri
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1476-RENAME.patch, KAFKA-1476.patch


 It would be useful to have a way to get a list of consumer groups currently 
 active via some tool/script that ships with kafka. This would be helpful so 
 that the system tools can be explored more easily.
 For example, when running the ConsumerOffsetChecker, it requires a group 
 option
 bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
 ?
 But, when just getting started with kafka, using the console producer and 
 consumer, it is not clear what value to use for the group option.  If a list 
 of consumer groups could be listed, then it would be clear what value to use.
 Background:
 http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-328) Write unit test for kafka server startup and shutdown API

2014-09-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142171#comment-14142171
 ] 

Neha Narkhede commented on KAFKA-328:
-

Ping [~balaji.sesha...@dish.com]

 Write unit test for kafka server startup and shutdown API 
 --

 Key: KAFKA-328
 URL: https://issues.apache.org/jira/browse/KAFKA-328
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Balaji Seshadri
  Labels: newbie

 Background discussion in KAFKA-320
 People often try to embed KafkaServer in an application that ends up calling 
 startup() and shutdown() repeatedly and sometimes in odd ways. To ensure this 
 works correctly we have to be very careful about cleaning up resources. This 
 is a good practice for making unit tests reliable anyway.
 A good first step would be to add some unit tests on startup and shutdown to 
 cover various cases:
 1. A Kafka server can startup if it is not already starting up, if it is not 
 currently being shutdown, or if it hasn't been already started
 2. A Kafka server can shutdown if it is not already shutting down, if it is 
 not currently starting up, or if it hasn't been already shutdown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142174#comment-14142174
 ] 

Neha Narkhede commented on KAFKA-1282:
--

+1 on your latest patch. I'm leaning towards accepting the patch since the test 
above points to an issue that seems unrelated to the patch. [~nmarasoi], it 
will be great if you can follow Jun's suggestion to reproduce the issue. Then 
file a JIRA to track it. I'm guessing killing idle connections shouldn't lead 
to data loss.

 Disconnect idle socket connection in Selector
 -

 Key: KAFKA-1282
 URL: https://issues.apache.org/jira/browse/KAFKA-1282
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: nicu marasoiu
  Labels: newbie++
 Fix For: 0.9.0

 Attachments: 1282_brush.patch, 1282_brushed_up.patch, 
 KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch


 To reduce # socket connections, it would be useful for the new producer to 
 close socket connections that are idle. We can introduce a new producer 
 config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142199#comment-14142199
 ] 

Neha Narkhede commented on KAFKA-1282:
--

Pushed the latest patch to trunk.

 Disconnect idle socket connection in Selector
 -

 Key: KAFKA-1282
 URL: https://issues.apache.org/jira/browse/KAFKA-1282
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: nicu marasoiu
  Labels: newbie++
 Fix For: 0.9.0

 Attachments: 1282_brush.patch, 1282_brushed_up.patch, 
 KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch


 To reduce # socket connections, it would be useful for the new producer to 
 close socket connections that are idle. We can introduce a new producer 
 config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #270

2014-09-20 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/270/changes

Changes:

[neha.narkhede] KAFKA-1282 Disconnect idle socket connection in Selector; 
reviewed by Neha Narkhede and Jun Rao

--
[...truncated 2154 lines...]
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testPartitionReassignmentNonOverlappingReplicas FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testReassigningNonExistingPartition FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testResumePartitionReassignmentThatWasCompleted FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testPreferredReplicaJsonData FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testBasicPreferredReplicaElection FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testShutdownBroker FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest  testTopicConfigChange FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at 

[jira] [Updated] (KAFKA-1622) project shouldn't require signing to build

2014-09-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1622:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. +1 and committed to trunk.

 project shouldn't require signing to build
 --

 Key: KAFKA-1622
 URL: https://issues.apache.org/jira/browse/KAFKA-1622
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Ivan Lyutov
Priority: Blocker
  Labels: build, newbie, packaging
 Fix For: 0.8.2

 Attachments: KAFKA-1622.patch


 we only need signing for uploadArchives that is it
 The project trunk failed to build due to some signing/license checks (the 
 diff I used to get things to build is here: 
 https://gist.github.com/dehora/7e3c0bd75bb2b5d87557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Kafka-trunk #271

2014-09-20 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/271/changes



[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-09-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261
 ] 

Jiangjie Qin commented on KAFKA-1305:
-

I looked into this problem and it seems to me the issue is mainly because the 
default controller queue size was too small.
The follow is as below:
1. Controller 265 received controlled shutdown request
2. Controller 265 put leaderAndIsrRequest into controller message queue and 
responded to broker 265.
3. Broker 265 received respond from Controller 265, shutdown successfully and 
de-registerred itself form zk.
4. Controller 265 request send thread started to send leaderAndIsrRequests 
which are put in step 2 to the brokers. Since broker 265 has already shutdown, 
it will start infinite retry. At this moment, the controller message queue size 
will never decrease. (Thread t@113)
5. Scheduled preferred leader election started, grabbed the controller lock and 
was trying to put the LeaderAndIsr request into controller message queue. 
However, because the queue size is only 10, it could not finish but just 
blocking on the put method while still holding the controller lock. (Thread 
t@117)
6. Broker change listener on controller 265 was triggered because broker path 
change in step 3, it was trying to grab the controller lock and stop thread 
t@113, but failed to do that because thread t@117 was holding controller lock 
and waiting on the controller message queue.

Currently the controller message queue size is 10. IMO if we can increase the 
number to be 100 or even bigger, this problem won't happen again. Actually, in 
most time, the number of messages in the queue will be small even empty because 
there should not be too many controller messages. So increasing the queue size 
won't cause memory consumption to be increase.

 Controller can hang on controlled shutdown with auto leader balance enabled
 ---

 Key: KAFKA-1305
 URL: https://issues.apache.org/jira/browse/KAFKA-1305
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.9.0


 This is relatively easy to reproduce especially when doing a rolling bounce.
 What happened here is as follows:
 1. The previous controller was bounced and broker 265 became the new 
 controller.
 2. I went on to do a controlled shutdown of broker 265 (the new controller).
 3. In the mean time the automatically scheduled preferred replica leader 
 election process started doing its thing and starts sending 
 LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
 (t@113 below).
 4. While that's happening, the controlled shutdown process on 265 succeeds 
 and proceeds to deregister itself from ZooKeeper and shuts down the socket 
 server.
 5. (ReplicaStateMachine actually removes deregistered brokers from the 
 controller channel manager's list of brokers to send requests to.  However, 
 that removal cannot take place (t@18 below) because preferred replica leader 
 election task owns the controller lock.)
 6. So the request thread to broker 265 gets into infinite retries.
 7. The entire broker shutdown process is blocked on controller shutdown for 
 the same reason (it needs to acquire the controller lock).
 Relevant portions from the thread-dump:
 Controller-265-to-broker-265-send-thread - Thread t@113
java.lang.Thread.State: TIMED_WAITING
   at java.lang.Thread.sleep(Native Method)
   at 
 kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
   at kafka.utils.Utils$.swallow(Utils.scala:167)
   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
   at kafka.utils.Logging$class.swallow(Logging.scala:94)
   at kafka.utils.Utils$.swallow(Utils.scala:46)
   at 
 kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
   at 
 kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
   - locked java.lang.Object@6dbf14a7
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
Locked ownable synchronizers:
   - None
 ...
 Thread-4 - Thread t@17
java.lang.Thread.State: WAITING on 
 java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
 kafka-scheduler-0
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
   at 
 

[jira] [Comment Edited] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-09-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261
 ] 

Jiangjie Qin edited comment on KAFKA-1305 at 9/21/14 12:45 AM:
---

I looked into this problem and it seems to me the issue is mainly because the 
default controller queue size was too small.
The problem flow is as below:
1. Controller 265 received controlled shutdown request
2. Controller 265 put leaderAndIsrRequest into controller message queue and 
responded to broker 265.
3. Broker 265 received respond from Controller 265, shutdown successfully and 
de-registerred itself form zk.
4. Controller 265 request send thread started to send leaderAndIsrRequests 
which are put in step 2 to the brokers. Since broker 265 has already shutdown, 
it will start infinite retry. At this moment, the controller message queue size 
will never decrease. (Thread t@113)
5. Scheduled preferred leader election started, grabbed the controller lock and 
was trying to put the LeaderAndIsr request into controller message queue. 
However, because the queue size is only 10, it could not finish but just 
blocking on the put method while still holding the controller lock. (Thread 
t@117)
6. Broker change listener on controller 265 was triggered because broker path 
change in step 3, it was trying to grab the controller lock and stop thread 
t@113, but failed to do that because thread t@117 was holding controller lock 
and waiting on the controller message queue.

Currently the controller message queue size is 10. IMO if we can increase the 
number to be 100 or even bigger, this problem won't happen again. Actually, in 
most time, the number of messages in the queue will be small even empty because 
there should not be too many controller messages. So increasing the queue size 
won't cause memory consumption to be increase.


was (Author: becket_qin):
I looked into this problem and it seems to me the issue is mainly because the 
default controller queue size was too small.
The follow is as below:
1. Controller 265 received controlled shutdown request
2. Controller 265 put leaderAndIsrRequest into controller message queue and 
responded to broker 265.
3. Broker 265 received respond from Controller 265, shutdown successfully and 
de-registerred itself form zk.
4. Controller 265 request send thread started to send leaderAndIsrRequests 
which are put in step 2 to the brokers. Since broker 265 has already shutdown, 
it will start infinite retry. At this moment, the controller message queue size 
will never decrease. (Thread t@113)
5. Scheduled preferred leader election started, grabbed the controller lock and 
was trying to put the LeaderAndIsr request into controller message queue. 
However, because the queue size is only 10, it could not finish but just 
blocking on the put method while still holding the controller lock. (Thread 
t@117)
6. Broker change listener on controller 265 was triggered because broker path 
change in step 3, it was trying to grab the controller lock and stop thread 
t@113, but failed to do that because thread t@117 was holding controller lock 
and waiting on the controller message queue.

Currently the controller message queue size is 10. IMO if we can increase the 
number to be 100 or even bigger, this problem won't happen again. Actually, in 
most time, the number of messages in the queue will be small even empty because 
there should not be too many controller messages. So increasing the queue size 
won't cause memory consumption to be increase.

 Controller can hang on controlled shutdown with auto leader balance enabled
 ---

 Key: KAFKA-1305
 URL: https://issues.apache.org/jira/browse/KAFKA-1305
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.9.0


 This is relatively easy to reproduce especially when doing a rolling bounce.
 What happened here is as follows:
 1. The previous controller was bounced and broker 265 became the new 
 controller.
 2. I went on to do a controlled shutdown of broker 265 (the new controller).
 3. In the mean time the automatically scheduled preferred replica leader 
 election process started doing its thing and starts sending 
 LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
 (t@113 below).
 4. While that's happening, the controlled shutdown process on 265 succeeds 
 and proceeds to deregister itself from ZooKeeper and shuts down the socket 
 server.
 5. (ReplicaStateMachine actually removes deregistered brokers from the 
 controller channel manager's list of brokers to send requests to.  However, 
 that removal cannot take place (t@18 below) because preferred replica leader 
 election task owns 

[jira] [Comment Edited] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-09-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261
 ] 

Jiangjie Qin edited comment on KAFKA-1305 at 9/21/14 12:48 AM:
---

I looked into this problem and it seems to me the issue is mainly because the 
default controller queue size was too small.
The problem flow is as below:
1. Controller 265 received controlled shutdown request
2. Controller 265 put leaderAndIsrRequest into controller message queue and 
responded to broker 265.
3. Broker 265 received respond from Controller 265, shutdown successfully and 
de-registerred itself form zk.
4. Controller 265 request send thread started to send leaderAndIsrRequests 
which are put in step 2 to the brokers. Since broker 265 has already shutdown, 
it will start infinite retry. At this moment, the controller message queue size 
will never decrease. (Thread t@113)
5. Scheduled preferred leader election started, grabbed the controller lock and 
was trying to put the LeaderAndIsr request into controller message queue. 
However, because the queue size is only 10, it could not finish but just 
blocking on the put method while still holding the controller lock. (Thread 
t@117)
6. Broker change listener on controller 265 was triggered because broker path 
change in step 3, it was trying to grab the controller lock and stop thread 
t@113, but failed to do that because thread t@117 was holding controller lock 
and waiting on the controller message queue.

Currently the controller message queue size is 10. IMO if we can increase the 
number to be 100 or even bigger, this problem won't happen again. Actually, in 
most time, the number of messages in the queue will be small even empty because 
there should not be too many controller messages. So increasing the queue size 
won't cause memory consumption to increase.


was (Author: becket_qin):
I looked into this problem and it seems to me the issue is mainly because the 
default controller queue size was too small.
The problem flow is as below:
1. Controller 265 received controlled shutdown request
2. Controller 265 put leaderAndIsrRequest into controller message queue and 
responded to broker 265.
3. Broker 265 received respond from Controller 265, shutdown successfully and 
de-registerred itself form zk.
4. Controller 265 request send thread started to send leaderAndIsrRequests 
which are put in step 2 to the brokers. Since broker 265 has already shutdown, 
it will start infinite retry. At this moment, the controller message queue size 
will never decrease. (Thread t@113)
5. Scheduled preferred leader election started, grabbed the controller lock and 
was trying to put the LeaderAndIsr request into controller message queue. 
However, because the queue size is only 10, it could not finish but just 
blocking on the put method while still holding the controller lock. (Thread 
t@117)
6. Broker change listener on controller 265 was triggered because broker path 
change in step 3, it was trying to grab the controller lock and stop thread 
t@113, but failed to do that because thread t@117 was holding controller lock 
and waiting on the controller message queue.

Currently the controller message queue size is 10. IMO if we can increase the 
number to be 100 or even bigger, this problem won't happen again. Actually, in 
most time, the number of messages in the queue will be small even empty because 
there should not be too many controller messages. So increasing the queue size 
won't cause memory consumption to be increase.

 Controller can hang on controlled shutdown with auto leader balance enabled
 ---

 Key: KAFKA-1305
 URL: https://issues.apache.org/jira/browse/KAFKA-1305
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.9.0


 This is relatively easy to reproduce especially when doing a rolling bounce.
 What happened here is as follows:
 1. The previous controller was bounced and broker 265 became the new 
 controller.
 2. I went on to do a controlled shutdown of broker 265 (the new controller).
 3. In the mean time the automatically scheduled preferred replica leader 
 election process started doing its thing and starts sending 
 LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
 (t@113 below).
 4. While that's happening, the controlled shutdown process on 265 succeeds 
 and proceeds to deregister itself from ZooKeeper and shuts down the socket 
 server.
 5. (ReplicaStateMachine actually removes deregistered brokers from the 
 controller channel manager's list of brokers to send requests to.  However, 
 that removal cannot take place (t@18 below) because preferred replica leader 
 election task 

[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work

2014-09-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142342#comment-14142342
 ] 

Sriharsha Chintalapani commented on KAFKA-1558:
---

[~nehanarkhede] [~junrao] update on the tests.
I ran the above tests in 5 kafka brokers and 3 zookeeper node with 1000 topics 
with 3 partitions 3 replication factor.
All these topics being written to and read from simultaneously.

1)  after the controller is restarted
topic log files and metadata deleted successfully
2)  after a soft failure (can simulate by pausing the jvm for longer that zk 
session timeout) of the controller
I am not able to consistently pass this test. I am getting this error
2014-09-21 01:21:39,101] INFO Partition [my-topic-435,0] on broker 5: Cached 
zkVersion [114] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition) 
Initially thought it could be related KAFKA-1382
but from debug logs
expected Some((Leader:2,ISR:2,LeaderEpoch:4,ControllerEpoch:7)) written 
Some((Leader:4,ISR:4,LeaderEpoch:5,ControllerEpoch:9)) 
(kafka.utils.ReplicationUtils$)
Leader is different. This is causing the broker to keep on logging these 
messages.
I am not sure why  it didn't get the leader update. I didn't see any errors in 
zookeeper logs.
I'll get more details on this.
3)  after a topic's partitions have been reassigned to some other brokers
topic deleted no issues
4) after running a preferred leader command
 topic successfully deleted
5) after a topic's partition has been increased
   topic and new partition data also deleted
6) controller broker is killed (kill -9)
successfully deleted the topic and metadata. Once the killed controller back 
online the logfiles for that topic partition on that broker also got deleted.
I am not able to reproduce ReplicaFetcher issue here that pointed out earlier. 
Before I was running tests on vms with low memory . I wasn't able to reproduce 
this on these nodes.




 AdminUtils.deleteTopic does not work
 

 Key: KAFKA-1558
 URL: https://issues.apache.org/jira/browse/KAFKA-1558
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Henning Schmiedehausen
Assignee: Sriharsha Chintalapani
Priority: Blocker
 Fix For: 0.8.2


 the AdminUtils:.deleteTopic method is implemented as
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.createPersistentPath(zkClient, 
 ZkUtils.getDeleteTopicPath(topic))
 }
 {code}
 but the DeleteTopicCommand actually does
 {code}
 zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer)
 zkClient.deleteRecursive(ZkUtils.getTopicPath(topic))
 {code}
 so I guess, that the 'createPersistentPath' above should actually be 
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic))
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)