[jira] [Updated] (KAFKA-581) provides windows batch script for starting Kafka/Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] antoine vianey updated KAFKA-581: - Attachment: zookeeper-server-stop.bat kafka-server-stop.bat Can you add this scripts to the 0.8 branch as well. This usefull when starting zookeeper and kafka from maven-antrun plugin or other that create a separate process... Zookeeper and Storm can be started and stopped in the pre and post-integration-test Regards provides windows batch script for starting Kafka/Zookeeper -- Key: KAFKA-581 URL: https://issues.apache.org/jira/browse/KAFKA-581 Project: Kafka Issue Type: Improvement Components: config Affects Versions: 0.8 Environment: Windows Reporter: antoine vianey Priority: Trivial Labels: features, run, windows Fix For: 0.8 Attachments: kafka-console-consumer.bat, kafka-console-producer.bat, kafka-run-class.bat, kafka-server-start.bat, kafka-server-stop.bat, sbt.bat, zookeeper-server-start.bat, zookeeper-server-stop.bat Original Estimate: 24h Remaining Estimate: 24h Provide a port for quickstarting Kafka dev on Windows : - kafka-run-class.bat - kafka-server-start.bat - zookeeper-server-start.bat This will help Kafka community growth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (KAFKA-581) provides windows batch script for starting Kafka/Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526440#comment-13526440 ] antoine vianey edited comment on KAFKA-581 at 12/7/12 3:11 PM: --- *Added the server-stop scripts* Can you add this scripts to the 0.8 branch as well. This usefull when starting zookeeper and kafka from maven-antrun plugin or other that create a separate process... Zookeeper and Storm can be started and stopped in the pre and post-integration-test Regards was (Author: avianey): *Added the stop scripts* Can you add this scripts to the 0.8 branch as well. This usefull when starting zookeeper and kafka from maven-antrun plugin or other that create a separate process... Zookeeper and Storm can be started and stopped in the pre and post-integration-test Regards provides windows batch script for starting Kafka/Zookeeper -- Key: KAFKA-581 URL: https://issues.apache.org/jira/browse/KAFKA-581 Project: Kafka Issue Type: Improvement Components: config Affects Versions: 0.8 Environment: Windows Reporter: antoine vianey Priority: Trivial Labels: features, run, windows Fix For: 0.8 Attachments: kafka-console-consumer.bat, kafka-console-producer.bat, kafka-run-class.bat, kafka-server-start.bat, kafka-server-stop.bat, sbt.bat, zookeeper-server-start.bat, zookeeper-server-stop.bat Original Estimate: 24h Remaining Estimate: 24h Provide a port for quickstarting Kafka dev on Windows : - kafka-run-class.bat - kafka-server-start.bat - zookeeper-server-start.bat This will help Kafka community growth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
0.8/HEAD Console consumer breakage?
So I was testing my own code, and using the console consumer against my seemingly-working-producer code. Since the last update, the console consumer crashes. I am going to try to track it down in the debugger and will come back with a patch if found.
[jira] [Created] (KAFKA-663) Add deploy feature to System Test
John Fung created KAFKA-663: --- Summary: Add deploy feature to System Test Key: KAFKA-663 URL: https://issues.apache.org/jira/browse/KAFKA-663 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-664) Kafka server threads die due to OOME during long running test
Neha Narkhede created KAFKA-664: --- Summary: Kafka server threads die due to OOME during long running test Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Priority: Blocker Fix For: 0.8 I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526590#comment-13526590 ] Neha Narkhede commented on KAFKA-664: - Another observation - The server is probably GCing quite a lot, since I see the following in the server logs - [2012-12-07 09:32:14,742] INFO Client session timed out, have not heard from server in 1204905ms for sessionid 0x23afd074d6600ea, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) The zookeeper session timeout is pretty high (15secs) and it is in the same DC as the Kafka cluster and the producer Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-651) Create testcases on auto create topics
[ https://issues.apache.org/jira/browse/KAFKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-651: Status: Patch Available (was: Open) Create testcases on auto create topics -- Key: KAFKA-651 URL: https://issues.apache.org/jira/browse/KAFKA-651 Project: Kafka Issue Type: Task Reporter: John Fung Labels: replication-testing Attachments: kafka-651-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-651) Create testcases on auto create topics
[ https://issues.apache.org/jira/browse/KAFKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-651: Attachment: kafka-651-v1.patch Uploaded kafka-651-v1.patch with 1 testcase to cover each functional group: testcase_0011 testcase_0024 testcase_0119 testcase_0128 testcase_0134 testcase_0159 testcase_0209 testcase_0259 testcase_0309 Create testcases on auto create topics -- Key: KAFKA-651 URL: https://issues.apache.org/jira/browse/KAFKA-651 Project: Kafka Issue Type: Task Reporter: John Fung Labels: replication-testing Attachments: kafka-651-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526719#comment-13526719 ] Neha Narkhede commented on KAFKA-664: - Heap dump is here - http://people.apache.org/~nehanarkhede/kafka-misc/kafka-0.8/heap-dump.tar.gz Almost all the largest objects trace back to RequestPurgatory$ExpiredRequestReaper as the GC root. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526720#comment-13526720 ] Neha Narkhede commented on KAFKA-664: - I'm re-running the tests with that option now Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526808#comment-13526808 ] Neha Narkhede commented on KAFKA-664: - The root cause seems to be that watchersForKey map keeps growing. I see that we add keys to the map, but never actually delete them. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-644) System Test should run properly with mixed File System Pathname
[ https://issues.apache.org/jira/browse/KAFKA-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526851#comment-13526851 ] Neha Narkhede commented on KAFKA-644: - +1. Thanks for the patch ! System Test should run properly with mixed File System Pathname --- Key: KAFKA-644 URL: https://issues.apache.org/jira/browse/KAFKA-644 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing Attachments: kafka-644-v1.patch Currently, System Test assumes that all the entities (ZK, Broker, Producer, Consumer) are running in machines which have the same File System Pathname as the machine in which the System Test scripts are running. Usually, our own local boxes would be like /home/kafka/. . . and remote boxes may look like /mnt/. . . In this case, System Test won't work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (KAFKA-644) System Test should run properly with mixed File System Pathname
[ https://issues.apache.org/jira/browse/KAFKA-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-644. --- System Test should run properly with mixed File System Pathname --- Key: KAFKA-644 URL: https://issues.apache.org/jira/browse/KAFKA-644 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing Attachments: kafka-644-v1.patch Currently, System Test assumes that all the entities (ZK, Broker, Producer, Consumer) are running in machines which have the same File System Pathname as the machine in which the System Test scripts are running. Usually, our own local boxes would be like /home/kafka/. . . and remote boxes may look like /mnt/. . . In this case, System Test won't work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-597) Refactor KafkaScheduler
[ https://issues.apache.org/jira/browse/KAFKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-597: Attachment: KAFKA-597-v4.patch Patch v4. - Rebased - Makes use of thread factory - Fixed broken scaladoc Refactor KafkaScheduler --- Key: KAFKA-597 URL: https://issues.apache.org/jira/browse/KAFKA-597 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Jay Kreps Priority: Minor Attachments: KAFKA-597-v1.patch, KAFKA-597-v2.patch, KAFKA-597-v3.patch, KAFKA-597-v4.patch It would be nice to cleanup KafkaScheduler. Here is what I am thinking Extract the following interface: trait Scheduler { def startup() def schedule(fun: () = Unit, name: String, delayMs: Long = 0, periodMs: Long): Scheduled def shutdown(interrupt: Boolean = false) } class Scheduled { def lastExecution: Long def cancel() } We would have two implementations, KafkaScheduler and MockScheduler. KafkaScheduler would be a wrapper for ScheduledThreadPoolExecutor. MockScheduler would only allow manual time advancement rather than using the system clock, we would switch unit tests over to this. This change would be different from the existing scheduler in a the following ways: 1. Would not return a ScheduledFuture (since this is useless) 2. shutdown() would be a blocking call. The current shutdown calls, don't really do what people want. 3. We would remove the daemon thread flag, as I don't think it works. 4. It returns an object which let's you cancel the job or get the last execution time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526876#comment-13526876 ] Joel Koshy commented on KAFKA-664: -- To clarify, the map itself shouldn't grow indefinitely right? - i.e., if there are no new partitions the number of keys should be the same. I think the issue is that expired requests (for a key) are not removed from the list of outstanding requests for that key. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-636) Make log segment delete asynchronous
[ https://issues.apache.org/jira/browse/KAFKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-636: Attachment: KAFKA-636-v1.patch This patch implements asynchronous delete in the log. To do this Log.scala now requires a scheduler to be used for scheduling the deletions. The deletion works as described above. The locking for segment deletion can now be more aggressive since the file renames are assumed to be fast they can be inside the lock. As part of testing this I also found a problem with MockScheduler, namely that it does not reentrant. That is, if scheduled tasks themselves create scheduled tasks it misbehaves. To fix this I rewrote MockScheduler to use a priority queue. The code is simpler and more correct since it now performs all executions in the correct order too. Make log segment delete asynchronous Key: KAFKA-636 URL: https://issues.apache.org/jira/browse/KAFKA-636 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-636-v1.patch We have a few corner-case bugs around delete of segment files: 1. It is possible for delete and truncate to kind of cross streams and end up with a case where you have no segments. 2. Reads on the log have no locking (which is good) but as a result deleting a segment that is being read will result in some kind of I/O exception. 3. We can't easily fix the synchronization problems without deleting files inside the log's write lock. This can be a problem as deleting a 2GB segment can take a couple of seconds even on an unloaded system. The proposed fix for these problems is to make file removal asynchronous using the following scheme as the new delete scheme: 1. Immediately remove the file from segment map and rename the file from X to X.deleted (e.g. 000.log to 00.log.deleted. We think renaming a file will not impact reads since the file is already open and hence the name is irrelevant. This will always be O(1) and can be done inside the write lock. 2. Schedule a future operation to delete the file. The time to wait would be configurable but we would just default it to 60 seconds and probably no one would ever change it. 3. On startup we would delete any files with the .deleted suffix as they would have been pending deletes that didn't take place. I plan to do this soon working against the refactored log (KAFKA-521). We can opt to back port the patch for 0.8 if we are feeling daring. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526883#comment-13526883 ] Joel Koshy commented on KAFKA-664: -- Okay I'm slightly confused. Even on expiration the request is marked as satisfied. So even if it is not removed from the watcher's list during expiration it will be removed on the next call to collectSatisfiedRequests - which in this case will be when the next produce request arrives to that partition. Which means this should only be due to low-volume partitions that are no longer growing. i.e., the replica fetcher would keep issuing fetch requests that keep expiring but never get removed from the list of pending requests in watchersFor(the-low-volume-partition). Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526887#comment-13526887 ] Jay Kreps commented on KAFKA-664: - Another issue is that we are saving the full producer request in memory for as long as it is in purgatory. Not sure that is causing this, but that is pretty bad. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: thread-dump.log I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-644) System Test should run properly with mixed File System Pathname
[ https://issues.apache.org/jira/browse/KAFKA-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-644: Attachment: kafka-644-v2.patch Uploaded kafka-644-v2.patch which supports the property auto_create_topic System Test should run properly with mixed File System Pathname --- Key: KAFKA-644 URL: https://issues.apache.org/jira/browse/KAFKA-644 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing Attachments: kafka-644-v1.patch, kafka-644-v2.patch Currently, System Test assumes that all the entities (ZK, Broker, Producer, Consumer) are running in machines which have the same File System Pathname as the machine in which the System Test scripts are running. Usually, our own local boxes would be like /home/kafka/. . . and remote boxes may look like /mnt/. . . In this case, System Test won't work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-646) Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[ https://issues.apache.org/jira/browse/KAFKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Ghike updated KAFKA-646: Attachment: kafka-646-patch-num1-v1.patch This patch has a bunch of refactoring changes and a couple of new additions. Addressing Jun's comments: These are all great catches! Thanks for being so thorough. 60. By default, metrics-core will return an existing metric object of the same name using a getOrCreate() like functionality. As discussed offline, we should fail the clients that use an already registered clientId name. We will need to create two objects thaty contain hashmaps to record the existing producer and consumer clientIds and methods to throw an exception if a client attempts to use an existing clientId. I worked on this change a bit, but it breaks a lot of our unit tests (about half) and the refactoring will take some time. Hence, I think it will be better if I submit a patch for all other changes and create another patch for this issue under this jira. Until then we can keep this jira open. 61. For recording stats about all topics, I am now using a string All.Topics. Since '.' is not allowed in the legal character set for topic names, this will differentiate from a topic named AllTopics. 62. Yes, we should validate groupId. Added the functionality and a unit test. It has the same validation rules as ClientId. 63. A metric name is something like (clientId + topic + some string) and this entire string is limited by fillename size. We already allow topic name to be at most 255 bytes long. We could fix max lengths for each of clientId, groupId, topic name so that the metric name never exceeds filename size. But those lengths will be quite arbitrary, perhaps we should skip the check on the length of clientId and groupId. 64. Removed brokerInfo from the clientId used to instantiate FetchRequestBuilder. Refactoring: 1. Moved validation of clientId at the end of instantiation of ProducerConfig and ConsumerConfig. - Created static objects ProducerConfig and ConsumerConfig which contain a validate() method. 2. Created global *Registry objects in which each high level Producer and Consumer can register their *stats objects. - These objects are registered in the static object only once using utils.Pool.getAndMaybePut functionality. - This will remove the need to pass *stats objects around the code in constructors (I thought having the metrics objects right up in the constructors was a bit intrusive, since one doesn't quite always think about the monitoring mechanism while instantiating various modules of the program, for example while unit testing.) - Instead of the constructor, each concerned class obtains the *Stats objects from the global registry object. - This cleans up any metrics objects created in the unit tests. - Special mention: The producer constructors are back to the old themselves. With clientId validation moved to *Config objects, the intermediate Producer constructor that merely separated the parameters of a quadruplet is gone. 3. Created separate files - for ProducerStats, ProducerTopicStats, ProducerRequestStats in kafka.producer package and for FetchRequestAndResponseStats in kafka.consumer package. Thought it was appropriate given that we already had ConsumerTopicStats in a separate file, and since the code for metrics had increased in size due to addition of *Registry and Aggregated* objects. Added comments. - for objects Topic, ClientId and GroupId in kafka.utils package. - to move the helper case classes ClientIdAndTopic, ClientIdAndBroker to kafka.common package. 4. Renamed a few variables to easier names (anyOldName to metricId change). New additions: 1. Added two objects to aggregate metrics recorded by SyncProducers and SimpleConsumers at the high level Producer and Consumer. - For this, changed KafkaTimer to accept a list of Timers. Typically we will pass a specificTimer and a globalTimer to this KafkaTimer class. Created a new KafkaHistogram in a similar way. 2. Validation of groupId. Issues: 1. Initializing the aggregator metrics with default values: For example, let's say that a syncProducer could be created (which will register a ProducerRequestStats mbean for this syncProducer). However, if no request is sent by this syncProducer then the absense of its data is not reflected in the aggregator histogram. For instance, the min requestSize for the syncProducer that never sent a request will be 0, but this won't be accurately represented in the aggregator histogram. Thus, we need to understand that if the request count of a syncProducer is 0, then its data will not be accurately reflected in the aggregator histogram. The question is whether it is possible to inform the aggregator histogram of some default values without increasing the request count of any syncProducer or the aggregated stats. Further
[jira] [Updated] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-664: Attachment: watchersForKey.png kafka-664-draft.patch The problem was ever increasing requests in the watchersForKey map. Please look at the graph attached. This can happen for very low volume topics since the replica fetcher requests keep entering this map, and since there are no more produce requests coming for those topics/partitions, no one ever removes those requests from the map. With Joel's help, hacked RequestPurgatory to force the cleanup of expired/satisfied requests by the expiry thread inside purgeSatisfied. Of course, a better solution is re-designing the purgatory data structure to point from the queue to the map, but that is a bigger change. I just want to get around this issue and continue performance testing. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-664-draft.patch, thread-dump.log, watchersForKey.png I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526973#comment-13526973 ] Neha Narkhede edited comment on KAFKA-664 at 12/8/12 1:56 AM: -- The problem was ever increasing requests in the watchersForKey map. Please look at the graph attached. In merely 40 minutes of running the broker, the number of requests in the purgatory map shot upto 4 million. This can happen for very low volume topics since the replica fetcher requests keep entering this map, and since there are no more produce requests coming for those topics/partitions, no one ever removes those requests from the map. With Joel's help, hacked RequestPurgatory to force the cleanup of expired/satisfied requests by the expiry thread inside purgeSatisfied. Of course, a better solution is re-designing the purgatory data structure to point from the queue to the map, but that is a bigger change. I just want to get around this issue and continue performance testing. was (Author: nehanarkhede): The problem was ever increasing requests in the watchersForKey map. Please look at the graph attached. This can happen for very low volume topics since the replica fetcher requests keep entering this map, and since there are no more produce requests coming for those topics/partitions, no one ever removes those requests from the map. With Joel's help, hacked RequestPurgatory to force the cleanup of expired/satisfied requests by the expiry thread inside purgeSatisfied. Of course, a better solution is re-designing the purgatory data structure to point from the queue to the map, but that is a bigger change. I just want to get around this issue and continue performance testing. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-664-draft.patch, thread-dump.log, watchersForKey.png I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your
[jira] [Updated] (KAFKA-597) Refactor KafkaScheduler
[ https://issues.apache.org/jira/browse/KAFKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-597: Attachment: KAFKA-597-v5.patch Thanks, new patch v5 addresses your comments: - Improved javadoc - This is actually good. I thought about it a bit and since I am making shutdown block the only time daemon vs non-daemon comes into play is if you don't call shutdown. If that is the case non-daemon threads will prevent garbage collection of the scheduler tasks and eventually block shutdown of the jvm, which seems unnecessary. - The change to shutdownNow is not good. This will invoke interrupt on all threads, which is too aggressive. Better to let them finish. If we end up needing to schedule long-running tasks we can invent a new notification mechanism. I changed this so that we use normal shutdown instead. Refactor KafkaScheduler --- Key: KAFKA-597 URL: https://issues.apache.org/jira/browse/KAFKA-597 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Jay Kreps Priority: Minor Attachments: KAFKA-597-v1.patch, KAFKA-597-v2.patch, KAFKA-597-v3.patch, KAFKA-597-v4.patch, KAFKA-597-v5.patch It would be nice to cleanup KafkaScheduler. Here is what I am thinking Extract the following interface: trait Scheduler { def startup() def schedule(fun: () = Unit, name: String, delayMs: Long = 0, periodMs: Long): Scheduled def shutdown(interrupt: Boolean = false) } class Scheduled { def lastExecution: Long def cancel() } We would have two implementations, KafkaScheduler and MockScheduler. KafkaScheduler would be a wrapper for ScheduledThreadPoolExecutor. MockScheduler would only allow manual time advancement rather than using the system clock, we would switch unit tests over to this. This change would be different from the existing scheduler in a the following ways: 1. Would not return a ScheduledFuture (since this is useless) 2. shutdown() would be a blocking call. The current shutdown calls, don't really do what people want. 3. We would remove the daemon thread flag, as I don't think it works. 4. It returns an object which let's you cancel the job or get the last execution time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira