[jira] [Commented] (KAFKA-3795) Transient system test failure upgrade_test.TestUpgrade
[ https://issues.apache.org/jira/browse/KAFKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953759#comment-15953759 ] Roger Hoover commented on KAFKA-3795: - Happened again: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2017-04-03--001.1491220440--apache--trunk--bdf4cba/ {noformat} test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=None.security_protocol=SASL_SSL.compression_types=.none status: FAIL run time: 4 minutes 4.673 seconds 199680 acked message did not make it to the Consumer. They are: 538129, 538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 538149, 538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 538162...plus 199660 more. Total Acked: 331954, Total Consumed: 138002. The first 1000 missing messages were validated to ensure they are in Kafka's data files. 1000 were missing. This suggests data loss. Here are some of the messages not found in the data files: [538624, 538625, 538626, 538627, 538628, 538629, 538630, 538631, 538632, 538633] Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/upgrade_test.py", line 125, in test_upgrade self.run_produce_consume_validate(core_test_action=lambda: self.perform_upgrade(from_kafka_version, File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 118, in run_produce_consume_validate self.validate() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 188, in validate assert success, msg AssertionError: 199680 acked message did not make it to the Consumer. They are: 538129, 538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 538149, 538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 538162...plus 199660 more. Total Acked: 331954, Total Consumed: 138002. The first 1000 missing messages were validated to ensure they are in Kafka's data files. 1000 were missing. This suggests data loss. Here are some of the messages not found in the data files: [538624, 538625, 538626, 538627, 538628, 538629, 538630, 538631, 538632, 538633] {noformat} > Transient system test failure upgrade_test.TestUpgrade > -- > > Key: KAFKA-3795 > URL: https://issues.apache.org/jira/browse/KAFKA-3795 > Project: Kafka > Issue Type: Bug > Components: system tests >Reporter: Jason Gustafson > Labels: reliability > > From a recent build running on the 0.10.0 branch: > {code} > test_id: > 2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True > status: FAIL > run time: 3 minutes 29.166 seconds > 3522 acked message did not make it to the Consumer. They are: 476524, > 476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, > 476539, 476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, > 476552, ...plus 3482 more. Total Acked: 110437, Total Consumed: 127470. The > first 1000 missing messages were validated to ensure they are in Kafka's data > files. 1000 were missing. This suggests data loss. Here are some of the > messages not found in the data files: [477184, 477185, 477187, 477188, > 477190, 477191, 477193, 477194, 477196, 477197] > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File >
[jira] [Updated] (KAFKA-4755) SimpleBenchmark consume test fails for streams
[ https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-4755: Priority: Blocker (was: Major) > SimpleBenchmark consume test fails for streams > -- > > Key: KAFKA-4755 > URL: https://issues.apache.org/jira/browse/KAFKA-4755 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Eno Thereska >Assignee: Eno Thereska >Priority: Blocker > Fix For: 0.11.0.0 > > > This occurred Feb 10th 2017: > kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1 > status: FAIL > run time: 7 minutes 36.712 seconds > Streams Test process on ubuntu@worker2 took too long to exit > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py", > line 86, in test_simple_benchmark > self.driver[num].wait() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 102, in wait > self.wait_node(node, timeout_sec) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 106, in wait_node > wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, > err_msg="Streams Test process on " + str(node.account) + " took too long to > exit") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (KAFKA-4755) SimpleBenchmark consume test fails for streams
[ https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover reopened KAFKA-4755: - Happened again. Logs here: http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/StreamsSimpleBenchmarkTest/test_simple_benchmark/212.tgz Streams Test process on ubuntu@worker2 took too long to exit Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py", line 48, in test_simple_benchmark self.driver.wait() File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py", line 99, in wait self.wait_node(node, timeout_sec) File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py", line 103, in wait_node wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, err_msg="Streams Test process on " + str(node.account) + " took too long to exit") File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until raise TimeoutError(err_msg) TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit > SimpleBenchmark consume test fails for streams > -- > > Key: KAFKA-4755 > URL: https://issues.apache.org/jira/browse/KAFKA-4755 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.11.0.0 > > > This occurred Feb 10th 2017: > kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1 > status: FAIL > run time: 7 minutes 36.712 seconds > Streams Test process on ubuntu@worker2 took too long to exit > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py", > line 86, in test_simple_benchmark > self.driver[num].wait() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 102, in wait > self.wait_node(node, timeout_sec) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py", > line 106, in wait_node > wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, > err_msg="Streams Test process on " + str(node.account) + " took too long to > exit") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4689) OffsetValidationTest fails validation with "Current position greater than the total number of consumed records"
[ https://issues.apache.org/jira/browse/KAFKA-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945543#comment-15945543 ] Roger Hoover commented on KAFKA-4689: - Happened again. Logs here http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/OffsetValidationTest/test_consumer_bounce/clean_shutdown%3DFalse.bounce_mode%3Drolling/49.tgz test_id: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling status: FAIL run time: 3 minutes 32.756 seconds Current position 79302 greater than the total number of consumed records 79299 Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/client/consumer_test.py", line 159, in test_consumer_bounce (consumer.current_position(partition), consumer.total_consumed()) AssertionError: Current position 79302 greater than the total number of consumed records 79299 > OffsetValidationTest fails validation with "Current position greater than the > total number of consumed records" > --- > > Key: KAFKA-4689 > URL: https://issues.apache.org/jira/browse/KAFKA-4689 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, system tests >Reporter: Ewen Cheslack-Postava >Assignee: Jason Gustafson > Labels: system-test-failure > > {quote} > > test_id: > kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=all > status: FAIL > run time: 1 minute 49.834 seconds > Current position greater than the total number of consumed records > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/client/consumer_test.py", > line 157, in test_consumer_bounce > "Current position greater than the total number of consumed records" > AssertionError: Current position greater than the total number of consumed > records > {quote} > See also > https://issues.apache.org/jira/browse/KAFKA-3513?focusedCommentId=15791790=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15791790 > which is another instance of this bug, which indicates the issue goes back > at least as far as 1/17/2017. Note that I don't think we've seen this in > 0.10.1 yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (KAFKA-4670) Kafka Consumer should validate FetchResponse
[ https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover closed KAFKA-4670. --- > Kafka Consumer should validate FetchResponse > > > Key: KAFKA-4670 > URL: https://issues.apache.org/jira/browse/KAFKA-4670 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.0 >Reporter: Roger Hoover >Assignee: Jason Gustafson >Priority: Minor > > As a negative test case, I purposefully configured a bad advertised listener > endpoint. > {code} > advertised.listeners=PLAINTEXT://www.google.com:80 > {code} > This causes the Consumer to over-allocate and run out of memory. > {quote} > [2017-01-18 10:03:03,866] DEBUG Sending metadata request > (type=MetadataRequest, topics=foo) to node -1 > (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to > Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: > null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, > replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata) > [2017-01-18 10:03:03,871] DEBUG Received group coordinator response > ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, > disconnected=false, > requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1}, > > responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}}) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: > 2147483647 rack: null) for group console-consumer-64535. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at > www.google.com:80. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for > group console-consumer-64535 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, > groupId=console-consumer-64535, sessionTimeout=1, > rebalanceTimeout=30, memberId=, protocolType=consumer, > groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8)) > to coordinator www.google.com:80 (id: 2147483647 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, > SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 > (org.apache.kafka.common.network.Selector) > [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647. > Fetching API versions. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node > 2147483647. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer: > (kafka.tools.ConsoleConsumer$) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:990) > at
[jira] [Commented] (KAFKA-4670) Kafka Consumer should validate FetchResponse
[ https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830670#comment-15830670 ] Roger Hoover commented on KAFKA-4670: - Ah, yes, thanks, [~ijuma]. > Kafka Consumer should validate FetchResponse > > > Key: KAFKA-4670 > URL: https://issues.apache.org/jira/browse/KAFKA-4670 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.0 >Reporter: Roger Hoover >Assignee: Jason Gustafson >Priority: Minor > > As a negative test case, I purposefully configured a bad advertised listener > endpoint. > {code} > advertised.listeners=PLAINTEXT://www.google.com:80 > {code} > This causes the Consumer to over-allocate and run out of memory. > {quote} > [2017-01-18 10:03:03,866] DEBUG Sending metadata request > (type=MetadataRequest, topics=foo) to node -1 > (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to > Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: > null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, > replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata) > [2017-01-18 10:03:03,871] DEBUG Received group coordinator response > ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, > disconnected=false, > requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1}, > > responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}}) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: > 2147483647 rack: null) for group console-consumer-64535. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at > www.google.com:80. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for > group console-consumer-64535 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, > groupId=console-consumer-64535, sessionTimeout=1, > rebalanceTimeout=30, memberId=, protocolType=consumer, > groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8)) > to coordinator www.google.com:80 (id: 2147483647 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, > SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 > (org.apache.kafka.common.network.Selector) > [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647. > Fetching API versions. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node > 2147483647. (org.apache.kafka.clients.NetworkClient) > [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer: > (kafka.tools.ConsoleConsumer$) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025) > at >
[jira] [Created] (KAFKA-4670) Kafka Consumer should validate FetchResponse
Roger Hoover created KAFKA-4670: --- Summary: Kafka Consumer should validate FetchResponse Key: KAFKA-4670 URL: https://issues.apache.org/jira/browse/KAFKA-4670 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.10.2.0 Reporter: Roger Hoover Assignee: Jason Gustafson Priority: Minor As a negative test case, I purposefully configured a bad advertised listener endpoint. {code} advertised.listeners=PLAINTEXT://www.google.com:80 {code} This causes the Consumer to over-allocate and run out of memory. {quote} [2017-01-18 10:03:03,866] DEBUG Sending metadata request (type=MetadataRequest, topics=foo) to node -1 (org.apache.kafka.clients.NetworkClient) [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata) [2017-01-18 10:03:03,871] DEBUG Received group coordinator response ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1}, responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}}) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 2147483647 rack: null) for group console-consumer-64535. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at www.google.com:80. (org.apache.kafka.clients.NetworkClient) [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for group console-consumer-64535 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, groupId=console-consumer-64535, sessionTimeout=1, rebalanceTimeout=30, memberId=, protocolType=consumer, groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8)) to coordinator www.google.com:80 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 (org.apache.kafka.common.network.Selector) [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647. Fetching API versions. (org.apache.kafka.clients.NetworkClient) [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 2147483647. (org.apache.kafka.clients.NetworkClient) [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:990) at kafka.consumer.NewShinyConsumer.(BaseConsumer.scala:55) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {quote} It seems like the consumer should validate responses
[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages
[ https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759269#comment-15759269 ] Roger Hoover commented on KAFKA-4527: - Happened again: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/ > Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where > paused connector produces messages > --- > > Key: KAFKA-4527 > URL: https://issues.apache.org/jira/browse/KAFKA-4527 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, system tests >Reporter: Ewen Cheslack-Postava >Assignee: Shikhar Bhushan > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > {quote} > > test_id: > kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink > status: FAIL > run time: 40.164 seconds > Paused sink connector should not consume any messages > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", > line 257, in test_pause_and_resume_sink > assert num_messages == len(self.sink.received_messages()), "Paused sink > connector should not consume any messages" > AssertionError: Paused sink connector should not consume any messages > {quote} > See one case here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > but it has also happened before, e.g. > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html > Thinking about the test, one simple possibility is that our approach to get > the number of messages produced/consumed during the test is flawed -- I think > we may not account for additional buffering between the connectors and the > process reading their output to determine what they have produced. However, > that's just a theory -- the minimal checking on the logs that I did didn't > reveal anything obviously wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest
[ https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759266#comment-15759266 ] Roger Hoover commented on KAFKA-3808: - Happened again: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/ > Transient failure in ReplicaVerificationToolTest > > > Key: KAFKA-3808 > URL: https://issues.apache.org/jira/browse/KAFKA-3808 > Project: Kafka > Issue Type: Bug > Components: system tests >Reporter: Geoff Anderson > > {code} > test_id: > 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags > status: FAIL > run time: 1 minute 9.231 seconds > Timed out waiting to reach non-zero number of replica lags. > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", > line 88, in test_replica_lags > err_msg="Timed out waiting to reach non-zero number of replica lags.") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Timed out waiting to reach non-zero number of replica lags. > {code} > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759264#comment-15759264 ] Roger Hoover commented on KAFKA-4166: - Similar failure: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/ > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Assignee: Jason Gustafson > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment
[ https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759260#comment-15759260 ] Roger Hoover commented on KAFKA-4526: - Thanks, [~apurva] > Transient failure in ThrottlingTest.test_throttled_reassignment > --- > > Key: KAFKA-4526 > URL: https://issues.apache.org/jira/browse/KAFKA-4526 > Project: Kafka > Issue Type: Bug >Reporter: Ewen Cheslack-Postava >Assignee: Apurva Mehta > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > This test is seeing transient failures sometimes > {quote} > Module: kafkatest.tests.core.throttling_test > Class: ThrottlingTest > Method: test_throttled_reassignment > Arguments: > { > "bounce_brokers": false > } > {quote} > This happens with both bounce_brokers = true and false. Fails with > {quote} > AssertionError: 1646 acked message did not make it to the Consumer. They are: > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus > 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the > first 1000 of these missing messages correctly made it into Kafka's data > files. This suggests they were lost on their way to the consumer. > {quote} > See > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > for an example. > Note that there are a number of similar bug reports for different tests: > https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka > I am wondering if we have a wrong ack setting somewhere that we should be > specifying as acks=all but is only defaulting to 0? > It also seems interesting that the missing messages in these recent failures > seem to always start at 0... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest
[ https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758169#comment-15758169 ] Roger Hoover commented on KAFKA-3808: - Happened again: {code} test_id: kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags status: FAIL run time: 1 minute 18.074 seconds Timed out waiting to reach zero replica lags. Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 84, in test_replica_lags err_msg="Timed out waiting to reach zero replica lags.") File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until raise TimeoutError(err_msg) TimeoutError: Timed out waiting to reach zero replica lags. {code} http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/ > Transient failure in ReplicaVerificationToolTest > > > Key: KAFKA-3808 > URL: https://issues.apache.org/jira/browse/KAFKA-3808 > Project: Kafka > Issue Type: Bug > Components: system tests >Reporter: Geoff Anderson > > {code} > test_id: > 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags > status: FAIL > run time: 1 minute 9.231 seconds > Timed out waiting to reach non-zero number of replica lags. > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", > line 88, in test_replica_lags > err_msg="Timed out waiting to reach non-zero number of replica lags.") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Timed out waiting to reach non-zero number of replica lags. > {code} > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover resolved KAFKA-4554. - Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/KAFKA-3808 > ReplicaVerificationToolTest.test_replica_lags system test failure > - > > Key: KAFKA-4554 > URL: https://issues.apache.org/jira/browse/KAFKA-4554 > Project: Kafka > Issue Type: Bug >Reporter: Roger Hoover > > {code} > > test_id: > kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags > status: FAIL > run time: 1 minute 18.074 seconds > Timed out waiting to reach zero replica lags. > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", > line 84, in test_replica_lags > err_msg="Timed out waiting to reach zero replica lags.") > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Timed out waiting to reach zero replica lags. > {code} > http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure
Roger Hoover created KAFKA-4554: --- Summary: ReplicaVerificationToolTest.test_replica_lags system test failure Key: KAFKA-4554 URL: https://issues.apache.org/jira/browse/KAFKA-4554 Project: Kafka Issue Type: Bug Reporter: Roger Hoover {code} test_id: kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags status: FAIL run time: 1 minute 18.074 seconds Timed out waiting to reach zero replica lags. Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 84, in test_replica_lags err_msg="Timed out waiting to reach zero replica lags.") File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until raise TimeoutError(err_msg) TimeoutError: Timed out waiting to reach zero replica lags. {code} http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758158#comment-15758158 ] Roger Hoover commented on KAFKA-4166: - Failed again: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/ > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Assignee: Jason Gustafson > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment
[ https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758155#comment-15758155 ] Roger Hoover commented on KAFKA-4526: - Failed again {code} test_id: kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False status: FAIL run time: 8 minutes 55.261 seconds 531 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 more. Total Acked: 172487, Total Consumed: 172119. We validated that the first 531 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py", line 175, in test_throttled_reassignment lambda: self.reassign_partitions(bounce_brokers, self.throttle)) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 101, in run_produce_consume_validate self.validate() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 163, in validate assert success, msg AssertionError: 531 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 more. Total Acked: 172487, Total Consumed: 172119. We validated that the first 531 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. test_id: kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True status: FAIL run time: 8 minutes 52.939 seconds 567 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 more. Total Acked: 169804, Total Consumed: 169248. We validated that the first 567 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py", line 175, in test_throttled_reassignment lambda: self.reassign_partitions(bounce_brokers, self.throttle)) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 101, in run_produce_consume_validate self.validate() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 163, in validate assert success, msg AssertionError: 567 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 more. Total Acked: 169804, Total Consumed: 169248. We validated that the first 567 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. {code} http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/ > Transient failure in
[jira] [Created] (KAFKA-4551) StreamsSmokeTest.test_streams intermittent failure
Roger Hoover created KAFKA-4551: --- Summary: StreamsSmokeTest.test_streams intermittent failure Key: KAFKA-4551 URL: https://issues.apache.org/jira/browse/KAFKA-4551 Project: Kafka Issue Type: Bug Reporter: Roger Hoover Priority: Blocker {code} test_id: kafkatest.tests.streams.streams_smoke_test.StreamsSmokeTest.test_streams status: FAIL run time: 4 minutes 44.872 seconds Traceback (most recent call last): File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run data = self.run_test() File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test return self.test_context.function(self.test) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_smoke_test.py", line 78, in test_streams node.account.ssh("grep SUCCESS %s" % self.driver.STDOUT_FILE, allow_fail=False) File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py", line 253, in ssh raise RemoteCommandError(self, cmd, exit_status, stderr.read()) RemoteCommandError: ubuntu@worker6: Command 'grep SUCCESS /mnt/streams/streams.stdout' returned non-zero exit status 1. {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-15--001.1481794587--apache--trunk--7049938/StreamsSmokeTest/test_streams/91.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754984#comment-15754984 ] Roger Hoover commented on KAFKA-4166: - Happened again here: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/ > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages
[ https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754980#comment-15754980 ] Roger Hoover commented on KAFKA-4527: - Happened again: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/ > Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where > paused connector produces messages > --- > > Key: KAFKA-4527 > URL: https://issues.apache.org/jira/browse/KAFKA-4527 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, system tests >Reporter: Ewen Cheslack-Postava >Assignee: Shikhar Bhushan > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > {quote} > > test_id: > kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink > status: FAIL > run time: 40.164 seconds > Paused sink connector should not consume any messages > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", > line 257, in test_pause_and_resume_sink > assert num_messages == len(self.sink.received_messages()), "Paused sink > connector should not consume any messages" > AssertionError: Paused sink connector should not consume any messages > {quote} > See one case here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > but it has also happened before, e.g. > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html > Thinking about the test, one simple possibility is that our approach to get > the number of messages produced/consumed during the test is flawed -- I think > we may not account for additional buffering between the connectors and the > process reading their output to determine what they have produced. However, > that's just a theory -- the minimal checking on the logs that I did didn't > reveal anything obviously wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment
[ https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754982#comment-15754982 ] Roger Hoover commented on KAFKA-4526: - Happened again here: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/ > Transient failure in ThrottlingTest.test_throttled_reassignment > --- > > Key: KAFKA-4526 > URL: https://issues.apache.org/jira/browse/KAFKA-4526 > Project: Kafka > Issue Type: Bug >Reporter: Ewen Cheslack-Postava >Assignee: Jason Gustafson > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > This test is seeing transient failures sometimes > {quote} > Module: kafkatest.tests.core.throttling_test > Class: ThrottlingTest > Method: test_throttled_reassignment > Arguments: > { > "bounce_brokers": false > } > {quote} > This happens with both bounce_brokers = true and false. Fails with > {quote} > AssertionError: 1646 acked message did not make it to the Consumer. They are: > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus > 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the > first 1000 of these missing messages correctly made it into Kafka's data > files. This suggests they were lost on their way to the consumer. > {quote} > See > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > for an example. > Note that there are a number of similar bug reports for different tests: > https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka > I am wondering if we have a wrong ack setting somewhere that we should be > specifying as acks=all but is only defaulting to 0? > It also seems interesting that the missing messages in these recent failures > seem to always start at 0... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454 ] Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:37 AM: --- It happened twice more: {code} Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_bounce Arguments: { "clean_shutdown": true, "new_consumer": false, "offsets_storage": "kafka" } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz and {code} Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_simple_end_to_end Arguments: { "security_protocol": "PLAINTEXT", "new_consumer": false } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz was (Author: theduderog): It happened twice more on these tests: {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_bounce Arguments: { "clean_shutdown": true, "new_consumer": false, "offsets_storage": "kafka" } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz and {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_simple_end_to_end Arguments: { ""security_protocol': ""PLAINTEXT"", ""new_consumer"": false }" {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454 ] Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:36 AM: --- It happened twice more on these tests: {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_bounce Arguments: { "clean_shutdown": true, "new_consumer": false, "offsets_storage": "kafka" } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz and {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_simple_end_to_end Arguments: { ""security_protocol': ""PLAINTEXT"", ""new_consumer"": false }" {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz was (Author: theduderog): It happened twice more on these test: {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_bounce Arguments: { "clean_shutdown": true, "new_consumer": false, "offsets_storage": "kafka" } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz and {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_simple_end_to_end Arguments: { ""security_protocol': ""PLAINTEXT"", ""new_consumer"": false }" {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure
[ https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454 ] Roger Hoover commented on KAFKA-4166: - It happened twice more on these test: {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_bounce Arguments: { "clean_shutdown": true, "new_consumer": false, "offsets_storage": "kafka" } {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz and {code} "Module: kafkatest.tests.core.mirror_maker_test Class: TestMirrorMakerService Method: test_simple_end_to_end Arguments: { ""security_protocol': ""PLAINTEXT"", ""new_consumer"": false }" {code} http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz > TestMirrorMakerService.test_bounce transient system test failure > > > Key: KAFKA-4166 > URL: https://issues.apache.org/jira/browse/KAFKA-4166 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma > Labels: transient-system-test-failure > > We've only seen one failure so far and it's a timeout error so it could be an > environment issue. Filing it here so that we can track it in case there are > additional failures: > {code} > Module: kafkatest.tests.core.mirror_maker_test > Class: TestMirrorMakerService > Method: test_bounce > Arguments: > { > "clean_shutdown": true, > "new_consumer": true, > "security_protocol": "SASL_SSL" > } > {code} > > {code} > test_id: > 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True > status: FAIL > run time: 3 minutes 30.354 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py", > line 178, in test_bounce > self.run_produce_consume_validate(core_test_action=lambda: > self.bounce(clean_shutdown=clean_shutdown)) > File > "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in run_produce_consume_validate > raise e > TimeoutError > {code} > > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment
[ https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747308#comment-15747308 ] Roger Hoover commented on KAFKA-4526: - This happened again on the Dec 13 nightly run. > Transient failure in ThrottlingTest.test_throttled_reassignment > --- > > Key: KAFKA-4526 > URL: https://issues.apache.org/jira/browse/KAFKA-4526 > Project: Kafka > Issue Type: Bug >Reporter: Ewen Cheslack-Postava >Assignee: Jason Gustafson > Labels: system-test-failure, system-tests > Fix For: 0.10.2.0 > > > This test is seeing transient failures sometimes > {quote} > Module: kafkatest.tests.core.throttling_test > Class: ThrottlingTest > Method: test_throttled_reassignment > Arguments: > { > "bounce_brokers": false > } > {quote} > This happens with both bounce_brokers = true and false. Fails with > {quote} > AssertionError: 1646 acked message did not make it to the Consumer. They are: > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus > 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the > first 1000 of these missing messages correctly made it into Kafka's data > files. This suggests they were lost on their way to the consumer. > {quote} > See > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html > for an example. > Note that there are a number of similar bug reports for different tests: > https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka > I am wondering if we have a wrong ack setting somewhere that we should be > specifying as acks=all but is only defaulting to 0? > It also seems interesting that the missing messages in these recent failures > seem to always start at 0... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params
[ https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-4361: Description: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset, those parameters are not used and instead overridden by the defaults. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers (and possibly Producers) was: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset, those parameters are not used and instead overridden by the defaults. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers > Streams does not respect user configs for "default" params > -- > > Key: KAFKA-4361 > URL: https://issues.apache.org/jira/browse/KAFKA-4361 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Roger Hoover >Assignee: Damian Guy > > For the config params in CONSUMER_DEFAULT_OVERRIDES > (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) > such as consumer.max.poll.records and consumer.auto.offset.reset, those > parameters are not used and instead overridden by the defaults. It may not > work for some producer config values as well. > If your application sets those params in the StreamsConfig, they are not used > in the underlying Kafka Consumers (and possibly Producers) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params
[ https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-4361: Description: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset, those parameters are not used and instead overridden by the defaults. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers was: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset, those parameters are not used and instead overridden by the defaults. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers. > Streams does not respect user configs for "default" params > -- > > Key: KAFKA-4361 > URL: https://issues.apache.org/jira/browse/KAFKA-4361 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Roger Hoover >Assignee: Damian Guy > > For the config params in CONSUMER_DEFAULT_OVERRIDES > (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) > such as consumer.max.poll.records and consumer.auto.offset.reset, those > parameters are not used and instead overridden by the defaults. It may not > work for some producer config values as well. > If your application sets those params in the StreamsConfig, they are not used > in the underlying Kafka Consumers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params
[ https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-4361: Description: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset, those parameters are not used and instead overridden by the defaults. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers. was: For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers. > Streams does not respect user configs for "default" params > -- > > Key: KAFKA-4361 > URL: https://issues.apache.org/jira/browse/KAFKA-4361 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Roger Hoover >Assignee: Damian Guy > > For the config params in CONSUMER_DEFAULT_OVERRIDES > (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) > such as consumer.max.poll.records and consumer.auto.offset.reset, those > parameters are not used and instead overridden by the defaults. It may not > work for some producer config values as well. > If your application sets those params in the StreamsConfig, they are not used > in the underlying Kafka Consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4361) Streams does not respect user configs for "default" params
Roger Hoover created KAFKA-4361: --- Summary: Streams does not respect user configs for "default" params Key: KAFKA-4361 URL: https://issues.apache.org/jira/browse/KAFKA-4361 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.1.0 Reporter: Roger Hoover Assignee: Damian Guy For the config params in CONSUMER_DEFAULT_OVERRIDES (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274) such as consumer.max.poll.records and consumer.auto.offset.reset. It may not work for some producer config values as well. If your application sets those params in the StreamsConfig, they are not used in the underlying Kafka Consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic
Roger Hoover created KAFKA-4331: --- Summary: Kafka Streams resetter is slow because it joins the same group for each topic Key: KAFKA-4331 URL: https://issues.apache.org/jira/browse/KAFKA-4331 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.0.1, 0.10.0.0 Reporter: Roger Hoover Assignee: Matthias J. Sax The resetter is joining the same group for each topic which takes ~10secs in my testing. This makes the reset very slow when you have a lot of topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3993) Console producer drops data
[ https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450054#comment-15450054 ] Roger Hoover commented on KAFKA-3993: - Thanks, [~cotedm]. I tried to set acks=all but apparently that doesn't work. > Console producer drops data > --- > > Key: KAFKA-3993 > URL: https://issues.apache.org/jira/browse/KAFKA-3993 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > > The console producer drops data when if the process exits too quickly. I > suspect that the shutdown hook does not call close() or something goes wrong > during that close(). > Here's a simple to illustrate the issue: > {noformat} > export BOOTSTRAP_SERVERS=localhost:9092 > export TOPIC=bar > export MESSAGES=1 > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 > --replication-factor 1 --topic "$TOPIC" \ > && echo "acks=all" > /tmp/producer.config \ > && echo "linger.ms=0" >> /tmp/producer.config \ > && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list > "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \ > && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" > --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores
[ https://issues.apache.org/jira/browse/KAFKA-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover closed KAFKA-4063. --- The JIRA UI was unresponsive so I accidentally submitted the form twice. > Add support for infinite endpoints for range queries in Kafka Streams KV > stores > --- > > Key: KAFKA-4063 > URL: https://issues.apache.org/jira/browse/KAFKA-4063 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Fix For: 0.10.1.0 > > > In some applications, it's useful to iterate over the key-value store either: > 1. from the beginning up to a certain key > 2. from a certain key to the end > We can add two new methods rangeUtil() and rangeFrom() easily to support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores
Roger Hoover created KAFKA-4063: --- Summary: Add support for infinite endpoints for range queries in Kafka Streams KV stores Key: KAFKA-4063 URL: https://issues.apache.org/jira/browse/KAFKA-4063 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.1.0 Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Fix For: 0.10.1.0 In some applications, it's useful to iterate over the key-value store either: 1. from the beginning up to a certain key 2. from a certain key to the end We can add two new methods rangeUtil() and rangeFrom() easily to support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores
Roger Hoover created KAFKA-4064: --- Summary: Add support for infinite endpoints for range queries in Kafka Streams KV stores Key: KAFKA-4064 URL: https://issues.apache.org/jira/browse/KAFKA-4064 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.1.0 Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Fix For: 0.10.1.0 In some applications, it's useful to iterate over the key-value store either: 1. from the beginning up to a certain key 2. from a certain key to the end We can add two new methods rangeUtil() and rangeFrom() easily to support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3993) Console producer drops data
[ https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415811#comment-15415811 ] Roger Hoover commented on KAFKA-3993: - Thanks, [~vahid]. Yeah, it looks the same. > Console producer drops data > --- > > Key: KAFKA-3993 > URL: https://issues.apache.org/jira/browse/KAFKA-3993 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > > The console producer drops data when if the process exits too quickly. I > suspect that the shutdown hook does not call close() or something goes wrong > during that close(). > Here's a simple to illustrate the issue: > {noformat} > export BOOTSTRAP_SERVERS=localhost:9092 > export TOPIC=bar > export MESSAGES=1 > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 > --replication-factor 1 --topic "$TOPIC" \ > && echo "acks=all" > /tmp/producer.config \ > && echo "linger.ms=0" >> /tmp/producer.config \ > && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list > "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \ > && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" > --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3993) Console producer drops data
[ https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-3993: Description: The console producer drops data when if the process exits too quickly. I suspect that the shutdown hook does not call close() or something goes wrong during that close(). Here's a simple to illustrate the issue: {noformat} export BOOTSTRAP_SERVERS=localhost:9092 export TOPIC=bar export MESSAGES=1 ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic "$TOPIC" \ && echo "acks=all" > /tmp/producer.config \ && echo "linger.ms=0" >> /tmp/producer.config \ && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \ && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" {noformat} was: The console producer drops data when if the process exits too quickly. I suspect that the shutdown hook does not call close() or something goes wrong during that close(). Here's a simple to illustrate the issue: {noformat} export BOOTSTRAP_SERVERS=localhost:9092 export TOPIC=bar export MESSAGES=1 ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic "$TOPIC" \ && echo "acks=all" > /tmp/producer.config \ && echo "linger.ms=0" >> /tmp/producer.config \ && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list "$BOOTSTRAP_SERVERS" --topic "$TOPIC" \ && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" {noformat} > Console producer drops data > --- > > Key: KAFKA-3993 > URL: https://issues.apache.org/jira/browse/KAFKA-3993 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > > The console producer drops data when if the process exits too quickly. I > suspect that the shutdown hook does not call close() or something goes wrong > during that close(). > Here's a simple to illustrate the issue: > {noformat} > export BOOTSTRAP_SERVERS=localhost:9092 > export TOPIC=bar > export MESSAGES=1 > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 > --replication-factor 1 --topic "$TOPIC" \ > && echo "acks=all" > /tmp/producer.config \ > && echo "linger.ms=0" >> /tmp/producer.config \ > && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list > "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \ > && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" > --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404439#comment-15404439 ] Roger Hoover commented on KAFKA-3752: - [~guozhang] You're assessment seems correct. It happened again when I restarted after a clean shutdown (SIGTERM + wait for exit). 1. We have a single KafkaStreams instance with 8 threads. 2. Here's the full log: https://gist.github.com/theduderog/f9ab4767cd3b098d404f5513a7e1c27e > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at >
[jira] [Created] (KAFKA-3993) Console producer drops data
Roger Hoover created KAFKA-3993: --- Summary: Console producer drops data Key: KAFKA-3993 URL: https://issues.apache.org/jira/browse/KAFKA-3993 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.10.0.0 Reporter: Roger Hoover The console producer drops data when if the process exits too quickly. I suspect that the shutdown hook does not call close() or something goes wrong during that close(). Here's a simple to illustrate the issue: {noformat} export BOOTSTRAP_SERVERS=localhost:9092 export TOPIC=bar export MESSAGES=1 ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic "$TOPIC" \ && echo "acks=all" > /tmp/producer.config \ && echo "linger.ms=0" >> /tmp/producer.config \ && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list "$BOOTSTRAP_SERVERS" --topic "$TOPIC" \ && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3740) Add configs for RocksDBStores
[ https://issues.apache.org/jira/browse/KAFKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334812#comment-15334812 ] Roger Hoover commented on KAFKA-3740: - [~h...@pinterest.com], any update on this? I'm wondering if you have any idea when it might be available. > Add configs for RocksDBStores > - > > Key: KAFKA-3740 > URL: https://issues.apache.org/jira/browse/KAFKA-3740 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Guozhang Wang >Assignee: Henry Cai > Labels: api, newbie > > Today most of the rocksDB configs are hard written inside {{RocksDBStore}}, > or the default values are directly used. We need to make them configurable > for advanced users. For example, some default values may not work perfectly > for some scenarios: > https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576 > > One way of doing that is to introduce a "RocksDBStoreConfigs" objects similar > to "StreamsConfig", which defines all related rocksDB options configs, that > can be passed as key-value pairs to "StreamsConfig". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3858) Add functions to print stream topologies
Roger Hoover created KAFKA-3858: --- Summary: Add functions to print stream topologies Key: KAFKA-3858 URL: https://issues.apache.org/jira/browse/KAFKA-3858 Project: Kafka Issue Type: New Feature Components: streams Affects Versions: 0.10.0.0 Reporter: Roger Hoover Assignee: Guozhang Wang For debugging and development, it would be very useful to be able to print Kafka streams topologies. At a minimum, it would be great to be able to see the logical topology including with Kafka topics linked by sub-topologies. I think that this information does not depend on partitioning. For more detail, it would be great to be able to print the same logical topology but also showing number of tasks (an perhaps task ids?). Finally, it would be great to show the physical topology after the tasks have been mapped to JVMs + threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3761) Controller has RunningAsBroker instead of RunningAsController state
[ https://issues.apache.org/jira/browse/KAFKA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3761 started by Roger Hoover. --- > Controller has RunningAsBroker instead of RunningAsController state > --- > > Key: KAFKA-3761 > URL: https://issues.apache.org/jira/browse/KAFKA-3761 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Assignee: Roger Hoover > > In `KafkaServer.start`, we start `KafkaController`: > {code} > /* start kafka controller */ > kafkaController = new KafkaController(config, zkUtils, brokerState, > kafkaMetricsTime, metrics, threadNamePrefix) > kafkaController.startup() > {code} > Which sets the state to `RunningAsController` in > `KafkaController.onControllerFailover`: > `brokerState.newState(RunningAsController)` > And this later gets set to `RunningAsBroker`. > This doesn't match the diagram in `BrokerStates`. [~junrao] suggested that we > should start the controller after we register the broker in ZK, but this > seems tricky as we need to controller in `KafkaApis`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper
[ https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-3760: Comment: was deleted (was: Create PR: https://github.com/apache/kafka/pull/1436) > Set broker state as running after publishing to ZooKeeper > - > > Key: KAFKA-3760 > URL: https://issues.apache.org/jira/browse/KAFKA-3760 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Jun Rao >Priority: Minor > > Currently, the broker state is set to running before it registers itself in > ZooKeeper. This is too early in the broker lifecycle. If clients use the > broker state as an indicator that the broker is ready to accept requests, > they will get errors. This change is to delay setting the broker state to > running until it's registered in ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper
[ https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303141#comment-15303141 ] Roger Hoover commented on KAFKA-3760: - Create PR: https://github.com/apache/kafka/pull/1436 > Set broker state as running after publishing to ZooKeeper > - > > Key: KAFKA-3760 > URL: https://issues.apache.org/jira/browse/KAFKA-3760 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Jun Rao >Priority: Minor > > Currently, the broker state is set to running before it registers itself in > ZooKeeper. This is too early in the broker lifecycle. If clients use the > broker state as an indicator that the broker is ready to accept requests, > they will get errors. This change is to delay setting the broker state to > running until it's registered in ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper
[ https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-3760: Summary: Set broker state as running after publishing to ZooKeeper (was: Setting broker state as running after publishing to ZooKeeper) > Set broker state as running after publishing to ZooKeeper > - > > Key: KAFKA-3760 > URL: https://issues.apache.org/jira/browse/KAFKA-3760 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Jun Rao >Priority: Minor > > Currently, the broker state is set to running before it registers itself in > ZooKeeper. This is too early in the broker lifecycle. If clients use the > broker state as an indicator that the broker is ready to accept requests, > they will get errors. This change is to delay setting the broker state to > running until it's registered in ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3760) Setting broker state as running after publishing to ZooKeeper
Roger Hoover created KAFKA-3760: --- Summary: Setting broker state as running after publishing to ZooKeeper Key: KAFKA-3760 URL: https://issues.apache.org/jira/browse/KAFKA-3760 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.10.0.0 Reporter: Roger Hoover Assignee: Jun Rao Priority: Minor Currently, the broker state is set to running before it registers itself in ZooKeeper. This is too early in the broker lifecycle. If clients use the broker state as an indicator that the broker is ready to accept requests, they will get errors. This change is to delay setting the broker state to running until it's registered in ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
Roger Hoover created KAFKA-3752: --- Summary: Provide a way for KStreams to recover from unclean shutdown Key: KAFKA-3752 URL: https://issues.apache.org/jira/browse/KAFKA-3752 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.0.0 Reporter: Roger Hoover Assignee: Guozhang Wang If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM Killer), it may leave behind lock files and fail to recover. It would be useful to have an options (say --force) to tell KStreams to proceed even if it finds old LOCK files. {noformat} [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in thread [StreamThread-1]: (org.apache.kafka.streams.processor.internals.StreamThread:583) org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager at org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) at org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) at org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) at org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218) Caused by: java.io.IOException: Failed to lock the state directory: /data/test/2/kafka-streams/test-2/0_0 at org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:95) at org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69)
[jira] [Created] (KAFKA-3741) KStream config for changelog min.in.sync.replicas
Roger Hoover created KAFKA-3741: --- Summary: KStream config for changelog min.in.sync.replicas Key: KAFKA-3741 URL: https://issues.apache.org/jira/browse/KAFKA-3741 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.0.0 Reporter: Roger Hoover Assignee: Guozhang Wang Kafka Streams currently allows you to specify a replication factor for changelog and repartition topics that it creates. It should also allow you to specify min.in.sync.replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3650) AWS test script fails to install vagrant
Roger Hoover created KAFKA-3650: --- Summary: AWS test script fails to install vagrant Key: KAFKA-3650 URL: https://issues.apache.org/jira/browse/KAFKA-3650 Project: Kafka Issue Type: Bug Affects Versions: 0.10.0.0 Reporter: Roger Hoover Priority: Trivial The URL for installing vagrant in the AWS init script is not longer valid (https://dl.bintray.com/mitchellh/vagrant/vagrant_1.7.2_x86_64.deb). We can use this instead: https://releases.hashicorp.com/vagrant/1.7.2/vagrant_1.7.2_x86_64.deb -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2105) NullPointerException in client on MetadataRequest
Roger Hoover created KAFKA-2105: --- Summary: NullPointerException in client on MetadataRequest Key: KAFKA-2105 URL: https://issues.apache.org/jira/browse/KAFKA-2105 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.8.2.1 Reporter: Roger Hoover Priority: Minor With the new producer, if you accidentally pass null to KafkaProducer.partitionsFor(null), it will cause the IO thread to throw NPE. Uncaught error in kafka producer I/O thread: java.lang.NullPointerException at org.apache.kafka.common.utils.Utils.utf8Length(Utils.java:174) at org.apache.kafka.common.protocol.types.Type$5.sizeOf(Type.java:176) at org.apache.kafka.common.protocol.types.ArrayOf.sizeOf(ArrayOf.java:55) at org.apache.kafka.common.protocol.types.Schema.sizeOf(Schema.java:81) at org.apache.kafka.common.protocol.types.Struct.sizeOf(Struct.java:218) at org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) at org.apache.kafka.common.requests.RequestSend.init(RequestSend.java:29) at org.apache.kafka.clients.NetworkClient.metadataRequest(NetworkClient.java:369) at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:391) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:188) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1369) snappy version update 1.1.x
[ https://issues.apache.org/jira/browse/KAFKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132229#comment-14132229 ] Roger Hoover commented on KAFKA-1369: - I also ran into this issue on 64-bit Centos 5. The snappy-java maintainer said that 1.1.1.3 is API compatible with 1.0.5.3 and has libstdc++ statically compiled into the object file so that it doesn't rely on the OS to have a new enough version. https://github.com/xerial/snappy-java/issues/17 snappy version update 1.1.x --- Key: KAFKA-1369 URL: https://issues.apache.org/jira/browse/KAFKA-1369 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.8.1.1 Environment: Red Hat Enterprise Linux Server release 5.8 (Tikanga) - x64 Reporter: thinker0 Priority: Minor Attachments: patch.diff https://github.com/xerial/snappy-java/issues/38 issue snappy version 1.1.x {code} org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239) at org.xerial.snappy.Snappy.clinit(Snappy.java:48) at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:351) at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) at java.io.InputStream.read(InputStream.java:101) at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply$mcI$sp(ByteBufferMessageSet.scala:68) at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:68) at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:68) at scala.collection.immutable.Stream$.continually(Stream.scala:1129) at kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:68) at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:178) at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:191) at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:145) at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66) at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58) {code} {code} /tmp] ldd snappy-1.0.5-libsnappyjava.so ./snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./snappy-1.0.5-libsnappyjava.so) ./snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by ./snappy-1.0.5-libsnappyjava.so) linux-vdso.so.1 = (0x7fff81dfc000) libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b554b43) libm.so.6 = /lib64/libm.so.6 (0x2b554b731000) libc.so.6 = /lib64/libc.so.6 (0x2b554b9b4000) libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b554bd0c000) /lib64/ld-linux-x86-64.so.2 (0x0033e2a0) {code} {code} /tmp] ldd snappy-1.1.1M1-be6ba593-9ac7-488e-953e-ba5fd9530ee1-libsnappyjava.so ldd: warning: you do not have execution permission for `./snappy-1.1.1M1-be6ba593-9ac7-488e-953e-ba5fd9530ee1-libsnappyjava.so' linux-vdso.so.1 = (0x7fff1c132000) libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b9548319000) libm.so.6 = /lib64/libm.so.6 (0x2b954861a000) libc.so.6 = /lib64/libc.so.6 (0x2b954889d000) libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b9548bf5000) /lib64/ld-linux-x86-64.so.2 (0x0033e2a0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-260) Add audit trail to kafka
[ https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911909#comment-13911909 ] Roger Hoover edited comment on KAFKA-260 at 6/23/14 7:12 PM: - It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen exactly, though. This may be true once messages have reached a broker. However, if a producer process were to be killed (say by SIGKILL), both it's unack'ed normal messages and the audit data would be lost. Would it make sense to persist the audit counts to the file system for producers so that they could potentially be recovered? was (Author: theduderog): It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen exactly, though. This may be true once messages have reached a broker. However, if a producer process were to be killed (say by SIGKILL), both it's commited messages and the audit data would be lost. Would it make sense to persist the audit counts to the file system for producers so that they could potentially be recovered? Add audit trail to kafka Key: KAFKA-260 URL: https://issues.apache.org/jira/browse/KAFKA-260 Project: Kafka Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: Picture 18.png, kafka-audit-trail-draft.patch LinkedIn has a system that does monitoring on top of our data flow to ensure all data is delivered to all consumers of data. This works by having each logical tier through which data passes produce messages to a central audit-trail topic; these messages give a time period and the number of messages that passed through that tier in that time period. Example of tiers for data might be producer, broker, hadoop-etl, etc. This makes it possible to compare the total events for a given time period to ensure that all events that are produced are consumed by all consumers. This turns out to be extremely useful. We also have an application that balances the books and checks that all data is consumed in a timely fashion. This gives graphs for each topic and shows any data loss and the lag at which the data is consumed (if any). This would be an optional feature that would allow you to to this kind of reconciliation automatically for all the topics kafka hosts against all the tiers of applications that interact with the data. Some details, the proposed format of the data is JSON using the following format for messages: { time:1301727060032, // the timestamp at which this audit message is sent topic: my_topic_name, // the topic this audit data is for tier:producer, // a user-defined tier name bucket_start: 130172640, // the beginning of the time bucket this data applies to bucket_end: 130172700, // the end of the time bucket this data applies to host:my_host_name.datacenter.linkedin.com, // the server that this was sent from datacenter:hlx32, // the datacenter this occurred in application:newsfeed_service, // a user-defined application name guid:51656274-a86a-4dff-b824-8e8e20a6348f, // a unique identifier for this message count:43634 } DISCUSSION Time is complex: 1. The audit data must be based on a timestamp in the events not the time on machine processing the event. Using this timestamp means that all downstream consumers will report audit data on the right time bucket. This means that there must be a timestamp in the event, which we don't currently require. Arguably we should just add a timestamp to the events, but I think it is sufficient for now just to allow the user to provide a function to extract the time from their events. 2. For counts to reconcile exactly we can only do analysis at a granularity based on the least common multiple of the bucket size used by all tiers. The simplest is just to configure them all to use the same bucket size. We currently use a bucket size of 10 mins, but anything from 1-60 mins is probably reasonable. For analysis purposes one tier is designated as the source tier and we do reconciliation against this count (e.g. if another tier has less, that is treated as lost, if another tier has more that is duplication). Note that this system makes false positives possible since you can lose an audit message. It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen
[jira] [Commented] (KAFKA-260) Add audit trail to kafka
[ https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911909#comment-13911909 ] Roger Hoover commented on KAFKA-260: It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen exactly, though. This may be true once messages have reached a broker. However, if a producer process were to be killed (say by SIGKILL), both it's commited messages and the audit data would be lost. Would it make sense to persist the audit counts to the file system for producers so that they could potentially be recovered? Add audit trail to kafka Key: KAFKA-260 URL: https://issues.apache.org/jira/browse/KAFKA-260 Project: Kafka Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: Picture 18.png, kafka-audit-trail-draft.patch LinkedIn has a system that does monitoring on top of our data flow to ensure all data is delivered to all consumers of data. This works by having each logical tier through which data passes produce messages to a central audit-trail topic; these messages give a time period and the number of messages that passed through that tier in that time period. Example of tiers for data might be producer, broker, hadoop-etl, etc. This makes it possible to compare the total events for a given time period to ensure that all events that are produced are consumed by all consumers. This turns out to be extremely useful. We also have an application that balances the books and checks that all data is consumed in a timely fashion. This gives graphs for each topic and shows any data loss and the lag at which the data is consumed (if any). This would be an optional feature that would allow you to to this kind of reconciliation automatically for all the topics kafka hosts against all the tiers of applications that interact with the data. Some details, the proposed format of the data is JSON using the following format for messages: { time:1301727060032, // the timestamp at which this audit message is sent topic: my_topic_name, // the topic this audit data is for tier:producer, // a user-defined tier name bucket_start: 130172640, // the beginning of the time bucket this data applies to bucket_end: 130172700, // the end of the time bucket this data applies to host:my_host_name.datacenter.linkedin.com, // the server that this was sent from datacenter:hlx32, // the datacenter this occurred in application:newsfeed_service, // a user-defined application name guid:51656274-a86a-4dff-b824-8e8e20a6348f, // a unique identifier for this message count:43634 } DISCUSSION Time is complex: 1. The audit data must be based on a timestamp in the events not the time on machine processing the event. Using this timestamp means that all downstream consumers will report audit data on the right time bucket. This means that there must be a timestamp in the event, which we don't currently require. Arguably we should just add a timestamp to the events, but I think it is sufficient for now just to allow the user to provide a function to extract the time from their events. 2. For counts to reconcile exactly we can only do analysis at a granularity based on the least common multiple of the bucket size used by all tiers. The simplest is just to configure them all to use the same bucket size. We currently use a bucket size of 10 mins, but anything from 1-60 mins is probably reasonable. For analysis purposes one tier is designated as the source tier and we do reconciliation against this count (e.g. if another tier has less, that is treated as lost, if another tier has more that is duplication). Note that this system makes false positives possible since you can lose an audit message. It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen exactly, though. This would integrate into the client (producer and consumer both) in the following way: 1. The user provides a way to get timestamps from messages (required) 2. The user configures the tier name, host name, datacenter name, and application name as part of the consumer and producer config. We can provide reasonable defaults if not supplied (e.g. if it is a Producer then set tier to producer and get the hostname from the OS). The application that processes this data is currently a Java Jetty app and talks to mysql. It feeds off the audit topic in kafka and runs both automatic monitoring
[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-1092: Status: Open (was: Patch Available) Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch, KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-1092: Status: Patch Available (was: Open) Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch, KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-1092: Attachment: KAFKA-1092.patch Updated parameter names to advertised.host.name and advertised.port and fixed broken ProducerTest Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch, KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809668#comment-13809668 ] Roger Hoover commented on KAFKA-1092: - The reason the kafka.producer.ProducerTest was broken was because that test was creating an anonymous subclass of KafkaConfig and overriding a couple of members. I don't know Scala that well but discovered that statically initialized variables that depend on other statically initialized variables behave unexpectedly. To fix this, I just set properties in the test and do not subclass KafkaConfig, as users would do. class A() { val x = 1 val y = x } class B() extends A { override val x = 2 } val b = new B() b.y //this is 0, not 1 or 2 Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch, KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809668#comment-13809668 ] Roger Hoover edited comment on KAFKA-1092 at 10/30/13 9:52 PM: --- The reason the kafka.producer.ProducerTest was broken was because that test was creating an anonymous subclass of KafkaConfig and overriding a couple of members. I don't know Scala that well but discovered that statically initialized variables that depend on other statically initialized variables behave unexpectedly. To fix this, I just set properties in the test and do not subclass KafkaConfig, as users would do. {code} class A() { val x = 1 val y = x } class B() extends A { override val x = 2 } val b = new B() b.y //this is 0, not 1 or 2 {code} was (Author: theduderog): The reason the kafka.producer.ProducerTest was broken was because that test was creating an anonymous subclass of KafkaConfig and overriding a couple of members. I don't know Scala that well but discovered that statically initialized variables that depend on other statically initialized variables behave unexpectedly. To fix this, I just set properties in the test and do not subclass KafkaConfig, as users would do. class A() { val x = 1 val y = x } class B() extends A { override val x = 2 } val b = new B() b.y //this is 0, not 1 or 2 Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch, KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-1092: Status: Patch Available (was: Open) Added two new configuration parameters: # Hostname the broker will advertise to producers and consumers. If not set, it uses the # value for host.name if configured. Otherwise, it will use the value returned from # java.net.InetAddress.getCanonicalHostName(). #advertise.host.name=hostname routable by clients # The port to publish to ZooKeeper for clients to use. If this is not set, # it will publish the same port that the broker binds to. #advertise.port=port accessible by clients Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated KAFKA-1092: Attachment: KAFKA-1092.patch Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807251#comment-13807251 ] Roger Hoover edited comment on KAFKA-1092 at 10/28/13 9:31 PM: --- Added two new configuration parameters: advertise.host.name - Hostname the broker will advertise to producers and consumers. If not set, it uses the value for host.name if configured. Otherwise, it will use the value returned from java.net.InetAddress.getCanonicalHostName(). advertise.port - The port to publish to ZooKeeper for clients to use. If this is not set, it will publish the same port that the broker binds to. was (Author: theduderog): Added two new configuration parameters: # Hostname the broker will advertise to producers and consumers. If not set, it uses the # value for host.name if configured. Otherwise, it will use the value returned from # java.net.InetAddress.getCanonicalHostName(). #advertise.host.name=hostname routable by clients # The port to publish to ZooKeeper for clients to use. If this is not set, # it will publish the same port that the broker binds to. #advertise.port=port accessible by clients Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
[ https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807258#comment-13807258 ] Roger Hoover commented on KAFKA-1092: - Note: this change is backward compatible. Not specifying the two new optional parameters will yield the same behavior as before Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Attachments: KAFKA-1092.patch Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname
Roger Hoover created KAFKA-1092: --- Summary: Add server config parameter to separate bind address and ZK hostname Key: KAFKA-1092 URL: https://issues.apache.org/jira/browse/KAFKA-1092 Project: Kafka Issue Type: New Feature Components: config Affects Versions: 0.8.1 Reporter: Roger Hoover Currently, in server.properties, you can configure host.name which gets used for two purposes: 1) to bind the socket 2) to publish the broker details to ZK for clients to use. There are times when these two settings need to be different. Here's an example. I want to setup Kafka brokers on OpenStack virtual machines in a private cloud but I need producers to connect from elsewhere on the internal corporate network. With OpenStack, the virtual machines are only exposed to DHCP addresses (typically RFC 1918 private addresses). You can assign floating ips to a virtual machine but it's forwarded using Network Address Translation and not exposed directly to the VM. Also, there's typically no DNS to provide hostname lookup. Hosts have names like fubar.novalocal that are not externally routable. Here's what I want. I want the broker to bind to the VM's private network IP but I want it to publish it's floating IP to ZooKeeper so that producers can publish to it. I propose a new optional parameter, listen, which would allow you to specify the socket address to listen on. If not set, the parameter would default to host.name, which is the current behavior. #Publish the externally routable IP in ZK host.name = floating ip #Accept connections from any interface the VM knows about listen = * -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (KAFKA-982) Logo for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715691#comment-13715691 ] Roger Hoover commented on KAFKA-982: +1 for 298 Logo for Kafka -- Key: KAFKA-982 URL: https://issues.apache.org/jira/browse/KAFKA-982 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Attachments: 289.jpeg, 294.jpeg, 296.png, 298.jpeg We should have a logo for kafka. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira