[jira] [Commented] (KAFKA-3795) Transient system test failure upgrade_test.TestUpgrade

2017-04-03 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953759#comment-15953759
 ] 

Roger Hoover commented on KAFKA-3795:
-

Happened again: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2017-04-03--001.1491220440--apache--trunk--bdf4cba/

{noformat}
test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=None.security_protocol=SASL_SSL.compression_types=.none
status: FAIL
run time:   4 minutes 4.673 seconds


199680 acked message did not make it to the Consumer. They are: 538129, 
538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 538149, 
538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 538162...plus 
199660 more. Total Acked: 331954, Total Consumed: 138002. The first 1000 
missing messages were validated to ensure they are in Kafka's data files. 1000 
were missing. This suggests data loss. Here are some of the messages not found 
in the data files: [538624, 538625, 538626, 538627, 538628, 538629, 538630, 
538631, 538632, 538633]

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 125, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 118, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 188, in validate
assert success, msg
AssertionError: 199680 acked message did not make it to the Consumer. They are: 
538129, 538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 
538149, 538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 
538162...plus 199660 more. Total Acked: 331954, Total Consumed: 138002. The 
first 1000 missing messages were validated to ensure they are in Kafka's data 
files. 1000 were missing. This suggests data loss. Here are some of the 
messages not found in the data files: [538624, 538625, 538626, 538627, 538628, 
538629, 538630, 538631, 538632, 538633]
{noformat}

> Transient system test failure upgrade_test.TestUpgrade
> --
>
> Key: KAFKA-3795
> URL: https://issues.apache.org/jira/browse/KAFKA-3795
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Jason Gustafson
>  Labels: reliability
>
> From a recent build running on the 0.10.0 branch:
> {code}
> test_id:
> 2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True
> status: FAIL
> run time:   3 minutes 29.166 seconds
> 3522 acked message did not make it to the Consumer. They are: 476524, 
> 476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 
> 476539, 476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 
> 476552, ...plus 3482 more. Total Acked: 110437, Total Consumed: 127470. The 
> first 1000 missing messages were validated to ensure they are in Kafka's data 
> files. 1000 were missing. This suggests data loss. Here are some of the 
> messages not found in the data files: [477184, 477185, 477187, 477188, 
> 477190, 477191, 477193, 477194, 477196, 477197]
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> 

[jira] [Updated] (KAFKA-4755) SimpleBenchmark consume test fails for streams

2017-03-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4755:

Priority: Blocker  (was: Major)

> SimpleBenchmark consume test fails for streams
> --
>
> Key: KAFKA-4755
> URL: https://issues.apache.org/jira/browse/KAFKA-4755
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> This occurred Feb 10th 2017:
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1
> status: FAIL
> run time:   7 minutes 36.712 seconds
> Streams Test process on ubuntu@worker2 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
>  line 86, in test_simple_benchmark
> self.driver[num].wait()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 102, in wait
> self.wait_node(node, timeout_sec)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 106, in wait_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (KAFKA-4755) SimpleBenchmark consume test fails for streams

2017-03-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover reopened KAFKA-4755:
-

Happened again.  Logs here:  
http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/StreamsSimpleBenchmarkTest/test_simple_benchmark/212.tgz




Streams Test process on ubuntu@worker2 took too long to exit
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
 line 48, in test_simple_benchmark
self.driver.wait()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py",
 line 99, in wait
self.wait_node(node, timeout_sec)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py",
 line 103, in wait_node
wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
err_msg="Streams Test process on " + str(node.account) + " took too long to 
exit")
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



> SimpleBenchmark consume test fails for streams
> --
>
> Key: KAFKA-4755
> URL: https://issues.apache.org/jira/browse/KAFKA-4755
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.11.0.0
>
>
> This occurred Feb 10th 2017:
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1
> status: FAIL
> run time:   7 minutes 36.712 seconds
> Streams Test process on ubuntu@worker2 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
>  line 86, in test_simple_benchmark
> self.driver[num].wait()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 102, in wait
> self.wait_node(node, timeout_sec)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 106, in wait_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4689) OffsetValidationTest fails validation with "Current position greater than the total number of consumed records"

2017-03-28 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945543#comment-15945543
 ] 

Roger Hoover commented on KAFKA-4689:
-

Happened again. Logs here 
http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/OffsetValidationTest/test_consumer_bounce/clean_shutdown%3DFalse.bounce_mode%3Drolling/49.tgz


test_id:
kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling
status: FAIL
run time:   3 minutes 32.756 seconds


Current position 79302 greater than the total number of consumed records 
79299
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/client/consumer_test.py",
 line 159, in test_consumer_bounce
(consumer.current_position(partition), consumer.total_consumed())
AssertionError: Current position 79302 greater than the total number of 
consumed records 79299


> OffsetValidationTest fails validation with "Current position greater than the 
> total number of consumed records"
> ---
>
> Key: KAFKA-4689
> URL: https://issues.apache.org/jira/browse/KAFKA-4689
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure
>
> {quote}
> 
> test_id:
> kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=all
> status: FAIL
> run time:   1 minute 49.834 seconds
> Current position greater than the total number of consumed records
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/client/consumer_test.py",
>  line 157, in test_consumer_bounce
> "Current position greater than the total number of consumed records"
> AssertionError: Current position greater than the total number of consumed 
> records
> {quote}
> See also 
> https://issues.apache.org/jira/browse/KAFKA-3513?focusedCommentId=15791790=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15791790
>  which is another instance of this bug, which indicates the issue goes back 
> at least as far as 1/17/2017. Note that I don't think we've seen this in 
> 0.10.1 yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-19 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover closed KAFKA-4670.
---

> Kafka Consumer should validate FetchResponse
> 
>
> Key: KAFKA-4670
> URL: https://issues.apache.org/jira/browse/KAFKA-4670
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Roger Hoover
>Assignee: Jason Gustafson
>Priority: Minor
>
> As a negative test case, I purposefully configured a bad advertised listener 
> endpoint.  
> {code}
> advertised.listeners=PLAINTEXT://www.google.com:80
> {code}
> This causes the Consumer to over-allocate and run out of memory.
> {quote}
> [2017-01-18 10:03:03,866] DEBUG Sending metadata request 
> (type=MetadataRequest, topics=foo) to node -1 
> (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
> Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
> null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
> replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
> [2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
> ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, 
> disconnected=false, 
> requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1},
>  
> responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
>  (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
> 2147483647 rack: null) for group console-consumer-64535. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
> www.google.com:80. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
> group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
> groupId=console-consumer-64535, sessionTimeout=1, 
> rebalanceTimeout=30, memberId=, protocolType=consumer, 
> groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
>  to coordinator www.google.com:80 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
> SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
> (org.apache.kafka.common.network.Selector)
> [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
> Fetching API versions. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
> 2147483647. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
> (kafka.tools.ConsoleConsumer$)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:990)
>   at 

[jira] [Commented] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-19 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830670#comment-15830670
 ] 

Roger Hoover commented on KAFKA-4670:
-

Ah, yes, thanks, [~ijuma].  

> Kafka Consumer should validate FetchResponse
> 
>
> Key: KAFKA-4670
> URL: https://issues.apache.org/jira/browse/KAFKA-4670
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Roger Hoover
>Assignee: Jason Gustafson
>Priority: Minor
>
> As a negative test case, I purposefully configured a bad advertised listener 
> endpoint.  
> {code}
> advertised.listeners=PLAINTEXT://www.google.com:80
> {code}
> This causes the Consumer to over-allocate and run out of memory.
> {quote}
> [2017-01-18 10:03:03,866] DEBUG Sending metadata request 
> (type=MetadataRequest, topics=foo) to node -1 
> (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
> Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
> null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
> replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
> [2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
> ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, 
> disconnected=false, 
> requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1},
>  
> responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
>  (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
> 2147483647 rack: null) for group console-consumer-64535. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
> www.google.com:80. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
> group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
> groupId=console-consumer-64535, sessionTimeout=1, 
> rebalanceTimeout=30, memberId=, protocolType=consumer, 
> groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
>  to coordinator www.google.com:80 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
> SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
> (org.apache.kafka.common.network.Selector)
> [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
> Fetching API versions. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
> 2147483647. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
> (kafka.tools.ConsoleConsumer$)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025)
>   at 
> 

[jira] [Created] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4670:
---

 Summary: Kafka Consumer should validate FetchResponse
 Key: KAFKA-4670
 URL: https://issues.apache.org/jira/browse/KAFKA-4670
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.2.0
Reporter: Roger Hoover
Assignee: Jason Gustafson
Priority: Minor


As a negative test case, I purposefully configured a bad advertised listener 
endpoint.  

{code}
advertised.listeners=PLAINTEXT://www.google.com:80
{code}

This causes the Consumer to over-allocate and run out of memory.

{quote}
[2017-01-18 10:03:03,866] DEBUG Sending metadata request (type=MetadataRequest, 
topics=foo) to node -1 (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
[2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, disconnected=false, 
requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1}, 
responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
2147483647 rack: null) for group console-consumer-64535. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
www.google.com:80. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
group console-consumer-64535 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
groupId=console-consumer-64535, sessionTimeout=1, rebalanceTimeout=30, 
memberId=, protocolType=consumer, 
groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
 to coordinator www.google.com:80 (id: 2147483647 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
(org.apache.kafka.common.network.Selector)
[2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
Fetching API versions. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
2147483647. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
(kafka.tools.ConsoleConsumer$)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:990)
at kafka.consumer.NewShinyConsumer.(BaseConsumer.scala:55)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
{quote}

It seems like the consumer should validate responses 

[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759269#comment-15759269
 ] 

Roger Hoover commented on KAFKA-4527:
-

Happened again: 

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/

> Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where 
> paused connector produces messages
> ---
>
> Key: KAFKA-4527
> URL: https://issues.apache.org/jira/browse/KAFKA-4527
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> {quote}
> 
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink
> status: FAIL
> run time:   40.164 seconds
> Paused sink connector should not consume any messages
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 257, in test_pause_and_resume_sink
> assert num_messages == len(self.sink.received_messages()), "Paused sink 
> connector should not consume any messages"
> AssertionError: Paused sink connector should not consume any messages
> {quote}
> See one case here: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  but it has also happened before, e.g. 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html
> Thinking about the test, one simple possibility is that our approach to get 
> the number of messages produced/consumed during the test is flawed -- I think 
> we may not account for additional buffering between the connectors and the 
> process reading their output to determine what they have produced. However, 
> that's just a theory -- the minimal checking on the logs that I did didn't 
> reveal anything obviously wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759266#comment-15759266
 ] 

Roger Hoover commented on KAFKA-3808:
-

Happened again:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/



> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759264#comment-15759264
 ] 

Roger Hoover commented on KAFKA-4166:
-

Similar failure: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759260#comment-15759260
 ] 

Roger Hoover commented on KAFKA-4526:
-

Thanks, [~apurva]

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758169#comment-15758169
 ] 

Roger Hoover commented on KAFKA-3808:
-

Happened again:

{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-17 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover resolved KAFKA-4554.
-
Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-3808


> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-17 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4554:
---

 Summary: ReplicaVerificationToolTest.test_replica_lags system test 
failure
 Key: KAFKA-4554
 URL: https://issues.apache.org/jira/browse/KAFKA-4554
 Project: Kafka
  Issue Type: Bug
Reporter: Roger Hoover


{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758158#comment-15758158
 ] 

Roger Hoover commented on KAFKA-4166:
-

Failed again: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758155#comment-15758155
 ] 

Roger Hoover commented on KAFKA-4526:
-

Failed again

{code}

test_id:
kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status: FAIL
run time:   8 minutes 55.261 seconds


531 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 more. Total 
Acked: 172487, Total Consumed: 172119. We validated that the first 531 of these 
missing messages correctly made it into Kafka's data files. This suggests they 
were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py",
 line 175, in test_throttled_reassignment
lambda: self.reassign_partitions(bounce_brokers, self.throttle))
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 531 acked message did not make it to the Consumer. They are: 0, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 
more. Total Acked: 172487, Total Consumed: 172119. We validated that the first 
531 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.


test_id:
kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status: FAIL
run time:   8 minutes 52.939 seconds


567 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 more. Total 
Acked: 169804, Total Consumed: 169248. We validated that the first 567 of these 
missing messages correctly made it into Kafka's data files. This suggests they 
were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py",
 line 175, in test_throttled_reassignment
lambda: self.reassign_partitions(bounce_brokers, self.throttle))
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 567 acked message did not make it to the Consumer. They are: 0, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 
more. Total Acked: 169804, Total Consumed: 169248. We validated that the first 
567 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

> Transient failure in 

[jira] [Created] (KAFKA-4551) StreamsSmokeTest.test_streams intermittent failure

2016-12-16 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4551:
---

 Summary: StreamsSmokeTest.test_streams intermittent failure
 Key: KAFKA-4551
 URL: https://issues.apache.org/jira/browse/KAFKA-4551
 Project: Kafka
  Issue Type: Bug
Reporter: Roger Hoover
Priority: Blocker




{code}
test_id:
kafkatest.tests.streams.streams_smoke_test.StreamsSmokeTest.test_streams
status: FAIL
run time:   4 minutes 44.872 seconds



Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_smoke_test.py",
 line 78, in test_streams
node.account.ssh("grep SUCCESS %s" % self.driver.STDOUT_FILE, 
allow_fail=False)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
 line 253, in ssh
raise RemoteCommandError(self, cmd, exit_status, stderr.read())
RemoteCommandError: ubuntu@worker6: Command 'grep SUCCESS 
/mnt/streams/streams.stdout' returned non-zero exit status 1.
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-15--001.1481794587--apache--trunk--7049938/StreamsSmokeTest/test_streams/91.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754984#comment-15754984
 ] 

Roger Hoover commented on KAFKA-4166:
-

Happened again here:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754980#comment-15754980
 ] 

Roger Hoover commented on KAFKA-4527:
-

Happened again:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where 
> paused connector produces messages
> ---
>
> Key: KAFKA-4527
> URL: https://issues.apache.org/jira/browse/KAFKA-4527
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> {quote}
> 
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink
> status: FAIL
> run time:   40.164 seconds
> Paused sink connector should not consume any messages
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 257, in test_pause_and_resume_sink
> assert num_messages == len(self.sink.received_messages()), "Paused sink 
> connector should not consume any messages"
> AssertionError: Paused sink connector should not consume any messages
> {quote}
> See one case here: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  but it has also happened before, e.g. 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html
> Thinking about the test, one simple possibility is that our approach to get 
> the number of messages produced/consumed during the test is flawed -- I think 
> we may not account for additional buffering between the connectors and the 
> process reading their output to determine what they have produced. However, 
> that's just a theory -- the minimal checking on the logs that I did didn't 
> reveal anything obviously wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754982#comment-15754982
 ] 

Roger Hoover commented on KAFKA-4526:
-

Happened again here:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:37 AM:
---

It happened twice more:

{code}
Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   "security_protocol": "PLAINTEXT",
"new_consumer": false
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz




was (Author: theduderog):
It happened twice more on these tests:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:36 AM:
---

It happened twice more on these tests:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz




was (Author: theduderog):
It happened twice more on these test:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover commented on KAFKA-4166:
-

It happened twice more on these test:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747308#comment-15747308
 ] 

Roger Hoover commented on KAFKA-4526:
-

This happened again on the Dec 13 nightly run.

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers (and possibly Producers)

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers 


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers (and possibly Producers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers 

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4361:
---

 Summary: Streams does not respect user configs for "default" params
 Key: KAFKA-4361
 URL: https://issues.apache.org/jira/browse/KAFKA-4361
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Damian Guy


For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

2016-10-21 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4331:
---

 Summary: Kafka Streams resetter is slow because it joins the same 
group for each topic
 Key: KAFKA-4331
 URL: https://issues.apache.org/jira/browse/KAFKA-4331
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.1, 0.10.0.0
Reporter: Roger Hoover
Assignee: Matthias J. Sax


The resetter is joining the same group for each topic which takes ~10secs in my 
testing.  This makes the reset very slow when you have a lot of topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-08-30 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450054#comment-15450054
 ] 

Roger Hoover commented on KAFKA-3993:
-

Thanks, [~cotedm].  I tried to set acks=all but apparently that doesn't work. 

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-19 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover closed KAFKA-4063.
---

The JIRA UI was unresponsive so I accidentally submitted the form twice.

> Add support for infinite endpoints for range queries in Kafka Streams KV 
> stores
> ---
>
> Key: KAFKA-4063
> URL: https://issues.apache.org/jira/browse/KAFKA-4063
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> In some applications, it's useful to iterate over the key-value store either:
> 1. from the beginning up to a certain key
> 2. from a certain key to the end
> We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4063:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4063
 URL: https://issues.apache.org/jira/browse/KAFKA-4063
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4064:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4064
 URL: https://issues.apache.org/jira/browse/KAFKA-4064
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-08-10 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415811#comment-15415811
 ] 

Roger Hoover commented on KAFKA-3993:
-

Thanks, [~vahid].  Yeah, it looks the same.

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3993) Console producer drops data

2016-08-10 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-3993:

Description: 
The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}

  was:
The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}


> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-08-02 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404439#comment-15404439
 ] 

Roger Hoover commented on KAFKA-3752:
-

[~guozhang] You're assessment seems correct.  It happened again when I 
restarted after a clean shutdown (SIGTERM + wait for exit). 

1.  We have a single KafkaStreams instance with 8 threads.
2. Here's the full log:  
https://gist.github.com/theduderog/f9ab4767cd3b098d404f5513a7e1c27e

> Provide a way for KStreams to recover from unclean shutdown
> ---
>
> Key: KAFKA-3752
> URL: https://issues.apache.org/jira/browse/KAFKA-3752
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>  Labels: architecture
>
> If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
> Killer), it may leave behind lock files and fail to recover.
> It would be useful to have an options (say --force) to tell KStreams to 
> proceed even if it finds old LOCK files.
> {noformat}
> [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in 
> thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:583)
> org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
> the state manager
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
>   at 
> 

[jira] [Created] (KAFKA-3993) Console producer drops data

2016-07-26 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3993:
---

 Summary: Console producer drops data
 Key: KAFKA-3993
 URL: https://issues.apache.org/jira/browse/KAFKA-3993
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.0.0
Reporter: Roger Hoover


The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3740) Add configs for RocksDBStores

2016-06-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334812#comment-15334812
 ] 

Roger Hoover commented on KAFKA-3740:
-

[~h...@pinterest.com], any update on this?  I'm wondering if you have any idea 
when it might be available.

> Add configs for RocksDBStores
> -
>
> Key: KAFKA-3740
> URL: https://issues.apache.org/jira/browse/KAFKA-3740
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Today most of the rocksDB configs are hard written inside {{RocksDBStore}}, 
> or the default values are directly used. We need to make them configurable 
> for advanced users. For example, some default values may not work perfectly 
> for some scenarios: 
> https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576
>  
> One way of doing that is to introduce a "RocksDBStoreConfigs" objects similar 
> to "StreamsConfig", which defines all related rocksDB options configs, that 
> can be passed as key-value pairs to "StreamsConfig".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3858) Add functions to print stream topologies

2016-06-16 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3858:
---

 Summary: Add functions to print stream topologies
 Key: KAFKA-3858
 URL: https://issues.apache.org/jira/browse/KAFKA-3858
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Guozhang Wang


For debugging and development, it would be very useful to be able to print 
Kafka streams topologies.  At a minimum, it would be great to be able to see 
the logical topology including with Kafka topics linked by sub-topologies.  I 
think that this information does not depend on partitioning.  For more detail, 
it would be great to be able to print the same logical topology but also 
showing number of tasks (an perhaps task ids?).  Finally, it would be great to 
show the physical topology after the tasks have been mapped to JVMs + threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3761) Controller has RunningAsBroker instead of RunningAsController state

2016-05-26 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3761 started by Roger Hoover.
---
> Controller has RunningAsBroker instead of RunningAsController state
> ---
>
> Key: KAFKA-3761
> URL: https://issues.apache.org/jira/browse/KAFKA-3761
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Roger Hoover
>
> In `KafkaServer.start`, we start `KafkaController`:
> {code}
> /* start kafka controller */
> kafkaController = new KafkaController(config, zkUtils, brokerState, 
> kafkaMetricsTime, metrics, threadNamePrefix)
> kafkaController.startup()
> {code}
> Which sets the state to `RunningAsController` in 
> `KafkaController.onControllerFailover`:
> `brokerState.newState(RunningAsController)`
> And this later gets set to `RunningAsBroker`.
> This doesn't match the diagram in `BrokerStates`. [~junrao] suggested that we 
> should start the controller after we register the broker in ZK, but this 
> seems tricky as we need to controller in `KafkaApis`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-3760:

Comment: was deleted

(was: Create PR:  https://github.com/apache/kafka/pull/1436)

> Set broker state as running after publishing to ZooKeeper
> -
>
> Key: KAFKA-3760
> URL: https://issues.apache.org/jira/browse/KAFKA-3760
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Jun Rao
>Priority: Minor
>
> Currently, the broker state is set to running before it registers itself in 
> ZooKeeper. This is too early in the broker lifecycle. If clients use the 
> broker state as an indicator that the broker is ready to accept requests, 
> they will get errors. This change is to delay setting the broker state to 
> running until it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303141#comment-15303141
 ] 

Roger Hoover commented on KAFKA-3760:
-

Create PR:  https://github.com/apache/kafka/pull/1436

> Set broker state as running after publishing to ZooKeeper
> -
>
> Key: KAFKA-3760
> URL: https://issues.apache.org/jira/browse/KAFKA-3760
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Jun Rao
>Priority: Minor
>
> Currently, the broker state is set to running before it registers itself in 
> ZooKeeper. This is too early in the broker lifecycle. If clients use the 
> broker state as an indicator that the broker is ready to accept requests, 
> they will get errors. This change is to delay setting the broker state to 
> running until it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-3760:

Summary: Set broker state as running after publishing to ZooKeeper  (was: 
Setting broker state as running after publishing to ZooKeeper)

> Set broker state as running after publishing to ZooKeeper
> -
>
> Key: KAFKA-3760
> URL: https://issues.apache.org/jira/browse/KAFKA-3760
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Jun Rao
>Priority: Minor
>
> Currently, the broker state is set to running before it registers itself in 
> ZooKeeper. This is too early in the broker lifecycle. If clients use the 
> broker state as an indicator that the broker is ready to accept requests, 
> they will get errors. This change is to delay setting the broker state to 
> running until it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3760) Setting broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3760:
---

 Summary: Setting broker state as running after publishing to 
ZooKeeper
 Key: KAFKA-3760
 URL: https://issues.apache.org/jira/browse/KAFKA-3760
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Jun Rao
Priority: Minor


Currently, the broker state is set to running before it registers itself in 
ZooKeeper. This is too early in the broker lifecycle. If clients use the broker 
state as an indicator that the broker is ready to accept requests, they will 
get errors. This change is to delay setting the broker state to running until 
it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-05-24 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3752:
---

 Summary: Provide a way for KStreams to recover from unclean 
shutdown
 Key: KAFKA-3752
 URL: https://issues.apache.org/jira/browse/KAFKA-3752
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Guozhang Wang


If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
Killer), it may leave behind lock files and fail to recover.

It would be useful to have an options (say --force) to tell KStreams to proceed 
even if it finds old LOCK files.

{noformat}
[2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in thread 
[StreamThread-1]:  
(org.apache.kafka.streams.processor.internals.StreamThread:583)
org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
the state manager
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
at 
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
at 
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
at 
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
at 
org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
at 
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Caused by: java.io.IOException: Failed to lock the state directory: 
/data/test/2/kafka-streams/test-2/0_0
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:95)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69)
  

[jira] [Created] (KAFKA-3741) KStream config for changelog min.in.sync.replicas

2016-05-20 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3741:
---

 Summary: KStream config for changelog min.in.sync.replicas
 Key: KAFKA-3741
 URL: https://issues.apache.org/jira/browse/KAFKA-3741
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Guozhang Wang


Kafka Streams currently allows you to specify a replication factor for 
changelog and repartition topics that it creates.  It should also allow you to 
specify min.in.sync.replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3650) AWS test script fails to install vagrant

2016-05-03 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3650:
---

 Summary: AWS test script fails to install vagrant
 Key: KAFKA-3650
 URL: https://issues.apache.org/jira/browse/KAFKA-3650
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Priority: Trivial


The URL for installing vagrant in the AWS init script is not longer valid 
(https://dl.bintray.com/mitchellh/vagrant/vagrant_1.7.2_x86_64.deb).

We can use this instead:  
https://releases.hashicorp.com/vagrant/1.7.2/vagrant_1.7.2_x86_64.deb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2105) NullPointerException in client on MetadataRequest

2015-04-07 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-2105:
---

 Summary: NullPointerException in client on MetadataRequest
 Key: KAFKA-2105
 URL: https://issues.apache.org/jira/browse/KAFKA-2105
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8.2.1
Reporter: Roger Hoover
Priority: Minor


With the new producer, if you accidentally pass null to 
KafkaProducer.partitionsFor(null), it will cause the IO thread to throw NPE.

Uncaught error in kafka producer I/O thread: 
java.lang.NullPointerException
at org.apache.kafka.common.utils.Utils.utf8Length(Utils.java:174)
at org.apache.kafka.common.protocol.types.Type$5.sizeOf(Type.java:176)
at 
org.apache.kafka.common.protocol.types.ArrayOf.sizeOf(ArrayOf.java:55)
at org.apache.kafka.common.protocol.types.Schema.sizeOf(Schema.java:81)
at org.apache.kafka.common.protocol.types.Struct.sizeOf(Struct.java:218)
at 
org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35)
at 
org.apache.kafka.common.requests.RequestSend.init(RequestSend.java:29)
at 
org.apache.kafka.clients.NetworkClient.metadataRequest(NetworkClient.java:369)
at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:391)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:188)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1369) snappy version update 1.1.x

2014-09-12 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132229#comment-14132229
 ] 

Roger Hoover commented on KAFKA-1369:
-

I also ran into this issue on 64-bit Centos 5.  The snappy-java maintainer said 
that 1.1.1.3 is API compatible with 1.0.5.3 and has libstdc++ statically 
compiled into the object file so that it doesn't rely on the OS to have a new 
enough version.

https://github.com/xerial/snappy-java/issues/17

 snappy version update 1.1.x
 ---

 Key: KAFKA-1369
 URL: https://issues.apache.org/jira/browse/KAFKA-1369
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.8.1.1
 Environment: Red Hat Enterprise Linux Server release 5.8 (Tikanga)
 - x64 
Reporter: thinker0
Priority: Minor
 Attachments: patch.diff


 https://github.com/xerial/snappy-java/issues/38 issue
 snappy version 1.1.x
 {code}
 org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
 at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239)
 at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
 at 
 org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:351)
 at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159)
 at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142)
 at java.io.InputStream.read(InputStream.java:101)
 at 
 kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply$mcI$sp(ByteBufferMessageSet.scala:68)
 at 
 kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:68)
 at 
 kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:68)
 at scala.collection.immutable.Stream$.continually(Stream.scala:1129)
 at 
 kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:68)
 at 
 kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:178)
 at 
 kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:191)
 at 
 kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:145)
 at 
 kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
 at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
 {code}
 {code}
 /tmp] ldd snappy-1.0.5-libsnappyjava.so
 ./snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version 
 `GLIBCXX_3.4.9' not found (required by ./snappy-1.0.5-libsnappyjava.so)
 ./snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version 
 `GLIBCXX_3.4.11' not found (required by ./snappy-1.0.5-libsnappyjava.so)
   linux-vdso.so.1 =  (0x7fff81dfc000)
   libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b554b43)
   libm.so.6 = /lib64/libm.so.6 (0x2b554b731000)
   libc.so.6 = /lib64/libc.so.6 (0x2b554b9b4000)
   libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b554bd0c000)
   /lib64/ld-linux-x86-64.so.2 (0x0033e2a0)
 {code}
 {code}
 /tmp] ldd snappy-1.1.1M1-be6ba593-9ac7-488e-953e-ba5fd9530ee1-libsnappyjava.so
 ldd: warning: you do not have execution permission for 
 `./snappy-1.1.1M1-be6ba593-9ac7-488e-953e-ba5fd9530ee1-libsnappyjava.so'
   linux-vdso.so.1 =  (0x7fff1c132000)
   libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b9548319000)
   libm.so.6 = /lib64/libm.so.6 (0x2b954861a000)
   libc.so.6 = /lib64/libc.so.6 (0x2b954889d000)
   libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b9548bf5000)
   /lib64/ld-linux-x86-64.so.2 (0x0033e2a0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-260) Add audit trail to kafka

2014-06-23 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911909#comment-13911909
 ] 

Roger Hoover edited comment on KAFKA-260 at 6/23/14 7:12 PM:
-

It also makes false negatives possible since if you lose both normal messages 
and the associated audit messages it will appear that everything adds up. The 
later problem is astronomically unlikely to happen exactly, though.

This may be true once messages have reached a broker.  However, if a producer 
process were to be killed (say by SIGKILL), both it's unack'ed normal messages 
and the audit data would be lost.  Would it make sense to persist the audit 
counts to the file system for producers so that they could potentially be 
recovered?


was (Author: theduderog):
It also makes false negatives possible since if you lose both normal messages 
and the associated audit messages it will appear that everything adds up. The 
later problem is astronomically unlikely to happen exactly, though.

This may be true once messages have reached a broker.  However, if a producer 
process were to be killed (say by SIGKILL), both it's commited messages and the 
audit data would be lost.  Would it make sense to persist the audit counts to 
the file system for producers so that they could potentially be recovered?

 Add audit trail to kafka
 

 Key: KAFKA-260
 URL: https://issues.apache.org/jira/browse/KAFKA-260
 Project: Kafka
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: Picture 18.png, kafka-audit-trail-draft.patch


 LinkedIn has a system that does monitoring on top of our data flow to ensure 
 all data is delivered to all consumers of data. This works by having each 
 logical tier through which data passes produce messages to a central 
 audit-trail topic; these messages give a time period and the number of 
 messages that passed through that tier in that time period. Example of tiers 
 for data might be producer, broker, hadoop-etl, etc. This makes it 
 possible to compare the total events for a given time period to ensure that 
 all events that are produced are consumed by all consumers.
 This turns out to be extremely useful. We also have an application that 
 balances the books and checks that all data is consumed in a timely 
 fashion. This gives graphs for each topic and shows any data loss and the lag 
 at which the data is consumed (if any).
 This would be an optional feature that would allow you to to this kind of 
 reconciliation automatically for all the topics kafka hosts against all the 
 tiers of applications that interact with the data.
 Some details, the proposed format of the data is JSON using the following 
 format for messages:
 {
   time:1301727060032,  // the timestamp at which this audit message is sent
   topic: my_topic_name, // the topic this audit data is for
   tier:producer, // a user-defined tier name
   bucket_start: 130172640, // the beginning of the time bucket this 
 data applies to
   bucket_end: 130172700, // the end of the time bucket this data 
 applies to
   host:my_host_name.datacenter.linkedin.com, // the server that this was 
 sent from
   datacenter:hlx32, // the datacenter this occurred in
   application:newsfeed_service, // a user-defined application name
   guid:51656274-a86a-4dff-b824-8e8e20a6348f, // a unique identifier for 
 this message
   count:43634
 }
 DISCUSSION
 Time is complex:
 1. The audit data must be based on a timestamp in the events not the time on 
 machine processing the event. Using this timestamp means that all downstream 
 consumers will report audit data on the right time bucket. This means that 
 there must be a timestamp in the event, which we don't currently require. 
 Arguably we should just add a timestamp to the events, but I think it is 
 sufficient for now just to allow the user to provide a function to extract 
 the time from their events.
 2. For counts to reconcile exactly we can only do analysis at a granularity 
 based on the least common multiple of the bucket size used by all tiers. The 
 simplest is just to configure them all to use the same bucket size. We 
 currently use a bucket size of 10 mins, but anything from 1-60 mins is 
 probably reasonable.
 For analysis purposes one tier is designated as the source tier and we do 
 reconciliation against this count (e.g. if another tier has less, that is 
 treated as lost, if another tier has more that is duplication).
 Note that this system makes false positives possible since you can lose an 
 audit message. It also makes false negatives possible since if you lose both 
 normal messages and the associated audit messages it will appear that 
 everything adds up. The later problem is astronomically unlikely to happen 
 

[jira] [Commented] (KAFKA-260) Add audit trail to kafka

2014-02-25 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911909#comment-13911909
 ] 

Roger Hoover commented on KAFKA-260:


It also makes false negatives possible since if you lose both normal messages 
and the associated audit messages it will appear that everything adds up. The 
later problem is astronomically unlikely to happen exactly, though.

This may be true once messages have reached a broker.  However, if a producer 
process were to be killed (say by SIGKILL), both it's commited messages and the 
audit data would be lost.  Would it make sense to persist the audit counts to 
the file system for producers so that they could potentially be recovered?

 Add audit trail to kafka
 

 Key: KAFKA-260
 URL: https://issues.apache.org/jira/browse/KAFKA-260
 Project: Kafka
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: Picture 18.png, kafka-audit-trail-draft.patch


 LinkedIn has a system that does monitoring on top of our data flow to ensure 
 all data is delivered to all consumers of data. This works by having each 
 logical tier through which data passes produce messages to a central 
 audit-trail topic; these messages give a time period and the number of 
 messages that passed through that tier in that time period. Example of tiers 
 for data might be producer, broker, hadoop-etl, etc. This makes it 
 possible to compare the total events for a given time period to ensure that 
 all events that are produced are consumed by all consumers.
 This turns out to be extremely useful. We also have an application that 
 balances the books and checks that all data is consumed in a timely 
 fashion. This gives graphs for each topic and shows any data loss and the lag 
 at which the data is consumed (if any).
 This would be an optional feature that would allow you to to this kind of 
 reconciliation automatically for all the topics kafka hosts against all the 
 tiers of applications that interact with the data.
 Some details, the proposed format of the data is JSON using the following 
 format for messages:
 {
   time:1301727060032,  // the timestamp at which this audit message is sent
   topic: my_topic_name, // the topic this audit data is for
   tier:producer, // a user-defined tier name
   bucket_start: 130172640, // the beginning of the time bucket this 
 data applies to
   bucket_end: 130172700, // the end of the time bucket this data 
 applies to
   host:my_host_name.datacenter.linkedin.com, // the server that this was 
 sent from
   datacenter:hlx32, // the datacenter this occurred in
   application:newsfeed_service, // a user-defined application name
   guid:51656274-a86a-4dff-b824-8e8e20a6348f, // a unique identifier for 
 this message
   count:43634
 }
 DISCUSSION
 Time is complex:
 1. The audit data must be based on a timestamp in the events not the time on 
 machine processing the event. Using this timestamp means that all downstream 
 consumers will report audit data on the right time bucket. This means that 
 there must be a timestamp in the event, which we don't currently require. 
 Arguably we should just add a timestamp to the events, but I think it is 
 sufficient for now just to allow the user to provide a function to extract 
 the time from their events.
 2. For counts to reconcile exactly we can only do analysis at a granularity 
 based on the least common multiple of the bucket size used by all tiers. The 
 simplest is just to configure them all to use the same bucket size. We 
 currently use a bucket size of 10 mins, but anything from 1-60 mins is 
 probably reasonable.
 For analysis purposes one tier is designated as the source tier and we do 
 reconciliation against this count (e.g. if another tier has less, that is 
 treated as lost, if another tier has more that is duplication).
 Note that this system makes false positives possible since you can lose an 
 audit message. It also makes false negatives possible since if you lose both 
 normal messages and the associated audit messages it will appear that 
 everything adds up. The later problem is astronomically unlikely to happen 
 exactly, though.
 This would integrate into the client (producer and consumer both) in the 
 following way:
 1. The user provides a way to get timestamps from messages (required)
 2. The user configures the tier name, host name, datacenter name, and 
 application name as part of the consumer and producer config. We can provide 
 reasonable defaults if not supplied (e.g. if it is a Producer then set tier 
 to producer and get the hostname from the OS).
 The application that processes this data is currently a Java Jetty app and 
 talks to mysql. It feeds off the audit topic in kafka and runs both automatic 
 monitoring 

[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-30 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-1092:


Status: Open  (was: Patch Available)

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch, KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-30 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-1092:


Status: Patch Available  (was: Open)

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch, KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-30 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-1092:


Attachment: KAFKA-1092.patch

Updated parameter names to advertised.host.name and advertised.port and 
fixed broken ProducerTest

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch, KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-30 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809668#comment-13809668
 ] 

Roger Hoover commented on KAFKA-1092:
-

The reason the kafka.producer.ProducerTest was broken was because that test was 
creating an anonymous subclass of KafkaConfig and overriding a couple of 
members.  I don't know Scala that well but discovered that statically 
initialized variables that depend on other statically initialized variables 
behave unexpectedly.  To fix this, I just set properties in the test and do not 
subclass KafkaConfig, as users would do.

class A() {
 val x = 1
 val y = x
}

class B() extends A {
 override val x = 2
}

val b = new B()
b.y //this is 0, not 1 or 2

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch, KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-30 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809668#comment-13809668
 ] 

Roger Hoover edited comment on KAFKA-1092 at 10/30/13 9:52 PM:
---

The reason the kafka.producer.ProducerTest was broken was because that test was 
creating an anonymous subclass of KafkaConfig and overriding a couple of 
members.  I don't know Scala that well but discovered that statically 
initialized variables that depend on other statically initialized variables 
behave unexpectedly.  To fix this, I just set properties in the test and do not 
subclass KafkaConfig, as users would do.

{code}
class A() {
 val x = 1
 val y = x
}

class B() extends A {
 override val x = 2
}

val b = new B()
b.y //this is 0, not 1 or 2
{code}


was (Author: theduderog):
The reason the kafka.producer.ProducerTest was broken was because that test was 
creating an anonymous subclass of KafkaConfig and overriding a couple of 
members.  I don't know Scala that well but discovered that statically 
initialized variables that depend on other statically initialized variables 
behave unexpectedly.  To fix this, I just set properties in the test and do not 
subclass KafkaConfig, as users would do.

class A() {
 val x = 1
 val y = x
}

class B() extends A {
 override val x = 2
}

val b = new B()
b.y //this is 0, not 1 or 2

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch, KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-1092:


Status: Patch Available  (was: Open)

Added two new configuration parameters:

# Hostname the broker will advertise to producers and consumers. If not set, it 
uses the
# value for host.name if configured.  Otherwise, it will use the value 
returned from
# java.net.InetAddress.getCanonicalHostName().
#advertise.host.name=hostname routable by clients

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertise.port=port accessible by clients

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-1092:


Attachment: KAFKA-1092.patch

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-28 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807251#comment-13807251
 ] 

Roger Hoover edited comment on KAFKA-1092 at 10/28/13 9:31 PM:
---

Added two new configuration parameters:

advertise.host.name - Hostname the broker will advertise to producers and 
consumers. If not set, it uses the value for host.name if configured.  
Otherwise, it will use the value returned from 
java.net.InetAddress.getCanonicalHostName().

advertise.port - The port to publish to ZooKeeper for clients to use. If this 
is not set, it will publish the same port that the broker binds to.


was (Author: theduderog):
Added two new configuration parameters:

# Hostname the broker will advertise to producers and consumers. If not set, it 
uses the
# value for host.name if configured.  Otherwise, it will use the value 
returned from
# java.net.InetAddress.getCanonicalHostName().
#advertise.host.name=hostname routable by clients

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertise.port=port accessible by clients

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-28 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807258#comment-13807258
 ] 

Roger Hoover commented on KAFKA-1092:
-

Note: this change is backward compatible.  Not specifying the two new optional 
parameters will yield the same behavior as before

 Add server config parameter to separate bind address and ZK hostname
 

 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover
 Attachments: KAFKA-1092.patch


 Currently, in server.properties, you can configure host.name which gets used 
 for two purposes: 1) to bind the socket 2) to publish the broker details to 
 ZK for clients to use.
 There are times when these two settings need to be different.  Here's an 
 example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
 private cloud but I need producers to connect from elsewhere on the internal 
 corporate network.  With OpenStack, the virtual machines are only exposed to 
 DHCP addresses (typically RFC 1918 private addresses).  You can assign 
 floating ips to a virtual machine but it's forwarded using Network Address 
 Translation and not exposed directly to the VM.  Also, there's typically no 
 DNS to provide hostname lookup.  Hosts have names like fubar.novalocal that 
 are not externally routable.
 Here's what I want.  I want the broker to bind to the VM's private network IP 
 but I want it to publish it's floating IP to ZooKeeper so that producers can 
 publish to it.
 I propose a new optional parameter, listen, which would allow you to 
 specify the socket address to listen on.  If not set, the parameter would 
 default to host.name, which is the current behavior.
 #Publish the externally routable IP in ZK
 host.name = floating ip
 #Accept connections from any interface the VM knows about
 listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (KAFKA-1092) Add server config parameter to separate bind address and ZK hostname

2013-10-17 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-1092:
---

 Summary: Add server config parameter to separate bind address and 
ZK hostname
 Key: KAFKA-1092
 URL: https://issues.apache.org/jira/browse/KAFKA-1092
 Project: Kafka
  Issue Type: New Feature
  Components: config
Affects Versions: 0.8.1
Reporter: Roger Hoover


Currently, in server.properties, you can configure host.name which gets used 
for two purposes: 1) to bind the socket 2) to publish the broker details to ZK 
for clients to use.

There are times when these two settings need to be different.  Here's an 
example.  I want to setup Kafka brokers on OpenStack virtual machines in a 
private cloud but I need producers to connect from elsewhere on the internal 
corporate network.  With OpenStack, the virtual machines are only exposed to 
DHCP addresses (typically RFC 1918 private addresses).  You can assign 
floating ips to a virtual machine but it's forwarded using Network Address 
Translation and not exposed directly to the VM.  Also, there's typically no DNS 
to provide hostname lookup.  Hosts have names like fubar.novalocal that are 
not externally routable.

Here's what I want.  I want the broker to bind to the VM's private network IP 
but I want it to publish it's floating IP to ZooKeeper so that producers can 
publish to it.

I propose a new optional parameter, listen, which would allow you to specify 
the socket address to listen on.  If not set, the parameter would default to 
host.name, which is the current behavior.

#Publish the externally routable IP in ZK
host.name = floating ip
#Accept connections from any interface the VM knows about
listen = *



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-982) Logo for Kafka

2013-07-22 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715691#comment-13715691
 ] 

Roger Hoover commented on KAFKA-982:


+1 for 298

 Logo for Kafka
 --

 Key: KAFKA-982
 URL: https://issues.apache.org/jira/browse/KAFKA-982
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps
 Attachments: 289.jpeg, 294.jpeg, 296.png, 298.jpeg


 We should have a logo for kafka.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira