[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails
[ https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127744#comment-16127744 ] Dong Lin commented on KAFKA-5663: - [~cmccabe] It appears that the bug in ConsoleConsumer has been hot-fixed by Jason yesterday. The test should pass now. > LogDirFailureTest system test fails > --- > > Key: KAFKA-5663 > URL: https://issues.apache.org/jira/browse/KAFKA-5663 > Project: Kafka > Issue Type: Bug >Reporter: Apurva Mehta >Assignee: Dong Lin > Fix For: 1.0.0 > > > The recently added JBOD system test failed last night. > {noformat} > Producer failed to produce messages for 20s. > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py", > line 166, in test_replication_with_disk_failure > self.start_producer_and_consumer() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 75, in start_producer_and_consumer > self.producer_start_timeout_sec) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Producer failed to produce messages for 20s. > {noformat} > Complete logs here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails
[ https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126753#comment-16126753 ] Dong Lin commented on KAFKA-5663: - [~cmccabe] I just looked into the issue. It seems that the error is not related the to log directory failure handling. The test failed at line 119 of log_dir_failure.py, which is before the test tries to log directory unavailable. The test failed because consumer failed to start. According to ConsoleConsumer-0-139748592136016/worker7/console_consumer.log, the consumer failed to start due to "kafka.common.InvalidConfigException: Wrong value earliest of auto.offset.reset in ConsumerConfig; Valid values are smallest and largest". I looked into the python code and the log to understand why "auto.offset.reset" is configured to be "earliest". However, the code suggests that this should not happen. This error should consistently cause the test to fail. I tried to verify this but https://jenkins.confluent.io/job/system-test-kafka-branch-builder is not working. I tried to test this locally but for some reason vagrant fails to work... I will try again tomorrow. Can you tell me how to find out the git hash in the log you provided? Also, does this test fail consistently on your side? Thanks, > LogDirFailureTest system test fails > --- > > Key: KAFKA-5663 > URL: https://issues.apache.org/jira/browse/KAFKA-5663 > Project: Kafka > Issue Type: Bug >Reporter: Apurva Mehta >Assignee: Dong Lin > Fix For: 1.0.0 > > > The recently added JBOD system test failed last night. > {noformat} > Producer failed to produce messages for 20s. > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py", > line 166, in test_replication_with_disk_failure > self.start_producer_and_consumer() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 75, in start_producer_and_consumer > self.producer_start_timeout_sec) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Producer failed to produce messages for 20s. > {noformat} > Complete logs here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails
[ https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122654#comment-16122654 ] Dong Lin commented on KAFKA-5663: - [~cmccabe] Sorry about it. Thanks for providing the log. I will look into this soon. > LogDirFailureTest system test fails > --- > > Key: KAFKA-5663 > URL: https://issues.apache.org/jira/browse/KAFKA-5663 > Project: Kafka > Issue Type: Bug >Reporter: Apurva Mehta >Assignee: Dong Lin > Fix For: 1.0.0 > > > The recently added JBOD system test failed last night. > {noformat} > Producer failed to produce messages for 20s. > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py", > line 166, in test_replication_with_disk_failure > self.start_producer_and_consumer() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 75, in start_producer_and_consumer > self.producer_start_timeout_sec) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Producer failed to produce messages for 20s. > {noformat} > Complete logs here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5710) KafkaAdminClient should remove inflight call correctly after response is received
Dong Lin created KAFKA-5710: --- Summary: KafkaAdminClient should remove inflight call correctly after response is received Key: KAFKA-5710 URL: https://issues.apache.org/jira/browse/KAFKA-5710 Project: Kafka Issue Type: Bug Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5695) Test DeleteRecordsRequest in AuthorizerIntegrationTest
Dong Lin created KAFKA-5695: --- Summary: Test DeleteRecordsRequest in AuthorizerIntegrationTest Key: KAFKA-5695 URL: https://issues.apache.org/jira/browse/KAFKA-5695 Project: Kafka Issue Type: Improvement Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5694) Add ChangeReplicaDirRequest and DescribeReplicaDirRequest (KIP-113)
Dong Lin created KAFKA-5694: --- Summary: Add ChangeReplicaDirRequest and DescribeReplicaDirRequest (KIP-113) Key: KAFKA-5694 URL: https://issues.apache.org/jira/browse/KAFKA-5694 Project: Kafka Issue Type: New Feature Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails
[ https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102403#comment-16102403 ] Dong Lin commented on KAFKA-5663: - Thanks [~apurva]. I will look into this. > LogDirFailureTest system test fails > --- > > Key: KAFKA-5663 > URL: https://issues.apache.org/jira/browse/KAFKA-5663 > Project: Kafka > Issue Type: Bug >Reporter: Apurva Mehta >Assignee: Dong Lin > > The recently added JBOD system test failed last night. > {noformat} > Producer failed to produce messages for 20s. > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py", > line 166, in test_replication_with_disk_failure > self.start_producer_and_consumer() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 75, in start_producer_and_consumer > self.producer_start_timeout_sec) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Producer failed to produce messages for 20s. > {noformat} > Complete logs here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5627) Reduce classes needed for LeaderAndIsrPartitionState and MetadataPartitionState
[ https://issues.apache.org/jira/browse/KAFKA-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-5627: --- Assignee: Dong Lin > Reduce classes needed for LeaderAndIsrPartitionState and > MetadataPartitionState > --- > > Key: KAFKA-5627 > URL: https://issues.apache.org/jira/browse/KAFKA-5627 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin > > It will be cleaner to replace LeaderAndIsrPartitionState and > MetadataPartitionState in LeaderAndIsr.scala with > org.apache.kafka.common.requests.PartitionState and > UpdateMetadataRequest.PartitionState respectively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5542) Improve Java doc for LeaderEpochFileCache.endOffsetFor()
Dong Lin created KAFKA-5542: --- Summary: Improve Java doc for LeaderEpochFileCache.endOffsetFor() Key: KAFKA-5542 URL: https://issues.apache.org/jira/browse/KAFKA-5542 Project: Kafka Issue Type: Bug Reporter: Dong Lin Assignee: Ben Stopford -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5163) Support replicas movement between log directories (KIP-113)
[ https://issues.apache.org/jira/browse/KAFKA-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069085#comment-16069085 ] Dong Lin commented on KAFKA-5163: - [~yuanjiali] ReplicaMoveThread will not grab lock for the entire period of movement. The data movement involves repeated FetchRequest and ProduceRequest until the new log catches up with the exiting log. Why would message be lost? > Support replicas movement between log directories (KIP-113) > --- > > Key: KAFKA-5163 > URL: https://issues.apache.org/jira/browse/KAFKA-5163 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Dong Lin > > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+directories > for motivation and design. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5521) Support replicas movement between log directories (KIP-113)
Dong Lin created KAFKA-5521: --- Summary: Support replicas movement between log directories (KIP-113) Key: KAFKA-5521 URL: https://issues.apache.org/jira/browse/KAFKA-5521 Project: Kafka Issue Type: Bug Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5367) Producer should not expiry topic from metadata cache if accumulator still has data for this topic
[ https://issues.apache.org/jira/browse/KAFKA-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-5367. - Resolution: Invalid > Producer should not expiry topic from metadata cache if accumulator still has > data for this topic > - > > Key: KAFKA-5367 > URL: https://issues.apache.org/jira/browse/KAFKA-5367 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Dong Lin > > To be added. -- This message was sent by Atlassian JIRA (v6.4.14#64029)