[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507561#comment-15507561 ] Balint Molnar commented on KAFKA-1954: -- Thanks, [~ijuma]. I realized nearly every test case recreates the server infra (kafka/zookeeper) before itself even if it's not needed, so first I would like to refactor the classes to restart the infra only the required times. > Speed Up The Unit Tests > --- > > Key: KAFKA-1954 > URL: https://issues.apache.org/jira/browse/KAFKA-1954 > Project: Kafka > Issue Type: Improvement >Reporter: Jay Kreps >Assignee: Sriharsha Chintalapani > Labels: newbie++ > Attachments: KAFKA-1954.patch > > > The server unit tests are pretty slow. They take about 8m40s on my machine. > Combined with slow scala compile time this is kind of painful. > Almost all of this time comes from the integration tests which start one or > more brokers and then shut them down. > Our finding has been that these integration tests are actually quite useful > so we probably can't just get rid of them. > Here are some times: > Zk startup: 100ms > Kafka server startup: 600ms > Kafka server shutdown: 500ms > > So you can see that an integration test suite with 10 tests that starts and > stops a 3 node cluster for each test will take ~34 seconds even if the tests > themselves are instantaneous. > I think the best solution to this is to get the test harness classes in shape > and then performance tune them a bit as this would potentially speed > everything up. There are several test harness classes: > - ZooKeeperTestHarness > - KafkaServerTestHarness > - ProducerConsumerTestHarness > - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using > new clients) > Unfortunately often tests don't use the right harness, they often use a > lower-level harness than they should and manually create stuff. Usually the > cause of this is that the harness is missing some feature. > I think the right thing to do here is > 1. Get the tests converted to the best possible harness. If you are testing > producers and consumers then you should use the harness that creates all that > and shuts it down for you. > 2. Optimize the harnesses to be faster. > How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here > are a few: > 1. It's worth analyzing the logging to see what is taking up time in the > startup and shutdown. > 2. There may be things like controlled shutdown that we can disable (since we > are anyway going to discard the brokers after shutdown. > 3. The harnesses could probably start all the servers and all the clients in > parallel. > 4. We maybe able to tune down the resource usage in the server config for > test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507511#comment-15507511 ] Ismael Juma commented on KAFKA-1954: It's worth mentioning that we use one gradle fork per core by default when running the test suite. So, adding parallelism at the individual test level will help less in that case. However, it will still help when running tests individually or if the fork count is overridden (we set it to 1 in Jenkins for better test stability). > Speed Up The Unit Tests > --- > > Key: KAFKA-1954 > URL: https://issues.apache.org/jira/browse/KAFKA-1954 > Project: Kafka > Issue Type: Improvement >Reporter: Jay Kreps >Assignee: Sriharsha Chintalapani > Labels: newbie++ > Attachments: KAFKA-1954.patch > > > The server unit tests are pretty slow. They take about 8m40s on my machine. > Combined with slow scala compile time this is kind of painful. > Almost all of this time comes from the integration tests which start one or > more brokers and then shut them down. > Our finding has been that these integration tests are actually quite useful > so we probably can't just get rid of them. > Here are some times: > Zk startup: 100ms > Kafka server startup: 600ms > Kafka server shutdown: 500ms > > So you can see that an integration test suite with 10 tests that starts and > stops a 3 node cluster for each test will take ~34 seconds even if the tests > themselves are instantaneous. > I think the best solution to this is to get the test harness classes in shape > and then performance tune them a bit as this would potentially speed > everything up. There are several test harness classes: > - ZooKeeperTestHarness > - KafkaServerTestHarness > - ProducerConsumerTestHarness > - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using > new clients) > Unfortunately often tests don't use the right harness, they often use a > lower-level harness than they should and manually create stuff. Usually the > cause of this is that the harness is missing some feature. > I think the right thing to do here is > 1. Get the tests converted to the best possible harness. If you are testing > producers and consumers then you should use the harness that creates all that > and shuts it down for you. > 2. Optimize the harnesses to be faster. > How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here > are a few: > 1. It's worth analyzing the logging to see what is taking up time in the > startup and shutdown. > 2. There may be things like controlled shutdown that we can disable (since we > are anyway going to discard the brokers after shutdown. > 3. The harnesses could probably start all the servers and all the clients in > parallel. > 4. We maybe able to tune down the resource usage in the server config for > test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507444#comment-15507444 ] Sriharsha Chintalapani commented on KAFKA-1954: --- [~baluchicken] feel free to take it over. > Speed Up The Unit Tests > --- > > Key: KAFKA-1954 > URL: https://issues.apache.org/jira/browse/KAFKA-1954 > Project: Kafka > Issue Type: Improvement >Reporter: Jay Kreps >Assignee: Sriharsha Chintalapani > Labels: newbie++ > Attachments: KAFKA-1954.patch > > > The server unit tests are pretty slow. They take about 8m40s on my machine. > Combined with slow scala compile time this is kind of painful. > Almost all of this time comes from the integration tests which start one or > more brokers and then shut them down. > Our finding has been that these integration tests are actually quite useful > so we probably can't just get rid of them. > Here are some times: > Zk startup: 100ms > Kafka server startup: 600ms > Kafka server shutdown: 500ms > > So you can see that an integration test suite with 10 tests that starts and > stops a 3 node cluster for each test will take ~34 seconds even if the tests > themselves are instantaneous. > I think the best solution to this is to get the test harness classes in shape > and then performance tune them a bit as this would potentially speed > everything up. There are several test harness classes: > - ZooKeeperTestHarness > - KafkaServerTestHarness > - ProducerConsumerTestHarness > - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using > new clients) > Unfortunately often tests don't use the right harness, they often use a > lower-level harness than they should and manually create stuff. Usually the > cause of this is that the harness is missing some feature. > I think the right thing to do here is > 1. Get the tests converted to the best possible harness. If you are testing > producers and consumers then you should use the harness that creates all that > and shuts it down for you. > 2. Optimize the harnesses to be faster. > How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here > are a few: > 1. It's worth analyzing the logging to see what is taking up time in the > startup and shutdown. > 2. There may be things like controlled shutdown that we can disable (since we > are anyway going to discard the brokers after shutdown. > 3. The harnesses could probably start all the servers and all the clients in > parallel. > 4. We maybe able to tune down the resource usage in the server config for > test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507430#comment-15507430 ] Balint Molnar commented on KAFKA-1954: -- [~sriharsha] if you are not working on this, do you mind if I give it a try? > Speed Up The Unit Tests > --- > > Key: KAFKA-1954 > URL: https://issues.apache.org/jira/browse/KAFKA-1954 > Project: Kafka > Issue Type: Improvement >Reporter: Jay Kreps >Assignee: Sriharsha Chintalapani > Labels: newbie++ > Attachments: KAFKA-1954.patch > > > The server unit tests are pretty slow. They take about 8m40s on my machine. > Combined with slow scala compile time this is kind of painful. > Almost all of this time comes from the integration tests which start one or > more brokers and then shut them down. > Our finding has been that these integration tests are actually quite useful > so we probably can't just get rid of them. > Here are some times: > Zk startup: 100ms > Kafka server startup: 600ms > Kafka server shutdown: 500ms > > So you can see that an integration test suite with 10 tests that starts and > stops a 3 node cluster for each test will take ~34 seconds even if the tests > themselves are instantaneous. > I think the best solution to this is to get the test harness classes in shape > and then performance tune them a bit as this would potentially speed > everything up. There are several test harness classes: > - ZooKeeperTestHarness > - KafkaServerTestHarness > - ProducerConsumerTestHarness > - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using > new clients) > Unfortunately often tests don't use the right harness, they often use a > lower-level harness than they should and manually create stuff. Usually the > cause of this is that the harness is missing some feature. > I think the right thing to do here is > 1. Get the tests converted to the best possible harness. If you are testing > producers and consumers then you should use the harness that creates all that > and shuts it down for you. > 2. Optimize the harnesses to be faster. > How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here > are a few: > 1. It's worth analyzing the logging to see what is taking up time in the > startup and shutdown. > 2. There may be things like controlled shutdown that we can disable (since we > are anyway going to discard the brokers after shutdown. > 3. The harnesses could probably start all the servers and all the clients in > parallel. > 4. We maybe able to tune down the resource usage in the server config for > test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396385#comment-14396385 ] Sriharsha Chintalapani commented on KAFKA-1954: --- Created reviewboard https://reviews.apache.org/r/32866/diff/ against branch origin/trunk Speed Up The Unit Tests --- Key: KAFKA-1954 URL: https://issues.apache.org/jira/browse/KAFKA-1954 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-1954.patch The server unit tests are pretty slow. They take about 8m40s on my machine. Combined with slow scala compile time this is kind of painful. Almost all of this time comes from the integration tests which start one or more brokers and then shut them down. Our finding has been that these integration tests are actually quite useful so we probably can't just get rid of them. Here are some times: Zk startup: 100ms Kafka server startup: 600ms Kafka server shutdown: 500ms So you can see that an integration test suite with 10 tests that starts and stops a 3 node cluster for each test will take ~34 seconds even if the tests themselves are instantaneous. I think the best solution to this is to get the test harness classes in shape and then performance tune them a bit as this would potentially speed everything up. There are several test harness classes: - ZooKeeperTestHarness - KafkaServerTestHarness - ProducerConsumerTestHarness - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using new clients) Unfortunately often tests don't use the right harness, they often use a lower-level harness than they should and manually create stuff. Usually the cause of this is that the harness is missing some feature. I think the right thing to do here is 1. Get the tests converted to the best possible harness. If you are testing producers and consumers then you should use the harness that creates all that and shuts it down for you. 2. Optimize the harnesses to be faster. How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here are a few: 1. It's worth analyzing the logging to see what is taking up time in the startup and shutdown. 2. There may be things like controlled shutdown that we can disable (since we are anyway going to discard the brokers after shutdown. 3. The harnesses could probably start all the servers and all the clients in parallel. 4. We maybe able to tune down the resource usage in the server config for test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14407411#comment-14407411 ] Sriharsha Chintalapani commented on KAFKA-1954: --- [~jkreps] Added parallel start for KafkaServerTestHarness and made almost all test cases to use this instead of starting the servers inside the tests. The above patch reduced the tests runtime from 8 min 40secs to 6 mins 9 secs on my machine. Speed Up The Unit Tests --- Key: KAFKA-1954 URL: https://issues.apache.org/jira/browse/KAFKA-1954 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-1954.patch The server unit tests are pretty slow. They take about 8m40s on my machine. Combined with slow scala compile time this is kind of painful. Almost all of this time comes from the integration tests which start one or more brokers and then shut them down. Our finding has been that these integration tests are actually quite useful so we probably can't just get rid of them. Here are some times: Zk startup: 100ms Kafka server startup: 600ms Kafka server shutdown: 500ms So you can see that an integration test suite with 10 tests that starts and stops a 3 node cluster for each test will take ~34 seconds even if the tests themselves are instantaneous. I think the best solution to this is to get the test harness classes in shape and then performance tune them a bit as this would potentially speed everything up. There are several test harness classes: - ZooKeeperTestHarness - KafkaServerTestHarness - ProducerConsumerTestHarness - IntegrationTestHarness (similar to ProducerConsumerTestHarness but using new clients) Unfortunately often tests don't use the right harness, they often use a lower-level harness than they should and manually create stuff. Usually the cause of this is that the harness is missing some feature. I think the right thing to do here is 1. Get the tests converted to the best possible harness. If you are testing producers and consumers then you should use the harness that creates all that and shuts it down for you. 2. Optimize the harnesses to be faster. How can we optimize the harnesses? I'm not sure, I would solicit ideas. Here are a few: 1. It's worth analyzing the logging to see what is taking up time in the startup and shutdown. 2. There may be things like controlled shutdown that we can disable (since we are anyway going to discard the brokers after shutdown. 3. The harnesses could probably start all the servers and all the clients in parallel. 4. We maybe able to tune down the resource usage in the server config for test cases a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1954) Speed Up The Unit Tests
[ https://issues.apache.org/jira/browse/KAFKA-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321668#comment-14321668 ] Jay Kreps commented on KAFKA-1954: -- Some data: Here are the slow tests: ||Class||Tests||Duration||| |kafka.integration.UncleanLeaderElectionTest|5|52.848| |kafka.api.ConsumerTest|9|49.334| |kafka.api.test.ProducerFailureHandlingTest|11|49.213|| |kafka.admin.DeleteTopicTest|9|48.783| |kafka.integration.RollingBounceTest|1|48.382| |kafka.consumer.ZookeeperConsumerConnectorTest|6|29.052| |kafka.admin.AddPartitionsTest|5|26.375| |kafka.admin.AdminTest|12|24.464| |kafka.admin.DeleteConsumerGroupTest|7|22.477| |kafka.server.LogRecoveryTest|4|21.186| |kafka.server.AdvertiseBrokerTest|1|16.004| |kafka.producer.ProducerTest|5|15.201| |kafka.integration.PrimitiveApiTest|8|12.037| |kafka.api.test.ProducerSendTest|5|10.104| |kafka.integration.AutoOffsetResetTest|4|9.035| |kafka.producer.SyncProducerTest|8|8.344| |kafka.server.ServerGenerateBrokerIdTest|4|7.54| |kafka.server.OffsetCommitTest|3|7.422| |kafka.producer.AsyncProducerTest|13|6.781| |unit.kafka.consumer.PartitionAssignorTest|2|6.429| |kafka.server.LeaderElectionTest|2|5.677| |kafka.server.LogOffsetTest|5|5.013| |kafka.integration.TopicMetadataTest|4|4.956| |kafka.server.ServerShutdownTest|4|4.885| |kafka.api.test.ProducerCompressionTest|4|4.77| |kafka.consumer.MetricsTest|1|3.072| |kafka.consumer.ConsumerIteratorTest|2|2.49| |kafka.server.ReplicaFetchTest|1|2.467| |kafka.javaapi.consumer.ZookeeperConsumerConnectorTest|1|2.066| |kafka.server.DynamicConfigChangeTest|2|1.892| |kafka.log4j.KafkaLog4jAppenderTest|2|1.881| |kafka.log.LogManagerTest|10|1.865| |kafka.integration.FetcherTest|1|1.235| |kafka.server.ReplicaManagerTest|3|1.229| Here is the server startup and shutdown logging to get a sense of timings: {noformat} [2015-02-14 11:07:58,350] INFO Verifying properties (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property broker.id is overridden to 0 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property log.cleaner.enable is overridden to false (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property log.dirs is overridden to /tmp/kafka-logs (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property log.retention.check.interval.ms is overridden to 30 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property log.retention.hours is overridden to 168 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property log.segment.bytes is overridden to 1073741824 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,392] INFO Property num.io.threads is overridden to 8 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property num.network.threads is overridden to 3 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property num.partitions is overridden to 1 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property num.recovery.threads.per.data.dir is overridden to 1 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property port is overridden to 9092 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property socket.receive.buffer.bytes is overridden to 102400 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property socket.request.max.bytes is overridden to 104857600 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,393] INFO Property socket.send.buffer.bytes is overridden to 102400 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,394] INFO Property zookeeper.connect is overridden to localhost:2181 (kafka.utils.VerifiableProperties) [2015-02-14 11:07:58,394] INFO Property zookeeper.connection.timeout.ms is overridden to 6000 (kafka.utils.VerifiableProperties) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/jay/work/kafka/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/jay/work/kafka/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-02-14 11:07:58,438] INFO starting (kafka.server.KafkaServer) [2015-02-14 11:07:58,441] INFO Connecting to zookeeper on localhost:2181 (kafka.server.KafkaServer) [2015-02-14 11:07:58,453] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2015-02-14 11:07:58,462] INFO Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT (org.apache.zookeeper.ZooKeeper) [2015-02-14 11:07:58,462] INFO Client environment:host.name=10.0.0.248 (org.apache.zookeeper.ZooKeeper) [2015-02-14 11:07:58,462] INFO Client