[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584827#comment-13584827 ] Jun Rao commented on KAFKA-736: --- Thanks for patch v4. Looks good. A few minor comments. Once they are addressed, the patch can be checked in. 40. SocketServerTest.testPipelinedRequestOrdering(): id, send, id2, send2 are not referenced 41. SyncProducerTest.testProducerCanTimeout(): Should we remove the println? 42. PrimitiveApiTest: testPipelinedProduceRequests seems to fail. I suspect that it's timing related. Also, produceList is not referenced. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p1, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-draft-producer-latency-20threads-acks1.out, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch, kafka-736-v3-producer-latency-20threads-acks1.out, kafka-736-v4.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584903#comment-13584903 ] Sriram Subramanian commented on KAFKA-736: -- +1 Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p1, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-draft-producer-latency-20threads-acks1.out, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch, kafka-736-v3-producer-latency-20threads-acks1.out, kafka-736-v4.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577800#comment-13577800 ] Jun Rao commented on KAFKA-736: --- Because of the problem identified in KAFKA-706, only the v3 patch will work. The v3 patch needs to be rebased though. Also, not sure why it includes changes related to Hadoop bridge. Is this patch mixed with another one? Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-draft-producer-latency-20threads-acks1.out, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch, kafka-736-v3-producer-latency-20threads-acks1.out Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571485#comment-13571485 ] Neha Narkhede commented on KAFKA-736: - Producer performance does not measure latency directly, but we expose a jmx bean that measures latency and request rate. I ran a 20 thread producer performance test with acks = 1, sent 500 messages and following are the latency numbers - avg and max. Full distribution of latency over the period of the test is attached here. v3 avg = 0.8452, max = 593.7651 draft avg = 0.8059, avg = avg = 537.3038 Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-draft-producer-latency-20threads-acks1.out, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch, kafka-736-v3-producer-latency-20threads-acks1.out Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569828#comment-13569828 ] Jun Rao commented on KAFKA-736: --- Thanks for the results. The case with 20 producer threads is pretty interesting. How many partitions are there in the topic? If there is only 1 partition, all I/O threads will need to synchronize on the same log during append. So, this will serialize all I/O threads and therefore the reading of the next produce request. If this is the case, I'd suggest that we try with more partitions in the topic, with sth like 20 network threads. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569840#comment-13569840 ] Neha Narkhede commented on KAFKA-736: - In all the tests above, the number of partitions is 1. I think to compare the 2 patches, we don't need to worry about changing the # of partitions simply because there is no change in the Log layer in both patches. But, just for curiosity, I can give that a try. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch, kafka-736-v3.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569670#comment-13569670 ] Neha Narkhede commented on KAFKA-736: - Benchmarked the draft and v2 patches for producer throughput , here are the results - Message size is 1K in all the tests batch size 1, producer threads 1 kafka-736-v2 - 13 MB/s kafka-736-draft - 30 MB/s batch size 100, producer threads 1 kafka-736-v2 - 48.4 MB/s kafka-736-draft - 61.5 MB/s batch size 100, producer threads 20 kafka-736-v2 - 11.6 MB/s kafka-736-draft - 81.6 MB/s I looked into the cause of this performance degradation on the v2 patch. What's happening is setting the selection key's interest bits to READ in processNewResponses is not reflected in the following select() operation for all BUT the first network thread (id 0). I tried the producer performance test with varying # of producer threads and network threads on the server and I consistently see this result. Due to this, all the producer connections handled by network threads with ids 1 see very low throughput since the next request is not read until 300 ms after the previous request is finished processing. I also confirmed that the producer had sent lot of data on those low throughput connections, just the server was reading it 300 ms later. I read up a little bit about concurrency and selection keys, found this - Generally, SelectionKey objects are thread-safe, but it's important to know that operations that modify the interest set are synchronized by Selector objects. This could cause calls to the interestOps( ) method to block for an indeterminate amount of time. The specific locking policy used by a selector, such as whether the locks are held throughout the selection process, is implementation-dependent. Overall, seems like Java NIO doesn't behave the way we want to wrt to having the updated interest bits take effect in the next select operation. This makes the v2 approach even trickier to reason about. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: check-message-ordering.py, kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569224#comment-13569224 ] Jay Kreps commented on KAFKA-736: - Reviewing the draft patch in case we end up going that route. It might be better to do the assignment to the queue inside request channel rather than in the socket server read method. key.hashCode % numQueues should be a fine way to go. I think the selector keys have no hashCode implementation so this effectively uses the object memory address which should be plenty fast. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569294#comment-13569294 ] ben fleis commented on KAFKA-736: - Went back and tested the 'draft' patch, and this failed in the same way that 706 was previously failing. I have not looked at the patches at all, merely blindly applied. Perhaps I made an error when applying/testing the v2 patch, although I believe I ran the same steps... YMMV. In any case draft patch has reliably failed 3x in a row, and a la 706, tcpdump confirms the lost messages. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568192#comment-13568192 ] ben fleis commented on KAFKA-736: - I tried the v2 patch linked above this morning against v0.8 HEAD. It appears (after full throttle testing for over an hour) to have eliminated the problem originally reported in KAFKA-706. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566883#comment-13566883 ] Jay Kreps commented on KAFKA-736: - 1. It would be good to remove getShutdownReceive() too. I think that method is totally unnecessary. 2. It would be good to remove getEmptyResponse and just have KafkaApis enqueue the request object with the send null. The scala style isn't to start with getXXX and in any case I don't think this is an operation that our request channel would do. 3. Yes, just changing the parameter name for doSend was all I had in mind. 4. I'm not sure I understand the perf numbers will swing by. Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: kafka-736-draft.patch, kafka-736-v1.patch, kafka-736-v2.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-736) Add an option to the 0.8 producer to mimic 0.7 producer behavior
[ https://issues.apache.org/jira/browse/KAFKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566215#comment-13566215 ] Jay Kreps commented on KAFKA-736: - This is basically good, but the comments and naming are all just carrying through specifics of producer behavior into the network layer which is a no-no. Instead, think of this as a general feature you are implementing: 1. server enqueues one request at a time 2. null as the response send indicates no response This is a nice feature for our socket server to have. Specifically: 1. RequestChannel.getFakeProducerResponse? Let's not add this as a public method on the thing that handles request queueing to our network server. That doesn't seem like part of the contract of a request queue. Can you remove that and the getShutdownReceive producer request that victor seems to have hacked in. The socket server's idea of a request is just a byte buffer. There shouldn't be any notion of producers or anything like that. 2. SocketServer.processNewResponses() the actual logic here is good, using a null send is a very logical way to say no response. Please remove the comment about producers and num.acks and just describe the feature you have implemented: null responseSend means no response to send. This is just part of the contract of socket server. 3. SocketServer.processNewResponses() fix misformatted else statement. 4. SyncProducer.send--this is an api change is that going to break anything? Can we just have it return null? 5. SyncProducer.doSend--this sends a generic request, you can't add numacks since acks are specific to ProducerRequest. 6. Not sure if I get why we need to override required.request.acks. If we want to override is the a generic place to do that instead of each test? 7. It would be good to add a unit test for one-way requests in the socket server. 8. It would be good to add a unit test for the producer num.acks=0 feature. We also should do a quick perf test on your machine to assess the impact of only reading one request at a time (if any). Add an option to the 0.8 producer to mimic 0.7 producer behavior Key: KAFKA-736 URL: https://issues.apache.org/jira/browse/KAFKA-736 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: p2, replication-performance Attachments: kafka-736-draft.patch, kafka-736-v1.patch Original Estimate: 24h Remaining Estimate: 24h I profiled a producer throughput benchmark between a producer and a remote broker. It turns out that the background send threads spends ~97% of its time waiting to read the acknowledgement from the broker. I propose we change the current behavior of request.required.acks=0 to mean no acknowledgement from the broker. This will mimic the 0.7 producer behavior and will enable tuning the producer for very high throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira