[jira] [Created] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set
Dana Powers created KAFKA-5465: -- Summary: FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set Key: KAFKA-5465 URL: https://issues.apache.org/jira/browse/KAFKA-5465 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Reporter: Dana Powers Priority: Minor In prior releases, when consuming uncompressed messages FetchResponse v0 will returns a message if it is smaller than the max_bytes sent in the FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the response will be empty unless the full message set is smaller than max_bytes. In some configurations, this may cause some old consumers to get stuck on large messages where previously they were able to make progress one message at a time. For example, when I produce 10 5KB messages using ProduceRequest v0 and then attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single message but smaller than all 10 messages together), I get an empty message set from 0.11.0.0. Previous brokers would have returned a single message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3878) Support exponential backoff for broker reconnect attempts
[ https://issues.apache.org/jira/browse/KAFKA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339941#comment-15339941 ] Dana Powers commented on KAFKA-3878: Proposed implementation at https://github.com/apache/kafka/pull/1523 > Support exponential backoff for broker reconnect attempts > - > > Key: KAFKA-3878 > URL: https://issues.apache.org/jira/browse/KAFKA-3878 > Project: Kafka > Issue Type: Improvement > Components: clients, network >Reporter: Dana Powers > > The client currently uses a constant backoff policy, configured via > 'reconnect.backoff.ms' . To reduce network load during longer broker outages, > it would be useful to support an optional exponentially increasing backoff > policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3878) Support exponential backoff for broker reconnect attempts
Dana Powers created KAFKA-3878: -- Summary: Support exponential backoff for broker reconnect attempts Key: KAFKA-3878 URL: https://issues.apache.org/jira/browse/KAFKA-3878 Project: Kafka Issue Type: Improvement Components: clients, network Reporter: Dana Powers The client currently uses a constant backoff policy, configured via 'reconnect.backoff.ms' . To reduce network load during longer broker outages, it would be useful to support an optional exponentially increasing backoff policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3615) Exclude test jars in CLASSPATH of kafka-run-class.sh
[ https://issues.apache.org/jira/browse/KAFKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265561#comment-15265561 ] Dana Powers commented on KAFKA-3615: This PR has a bug that breaks the classpath setup for bin scripts in the rc2 release. Should we reopen this and follow up, or open a new issue? > Exclude test jars in CLASSPATH of kafka-run-class.sh > > > Key: KAFKA-3615 > URL: https://issues.apache.org/jira/browse/KAFKA-3615 > Project: Kafka > Issue Type: Improvement > Components: admin, build >Affects Versions: 0.10.0.0 >Reporter: Liquan Pei >Assignee: Liquan Pei > Labels: newbie > Fix For: 0.10.1.0, 0.10.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse
Dana Powers created KAFKA-3550: -- Summary: Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse Key: KAFKA-3550 URL: https://issues.apache.org/jira/browse/KAFKA-3550 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.9.0.1, 0.8.2.2, 0.8.1.1, 0.8.0 Reporter: Dana Powers To reproduce: Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234). The expected behavior is for the broker to reject the request as unrecognized. Broker (incorrectly) responds with MetadataResponse v0. The problem here is that any request for a "new" MetadataRequest (i.e., KIP-4) sent to an old broker will generate an incorrect response. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum
[ https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234221#comment-15234221 ] Dana Powers commented on KAFKA-3160: Magnus: have you made any progress on this? The more I think about it, the more I think this needs to get included w/ KIP-31. If the goal of KIP-31 is to avoid recompression, and the goal of this JIRA is to fix the compression format, and in all cases we need to maintain compatibility with old clients, then I think the only way to solve all conditions is to make the pre-KIP-31 FetchRequest / ProduceRequest versions use the broken LZ4 format, and require the fixed format in the new FetchRequest / ProduceRequest version: Old 0.8/0.9 clients (current behavior): produce messages w/ broken checksum; consume messages w/ incorrect checksum only New 0.10 clients (proposed behavior): produce messages in "new KIP-31 format" w/ correct checksum; consume messages in "new KIP-31 format" w/ correct checksum only Proposed behavior for 0.10 broker: - convert all "old format" messages to "new KIP-31 format" + fix checksum to correct value - require incoming "new KIP-31 format" messages to have correct checksum, otherwise throw error - when serving requests for "old format", fixup checksum to be incorrect when converting "new KIP-31 format" messages to old format Thoughts? > Kafka LZ4 framing code miscalculates header checksum > > > Key: KAFKA-3160 > URL: https://issues.apache.org/jira/browse/KAFKA-3160 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1 >Reporter: Dana Powers >Assignee: Magnus Edenhill > Labels: compatibility, compression, lz4 > > KAFKA-1493 partially implements the LZ4 framing specification, but it > incorrectly calculates the header checksum. This causes > KafkaLZ4BlockInputStream to raise an error > [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed > LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly > framed LZ4 data, which means clients decoding LZ4 messages from kafka will > always receive incorrectly framed data. > Specifically, the current implementation includes the 4-byte MagicNumber in > the checksum, which is incorrect. > http://cyan4973.github.io/lz4/lz4_Frame_format.html > Third-party clients that attempt to use off-the-shelf lz4 framing find that > brokers reject messages as having a corrupt checksum. So currently non-java > clients must 'fixup' lz4 packets to deal with the broken checksum. > Magnus first identified this issue in librdkafka; kafka-python has the same > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum
[ https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dana Powers updated KAFKA-3160: --- Description: KAFKA-1493 partially implements the LZ4 framing specification, but it incorrectly calculates the header checksum. This causes KafkaLZ4BlockInputStream to raise an error [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly framed LZ4 data, which means clients decoding LZ4 messages from kafka will always receive incorrectly framed data. Specifically, the current implementation includes the 4-byte MagicNumber in the checksum, which is incorrect. http://cyan4973.github.io/lz4/lz4_Frame_format.html Third-party clients that attempt to use off-the-shelf lz4 framing find that brokers reject messages as having a corrupt checksum. So currently non-java clients must 'fixup' lz4 packets to deal with the broken checksum. Magnus first identified this issue in librdkafka; kafka-python has the same problem. was: KAFKA-1493 partially implements the LZ4 framing specification, but it incorrectly calculates the header checksum. This causes KafkaLZ4BlockInputStream to raise an error [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed LZ4 data. It also causes the kafka broker to always return incorrectly framed LZ4 data to clients. Specifically, the current implementation includes the 4-byte MagicNumber in the checksum, which is incorrect. http://cyan4973.github.io/lz4/lz4_Frame_format.html Third-party clients that attempt to use off-the-shelf lz4 framing find that brokers reject messages as having a corrupt checksum. So currently non-java clients must 'fixup' lz4 packets to deal with the broken checksum. Magnus first identified this issue in librdkafka; kafka-python has the same problem. > Kafka LZ4 framing code miscalculates header checksum > > > Key: KAFKA-3160 > URL: https://issues.apache.org/jira/browse/KAFKA-3160 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1 >Reporter: Dana Powers >Assignee: Magnus Edenhill > Labels: compatibility, compression, lz4 > > KAFKA-1493 partially implements the LZ4 framing specification, but it > incorrectly calculates the header checksum. This causes > KafkaLZ4BlockInputStream to raise an error > [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed > LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly > framed LZ4 data, which means clients decoding LZ4 messages from kafka will > always receive incorrectly framed data. > Specifically, the current implementation includes the 4-byte MagicNumber in > the checksum, which is incorrect. > http://cyan4973.github.io/lz4/lz4_Frame_format.html > Third-party clients that attempt to use off-the-shelf lz4 framing find that > brokers reject messages as having a corrupt checksum. So currently non-java > clients must 'fixup' lz4 packets to deal with the broken checksum. > Magnus first identified this issue in librdkafka; kafka-python has the same > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208721#comment-15208721 ] Dana Powers commented on KAFKA-3442: +1. verified test passes on trunk commit `7af67ce` > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Assignee: Jiangjie Qin >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207142#comment-15207142 ] Dana Powers commented on KAFKA-3442: sounds good to me! > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Assignee: Jiangjie Qin >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206876#comment-15206876 ] Dana Powers commented on KAFKA-3442: [~junrao] Actually I meant adding the error code to v0 and v1 responses. I think that would be helpful if the response behavior changes to no longer include partial messages. But if behavior doesn't change for v0/v1 then agree it is probably better to defer to future release. > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Assignee: Jiangjie Qin >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206790#comment-15206790 ] Dana Powers commented on KAFKA-3442: yes, kafka-python uses the same check for partial messages as the java client and will raise a RecordTooLargeException to user if there is only a partial message. I think it would be best if the 0.10 broker continued to return partial messages, but since the protocol spec refers to this behavior as an optimization then I think it would be acceptable to change the behavior and return empty payload. Would it be possible to include an error_code with the empty payload? > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Assignee: Jiangjie Qin >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205655#comment-15205655 ] Dana Powers commented on KAFKA-3442: In all prior broker releases clients check the response for a "partial" message -- i.e. the MessageSetSize is less than the MessageSize. This follows this statement in the protocol wiki: "As an optimization the server is allowed to return a partial message at the end of the message set. Clients should handle this case." So in this case the client is checking for a partial message, not an error code. > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Assignee: Jiangjie Qin >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205411#comment-15205411 ] Dana Powers edited comment on KAFKA-3442 at 3/22/16 12:13 AM: -- {code} # clone kafka-python repo git clone https://github.com/dpkp/kafka-python.git # install kafka fixture binaries tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/ mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin # you can install other versions of kafka to compare test results KAFKA_VERSION=0.9.0.1 ./build_integration.sh # install python test harness pip install tox # run just the failing test [replace py27 w/ py## as needed: py26,py27,py33,py34,py35] KAFKA_VERSION=0.10.0.0 tox -e py27 test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages {code} was (Author: dana.powers): # clone kafka-python repo git clone https://github.com/dpkp/kafka-python.git # install kafka fixture binaries tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/ mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin # you can install other versions of kafka to compare test results KAFKA_VERSION=0.9.0.1 ./build_integration.sh # install python test harness pip install tox # run just the failing test [replace py27 w/ py## as needed: py26,py27,py33,py34,py35] KAFKA_VERSION=0.10.0.0 tox -e py27 test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205411#comment-15205411 ] Dana Powers commented on KAFKA-3442: # clone kafka-python repo git clone https://github.com/dpkp/kafka-python.git # install kafka fixture binaries tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/ mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin # you can install other versions of kafka to compare test results KAFKA_VERSION=0.9.0.1 ./build_integration.sh # install python test harness pip install tox # run just the failing test [replace py27 w/ py## as needed: py26,py27,py33,py34,py35] KAFKA_VERSION=0.10.0.0 tox -e py27 test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
[ https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dana Powers updated KAFKA-3442: --- Priority: Blocker (was: Major) > FetchResponse size exceeds max.partition.fetch.bytes > > > Key: KAFKA-3442 > URL: https://issues.apache.org/jira/browse/KAFKA-3442 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Dana Powers >Priority: Blocker > Fix For: 0.10.0.0 > > > Produce 1 byte message to topic foobar > Fetch foobar w/ max.partition.fetch.bytes=1024 > Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass > this test, but 0.10 FetchResponse has full message, exceeding the max > specified in the FetchRequest. > I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes
Dana Powers created KAFKA-3442: -- Summary: FetchResponse size exceeds max.partition.fetch.bytes Key: KAFKA-3442 URL: https://issues.apache.org/jira/browse/KAFKA-3442 Project: Kafka Issue Type: Bug Affects Versions: 0.10.0.0 Reporter: Dana Powers Produce 1 byte message to topic foobar Fetch foobar w/ max.partition.fetch.bytes=1024 Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass this test, but 0.10 FetchResponse has full message, exceeding the max specified in the FetchRequest. I tested with v0 and v1 apis, both fail. Have not tested w/ v2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3200) MessageSet from broker seems invalid
[ https://issues.apache.org/jira/browse/KAFKA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131142#comment-15131142 ] Dana Powers commented on KAFKA-3200: the minimum Message payload size is 26 bytes (8 offset, 4 size, 14 for an 'empty' message), so generally I would break if there are less than 26 bytes left and then also break if the decoded size is larger than the remaining buffer. for reference, the code I wrote to handle message set decoding in kafka-python is here: https://github.com/dpkp/kafka-python/blob/master/kafka/protocol/message.py#L123-L158 > MessageSet from broker seems invalid > > > Key: KAFKA-3200 > URL: https://issues.apache.org/jira/browse/KAFKA-3200 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 > Environment: Linux, running Oracle JVM 1.8 >Reporter: Rajiv Kurian > > I am writing a java consumer client for Kafka and using the protocol guide at > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol > to parse buffers. I am currently running into a problem parsing certain > fetch responses. Many times it works fine but some other times it does not. > It might just be a bug with my implementation in which case I apologize. > My messages are uncompressed and exactly 23 bytes in length and has null > keys. So each Message in my MessageSet is exactly size 4 (crc) + > 1(magic_bytes) + 1 (attributes) + 4(key_num_bytes) + 0 (key is null) + > 4(num_value_bytes) + 23(value_bytes) = 37 bytes. > So each element of the MessageSet itself is exactly 37 (size of message) + 8 > (offset) + 4 (message_size) = 49 bytes. > In comparison an empty message set element should be of size 8 (offset) + 4 > (message_size) + 4 (crc) + 1(magic_bytes) + 1 (attributes) + 4(key_num_bytes) > + 0 (key is null) + 4(num_value_bytes) + 0(value is null) = 26 bytes > I occasionally receive a MessageSet which says size is 1000. A size of 1000 > is not divisible by my MessageSet element size which is 49 bytes. When I > parse such a message set I can actually read 20 of message set elements(49 > bytes) which is 980 bytes. I have 20 extra bytes to parse now which is > actually less than even an empty message (26 bytes). At this moment I don't > know how to parse the messages any more. > I will attach a file for a response that can actually cause me to run into > this problem as well as the sample ccde. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3177) Kafka consumer can hang when position() is called on a non-existing partition.
[ https://issues.apache.org/jira/browse/KAFKA-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127701#comment-15127701 ] Dana Powers commented on KAFKA-3177: A similar infinite loop happens when the partition exists but has no leader b/c it is under-replicated. In that case, Fetcher.listOffset infinitely retries on the leaderNotAvailableError returned by sendListOffsetRequest. > Kafka consumer can hang when position() is called on a non-existing partition. > -- > > Key: KAFKA-3177 > URL: https://issues.apache.org/jira/browse/KAFKA-3177 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.0 >Reporter: Jiangjie Qin >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > This can be easily reproduced as following: > {code} > { > ... > consumer.assign(SomeNonExsitingTopicParition); > consumer.position(); > ... > } > {code} > It seems when position is called we will try to do the following: > 1. Fetch committed offsets. > 2. If there is no committed offsets, try to reset offset using reset > strategy. in sendListOffsetRequest(), if the consumer does not know the > TopicPartition, it will refresh its metadata and retry. In this case, because > the partition does not exist, we fall in to the infinite loop of refreshing > topic metadata. > Another orthogonal issue is that if the topic in the above code piece does > not exist, position() call will actually create the topic due to the fact > that currently topic metadata request could automatically create the topic. > This is a known separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119703#comment-15119703 ] Dana Powers commented on KAFKA-1493: Hi all - it appears that the header checksum (HC) byte is incorrect. The Kafka implementation hashes the magic bytes + header, but the spec is to only hash header (don't include magic). We are having some trouble encoding/decoding from non-java clients because the framing must be munged before reading / writing to kafka. Is this known? I don't see another JIRA for it. Should I file separately or should this be reopened? > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > -- > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: James Oliver >Assignee: James Oliver >Priority: Blocker > Fix For: 0.8.2.0 > > Attachments: KAFKA-1493.patch, KAFKA-1493.patch, > KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120072#comment-15120072 ] Dana Powers commented on KAFKA-1493: filed KAFKA-3160 > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > -- > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: James Oliver >Assignee: James Oliver >Priority: Blocker > Fix For: 0.8.2.0 > > Attachments: KAFKA-1493.patch, KAFKA-1493.patch, > KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum
Dana Powers created KAFKA-3160: -- Summary: Kafka LZ4 framing code miscalculates header checksum Key: KAFKA-3160 URL: https://issues.apache.org/jira/browse/KAFKA-3160 Project: Kafka Issue Type: Bug Components: compression Affects Versions: 0.9.0.0, 0.8.2.1, 0.8.2.0, 0.8.2.2 Reporter: Dana Powers KAFKA-1493 implements the LZ4 framing specification, but it incorrectly calculates the header checksum. Specifically, the current implementation includes the 4-byte MagicNumber in the checksum, which is incorrect. http://cyan4973.github.io/lz4/lz4_Frame_format.html Third-party clients that attempt to use off-the-shelf lz4 framing find that brokers reject messages as having a corrupt checksum. So currently non-java clients must 'fixup' lz4 packets to deal with the broken checksum. Magnus first identified this issue in librdkafka; kafka-python has the same problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3088) 0.9.0.0 broker crash on receipt of produce request with empty client ID
[ https://issues.apache.org/jira/browse/KAFKA-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111020#comment-15111020 ] Dana Powers commented on KAFKA-3088: But client ids are not globally unique, even in the java implementation, right? Start 2 consumers in different processes / different servers, and you'll get two identical client-ids (consumer-1) as I understand that code. Also note that a user-supplied client-id does not get any incremented value added, so all metrics get blended in that case. And most third-party clients that set a client-id dont attempt to add a unique incremented number at all. So I don't think option-2 adds much value. Another alternative to consider is skipping client metrics if there is no client-id. > 0.9.0.0 broker crash on receipt of produce request with empty client ID > --- > > Key: KAFKA-3088 > URL: https://issues.apache.org/jira/browse/KAFKA-3088 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.0 >Reporter: Dave Peterson >Assignee: Jun Rao > > Sending a produce request with an empty client ID to a 0.9.0.0 broker causes > the broker to crash as shown below. More details can be found in the > following email thread: > http://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3c5693ecd9.4050...@dspeterson.com%3e >[2016-01-10 23:03:44,957] ERROR [KafkaApi-3] error when handling request > Name: ProducerRequest; Version: 0; CorrelationId: 1; ClientId: null; > RequiredAcks: 1; AckTimeoutMs: 1 ms; TopicAndPartition: [topic_1,3] -> 37 > (kafka.server.KafkaApis) >java.lang.NullPointerException > at > org.apache.kafka.common.metrics.JmxReporter.getMBeanName(JmxReporter.java:127) > at > org.apache.kafka.common.metrics.JmxReporter.addAttribute(JmxReporter.java:106) > at > org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76) > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:288) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:177) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:162) > at > kafka.server.ClientQuotaManager.getOrCreateQuotaSensors(ClientQuotaManager.scala:209) > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:111) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$2(KafkaApis.scala:353) > at > kafka.server.KafkaApis$$anonfun$handleProducerRequest$1.apply(KafkaApis.scala:371) > at > kafka.server.KafkaApis$$anonfun$handleProducerRequest$1.apply(KafkaApis.scala:371) > at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:348) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:366) > at kafka.server.KafkaApis.handle(KafkaApis.scala:68) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1649) Protocol documentation does not indicate that ReplicaNotAvailable can be ignored
[ https://issues.apache.org/jira/browse/KAFKA-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277690#comment-14277690 ] Dana Powers commented on KAFKA-1649: I think this could be a problem -- it is not updated in the wiki and there appears to be a server-side change between 0.8.1.1 and 0.8.2.0 that could expose third-party clients to uncaught errors, assuming that the client authors/maintainers were not aware that the ReplicaNotAvailable Error should be ignored. Protocol documentation does not indicate that ReplicaNotAvailable can be ignored Key: KAFKA-1649 URL: https://issues.apache.org/jira/browse/KAFKA-1649 Project: Kafka Issue Type: Improvement Components: website Affects Versions: 0.8.1.1 Reporter: Hernan Rivas Inaka Priority: Minor Labels: protocol-documentation Original Estimate: 10m Remaining Estimate: 10m The protocol documentation here https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes should indicate that error 9 (ReplicaNotAvailable) can be safely ignored on producers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1649) Protocol documentation does not indicate that ReplicaNotAvailable can be ignored
[ https://issues.apache.org/jira/browse/KAFKA-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277807#comment-14277807 ] Dana Powers commented on KAFKA-1649: I am only testing from the wire-protocol level. Running a broker failure test with 2 brokers, 1 topic w/ num.partitions=2 and default.replication.factor=2 . Send 100 random messages directly to partition 0, kill the leader for partition 0, attempt to write messages to partition 0, with retries and metadata reloads. Running the test against 0.8.2.0 returns ReplicaNotAvailable error code in the PartitionMetadata, whereas 0.8.1.1 does not. This is the metadata w/ both brokers up: Topic metadata: [TopicMetadata(topic='test_switch_leader-qkUBJTZGLA', error=0, partitions=[PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=1, leader=1, replicas=(1, 0), isr=(1, 0), error=0), PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=0, leader=0, replicas=(0, 1), isr=(0, 1), error=0)])] And this is the metadata after killing one broker (0.8.2.0): Topic metadata: [TopicMetadata(topic='test_switch_leader-qkUBJTZGLA', error=0, partitions=[PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=1, leader=1, replicas=(1,), isr=(1,), error=9), PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=0, leader=1, replicas=(1,), isr=(1,), error=9)])] The 0.8.1.1 output is slightly different -- and significantly no error in PartitionMetadata Before killing partition 0 leader (broker 1): Topic metadata: [TopicMetadata(topic='test_switch_leader-eMbCMlVrOC', error=0, partitions=[PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=0, leader=1, replicas=(1, 0), isr=(1, 0), error=0), PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=1, leader=0, replicas=(0, 1), isr=(0, 1), error=0)])] After killing partition 0 leader: Topic metadata: [TopicMetadata(topic='test_switch_leader-eMbCMlVrOC', error=0, partitions=[PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=0, leader=0, replicas=(1, 0), isr=(0,), error=0), PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=1, leader=0, replicas=(0, 1), isr=(0, 1), error=0)])] Protocol documentation does not indicate that ReplicaNotAvailable can be ignored Key: KAFKA-1649 URL: https://issues.apache.org/jira/browse/KAFKA-1649 Project: Kafka Issue Type: Improvement Components: website Affects Versions: 0.8.1.1 Reporter: Hernan Rivas Inaka Priority: Minor Labels: protocol-documentation Original Estimate: 10m Remaining Estimate: 10m The protocol documentation here https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes should indicate that error 9 (ReplicaNotAvailable) can be safely ignored on producers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1841) OffsetCommitRequest API - timestamp field is not versioned
[ https://issues.apache.org/jira/browse/KAFKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dana Powers updated KAFKA-1841: --- Description: Timestamp field was added to the OffsetCommitRequest wire protocol api for 0.8.2 by KAFKA-1012 . The 0.8.1.1 server does not support the timestamp field, so I think the api version of OffsetCommitRequest should be incremented and checked by the 0.8.2 kafka server before attempting to read a timestamp from the network buffer in OffsetCommitRequest.readFrom (core/src/main/scala/kafka/api/OffsetCommitRequest.scala) It looks like a subsequent patch (KAFKA-1462) added another api change to support a new constructor w/ params generationId and consumerId, calling that version 1, and a pending patch (KAFKA-1634) adds retentionMs as another field, while possibly removing timestamp altogether, calling this version 2. So the fix here is not straightforward enough for me to submit a patch. This could possibly be merged into KAFKA-1634, but opening as a separate Issue because I believe the lack of versioning in the current trunk should block 0.8.2 release. was: Timestamp field was added to the OffsetCommitRequest wire protocol api for 0.8.2 by KAFKA-1012 . The 0.8.1.1 server does not support the timestamp field, so I think the api version of OffsetCommitRequest should be incremented and checked by the 0.8.2 kafka server before attempting to read a timestamp from the network buffer in OffsetCommitRequest.readFrom (core/src/main/scala/kafka/api/OffsetCommitRequest.scala) It looks like a subsequent patch (kafka-1462) added another api change to support a new constructor w/ params generationId and consumerId, calling that version 1, and a pending patch (kafka-1634) adds retentionMs as another field, while possibly removing timestamp altogether, calling this version 2. So the fix here is not straightforward enough for me to submit a patch. This could possibly be merged into KAFKA-1634, but opening as a separate Issue because I believe the lack of versioning in the current trunk should block 0.8.2 release. OffsetCommitRequest API - timestamp field is not versioned -- Key: KAFKA-1841 URL: https://issues.apache.org/jira/browse/KAFKA-1841 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Environment: wire-protocol Reporter: Dana Powers Priority: Blocker Timestamp field was added to the OffsetCommitRequest wire protocol api for 0.8.2 by KAFKA-1012 . The 0.8.1.1 server does not support the timestamp field, so I think the api version of OffsetCommitRequest should be incremented and checked by the 0.8.2 kafka server before attempting to read a timestamp from the network buffer in OffsetCommitRequest.readFrom (core/src/main/scala/kafka/api/OffsetCommitRequest.scala) It looks like a subsequent patch (KAFKA-1462) added another api change to support a new constructor w/ params generationId and consumerId, calling that version 1, and a pending patch (KAFKA-1634) adds retentionMs as another field, while possibly removing timestamp altogether, calling this version 2. So the fix here is not straightforward enough for me to submit a patch. This could possibly be merged into KAFKA-1634, but opening as a separate Issue because I believe the lack of versioning in the current trunk should block 0.8.2 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation
[ https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265042#comment-14265042 ] Dana Powers commented on KAFKA-1634: possibly related to this JIRA: KAFKA-1841 . The timestamp field itself was not in the released api version 0 and if it is to be included in 0.8.2 (this JIRA suggests it is, but to be removed in 0.8.3 ?) then I think it will need to be versioned. Improve semantics of timestamp in OffsetCommitRequests and update documentation --- Key: KAFKA-1634 URL: https://issues.apache.org/jira/browse/KAFKA-1634 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Guozhang Wang Priority: Blocker Fix For: 0.8.3 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, KAFKA-1634_2014-12-01_18:03:12.patch From the mailing list - following up on this -- I think the online API docs for OffsetCommitRequest still incorrectly refer to client-side timestamps: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest Wasn't that removed and now always handled server-side now? Would one of the devs mind updating the API spec wiki? -- This message was sent by Atlassian JIRA (v6.3.4#6332)