[jira] [Resolved] (KAFKA-13040) Increase minimum value of segment.ms and segment.bytes
[ https://issues.apache.org/jira/browse/KAFKA-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Badai Aqrandista resolved KAFKA-13040. -- Resolution: Duplicate > Increase minimum value of segment.ms and segment.bytes > -- > > Key: KAFKA-13040 > URL: https://issues.apache.org/jira/browse/KAFKA-13040 > Project: Kafka > Issue Type: Improvement >Reporter: Badai Aqrandista >Assignee: Badai Aqrandista >Priority: Minor > > Raised for KIP-760 (linked). > Many times, Kafka brokers in production crash with "Too many open files" > error or "Out of memory" errors because some Kafka topics have a lot of > segment files as a result of small {{segment.ms}} or {{segment.bytes}}. These > two configuration can be set by any user who is authorized to create topic or > modify topic configuration. > To prevent these two configuration from causing Kafka broker crash, they > should have a minimum value that is big enough. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-7760) Add broker configuration to set minimum value for segment.bytes and segment.ms
[ https://issues.apache.org/jira/browse/KAFKA-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Badai Aqrandista reopened KAFKA-7760: - Reopening issue and making this the main ticket for KIP-760 > Add broker configuration to set minimum value for segment.bytes and segment.ms > -- > > Key: KAFKA-7760 > URL: https://issues.apache.org/jira/browse/KAFKA-7760 > Project: Kafka > Issue Type: Improvement >Reporter: Badai Aqrandista >Assignee: Badai Aqrandista >Priority: Major > Labels: kip, newbie > > If someone set segment.bytes or segment.ms at topic level to a very small > value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a > very high number of segment files. This can bring down the whole broker due > to hitting the maximum open file (for log) or maximum number of mmap-ed file > (for index). > To prevent that from happening, I would like to suggest adding two new items > to the broker configuration: > * min.topic.segment.bytes, defaults to 1048576: The minimum value for > segment.bytes. When someone sets topic configuration segment.bytes to a value > lower than this, Kafka throws an error INVALID VALUE. > * min.topic.segment.ms, defaults to 360: The minimum value for > segment.ms. When someone sets topic configuration segment.ms to a value lower > than this, Kafka throws an error INVALID VALUE. > Thanks > Badai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-7760) Add broker configuration to set minimum value for segment.bytes and segment.ms
[ https://issues.apache.org/jira/browse/KAFKA-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Badai Aqrandista resolved KAFKA-7760. - Resolution: Duplicate > Add broker configuration to set minimum value for segment.bytes and segment.ms > -- > > Key: KAFKA-7760 > URL: https://issues.apache.org/jira/browse/KAFKA-7760 > Project: Kafka > Issue Type: Improvement >Reporter: Badai Aqrandista >Assignee: Dulvin Witharane >Priority: Major > Labels: kip, newbie > > If someone set segment.bytes or segment.ms at topic level to a very small > value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a > very high number of segment files. This can bring down the whole broker due > to hitting the maximum open file (for log) or maximum number of mmap-ed file > (for index). > To prevent that from happening, I would like to suggest adding two new items > to the broker configuration: > * min.topic.segment.bytes, defaults to 1048576: The minimum value for > segment.bytes. When someone sets topic configuration segment.bytes to a value > lower than this, Kafka throws an error INVALID VALUE. > * min.topic.segment.ms, defaults to 360: The minimum value for > segment.ms. When someone sets topic configuration segment.ms to a value lower > than this, Kafka throws an error INVALID VALUE. > Thanks > Badai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13040) Increase minimum value of segment.ms and segment.bytes
Badai Aqrandista created KAFKA-13040: Summary: Increase minimum value of segment.ms and segment.bytes Key: KAFKA-13040 URL: https://issues.apache.org/jira/browse/KAFKA-13040 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista Many times, Kafka brokers in production crash with "Too many open files" error or "Out of memory" errors because some Kafka topics have a lot of segment files as a result of small {{segment.ms}} or {{segment.bytes}}. These two configuration can be set by any user who is authorized to create topic or modify topic configuration. To prevent these two configuration from causing Kafka broker crash, they should have a minimum value that is big enough. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12163) Controller should ensure zkVersion is monotonically increasing when sending UpdateMetadata requests.
Badai Aqrandista created KAFKA-12163: Summary: Controller should ensure zkVersion is monotonically increasing when sending UpdateMetadata requests. Key: KAFKA-12163 URL: https://issues.apache.org/jira/browse/KAFKA-12163 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: Badai Aqrandista When sending UpdateMetadata requests, controller does not currently perform any check to ensure zkVersion is monotonically increasing. If Zookeeper gets into a bad state, this can cause Kafka cluster to get into a bad state and possible data loss as well. Controller should perform a check to protect the Kafka clusters from getting into a bad state. Following shows an example of zkVersion going backward at 2020-12-08 14:10:46,420. {noformat} [2020-11-23 00:56:20,315] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=195, leader=2152, leaderEpoch=210, isr=[2154, 2152, 1153, 1152], zkVersion=535, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-11-23 01:15:28,449] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=195, leader=2152, leaderEpoch=210, isr=[2154, 2152, 1153, 1152], zkVersion=535, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-11-24 00:15:17,042] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=196, leader=2152, leaderEpoch=211, isr=[2154, 2152, 1152], zkVersion=536, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[1153]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 21:53:14,887] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152]) to brokers Set(2153, 1152, 2154, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:11:43,739] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2152) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:11:43,815] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:12:12,602] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153, 2152], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:12:17,019] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=213, isr=[2154, 1152, 1153, 2152], zkVersion=540, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:08:46,077] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152], zkVersion=541, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2154]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:08:54,790] TRACE [Controller id=1152 epoch=197] Sending
[jira] [Created] (KAFKA-12162) Kafka broker continued to run after failing to create "/brokers/ids/X" znode.
Badai Aqrandista created KAFKA-12162: Summary: Kafka broker continued to run after failing to create "/brokers/ids/X" znode. Key: KAFKA-12162 URL: https://issues.apache.org/jira/browse/KAFKA-12162 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: Badai Aqrandista We found that Kafka broker continued to run after it failed to create "/brokers/ids/X" znode and still acted as a partition leader. Here's the log snippet. {code:java} [2020-12-08 14:10:25,040] INFO Client successfully logged in. (org.apache.zookeeper.Login) [2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient) [2020-12-08 14:10:25,040] INFO Opening socket connection to server kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,045] INFO Socket connection established, initiating session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service, session 0x5002ed5001b has expired (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service, session 0x5002ed5001b has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,055] INFO EventThread shut down for session: 0x5002ed5001b (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired. (kafka.zookeeper.ZooKeeperClient) [2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a new session to kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01. (kafka.zookeeper.ZooKeeperClient) [2020-12-08 14:10:25,059] INFO Initiating client connection, connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01 sessionTimeout=22500 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f (org.apache.zookeeper.ZooKeeper) [2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket) [2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,061] INFO Client successfully logged in. (org.apache.zookeeper.Login) [2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient) [2020-12-08 14:10:25,061] INFO Opening socket connection to server kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,065] INFO Socket connection established, initiating session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true) (kafka.zk.KafkaZkClient) [2020-12-08 14:10:25,081] INFO Session establishment complete on server kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated timeout = 22500 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at /brokers/ids/2152, node already exists and owner '288459910057232437' does not match current session '288340729659195393' (kafka.zk.KafkaZkClient$CheckedEphemeral) [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] Got error produce response with correlation id 520478 on topic-partition _confluent-metrics-9, retrying (9 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender) [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] Received invalid metadata error in produce request on partition _confluent-metrics-9 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender) [2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter] Got error produce response with correlation id 520480 on topic-partition _confluent-metrics-4, retrying (9 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10872) Log broker configuration prefixed with "listener.name.*"
Badai Aqrandista created KAFKA-10872: Summary: Log broker configuration prefixed with "listener.name.*" Key: KAFKA-10872 URL: https://issues.apache.org/jira/browse/KAFKA-10872 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista When configuring broker listeners with "listener.name.*" prefix, it is very hard to verify in the log if we are passing the correct value or if we're missing any values. Can we log these configuration at INFO level? For example: {code:java} listener.name.internal.ssl.truststore.location listener.name.internal.ssl.truststore.password listener.name.internal.ssl.keystore.location listener.name.internal.ssl.keystore.password {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10776) JMX metric RequestsPerSec requires API version to access
Badai Aqrandista created KAFKA-10776: Summary: JMX metric RequestsPerSec requires API version to access Key: KAFKA-10776 URL: https://issues.apache.org/jira/browse/KAFKA-10776 Project: Kafka Issue Type: Bug Reporter: Badai Aqrandista JMX metric for "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce" seems to require the API version by adding "version=8" at the end of JMX metric name. {noformat} badai@Badai-Aqrandista-MBP15 % bin/kafka-run-class kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi -object-name kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. No matched attributes for the queried objects ArrayBuffer(kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce). {noformat} {noformat} badai@Badai-Aqrandista-MBP15 % bin/kafka-run-class kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi -object-name kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8 Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. "time","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:Count","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:EventType","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:FifteenMinuteRate","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:FiveMinuteRate","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:MeanRate","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:OneMinuteRate","kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=8:RateUnit" 1606699872861,6,requests,0.1989162534702867,0.19690572377187665,0.15294943373697226,0.1883600179350904,SECONDS 1606699874859,6,requests,0.19781422719176484,0.19365115842372535,0.14552410136055885,0.1732995824406591,SECONDS 1606699876860,8,requests,0.19781422719176484,0.19365115842372535,0.18505354392836157,0.1732995824406591,SECONDS 1606699878863,8,requests,0.19893436710436987,0.19706180478057453,0.1768573689590503,0.19142554703039305,SECONDS ^C{noformat} While other JMX metric under RequestMetrics do not require to specify API version: {noformat} badai@Badai-Aqrandista-MBP15 confluent-5.5.1 % bin/kafka-run-class kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi -object-name kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. "time","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:50thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:75thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:95thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:98thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:999thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:99thPercentile","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:Count","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:Max","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:Mean","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:Min","kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce:StdDev" 1606700344429,2.0,2.0,4.689,31.2799745,46.0,46.0,72,46.0,2.639,1.0,5.436718033280366 1606700346433,2.0,2.0,4.689,31.2799745,46.0,46.0,72,46.0,2.639,1.0,5.436718033280366 ^C {noformat} This is definitely not documented here: [https://kafka.apache.org/documentation/#monitoring] I think "version=X" part in "RequestsPerSec" is a mistake and should be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10529) Controller should throttle partition reassignment
Badai Aqrandista created KAFKA-10529: Summary: Controller should throttle partition reassignment Key: KAFKA-10529 URL: https://issues.apache.org/jira/browse/KAFKA-10529 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista With [KIP-455|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment]], reassignment can be triggered via AdminClient API. However, when reassigning a large number of topic partitions at once, this can cause a storm of LeaderAndIsr and UpdateMetadata requests, which can occupy Controller thread for some time. And this prevents Controller from processing other requests. So, Controller should throttle sending LeaderAndIsr request when actioning a reassignment request. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9684) Add support for SNI names in SSL request
[ https://issues.apache.org/jira/browse/KAFKA-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Badai Aqrandista resolved KAFKA-9684. - Resolution: Not A Problem Java 7 should include SNI names by default. > Add support for SNI names in SSL request > > > Key: KAFKA-9684 > URL: https://issues.apache.org/jira/browse/KAFKA-9684 > Project: Kafka > Issue Type: Improvement >Reporter: Badai Aqrandista >Priority: Minor > > When running Kafka cluster with SSL security behind HA Proxy, we need the > client to send SSL packets with SNI name extension [1]. This will allow HA > Proxy to forward the request to the relevant broker behind it (passthrough). > Java 7 and higher supports this by adding SNIHostName [2] to SSLParameters > [3]. > [1] https://www.ietf.org/rfc/rfc6066.txt > [2] https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SNIHostName.html > [3] > https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLParameters.html#setServerNames-java.util.List- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9684) Add support for SNI names in SSL request
Badai Aqrandista created KAFKA-9684: --- Summary: Add support for SNI names in SSL request Key: KAFKA-9684 URL: https://issues.apache.org/jira/browse/KAFKA-9684 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista When running Kafka cluster with SSL security behind HA Proxy, we need the client to send SSL packets with SNI name extension [1]. This will allow HA Proxy to forward the request to the relevant broker behind it (passthrough). Java 7 and higher supports this by adding SNIHostName [2] to SSLParameters [3]. [1] https://www.ietf.org/rfc/rfc6066.txt [2] https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SNIHostName.html [3] https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLParameters.html#setServerNames-java.util.List- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9540) Application getting "Could not find the standby task 0_4 while closing it" error
Badai Aqrandista created KAFKA-9540: --- Summary: Application getting "Could not find the standby task 0_4 while closing it" error Key: KAFKA-9540 URL: https://issues.apache.org/jira/browse/KAFKA-9540 Project: Kafka Issue Type: Bug Components: streams Reporter: Badai Aqrandista Because of this the following line, there is a possibility that some standby tasks might not be created: https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L436 Then causing this line to not adding the task to standby task list: https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L299 But this line assumes that all standby tasks are to be created and add it to the standby list: https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java#L168 This results in user getting this error message on the next PARTITION_ASSIGNMENT state: {noformat} Could not find the standby task 0_4 while closing it (org.apache.kafka.streams.processor.internals.AssignedStandbyTasks:74) {noformat} But the harm caused by this issue is minimal: No standby task for some partitions. And it is recreated on the next rebalance anyway. So, I suggest lowering this message to WARN. Or probably check to WARN when standby task could not be created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9459) MM2 sync topic config does work
Badai Aqrandista created KAFKA-9459: --- Summary: MM2 sync topic config does work Key: KAFKA-9459 URL: https://issues.apache.org/jira/browse/KAFKA-9459 Project: Kafka Issue Type: Bug Components: mirrormaker Affects Versions: 2.4.0 Reporter: Badai Aqrandista I have MM2 configured as follow: {code:java} { "name": "mm2-from-1-to-2", "config": { "connector.class":"org.apache.kafka.connect.mirror.MirrorSourceConnector", "topics":"foo", "key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "sync.topic.configs.enabled":"true", "sync.topic.configs.interval.seconds": 60, "sync.topic.acls.enabled": "false", "replication.factor": 1, "offset-syncs.topic.replication.factor": 1, "heartbeats.topic.replication.factor": 1, "checkpoints.topic.replication.factor": 1, "target.cluster.alias":"dest", "target.cluster.bootstrap.servers":"dest.example.com:9092", "source.cluster.alias":"src", "source.cluster.bootstrap.servers":"src.example.com:9092", "tasks.max": 1} } {code} Topic "foo" is configured with "cleanup.policy=compact". But after waiting for 15 minutes, I still don't see "src.foo" in the destination cluster has "cleanup.policy=compact". I had the connect node to run in TRACE level and I could not find any calls to describeConfigs (https://github.com/apache/kafka/blob/2.4.0/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L327). This implies it never actually get a list of topics that it needs to get topic configs from. And I am suspecting this code (https://github.com/apache/kafka/blob/2.4.0/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L214-L220): {code:java} private Set topicsBeingReplicated() { return knownTopicPartitions.stream() .map(x -> x.topic()) .distinct() .filter(x -> knownTargetTopics.contains(formatRemoteTopic(x))) .collect(Collectors.toSet()); } {code} knownTopicPartitions contains topic-partitions from the source cluster. knownTargetTopics contains topic-partitions from the target cluster, whose topic names contain source alias already. So, why is topicsBeingReplicated (list of topic-partitions from source cluster) being filtered using knownTopicPartitions (list of topic-partitions from target cluster)? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9275) Print assignment and IP address in the log message when consumer leaves/removed from the group
Badai Aqrandista created KAFKA-9275: --- Summary: Print assignment and IP address in the log message when consumer leaves/removed from the group Key: KAFKA-9275 URL: https://issues.apache.org/jira/browse/KAFKA-9275 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista In order to simplify identification of which member is causing rebalance, can we add the IP address and the list of topic-partitions assigned to the member that leaves/removed from the group? And especially in the "reason" string for rebalance: https://github.com/apache/kafka/blob/2.4/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L964 https://github.com/apache/kafka/blob/2.4/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L972 https://github.com/apache/kafka/blob/2.4/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L1144 https://github.com/apache/kafka/blob/2.4/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L493 This will allow a much faster investigation when a consumer group is stuck in rebalancing state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-8562) SASL_SSL still performs reverse DNS lookup despite KAFKA-5051
Badai Aqrandista created KAFKA-8562: --- Summary: SASL_SSL still performs reverse DNS lookup despite KAFKA-5051 Key: KAFKA-8562 URL: https://issues.apache.org/jira/browse/KAFKA-8562 Project: Kafka Issue Type: Bug Reporter: Badai Aqrandista When using SASL_SSL, the Kafka client performs a reverse DNS lookup to resolve IP to DNS. So, this circumvent the security fix made in KAFKA-5051. This is the line of code from AK 2.2 where it performs the lookup: https://github.com/apache/kafka/blob/2.2.0/clients/src/main/java/org/apache/kafka/common/network/SaslChannelBuilder.java#L205 Following log messages show that consumer initially tried to connect with IP address 10.0.2.15. Then suddenly it created SaslClient with a hostname: {code:java} [2019-06-18 06:23:36,486] INFO Kafka commitId: 00d486623990ed9d (org.apache.kafka.common.utils.AppInfoParser) [2019-06-18 06:23:36,487] DEBUG [Consumer clientId=KafkaStore-reader-_schemas, groupId=schema-registry-10.0.2.15-18081] Kafka consumer initialized (org.apache.kafka.clients.consumer.KafkaConsumer) [2019-06-18 06:23:36,505] DEBUG [Consumer clientId=KafkaStore-reader-_schemas, groupId=schema-registry-10.0.2.15-18081] Initiating connection to node 10.0.2.15:19094 (id: -1 rack: null) using address /10.0.2.15 (org.apache.kafka.clients.NetworkClient) [2019-06-18 06:23:36,512] DEBUG Set SASL client state to SEND_APIVERSIONS_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) [2019-06-18 06:23:36,515] DEBUG Creating SaslClient: client=null;service=kafka;serviceHostname=quickstart.confluent.io;mechs=[PLAIN] (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) {code} Thanks Badai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8546) Call System#runFinalization to avoid memory leak caused by JDK-6293787
Badai Aqrandista created KAFKA-8546: --- Summary: Call System#runFinalization to avoid memory leak caused by JDK-6293787 Key: KAFKA-8546 URL: https://issues.apache.org/jira/browse/KAFKA-8546 Project: Kafka Issue Type: Bug Affects Versions: 2.0.1 Reporter: Badai Aqrandista When a heavily used broker uses gzip compression on all topics, sometime you can hit GC pauses greater than zookeeper.session.timeout.ms of 6000ms. This is caused by memory leak caused by JDK-6293787 ([https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6293787]), which is caused by JDK-4797189 ([https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4797189]). In summary, this is what happen: * Inflater class contains finalizer method. * Whenever a class with finalizer method is instantiated, a Finalizer object is created. * GC finalizer thread is responsible to process all Finalizer objects. * If the rate of Finalizer object creation exceed the rate of GC finalizer thread ability to process it, Finalizer object number grows continuously, and eventually triggers full GC (because it is stored in Old Gen). Following stack trace shows what happen when a process is frozen doing full GC: {code:java} kafka-request-handler-13 Runnable Thread ID: 79 java.util.zip.Inflater.inflateBytes(long, byte[], int, int) Inflater.java java.util.zip.Inflater.inflate(byte[], int, int) Inflater.java:259 java.util.zip.InflaterInputStream.read(byte[], int, int) InflaterInputStream.java:152 java.util.zip.GZIPInputStream.read(byte[], int, int) GZIPInputStream.java:117 java.io.BufferedInputStream.fill() BufferedInputStream.java:246 java.io.BufferedInputStream.read() BufferedInputStream.java:265 java.io.DataInputStream.readByte() DataInputStream.java:265 org.apache.kafka.common.utils.ByteUtils.readVarint(DataInput) ByteUtils.java:168 org.apache.kafka.common.record.DefaultRecord.readFrom(DataInput, long, long, int, Long) DefaultRecord.java:292 org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(long, long, int, Long) DefaultRecordBatch.java:264 org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next() DefaultRecordBatch.java:563 org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next() DefaultRecordBatch.java:532 org.apache.kafka.common.record.DefaultRecordBatch.iterator() DefaultRecordBatch.java:327 scala.collection.convert.Wrappers$JIterableWrapper.iterator() Wrappers.scala:54 scala.collection.IterableLike$class.foreach(IterableLike, Function1) IterableLike.scala:72 scala.collection.AbstractIterable.foreach(Function1) Iterable.scala:54 kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(MutableRecordBatch) LogValidator.scala:267 kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(Object) LogValidator.scala:259 scala.collection.Iterator$class.foreach(Iterator, Function1) Iterator.scala:891 scala.collection.AbstractIterator.foreach(Function1) Iterator.scala:1334 scala.collection.IterableLike$class.foreach(IterableLike, Function1) IterableLike.scala:72 scala.collection.AbstractIterable.foreach(Function1) Iterable.scala:54 kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(MemoryRecords, LongRef, Time, long, CompressionCodec, CompressionCodec, boolean, byte, TimestampType, long, int, boolean) LogValidator.scala:259 kafka.log.LogValidator$.validateMessagesAndAssignOffsets(MemoryRecords, LongRef, Time, long, CompressionCodec, CompressionCodec, boolean, byte, TimestampType, long, int, boolean) LogValidator.scala:70 kafka.log.Log$$anonfun$append$2.liftedTree1$1(LogAppendInfo, ObjectRef, LongRef, long) Log.scala:771 kafka.log.Log$$anonfun$append$2.apply() Log.scala:770 kafka.log.Log$$anonfun$append$2.apply() Log.scala:752 kafka.log.Log.maybeHandleIOException(Function0, Function0) Log.scala:1842 kafka.log.Log.append(MemoryRecords, boolean, boolean, int) Log.scala:752 kafka.log.Log.appendAsLeader(MemoryRecords, int, boolean) Log.scala:722 kafka.cluster.Partition$$anonfun$13.apply() Partition.scala:660 kafka.cluster.Partition$$anonfun$13.apply() Partition.scala:648 kafka.utils.CoreUtils$.inLock(Lock, Function0) CoreUtils.scala:251 kafka.utils.CoreUtils$.inReadLock(ReadWriteLock, Function0) CoreUtils.scala:257 kafka.cluster.Partition.appendRecordsToLeader(MemoryRecords, boolean, int) Partition.scala:647 kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(Tuple2) ReplicaManager.scala:745 kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(Object) ReplicaManager.scala:733 scala.collection.TraversableLike$$anonfun$map$1.apply(Object) TraversableLike.scala:234 scala.collection.TraversableLike$$anonfun$map$1.apply(Object) TraversableLike.scala:234
[jira] [Created] (KAFKA-8501) Remove key and value from exception message
Badai Aqrandista created KAFKA-8501: --- Summary: Remove key and value from exception message Key: KAFKA-8501 URL: https://issues.apache.org/jira/browse/KAFKA-8501 Project: Kafka Issue Type: Bug Reporter: Badai Aqrandista KAFKA-7510 moves the WARN messages that contain key and value to TRACE level. But the exceptions still contain key and value. These are the two in RecordCollectorImpl: [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L185-L196] [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L243-L254] Can these be modified as well to remove key and value from the error message, which we don't know what log level it will be printed in? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7760) Add broker configuration to set minimum value for segment.bytes and segment.ms
Badai Aqrandista created KAFKA-7760: --- Summary: Add broker configuration to set minimum value for segment.bytes and segment.ms Key: KAFKA-7760 URL: https://issues.apache.org/jira/browse/KAFKA-7760 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista If someone set segment.bytes or segment.ms at topic level to a very small value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a very high number of segment files. This can bring down the whole broker due to hitting the maximum open file (for log) or maximum number of mmap-ed file (for index). To prevent that from happening, I would like to suggest adding two new items to the broker configuration: * min.topic.segment.bytes, defaults to 1048576: The minimum value for segment.bytes. When someone sets topic configuration segment.bytes to a value lower than this, Kafka throws an error INVALID VALUE. * min.topic.segment.ms, defaults to 360: The minimum value for segment.ms. When someone sets topic configuration segment.ms to a value lower than this, Kafka throws an error INVALID VALUE. Thanks Badai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7754) zookeeper-security-migration.sh sets the root ZNode as world-readable
Badai Aqrandista created KAFKA-7754: --- Summary: zookeeper-security-migration.sh sets the root ZNode as world-readable Key: KAFKA-7754 URL: https://issues.apache.org/jira/browse/KAFKA-7754 Project: Kafka Issue Type: Bug Components: security Affects Versions: 2.0.1 Reporter: Badai Aqrandista If I start broker with {{zookeeper.set.acl=true}} from the first time I start the broker, the root ZNode is not set to be world-readable to allow other application to share the Zookeeper ensemble with chroot. But if I run {{zookeeper-security-migration.sh}} with {{–zookeeper.acl secure}}, the root ZNode becomes world-readable. Is this correct? {noformat} root@localhost:/# zookeeper-shell localhost:2181 Connecting to localhost:2181 Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] WATCHER:: WatchedEvent state:SyncConnected type:None path:null WATCHER:: WatchedEvent state:SaslAuthenticated type:None path:null [zk: localhost:2181(CONNECTED) 0] getAcl / 'world,'anyone : cdrwa [zk: localhost:2181(CONNECTED) 1] getAcl /brokers 'world,'anyone : r 'sasl,'kafkabroker : cdrwa [zk: localhost:2181(CONNECTED) 2] quit Quitting... root@localhost:/# zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect localhost:2181 root@localhost:/# zookeeper-shell localhost:2181 Connecting to localhost:2181 Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] WATCHER:: WatchedEvent state:SyncConnected type:None path:null WATCHER:: WatchedEvent state:SaslAuthenticated type:None path:null [zk: localhost:2181(CONNECTED) 0] getAcl / 'world,'anyone : r 'sasl,'kafkabroker : cdrwa [zk: localhost:2181(CONNECTED) 1] getAcl /brokers 'world,'anyone : r 'sasl,'kafkabroker : cdrwa [zk: localhost:2181(CONNECTED) 2] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7467) NoSuchElementException is raised because controlBatch is empty
Badai Aqrandista created KAFKA-7467: --- Summary: NoSuchElementException is raised because controlBatch is empty Key: KAFKA-7467 URL: https://issues.apache.org/jira/browse/KAFKA-7467 Project: Kafka Issue Type: Bug Components: core Affects Versions: 1.1.0 Reporter: Badai Aqrandista Somehow, log cleaner died because of NoSuchElementException when it calls onControlBatchRead: {noformat} [2018-09-25 14:18:31,088] INFO Cleaner 0: Cleaning segment 0 in log __consumer_offsets-45 (largest timestamp Fri Apr 27 16:12:39 CDT 2018) into 0, discarding deletes. (kafka.log.LogCleaner) [2018-09-25 14:18:31,092] ERROR [kafka-log-cleaner-thread-0]: Error due to (kafka.log.LogCleaner) java.util.NoSuchElementException at java.util.Collections$EmptyIterator.next(Collections.java:4189) at kafka.log.CleanedTransactionMetadata.onControlBatchRead(LogCleaner.scala:945) at kafka.log.Cleaner.kafka$log$Cleaner$$shouldDiscardBatch(LogCleaner.scala:636) at kafka.log.Cleaner$$anon$5.checkBatchRetention(LogCleaner.scala:573) at org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:157) at org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:138) at kafka.log.Cleaner.cleanInto(LogCleaner.scala:604) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:518) at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:462) at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:461) at scala.collection.immutable.List.foreach(List.scala:392) at kafka.log.Cleaner.doClean(LogCleaner.scala:461) at kafka.log.Cleaner.clean(LogCleaner.scala:438) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:305) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:291) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) [2018-09-25 14:18:31,093] INFO [kafka-log-cleaner-thread-0]: Stopped (kafka.log.LogCleaner) {noformat} The following code does not seem to expect the controlBatch to be empty: https://github.com/apache/kafka/blob/1.1/core/src/main/scala/kafka/log/LogCleaner.scala#L946 {noformat} def onControlBatchRead(controlBatch: RecordBatch): Boolean = { consumeAbortedTxnsUpTo(controlBatch.lastOffset) val controlRecord = controlBatch.iterator.next() val controlType = ControlRecordType.parse(controlRecord.key) val producerId = controlBatch.producerId {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7379) send.buffer.bytes should be allowed to set -1 in KafkaStreams
Badai Aqrandista created KAFKA-7379: --- Summary: send.buffer.bytes should be allowed to set -1 in KafkaStreams Key: KAFKA-7379 URL: https://issues.apache.org/jira/browse/KAFKA-7379 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.0.0 Reporter: Badai Aqrandista send.buffer.bytes and receive.buffer.bytes are declared with atLeast(0) constraint in StreamsConfig, whereas -1 should be also allowed to set. This is like KAFKA-6891. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6939) Change the default of log.message.timestamp.difference.max.ms to 500 years
Badai Aqrandista created KAFKA-6939: --- Summary: Change the default of log.message.timestamp.difference.max.ms to 500 years Key: KAFKA-6939 URL: https://issues.apache.org/jira/browse/KAFKA-6939 Project: Kafka Issue Type: Improvement Reporter: Badai Aqrandista If producer incorrectly provides timestamp in microsecond (not in millisecond), the record is accepted by default and can cause broker to roll the segment files continuously. And on a heavily used broker, this will generate a lot of index files, which then causes the broker to hit `vm.max_map_count`. So I'd like to suggest changing the default for log.message.timestamp.difference.max.ms to 157680 (500 years * 365 days * 86400 seconds * 1000). This would reject timestamp in microsecond from producer and still allow most historical data to be stored in Kafka. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6756) client.id and group.id validation in the old vs new consumer
Badai Aqrandista created KAFKA-6756: --- Summary: client.id and group.id validation in the old vs new consumer Key: KAFKA-6756 URL: https://issues.apache.org/jira/browse/KAFKA-6756 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 1.1.0, 0.10.2.1 Reporter: Badai Aqrandista It looks like the old consumer that is based on the Scala code validates "client.id" and "group.id" using this code: [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/common/Config.scala#L25] [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ConsumerConfig.scala#L60] However, the new consumer uses the Java code that does not validate "client.id" and "group.id" at all here: [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java#L264] [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java#L298] So the new consumer does not validate "client.id" and "group.id" like the old consumer. Either way, the documentation never specify the valid character for "client.id" and "group.id". Is this a bug or by design? -- This message was sent by Atlassian JIRA (v7.6.3#76005)