[GitHub] [kafka] wenzelcheng closed pull request #11360: Floating point number comparison problem

2021-10-03 Thread GitBox


wenzelcheng closed pull request #11360:
URL: https://github.com/apache/kafka/pull/11360


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] lbradstreet opened a new pull request #11376: KAFKA-13342: LeaderAndIsrRequest should not be sent for topic queued for deletion

2021-10-03 Thread GitBox


lbradstreet opened a new pull request #11376:
URL: https://github.com/apache/kafka/pull/11376


   In some cases a broker may be lost during a topic deletion and before
   its replica has moved to OfflineReplica state. When the broker comes
   back up the controller will send it a LeaderAndIsrRequest containing the
   partition even though it is already in the deleting state in the controller.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-13331) Slow reassignments in 2.8 because of large number of UpdateMetadataResponseReceived(UpdateMetadataResponseData(errorCode=0),) Events

2021-10-03 Thread GEORGE LI (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423761#comment-17423761
 ] 

GEORGE LI commented on KAFKA-13331:
---

besides slowness of the reassignments,  when the EventQueue is long, with a lot 
of UpdateMetadataRequestResponse events.  The cluster  is also not behaving 
correctly (slow / timeout with some RPC calls). e.g.   using {{kafka-topics.sh  
--bootstrap-server}}  will time out like below.  because it puts 
{{listPartitionReassignments}}  in the EventQueue.   The old way 
{{kafka-topics.sh  --zookeeper}} returned fast. 

{code}
$ /usr/lib/kafka/bin/kafka-topics.sh --bootstrap-server :9092 
--topic  --describe
Error while executing topic command : Call(callName=listPartitionReassignments, 
deadlineMs=1632339660352, tries=1, nextAllowedTryMs=1632339660520) timed out at 
1632339660420 after 1 attempt(s)
[2021-09-22 19:41:00,425] ERROR 
org.apache.kafka.common.errors.TimeoutException: 
Call(callName=listPartitionReassignments, deadlineMs=1632339660352, tries=1, 
nextAllowedTryMs=1632339660520) timed out at 1632339660420 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled 
listPartitionReassignments request with correlation id 9 due to node 10043 
being disconnected
 (kafka.admin.TopicCommand$)
{code}

Any RPC calls putting into the EventQueue will be affected during the 
reassignments (batch size 50+) in our environment. 



> Slow reassignments in 2.8 because of large number of  
> UpdateMetadataResponseReceived(UpdateMetadataResponseData(errorCode=0),)
>  Events
> 
>
> Key: KAFKA-13331
> URL: https://issues.apache.org/jira/browse/KAFKA-13331
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, controller, core
>Affects Versions: 2.8.1, 3.0.0
>Reporter: GEORGE LI
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: Screen Shot 2021-09-28 at 12.57.34 PM.png
>
>
> Slowness is observed when doing reassignments on clusters with more brokers 
> (e.g. 80 brokers). 
> After investigation, it looks like the slowness is because  for 
> reassignments, it sends the UpdateMetadataRequest to all the broker for every 
> topic partition affected by the reassignment (maybe some optimization can be 
> done).   e.g. 
> for a reassignment with batch size of 50 partitions.  it will generate about 
> 10k - 20k ControllerEventManager.EventQueueSize and  the p99 EventQueueTimeMs 
> will be  1M.   if the batch size is 100 partitions,   about 40K 
> EventQeuueSize and  3M p99 EventQueueTimeMs.   See below screen shot on the 
> metrics.  
>  !Screen Shot 2021-09-28 at 12.57.34 PM.png! 
> it takes about 10-30minutes to process 100 reassignments per batch.   and 
> 20-30 seconds for 1 reassignment per batch even the topic partition is almost 
> empty.   Version 1.1, the reassignment is almost instant. 
> Looking at what is in the ControllerEventManager.EventQueue,  the majority 
> (depends on the how many brokers in the cluster, it can be 90%+)  is  
> {{UpdateMetadataResponseReceived(UpdateMetadataResponseData(errorCode=0),)}}
>  events.   which is introduced  in this commit: 
> {code}
> commit 4e431246c31170a7f632da8edfdb9cf4f882f6ef
> Author: Jason Gustafson 
> Date:   Thu Nov 21 07:41:29 2019 -0800
> MINOR: Controller should log UpdateMetadata response errors (#7717)
> 
> Create a controller event for handling UpdateMetadata responses and log a 
> message when a response contains an error.
> 
> Reviewers: Stanislav Kozlovski , Ismael 
> Juma 
> {code}
> Checking how the events in the ControllerEventManager are processed for  
> {{UpdateMetadata response}},  it seems like it's only checking whether there 
> is an error, and simply log the error.  but it takes about 3ms - 60ms  to 
> dequeue each event.  Because it's a FIFO queue,  other events were waiting in 
> the queue. 
> {code}
>   private def processUpdateMetadataResponseReceived(updateMetadataResponse: 
> UpdateMetadataResponse, brokerId: Int): Unit = {
> if (!isActive) return
> if (updateMetadataResponse.error != Errors.NONE) {
>   stateChangeLogger.error(s"Received error 
> ${updateMetadataResponse.error} in UpdateMetadata " +
> s"response $updateMetadataResponse from broker $brokerId")
> }
>   }
> {code}
> There might be more sophisticated logic for handling the UpdateMetadata 
> response error in the future.   For current version,   would it be better to 
> check whether  the response error code is  Errors.NONE before putting into 
> the Event Queue?   e.g.  I put this additional check and see the Reassignment 
> Performance increase dramatically on the 80 brokers cluster. 
> {code}
>  val up

[jira] [Commented] (KAFKA-13342) LISR sent for topic queued for deletion in controller

2021-10-03 Thread Lucas Bradstreet (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423756#comment-17423756
 ] 

Lucas Bradstreet commented on KAFKA-13342:
--

Under some scenarios I think this could cause partition dirs to be recreated.

> LISR sent for topic queued for deletion in controller
> -
>
> Key: KAFKA-13342
> URL: https://issues.apache.org/jira/browse/KAFKA-13342
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Bradstreet
>Priority: Minor
>
> Under certain conditions in some system tests a broker will be hard killed 
> during a topic deletion and before its replica has moved to OfflineReplica 
> state. When the broker comes back up the controller will send it a 
> LeaderAndIsrRequest containing the partition causing it to recreate the 
> partition locally even though it is in deleting state in the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13342) LISR sent for topic queued for deletion in controller

2021-10-03 Thread Lucas Bradstreet (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Bradstreet updated KAFKA-13342:
-
Priority: Minor  (was: Major)

> LISR sent for topic queued for deletion in controller
> -
>
> Key: KAFKA-13342
> URL: https://issues.apache.org/jira/browse/KAFKA-13342
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Bradstreet
>Priority: Minor
>
> Under certain conditions in some system tests a broker will be hard killed 
> during a topic deletion and before its replica has moved to OfflineReplica 
> state. When the broker comes back up the controller will send it a 
> LeaderAndIsrRequest containing the partition causing it to recreate the 
> partition locally even though it is in deleting state in the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13342) LISR sent for topic queued for deletion in controller

2021-10-03 Thread Lucas Bradstreet (Jira)
Lucas Bradstreet created KAFKA-13342:


 Summary: LISR sent for topic queued for deletion in controller
 Key: KAFKA-13342
 URL: https://issues.apache.org/jira/browse/KAFKA-13342
 Project: Kafka
  Issue Type: Bug
Reporter: Lucas Bradstreet


Under certain conditions in some system tests a broker will be hard killed 
during a topic deletion and before its replica has moved to OfflineReplica 
state. When the broker comes back up the controller will send it a 
LeaderAndIsrRequest containing the partition causing it to recreate the 
partition locally even though it is in deleting state in the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13341) Quotas are not applied to requests with null clientId

2021-10-03 Thread David Mao (Jira)
David Mao created KAFKA-13341:
-

 Summary: Quotas are not applied to requests with null clientId
 Key: KAFKA-13341
 URL: https://issues.apache.org/jira/browse/KAFKA-13341
 Project: Kafka
  Issue Type: Bug
Reporter: David Mao


ClientQuotaManager.DefaultQuotaCallback will not check for the existence of a 
default quota if a request's clientId is null. This results in null clientIds 
bypassing quotas.

Null clientIds are permitted in the protocol, so this seems like a bug.

This looks like it may be a regression introduced by 
https://github.com/apache/kafka/pull/7372



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13341) Quotas are not applied to requests with null clientId

2021-10-03 Thread David Mao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mao updated KAFKA-13341:
--
Priority: Major  (was: Minor)

> Quotas are not applied to requests with null clientId
> -
>
> Key: KAFKA-13341
> URL: https://issues.apache.org/jira/browse/KAFKA-13341
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Priority: Major
>
> ClientQuotaManager.DefaultQuotaCallback will not check for the existence of a 
> default quota if a request's clientId is null. This results in null clientIds 
> bypassing quotas.
> Null clientIds are permitted in the protocol, so this seems like a bug.
> This looks like it may be a regression introduced by 
> https://github.com/apache/kafka/pull/7372



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] satishd commented on pull request #11058: KAFKA-12802 Added a file based cache for consumed remote log metadata for each partition to avoid consuming again incase of broker restarts.

2021-10-03 Thread GitBox


satishd commented on pull request #11058:
URL: https://github.com/apache/kafka/pull/11058#issuecomment-933099724


   Thanks @junrao for the review comments. Addressed them with the latest 
commit and replies. 
   
   Added a scenario to verify checkpointed offsets in the 
[test](https://github.com/apache/kafka/pull/11058/commits/554df8c5e58f5dc14b5d1a3476f011184116a088#diff-8c57d1a1451531841bccd4de7f38b838cfa8444e0257c5097480e92e3e0fe72bR145).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] upsidedownsmile opened a new pull request #11375: KAFKA-10865: Log transformed record in WorkerSinkTask

2021-10-03 Thread GitBox


upsidedownsmile opened a new pull request #11375:
URL: https://github.com/apache/kafka/pull/11375


   This PR adds a log message with the **topic**, **key** and **value** to the 
transformed record in **WorkerSinkTask** similar to was already being done in 
**WorkerSinkTask**.
   
   Also fixed a small typo in javadocs where there was a double period.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] wenzelcheng closed pull request #11360: Floating point number comparison problem

2021-10-03 Thread GitBox


wenzelcheng closed pull request #11360:
URL: https://github.com/apache/kafka/pull/11360


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org