Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1145

2022-08-15 Thread Apache Jenkins Server
See 




[GitHub] [kafka-site] jsancio merged pull request #438: MINOR; Add new signing key for jsan...@apache.org

2022-08-15 Thread GitBox


jsancio merged PR #438:
URL: https://github.com/apache/kafka-site/pull/438


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-14167) Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller

2022-08-15 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14167:
---

 Summary: Unexpected UNKNOWN_SERVER_ERROR raised from kraft 
controller
 Key: KAFKA-14167
 URL: https://issues.apache.org/jira/browse/KAFKA-14167
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In `ControllerApis`, we have callbacks such as the following after completion:
{code:java}
    controller.allocateProducerIds(context, allocatedProducerIdsRequest.data)
      .handle[Unit] { (results, exception) =>
        if (exception != null) {
          requestHelper.handleError(request, exception)
        } else {
          requestHelper.sendResponseMaybeThrottle(request, requestThrottleMs => 
{
            results.setThrottleTimeMs(requestThrottleMs)
            new AllocateProducerIdsResponse(results)
          })
        }
      } {code}
What I see locally is that the underlying exception that gets passed to 
`handle` always gets wrapped in a `CompletionException`. When passed to 
`getErrorResponse`, this error will get converted to `UNKNOWN_SERVER_ERROR`. 
For example, in this case, a `NOT_CONTROLLER` error returned from the 
controller would be returned as `UNKNOWN_SERVER_ERROR`. It looks like there are 
a few APIs that are potentially affected by this bug, such as `DeleteTopics` 
and `UpdateFeatures`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14039) Fix KRaft AlterConfigPolicy usage

2022-08-15 Thread David Arthur (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arthur resolved KAFKA-14039.
--
Resolution: Fixed

> Fix KRaft AlterConfigPolicy usage
> -
>
> Key: KAFKA-14039
> URL: https://issues.apache.org/jira/browse/KAFKA-14039
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> In ConfigurationControlManager, we are currently passing all the 
> configuration values known to the controller down into the AlterConfigPolicy. 
> This does not match the behavior in ZK mode where we only pass configs which 
> were included in the alter configs request.
> This can lead to different unexpected behavior in custom AlterConfigPolicy 
> implementations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1144

2022-08-15 Thread Apache Jenkins Server
See 




Re: [DISCISS] KIP-860: Add client-provided option to guard against unintentional replication factor change during partition reassignments

2022-08-15 Thread Stanislav Kozlovski
Thanks for the discussion all,

I have updated the KIP to mention throwing an UnsupportedVersionException
if the server is running an old version that would not honor the configured
allowReplicationFactor option.

Please take a look:
- KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
- changes:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=217392873=4=3

If there aren't extra comments, I plan on starting a vote thread by the end
of this week.

Best,
Stanislav

On Tue, Aug 9, 2022 at 5:06 AM David Jacot 
wrote:

> Throwing an UnsupportedVersionException with an appropriate message
> seems to be the best option when the new API is not supported and
> AllowReplicationFactorChange is not set to the default value.
>
> Cheers,
> David
>
> On Mon, Aug 8, 2022 at 6:25 PM Vikas Singh 
> wrote:
> >
> > I personally like the UVE option. It provides options for clients to go
> > either way, retry or abort. If we do it in AdminClient, then users have
> to
> > live with what we have chosen.
> >
> > > Note this can happen during an RF change too. e.g [1,2,3] => [4,5,6,7]
> (RF
> > > change, intermediate set is [1,2,3,4,5,6,7]), and we try to do a
> > > reassignment to [9,10,11], the logic will compare [4,5,6,7] to
> [9,10,11].
> > > In such a situation where one wants to cancel the RF increase and
> reassign
> > > again, one first needs to cancel the existing reassignment via the API
> (no
> > > special action required despite RF change)
> >
> > Thanks for the explanation. I did realize this nuance and thus requested
> to
> > put that in KIP as it's not mentioned why the choice was made. I am fine
> if
> > you choose to not do it in the interest of brevity.
> >
> > Vikas
> >
> > On Sun, Aug 7, 2022 at 9:02 AM Stanislav Kozlovski
> >  wrote:
> >
> > > Thank you for the reviews.
> > >
> > > Vikas,
> > > > > In the case of an already-reassigning partition being reassigned
> again,
> > > the validation compares the targetReplicaSet size of the reassignment
> to
> > > the targetReplicaSet size of the new reassignment and throws if those
> > > differ.
> > > > Can you add more detail to this, or clarify what is targetReplicaSet
> (for
> > > e.g. why not sourceReplicaSet?) and how the target replica set will be
> > > calculated?
> > > If a reassignment is ongoing, such that [1,2,3] => [4,5,6] (the
> replica set
> > > in Kafka will be [1,2,3,4,5,6] during the reassignment), and you try to
> > > issue a new reassignment (e.g [7,8,9], Kafka should NOT think that the
> RF
> > > of the partition is 6 just because a reassignment is ongoing. Hence, we
> > > compare [4,5,6]'s length to [7,8,9]
> > > The targetReplicaSet is a term we use in KIP-455
> > > <
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > >.
> > > It means the desired replica set that a given reassignment is trying to
> > > achieve. Here we compare said set of the on-going reassignment to the
> new
> > > reassignment.
> > >
> > > Note this can happen during an RF change too. e.g [1,2,3] => [4,5,6,7]
> (RF
> > > change, intermediate set is [1,2,3,4,5,6,7]), and we try to do a
> > > reassignment to [9,10,11], the logic will compare [4,5,6,7] to
> [9,10,11].
> > > In such a situation where one wants to cancel the RF increase and
> reassign
> > > again, one first needs to cancel the existing reassignment via the API
> (no
> > > special action required despite RF change)
> > >
> > >
> > > > And what about the reassign partitions CLI? Do we want to expose the
> > > option there too?
> > > Yes, this is already present in the KIP if I'm not mistaken. We
> describe it
> > > in "Accordingly, the kafka-reassign-partitions.sh tool will be updated
> to
> > > allow supplying the new option:"
> > > I have edited the KIP to contain two clear paragraphs called Admin API
> and
> > > CLI now.
> > >
> > > Colin,
> > >
> > > >  it would be nice for the first paragraph to be a bit more explicit
> about
> > > this goal.
> > > sounds good, updated it with that suggestion.
> > >
> > > > client-side forward compatibility
> > > I was under the assumption that it is not recommended to upgrade
> clients
> > > before brokers, but a quick search cleared it up to me that we're
> pretty
> > > intentional about allowing that
> > > <
> > >
> https://www.confluent.io/blog/upgrading-apache-kafka-clients-just-got-easier/
> > > >
> > > .
> > > Do you happen to know if we have any policy on client-side forward
> > > compatibility with regard to such things -- extending "write" APIs
> (that
> > > mutate the state) with fields that conditionally limit that
> modification?
> > > It seems like a rare use case to me, hence renaming it to something
> like
> > > tryDisableReplicationFactorChange may unnecessary impair the API.
> > >
> > > Would Admin API documentation that says "this is 

[jira] [Resolved] (KAFKA-13809) FileStreamSinkConnector and FileStreamSourceConnector should propagate full configuration to tasks

2022-08-15 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-13809.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> FileStreamSinkConnector and FileStreamSourceConnector should propagate full 
> configuration to tasks
> --
>
> Key: KAFKA-13809
> URL: https://issues.apache.org/jira/browse/KAFKA-13809
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Daniel Urban
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.4.0
>
>
> The 2 example connectors do not propagate the full connector configuration to 
> the tasks. This makes it impossible to override built-in configs, such as 
> producer/consumer overrides.
> This causes an issue even when used for testing purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-13809) FileStreamSinkConnector and FileStreamSourceConnector should propagate full configuration to tasks

2022-08-15 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton reopened KAFKA-13809:
---

> FileStreamSinkConnector and FileStreamSourceConnector should propagate full 
> configuration to tasks
> --
>
> Key: KAFKA-13809
> URL: https://issues.apache.org/jira/browse/KAFKA-13809
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Daniel Urban
>Assignee: Yash Mayya
>Priority: Major
>
> The 2 example connectors do not propagate the full connector configuration to 
> the tasks. This makes it impossible to override built-in configs, such as 
> producer/consumer overrides.
> This causes an issue even when used for testing purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1143

2022-08-15 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 418909 lines...]
[2022-08-15T20:37:50.469Z] InternalTopicIntegrationTest > 
shouldCompactTopicsForKeyValueStoreChangelogs() PASSED
[2022-08-15T20:37:50.469Z] 
[2022-08-15T20:37:50.469Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() STARTED
[2022-08-15T20:37:52.832Z] 
[2022-08-15T20:37:52.832Z] FineGrainedAutoResetIntegrationTest > 
shouldOnlyReadRecordsWhereEarliestSpecifiedWithInvalidCommittedOffsets() PASSED
[2022-08-15T20:37:52.832Z] 
[2022-08-15T20:37:52.832Z] FineGrainedAutoResetIntegrationTest > 
shouldOnlyReadRecordsWhereEarliestSpecifiedWithNoCommittedOffsetsWithDefaultGlobalAutoOffsetResetEarliest()
 STARTED
[2022-08-15T20:37:53.335Z] 
[2022-08-15T20:37:53.335Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() PASSED
[2022-08-15T20:37:53.335Z] 
[2022-08-15T20:37:53.335Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() STARTED
[2022-08-15T20:37:54.341Z] 
[2022-08-15T20:37:54.341Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() PASSED
[2022-08-15T20:37:55.356Z] 
[2022-08-15T20:37:55.356Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) STARTED
[2022-08-15T20:37:59.810Z] 
[2022-08-15T20:37:59.810Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) PASSED
[2022-08-15T20:37:59.810Z] 
[2022-08-15T20:37:59.810Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() STARTED
[2022-08-15T20:38:00.740Z] 
[2022-08-15T20:38:00.740Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() PASSED
[2022-08-15T20:38:00.740Z] 
[2022-08-15T20:38:00.740Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) STARTED
[2022-08-15T20:38:03.776Z] 
[2022-08-15T20:38:03.776Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) PASSED
[2022-08-15T20:38:03.776Z] 
[2022-08-15T20:38:03.776Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) STARTED
[2022-08-15T20:38:07.337Z] 
[2022-08-15T20:38:07.337Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) PASSED
[2022-08-15T20:38:07.337Z] 
[2022-08-15T20:38:07.337Z] KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) STARTED
[2022-08-15T20:38:10.656Z] 
[2022-08-15T20:38:10.656Z] KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) PASSED
[2022-08-15T20:38:10.656Z] 
[2022-08-15T20:38:10.656Z] KStreamAggregationIntegrationTest > 
shouldCount(TestInfo) STARTED
[2022-08-15T20:38:14.600Z] 
[2022-08-15T20:38:14.600Z] KStreamAggregationIntegrationTest > 
shouldCount(TestInfo) PASSED
[2022-08-15T20:38:14.600Z] 
[2022-08-15T20:38:14.600Z] KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) STARTED
[2022-08-15T20:38:17.767Z] 
[2022-08-15T20:38:17.767Z] KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) PASSED
[2022-08-15T20:38:17.767Z] 
[2022-08-15T20:38:17.767Z] KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) STARTED
[2022-08-15T20:38:21.675Z] 
[2022-08-15T20:38:21.675Z] KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) PASSED
[2022-08-15T20:38:21.675Z] 
[2022-08-15T20:38:21.675Z] KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() STARTED
[2022-08-15T20:38:22.605Z] 
[2022-08-15T20:38:22.605Z] KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() PASSED
[2022-08-15T20:38:22.605Z] 
[2022-08-15T20:38:22.605Z] KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) STARTED
[2022-08-15T20:38:25.594Z] 
[2022-08-15T20:38:25.594Z] KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) PASSED
[2022-08-15T20:38:25.594Z] 
[2022-08-15T20:38:25.594Z] KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() STARTED
[2022-08-15T20:38:27.340Z] 
[2022-08-15T20:38:27.340Z] KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() PASSED
[2022-08-15T20:38:27.340Z] 
[2022-08-15T20:38:27.340Z] KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) STARTED
[2022-08-15T20:38:30.161Z] 
[2022-08-15T20:38:30.161Z] KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) PASSED
[2022-08-15T20:38:34.064Z] 
[2022-08-15T20:38:34.064Z] FineGrainedAutoResetIntegrationTest > 
shouldOnlyReadRecordsWhereEarliestSpecifiedWithNoCommittedOffsetsWithDefaultGlobalAutoOffsetResetEarliest()
 PASSED
[2022-08-15T20:38:34.064Z] 
[2022-08-15T20:38:34.064Z] FineGrainedAutoResetIntegrationTest > 
shouldThrowStreamsExceptionNoResetSpecified() STARTED
[2022-08-15T20:38:34.064Z] 
[2022-08-15T20:38:34.064Z] FineGrainedAutoResetIntegrationTest > 
shouldThrowStreamsExceptionNoResetSpecified() PASSED
[2022-08-15T20:38:36.138Z] 
[2022-08-15T20:38:36.138Z] GlobalKTableIntegrationTest > 

Re: Last sprint to finish line: Replace EasyMock/Powermock with Mockito

2022-08-15 Thread Christo
 Hello!
Following Divij's example I wanted to give an update on the progress being made.
With a combined effort from Yash Mayya, Matthew de Detrich and myself we are 
3/4 through providing pull requests for moving the remaining EasyMock tests to 
Mockito (https://issues.apache.org/jira/browse/KAFKA-14133).
Across those and other pull requests we have received insightful feedback from 
Bruno Cadonna, Chris Egerton, Dalibor Plavcic and Ismael Juma on how things can 
be improved. Thank you very much!
Current pull requests we are trying to get to a resolution from oldest to 
newest are:
* https://github.com/apache/kafka/pull/12409* 
https://github.com/apache/kafka/pull/12418* 
https://github.com/apache/kafka/pull/12459* 
https://github.com/apache/kafka/pull/12465
* https://github.com/apache/kafka/pull/12473
* https://github.com/apache/kafka/pull/12492* 
https://github.com/apache/kafka/pull/12505* 
https://github.com/apache/kafka/pull/12509
Best,Christo

On Thursday, 4 August 2022, 18:27:17 BST, Divij Vaidya 
 wrote:  
 
 Hi everyone

To provide you with quick updates on the progress.

Open PRs (pending review):

  1. Streams - https://github.com/apache/kafka/pull/12449
  2. Streams - https://github.com/apache/kafka/pull/12465
  3. Streams - https://github.com/apache/kafka/pull/12459
  4. Connect - https://github.com/apache/kafka/pull/12484
  5. Connect - https://github.com/apache/kafka/pull/12473  6. Connect - 
https://github.com/apache/kafka/pull/12409
  7. Connect - https://github.com/apache/kafka/pull/12472

Open tasks (pending an owners):

  1. https://issues.apache.org/jira/browse/KAFKA-14132 (need owners for
  separate individual tests)
  2. https://issues.apache.org/jira/browse/KAFKA-14133


General guidance to reduce code review churn when working on these test
conversions:

  1. Please use @RunWith(MockitoJUnitRunner.StrictStubs.class) since it
  provides many benefits.
  2. Please do not perform JUnit 5 migration in the same PR as Mockito
  conversion to keep the changes few and easy to review. We will follow up
  with a blanket JUnit5 conversion (similar to this
  ) when Mockito migration is
  complete.
  3. Please use @Mock annotation to mock (Chris Egerton has added this
  comment on various PRs, hence calling it out)
  4. Note that @RunWith(MockitoJUnitRunner.StrictStubs.class) verifies the
  invocation of declared stubs automatically. If the stubs are not invoked,
  the test throws a UnnecessaryStubbingException. Note that this doesn't seem
  to work for `mockStatic` and I would suggest to explicitly verify stub
  invocations over there.
  5. As a reference, you can use the merged PR from Chris Egerton here:
  https://github.com/apache/kafka/pull/12409  6. Add a verification step in the 
description that the test has
  successfully run with the command `./gradlew connect:runtime:unitTest` (or
  equivalent for the module you are changing the test for). Additionally, you
  can add the code coverage report using `./gradlew streams:reportCoverage
  -PenableTestCoverage=true -Dorg.gradle.parallel=false` to verify that no
  test assertion has been accidentally removed during the change.


*Chris*, would you like to add anything else to the general guidance above
which would help reduce the code review churn?

--
Divij Vaidya



On Mon, Aug 1, 2022 at 6:49 PM Divij Vaidya  wrote:

> Hi folks
>
> We have been trying to replace EasyMock/Powermock with Mockito
>  for quite a while.
> This adds complications for migrating to JDK 17 & Junit5. Significant
> contributions have been made by various folks towards this goal and the
> finish line is almost in sight.
>
> Let's join forces this week and get the task done!
>
> I and Christo(cc'ed) will be spending time converting the straggler tests
> during this week.
>
> At this stage, we are missing a shepherd to help us wrap up this task. *Could
> we please solicit some code review bandwidth from a committer for this week
> to help us reach the finish line?*
>
> Current pending PR requests:
> 1. KAFKA-13036: Replace EasyMock and PowerMock with Mockito for 
> RocksDBMetricsRecorderTest by divijvaidya · Pull Request #12459 · apache/kafka

> 2. KAFKA-12950: Replace EasyMock and PowerMock with Mockito for 
> KafkaStreamsTest by divijvaidya · Pull Request #12465 · apache/kafka
> 3. KAFKA-13414: Replace PowerMock/EasyMock with Mockito in 
> connect.storage.KafkaOffsetBackingStoreTest by clolov · Pull Request #12418 · 
> apache/kafka
>
> Regards,
> Divij Vaidya
>
>
  

[jira] [Resolved] (KAFKA-14154) Persistent URP after controller soft failure

2022-08-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14154.
-
Resolution: Fixed

> Persistent URP after controller soft failure
> 
>
> Key: KAFKA-14154
> URL: https://issues.apache.org/jira/browse/KAFKA-14154
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We ran into a scenario where a partition leader was unable to expand the ISR 
> after a soft controller failover. Here is what happened:
> Initial state: leader=1, isr=[1,2], leader epoch=10. Broker 1 is acting as 
> the current controller.
> 1. Broker 1 loses its session in Zookeeper.  
> 2. Broker 2 becomes the new controller.
> 3. During initialization, controller 2 removes 1 from the ISR. So state is 
> updated: leader=2, isr=[2], leader epoch=11.
> 4. Broker 2 receives `LeaderAndIsr` from the new controller with leader 
> epoch=11.
> 5. Broker 2 immediately tries to add replica 1 back to the ISR since it is 
> still fetching and is caught up. However, the 
> `BrokerToControllerChannelManager` is still pointed at controller 1, so that 
> is where the `AlterPartition` is sent.
> 6. Controller 1 does not yet realize that it is not the controller, so it 
> processes the `AlterPartition` request. It sees the leader epoch of 11, which 
> is higher than what it has in its own context. Following changes to the 
> `AlterPartition` validation in 
> [https://github.com/apache/kafka/pull/12032/files,] the controller returns 
> FENCED_LEADER_EPOCH.
> 7. After receiving the FENCED_LEADER_EPOCH from the old controller, the 
> leader is stuck because it assumes that the error implies that another 
> LeaderAndIsr request should be sent.
> Prior to 
> [https://github.com/apache/kafka/pull/12032/files|https://github.com/apache/kafka/pull/12032/files,],
>  the way we handled this case was a little different. We only verified that 
> the leader epoch in the request was at least as large as the current epoch in 
> the controller context. Anything higher was accepted. The controller would 
> have attempted to write the updated state to Zookeeper. This update would 
> have failed because of the controller epoch check, however, we would have 
> returned NOT_CONTROLLER in this case, which is handled in 
> `AlterPartitionManager`.
> It is tempting to revert the logic, but the risk is in the idempotency check: 
> [https://github.com/apache/kafka/pull/12032/files#diff-3e042c962e80577a4cc9bbcccf0950651c6b312097a86164af50003c00c50d37L2369.]
>  If the AlterPartition request happened to match the state inside the old 
> controller, the controller would consider the update successful and return no 
> error. But if its state was already stale at that point, then that might 
> cause the leader to incorrectly assume that the state had been updated.
> One way to fix this problem without weakening the validation is to rely on 
> the controller epoch in `AlterPartitionManager`. When we discover a new 
> controller, we also discover its epoch, so we can pass that through. The 
> `LeaderAndIsr` request already includes the controller epoch of the 
> controller that sent it and we already propagate this through to 
> `AlterPartition.submit`. Hence all we need to do is verify that the epoch of 
> the current controller target is at least as large as the one discovered 
> through the `LeaderAndIsr`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Consumer Lag-Apache_kafka_JMX metrics

2022-08-15 Thread Kafka Life
Dear Kafka Experts
we need to monitor the consumer lag in kafka clusters 2.5.1 and 2.8.0
versions of kafka in Grafana.

1/ What is the correct path for JMX metrics to evaluate Consumer Lag in
kafka cluster.

2/ I had thought it is FetcherLag  but it looks like it is not as per the
link below.
https://www.instaclustr.com/support/documentation/kafka/monitoring-information/fetcher-lag-metrics/#:~:text=Aggregated%20Fetcher%20Consumer%20Lag%20This%20metric%20aggregates%20lag,in%20sync%20with%20partitions%20that%20it%20is%20replicating
.

Could one of you experts please guide on which JMX i should use for
consumer lag apart from kafka burrow or such intermediate tools

Thanking you in advance


[jira] [Created] (KAFKA-14166) Consistent toString implementations for byte arrays in generated messages

2022-08-15 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14166:
---

 Summary: Consistent toString implementations for byte arrays in 
generated messages
 Key: KAFKA-14166
 URL: https://issues.apache.org/jira/browse/KAFKA-14166
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


In the generated `toString()` implementations for message objects (such as 
protocol RPCs), we are a little inconsistent in how we display types with raw 
bytes. If the type is `Array[Byte]`, then we use Arrays.toString. If the type 
is `ByteBuffer` (i.e. when `zeroCopy` is set), then we use the corresponding 
`ByteBuffer.toString`, which is not often useful. We should try to be 
consistent. By default, it is probably not useful to print the full array 
contents, but we might print summary information (e.g. size, checksum).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] jsancio opened a new pull request, #438: MINOR; Add new signing key for jsan...@apache.org

2022-08-15 Thread GitBox


jsancio opened a new pull request, #438:
URL: https://github.com/apache/kafka-site/pull/438

   This key can also be found on keys.opengpg.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-844: Transactional State Stores

2022-08-15 Thread Alexander Sorokoumov
Hey Guozhang,

Thank you for elaborating! I like your idea to introduce a StreamsConfig
specifically for the default store APIs. You mentioned Materialized, but I
think changes in StreamJoined follow the same logic.

I updated the KIP and the prototype according to your suggestions:
* Add a new StoreType and a StreamsConfig for transactional RocksDB.
* Decide whether Materialized/StreamJoined are transactional based on the
configured StoreType.
* Move RocksDBTransactionalMechanism to
org.apache.kafka.streams.state.internals to remove it from the proposal
scope.
* Add a flag in new Stores methods to configure a state store as
transactional. Transactional state stores use the default transactional
mechanism.
* The changes above allowed to remove all changes to the StoreSupplier
interface.

I am not sure about marking StateStore#transactional() as evolving. As long
as we allow custom user implementations of that interface, we should
probably either keep that flag to distinguish between transactional and
non-transactional implementations or change the contract behind the
interface. What do you think?

Best,
Alex

On Thu, Aug 11, 2022 at 1:00 AM Guozhang Wang  wrote:

> Hello Alex,
>
> Thanks for the replies. Regarding the global config v.s. per-store spec, I
> agree with John's early comments to some degrees, but I think we may well
> distinguish a couple scenarios here. In sum we are discussing about the
> following levels of per-store spec:
>
> * Materialized#transactional()
> * StoreSupplier#transactional()
> * StateStore#transactional()
> * Stores.persistentTransactionalKeyValueStore()...
>
> And my thoughts are the following:
>
> * In the current proposal users could specify transactional as either
> "Materialized.as("storeName").withTransantionsEnabled()" or
> "Materialized.as(Stores.persistentTransactionalKeyValueStore(..))", which
> seems not necessary to me. In general, the more options the library
> provides, the messier for users to learn the new APIs.
>
> * When using built-in stores, users would usually go with
> Materialized.as("storeName"). In such cases I feel it's not very meaningful
> to specify "some of the built-in stores to be transactional, while others
> be non transactional": as long as one of your stores are non-transactional,
> you'd still pay for large restoration cost upon unclean failure. People
> may, indeed, want to specify if different transactional mechanisms to be
> used across stores; but for whether or not the stores should be
> transactional, I feel it's really an "all or none" answer, and our built-in
> form (rocksDB) should support transactionality for all store types.
>
> * When using customized stores, users would usually go with
> Materialized.as(StoreSupplier). And it's possible if users would choose
> some to be transactional while others non-transactional (e.g. if their
> customized store only supports transactional for some store types, but not
> others).
>
> * At a per-store level, the library do not really care, or need to know
> whether that store is transactional or not at runtime, except for
> compatibility reasons today we want to make sure the written checkpoint
> files do not include those non-transactional stores. But this check would
> eventually go away as one day we would always checkpoint files.
>
> ---
>
> With all of that in mind, my gut feeling is that:
>
> * Materialized#transactional(): we would not need this knob, since for
> built-in stores I think just a global config should be sufficient (see
> below), while for customized store users would need to specify that via the
> StoreSupplier anyways and not through this API. Hence I think for either
> case we do not need to expose such a knob on the Materialized level.
>
> * Stores.persistentTransactionalKeyValueStore(): I think we could refactor
> that function without introducing new constructors in the Stores factory,
> but just add new overloads to the existing func name e.g.
>
> ```
> persistentKeyValueStore(final String name, final boolean transactional)
> ```
>
> Plus we can augment the storeImplType as introduced in
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-591%3A+Add+Kafka+Streams+config+to+set+default+state+store
> as a syntax sugar for users, e.g.
>
> ```
> public enum StoreImplType {
> ROCKS_DB,
> TXN_ROCKS_DB,
> IN_MEMORY
>   }
> ```
>
> ```
> stream.groupByKey().count(Materialized.withStoreType(StoreImplType.TXN_
> ROCKS_DB));
> ```
>
> The above provides this global config at the store impl type level.
>
> * RocksDBTransactionalMechanism: I agree with Bruno that we would better
> not expose this knob to users, but rather keep it purely as an impl detail
> abstracted from the "TXN_ROCKS_DB" type. Over time we may, e.g. use
> in-memory stores as the secondary stores with optional spill-to-disks when
> we hit the memory limit, but all of that optimizations in the future should
> be kept away from the users.
>
> * 

[jira] [Resolved] (KAFKA-14051) KRaft remote controllers do not create metrics reporters

2022-08-15 Thread Ron Dagostino (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Dagostino resolved KAFKA-14051.
---
Resolution: Fixed

> KRaft remote controllers do not create metrics reporters
> 
>
> Key: KAFKA-14051
> URL: https://issues.apache.org/jira/browse/KAFKA-14051
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3
>Reporter: Ron Dagostino
>Assignee: Ron Dagostino
>Priority: Major
>
> KRaft remote controllers (KRaft nodes with the configuration value 
> process.roles=controller) do not create the configured metrics reporters 
> defined by the configuration key metric.reporters.  The reason is because 
> KRaft remote controllers are not wired up for dynamic config changes, and the 
> creation of the configured metric reporters actually happens during the 
> wiring up of the broker for dynamic reconfiguration, in the invocation of 
> DynamicBrokerConfig.addReconfigurables(KafkaBroker).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1142

2022-08-15 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-13559) The broker's ProduceResponse may be delayed for 300ms

2022-08-15 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-13559.

Fix Version/s: 3.4.0
 Reviewer: Rajini Sivaram
   Resolution: Fixed

> The broker's  ProduceResponse may be delayed for 300ms
> --
>
> Key: KAFKA-13559
> URL: https://issues.apache.org/jira/browse/KAFKA-13559
> Project: Kafka
>  Issue Type: Task
>  Components: core
>Affects Versions: 2.7.0
>Reporter: frankshi
>Assignee: Badai Aqrandista
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: image-1.png, image-2.png, 
> image-2021-12-21-14-44-56-689.png, image-2021-12-21-14-45-22-716.png, 
> image-3.png, image-5.png, image-6.png, image-7.png, image.png
>
>
> Hi team:
> We have found the value in the source code 
> [here|https://github.com/apache/kafka/blob/2.7/core/src/main/scala/kafka/network/SocketServer.scala#L922]
>  may lead broker’s  ProduceResponse to be delayed for 300ms.
>  * Server-version: 2.13-2.7.0.
>  * Client-version: confluent-kafka-python-1.5.0.
> we have set the client’s  configuration as following:
> {code:java}
> ling.ms = 0
> acks = 1
> delivery.timeout.ms = 100
> request.timeout.ms =  80
> Sasl.mechanism =  “PLAIN”
> Security.protocol  =  “SASL_SSL”
> ..
> {code}
> Because we set ACKs = 1, the client sends ProduceRequests and receives 
> ProduceResponses from brokers. The leader broker doesn't need to wait for the 
> ISR’s writing data to disk successfully.  It can reply to the client by 
> sending ProduceResponses directly. In our situation, the ping value between 
> the client and the kafka brokers is about ~10ms, and most of the time, the 
> responses are received about 10ms after the Produce requests are sent. But 
> sometimes the responses are received about ~300ms later.
> The following shows the log from the client.
> {code:java}
> 2021-11-26 02:31:30,567  Sent partial ProduceRequest (v7, 0+16527/37366 
> bytes, CorrId 2753)
> 2021-11-26 02:31:30,568  Sent partial ProduceRequest (v7, 16527+16384/37366 
> bytes, CorrId 2753)
> 2021-11-26 02:31:30,568  Sent ProduceRequest (v7, 37366 bytes @ 32911, CorrId 
> 2753)
> 2021-11-26 02:31:30,570  Sent ProduceRequest (v7, 4714 bytes @ 0, CorrId 2754)
> 2021-11-26 02:31:30,571  Sent ProduceRequest (v7, 1161 bytes @ 0, CorrId 2755)
> 2021-11-26 02:31:30,572  Sent ProduceRequest (v7, 1240 bytes @ 0, CorrId 2756)
> 2021-11-26 02:31:30,572  Received ProduceResponse (v7, 69 bytes, CorrId 2751, 
> rtt 9.79ms)
> 2021-11-26 02:31:30,572  Received ProduceResponse (v7, 69 bytes, CorrId 2752, 
> rtt 10.34ms)
> 2021-11-26 02:31:30,573  Received ProduceResponse (v7, 69 bytes, CorrId 2753, 
> rtt 10.11ms)
> 2021-11-26 02:31:30,872  Received ProduceResponse (v7, 69 bytes, CorrId 2754, 
> rtt 309.69ms)
> 2021-11-26 02:31:30,883  Sent ProduceRequest (v7, 1818 bytes @ 0, CorrId 2757)
> 2021-11-26 02:31:30,887  Sent ProduceRequest (v7, 1655 bytes @ 0, CorrId 2758)
> 2021-11-26 02:31:30,888  Received ProduceResponse (v7, 69 bytes, CorrId 2755, 
> rtt 318.85ms)
> 2021-11-26 02:31:30,893  Sent partial ProduceRequest (v7, 0+16527/37562 
> bytes, CorrId 2759)
> 2021-11-26 02:31:30,894  Sent partial ProduceRequest (v7, 16527+16384/37562 
> bytes, CorrId 2759)
> 2021-11-26 02:31:30,895  Sent ProduceRequest (v7, 37562 bytes @ 32911, CorrId 
> 2759)
> 2021-11-26 02:31:30,896  Sent ProduceRequest (v7, 4700 bytes @ 0, CorrId 2760)
> 2021-11-26 02:31:30,897  Received ProduceResponse (v7, 69 bytes, CorrId 2756, 
> rtt 317.74ms)
> 2021-11-26 02:31:30,897  Received ProduceResponse (v7, 69 bytes, CorrId 2757, 
> rtt 4.22ms)
> 2021-11-26 02:31:30,899  Received ProduceResponse (v7, 69 bytes, CorrId 2758, 
> rtt 2.61ms){code}
>  
> The requests of CorrId 2753 and 2754 are almost sent at the same time, but 
> the Response of 2754 is delayed for ~300ms. 
> We checked the logs on the broker.
>  
> {code:java}
> [2021-11-26 02:31:30,873] DEBUG Completed 
> request:RequestHeader(apiKey=PRODUCE, apiVersion=7, clientId=rdkafka, 
> correlationId=2754) – {acks=1,timeout=80,numPartitions=1},response: 
> {responses=[\{topic=***,partition_responses=[{partition=32,error_code=0,base_offset=58625,log_append_time=-1,log_start_offset=49773}]}
> ],throttle_time_ms=0} from connection 
> 10.10.44.59:9093-10.10.0.68:31183-66;totalTime:0.852,requestQueueTime:0.128,localTime:0.427,remoteTime:0.09,throttleTime:0,responseQueueTime:0.073,sendTime:0.131,securityProtocol:SASL_SSL,principal:User:***,listener:SASL_SSL,clientInformation:ClientInformation(softwareName=confluent-kafka-python,
>  softwareVersion=1.5.0-rdkafka-1.5.2) (kafka.request.logger)
> {code}
>  
>  
> It seems that the time cost on the server side is very small. What’s the 
> reason for the latency spikes?
> We also did tcpdump  at the server side and