Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #82

2021-04-29 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-29 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12734:


 Summary: LazyTimeIndex & LazyOffsetIndex may cause 
niobufferoverflow when skip activeSegment  sanityCheck
 Key: KAFKA-12734
 URL: https://issues.apache.org/jira/browse/KAFKA-12734
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.8.0, 2.7.0, 2.6.0, 2.5.0, 2.4.0
Reporter: Wenbing Shen
 Attachments: LoadIndex.png, niobufferoverflow.png

This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-690 Add additional configuration to control MirrorMaker 2 internal topics naming convention

2021-04-29 Thread Ryanne Dolan
Thanks Omnia. lgtm!

Ryanne

On Thu, Apr 29, 2021 at 10:50 AM Omnia Ibrahim 
wrote:

> I updated the KIP
>
> On Thu, Apr 29, 2021 at 4:43 PM Omnia Ibrahim 
> wrote:
>
>> Sure, this would make it easier, we can make these functions returns the
>> original behaviour (.checkpoints.internal,
>> "mm2-offset-syncs..internal", heartbeat) without any
>> customisation using `replication.policy.separator` and use the separator in
>> the DefaultReplicationPolicy
>>
>> On Wed, Apr 28, 2021 at 1:31 AM Ryanne Dolan 
>> wrote:
>>
>>> Thanks Omnia, makes sense to me.
>>>
>>> > Customers who have their customised ReplicationPolicy will need to add
>>> the definition of their internal topics naming convention
>>>
>>> I wonder should we include default impls in the interface to avoid that
>>> requirement?
>>>
>>> Ryanne
>>>
>>> On Sun, Apr 25, 2021, 2:20 PM Omnia Ibrahim 
>>> wrote:
>>>
 Hi Mickael and Ryanne,
 I updated the KIP to add these methods to the ReplicationPolicy instead
 of an extra interface to simplify the changes. Please have a look and let
 me know your thoughts.

 Thanks

 On Tue, Mar 30, 2021 at 7:19 PM Omnia Ibrahim 
 wrote:

> *(sorry forgot to Replay to All) *
> Hi Ryanne,
> It's a valid concern, I was trying to separate the concerns of
> internal and replicated policy away from each other and to make the code
> readable as extending ReplicationPolicy to manage both internal and
> replicated topic is a bit odd. Am not against simplifying things out to
> make ReplicationPolicy handling both at the end of the day if an MM2 user
> has a special naming convention for topics it will be affecting both
> replicated and MM2 internal topics.
>
> For simplifying things we can extend `ReplicationPolicy` to the
> following instead of adding an extra class
>
>> *public interface ReplicationPolicy {*
>> String topicSource(String topic);
>> String upstreamTopic(String topic);
>>
>>
>> */** Returns heartbeats topic name.*/String heartbeatsTopic();*
>>
>>
>>
>>
>>
>> */** Returns the offset-syncs topic for given cluster alias. */
>>   String offsetSyncTopic(String targetAlias);/** Returns the name
>> checkpoint topic for given cluster alias. */String
>> checkpointTopic(String sourceAlias); *
>>
>> default String originalTopic(String topic) {
>> String upstream = upstreamTopic(topic);
>> if (upstream == null) {
>> return topic;
>> } else {
>> return originalTopic(upstream);
>> }
>> }
>>
>>
>> */** Internal topics are never replicated. */
>> isInternalTopic(String topic) *//the implementaion will be moved to
>> `DefaultReplicationPolicy` to handle both kafka topics and MM2 internal
>> topics.
>> }
>>
>
> On Fri, Mar 26, 2021 at 3:05 PM Ryanne Dolan 
> wrote:
>
>> Omnia, have we considered just adding methods to ReplicationPolicy?
>> I'm reluctant to add a new class because, as Mickael points out, we'd 
>> need
>> to carry it around in client code.
>>
>> Ryanne
>>
>> On Fri, Feb 19, 2021 at 8:31 AM Mickael Maison <
>> mickael.mai...@gmail.com> wrote:
>>
>>> Hi Omnia,
>>>
>>> Thanks for the clarifications.
>>>
>>> - I'm still a bit uneasy with the overlap between these 2 methods as
>>> currently `ReplicationPolicy.isInternalTopic` already handles MM2
>>> internal topics. Should we make it only handle Kafka internal topics
>>> and `isMM2InternalTopic()` only handle MM2 topics?
>>>
>>> - I'm not sure I understand what this method is used for. There are
>>> no
>>> such methods for the other 2 topics (offset-sync and heartbeat). Also
>>> what happens if there are other MM2 instances using different naming
>>> schemes in the same cluster. Do all instances have to know about the
>>> other naming schemes? What are the expected issues if they don't?
>>>
>>> - RemoteClusterUtils is a client-side utility so it does not have
>>> access to the MM2 configuration. Since this new API can affect the
>>> name of the checkpoint topic, it will need to be used client-side too
>>> so users can find the checkpoint topic name. I had to realized this
>>> was the case.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 15, 2021 at 9:33 PM Omnia Ibrahim <
>>> o.g.h.ibra...@gmail.com> wrote:
>>> >
>>> > Hi Mickael, did you have some time to check my answer?
>>> >
>>> > On Thu, Jan 21, 2021 at 10:10 PM Omnia Ibrahim <
>>> o.g.h.ibra...@gmail.com> wrote:
>>> >>
>>> >> Hi Mickael,
>>> >> Thanks for taking another look into the KIP, regards your
>>> questions
>>> >>
>>> >> - I believe we need both "isMM2InternalTopic" and
>>> 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #81

2021-04-29 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #80

2021-04-29 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-6409) LogRecoveryTest (testHWCheckpointWithFailuresSingleLogSegment) is flaky

2021-04-29 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-6409.
---
Resolution: Duplicate

> LogRecoveryTest (testHWCheckpointWithFailuresSingleLogSegment) is flaky
> ---
>
> Key: KAFKA-6409
> URL: https://issues.apache.org/jira/browse/KAFKA-6409
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Wladimir Schmidt
>Priority: Major
>
> In the LogRecoveryTest the test named 
> testHWCheckpointWithFailuresSingleLogSegment is affected and not stable. 
> Sometimes it passes, sometimes it is not.
> Scala 2.12. JDK9
> java.lang.AssertionError: Timing out after 3 ms since a new leader that 
> is different from 1 was not elected for partition new-topic-0, leader is 
> Some(1)
>   at kafka.utils.TestUtils$.fail(TestUtils.scala:351)
>   at 
> kafka.utils.TestUtils$.$anonfun$waitUntilLeaderIsElectedOrChanged$8(TestUtils.scala:828)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:818)
>   at 
> kafka.server.LogRecoveryTest.testHWCheckpointWithFailuresSingleLogSegment(LogRecoveryTest.scala:152)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy1.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
>   at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #79

2021-04-29 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-12733) KRaft: always bump leader epoch when changing the ISR for a controlled shutdown

2021-04-29 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-12733:


 Summary: KRaft: always bump leader epoch when changing the ISR for 
a controlled shutdown
 Key: KAFKA-12733
 URL: https://issues.apache.org/jira/browse/KAFKA-12733
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-735: Increase default consumer session timeout

2021-04-29 Thread Jason Gustafson
Hey All,

Thanks everyone for the votes.

I had some offline discussion with Magnus Edenhill and he brought up a
potential problem with the change in behavior for group.(min|max).
session.timeout.ms. Unlike the java consumer, the librdkafka consumer
proactively revokes partitions after the session timeout expires locally
with no response from the coordinator. That means that the automatic
adjustment of the session timeout would not be respected by the client.
What is worse, after the consumer expires its local session timeout and
revokes partitions, it rejoins as a new member. However, the rebalance
would not be able to complete until the old member had expired, so the net
effect would be that rebalances get unexpectedly delayed by the adjusted
session timeout.

I think this means we need to give this change some more thought, so I am
removing it from the KIP. Probably we will have to bump the JoinGroup API
so that the coordinator can make the client aware of the adjusted session
timeout (and probably the heartbeat interval as well). I will look into
doing this change in a separate proposal.

Thanks,
Jason

On Thu, Apr 29, 2021 at 1:43 AM Bruno Cadonna  wrote:

> Thank you for the KIP, Jason!
>
> +1 (binding)
>
> Best,
> Bruno
>
> On 29.04.21 10:10, Luke Chen wrote:
> > Hi Jason,
> > +1 (non-binding)
> >
> > Really need this KIP to save poor jenkins flaky tests. :)
> >
> > Luke
> >
> > On Thu, Apr 29, 2021 at 4:01 PM David Jacot  >
> > wrote:
> >
> >> +1 (binding)
> >>
> >> Thanks for the KIP.
> >>
> >> On Thu, Apr 29, 2021 at 2:27 AM Bill Bejeck  wrote:
> >>
> >>> Thanks for the KIP Jason, +1(binding)
> >>>
> >>> -Bill
> >>>
> >>> On Wed, Apr 28, 2021 at 7:47 PM Guozhang Wang 
> >> wrote:
> >>>
>  +1. Thanks Jason!
> 
>  On Wed, Apr 28, 2021 at 12:50 PM Gwen Shapira
> >>  
>  wrote:
> 
> > I love this improvement.
> >
> > +1 (binding)
> >
> > On Wed, Apr 28, 2021 at 10:46 AM Jason Gustafson
> > 
> > wrote:
> >
> >> Hi All,
> >>
> >> I'd like to start a vote on KIP-735:
> >>
> >>
> >
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout
> >> .
> >> +1
> >> from myself obviously
> >>
> >> -Jason
> >>
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
> 
> 
>  --
>  -- Guozhang
> 
> >>>
> >>
> >
>


[jira] [Resolved] (KAFKA-12265) Move the BatchAccumulator in KafkaRaftClient to LeaderState

2021-04-29 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-12265.
-
Resolution: Fixed

> Move the BatchAccumulator in KafkaRaftClient to LeaderState
> ---
>
> Key: KAFKA-12265
> URL: https://issues.apache.org/jira/browse/KAFKA-12265
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: Ryan Dielhenn
>Priority: Major
>
> The {{KafkaRaftClient}} has a field for the {{BatchAccumulator}} that is only 
> used and set when it is the leader. In other cases, leader specific 
> information was stored in {{LeaderState}}. In a recent change {{EpochState}}, 
> which {{LeaderState}} implements, was changed to be a {{Closable}}. 
> {{QuorumState}} makes sure to always close the previous state before 
> transitioning to the next state. We can use this redesign to move the 
> {{BatchAccumulator}} to the {{LeaderState}} and simplify some of the handling 
> in {{KafkaRaftClient}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12732) Possible Kerberos configuration bug in mirrormaker

2021-04-29 Thread Wayland Goodliffe (Jira)
Wayland Goodliffe created KAFKA-12732:
-

 Summary: Possible Kerberos configuration bug in mirrormaker
 Key: KAFKA-12732
 URL: https://issues.apache.org/jira/browse/KAFKA-12732
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.8.0, 1.0.0
Reporter: Wayland Goodliffe


I'm fairly sure that I've found an obscure configuration bug with mirrormaker 
when using kerberos. Luckily there is an easy workaround.

I have tested the old mirrormaker in kafka v1.0.0, as well as mirrormaker 2 in 
v2.8.0 (both legacy mode and otherwise) - so I think there's a good chance this 
affects all versions.

The set up is as follows:
 * consumer and producer both use
 ** security.protocol=SASL_SSL
 ** sasl.mechanism=GSSAPI
 * consumer and producer are in the same realm, but have different 
sasl.kerberos.service.name
 ** e.g. kafka-source and kafka-target
 * you specify the service names via sasl.kerberos.service.name (rather than 
serviceName in the jaas.conf)
 ** either in mm2.properties source.sasl.kerberos.service.name and 
target.sasl.kerberos.service.name
 ** or for legacy mode in consumer.config and producer.config

Then
 * mirrormaker uses the wrong service name in the second adminclient connection 
when trying to validate the response ticket.
 ** note that it reports the correct values when it logs the client config, so 
this is likely due to some hidden issue deeper in kerberos code.
 * For example, in my case with 2.8.0, it would try to validate 
kafka-source/target.host@REALM instead of validating 
kafka-target/target.host@REALM
 * The resultant exception was 
KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVERdue 
to kafka-source not being defined for target.host

Workaround
 * ensure you specify separate jaas.conf for consumer/producer via 
sasl.jaas.config properties (instead of via -Djava.security.auth.login.config)
 * Use the jaas.conf field serviceName to specify the correct values for the 
brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-690 Add additional configuration to control MirrorMaker 2 internal topics naming convention

2021-04-29 Thread Omnia Ibrahim
I updated the KIP

On Thu, Apr 29, 2021 at 4:43 PM Omnia Ibrahim 
wrote:

> Sure, this would make it easier, we can make these functions returns the
> original behaviour (.checkpoints.internal,
> "mm2-offset-syncs..internal", heartbeat) without any
> customisation using `replication.policy.separator` and use the separator in
> the DefaultReplicationPolicy
>
> On Wed, Apr 28, 2021 at 1:31 AM Ryanne Dolan 
> wrote:
>
>> Thanks Omnia, makes sense to me.
>>
>> > Customers who have their customised ReplicationPolicy will need to add
>> the definition of their internal topics naming convention
>>
>> I wonder should we include default impls in the interface to avoid that
>> requirement?
>>
>> Ryanne
>>
>> On Sun, Apr 25, 2021, 2:20 PM Omnia Ibrahim 
>> wrote:
>>
>>> Hi Mickael and Ryanne,
>>> I updated the KIP to add these methods to the ReplicationPolicy instead
>>> of an extra interface to simplify the changes. Please have a look and let
>>> me know your thoughts.
>>>
>>> Thanks
>>>
>>> On Tue, Mar 30, 2021 at 7:19 PM Omnia Ibrahim 
>>> wrote:
>>>
 *(sorry forgot to Replay to All) *
 Hi Ryanne,
 It's a valid concern, I was trying to separate the concerns of internal
 and replicated policy away from each other and to make the code readable as
 extending ReplicationPolicy to manage both internal and replicated topic is
 a bit odd. Am not against simplifying things out to make ReplicationPolicy
 handling both at the end of the day if an MM2 user has a special naming
 convention for topics it will be affecting both replicated and MM2 internal
 topics.

 For simplifying things we can extend `ReplicationPolicy` to the
 following instead of adding an extra class

> *public interface ReplicationPolicy {*
> String topicSource(String topic);
> String upstreamTopic(String topic);
>
>
> */** Returns heartbeats topic name.*/String heartbeatsTopic();*
>
>
>
>
>
> */** Returns the offset-syncs topic for given cluster alias. */
> String offsetSyncTopic(String targetAlias);/** Returns the name
> checkpoint topic for given cluster alias. */String
> checkpointTopic(String sourceAlias); *
>
> default String originalTopic(String topic) {
> String upstream = upstreamTopic(topic);
> if (upstream == null) {
> return topic;
> } else {
> return originalTopic(upstream);
> }
> }
>
>
> */** Internal topics are never replicated. */
> isInternalTopic(String topic) *//the implementaion will be moved to
> `DefaultReplicationPolicy` to handle both kafka topics and MM2 internal
> topics.
> }
>

 On Fri, Mar 26, 2021 at 3:05 PM Ryanne Dolan 
 wrote:

> Omnia, have we considered just adding methods to ReplicationPolicy?
> I'm reluctant to add a new class because, as Mickael points out, we'd need
> to carry it around in client code.
>
> Ryanne
>
> On Fri, Feb 19, 2021 at 8:31 AM Mickael Maison <
> mickael.mai...@gmail.com> wrote:
>
>> Hi Omnia,
>>
>> Thanks for the clarifications.
>>
>> - I'm still a bit uneasy with the overlap between these 2 methods as
>> currently `ReplicationPolicy.isInternalTopic` already handles MM2
>> internal topics. Should we make it only handle Kafka internal topics
>> and `isMM2InternalTopic()` only handle MM2 topics?
>>
>> - I'm not sure I understand what this method is used for. There are no
>> such methods for the other 2 topics (offset-sync and heartbeat). Also
>> what happens if there are other MM2 instances using different naming
>> schemes in the same cluster. Do all instances have to know about the
>> other naming schemes? What are the expected issues if they don't?
>>
>> - RemoteClusterUtils is a client-side utility so it does not have
>> access to the MM2 configuration. Since this new API can affect the
>> name of the checkpoint topic, it will need to be used client-side too
>> so users can find the checkpoint topic name. I had to realized this
>> was the case.
>>
>> Thanks
>>
>> On Mon, Feb 15, 2021 at 9:33 PM Omnia Ibrahim <
>> o.g.h.ibra...@gmail.com> wrote:
>> >
>> > Hi Mickael, did you have some time to check my answer?
>> >
>> > On Thu, Jan 21, 2021 at 10:10 PM Omnia Ibrahim <
>> o.g.h.ibra...@gmail.com> wrote:
>> >>
>> >> Hi Mickael,
>> >> Thanks for taking another look into the KIP, regards your questions
>> >>
>> >> - I believe we need both "isMM2InternalTopic" and
>> `ReplicationPolicy.isInternalTopic`  as 
>> `ReplicationPolicy.isInternalTopic`
>> does check if a topic is Kafka internal topic, while `isMM2InternalTopic`
>> is just focusing if a topic is MM2 internal topic or not(which is
>> heartbeat/checkpoint/offset-sync). 

Re: [DISCUSS] KIP-690 Add additional configuration to control MirrorMaker 2 internal topics naming convention

2021-04-29 Thread Omnia Ibrahim
Sure, this would make it easier, we can make these functions returns the
original behaviour (.checkpoints.internal,
"mm2-offset-syncs..internal", heartbeat) without any
customisation using `replication.policy.separator` and use the separator in
the DefaultReplicationPolicy

On Wed, Apr 28, 2021 at 1:31 AM Ryanne Dolan  wrote:

> Thanks Omnia, makes sense to me.
>
> > Customers who have their customised ReplicationPolicy will need to add
> the definition of their internal topics naming convention
>
> I wonder should we include default impls in the interface to avoid that
> requirement?
>
> Ryanne
>
> On Sun, Apr 25, 2021, 2:20 PM Omnia Ibrahim 
> wrote:
>
>> Hi Mickael and Ryanne,
>> I updated the KIP to add these methods to the ReplicationPolicy instead
>> of an extra interface to simplify the changes. Please have a look and let
>> me know your thoughts.
>>
>> Thanks
>>
>> On Tue, Mar 30, 2021 at 7:19 PM Omnia Ibrahim 
>> wrote:
>>
>>> *(sorry forgot to Replay to All) *
>>> Hi Ryanne,
>>> It's a valid concern, I was trying to separate the concerns of internal
>>> and replicated policy away from each other and to make the code readable as
>>> extending ReplicationPolicy to manage both internal and replicated topic is
>>> a bit odd. Am not against simplifying things out to make ReplicationPolicy
>>> handling both at the end of the day if an MM2 user has a special naming
>>> convention for topics it will be affecting both replicated and MM2 internal
>>> topics.
>>>
>>> For simplifying things we can extend `ReplicationPolicy` to the
>>> following instead of adding an extra class
>>>
 *public interface ReplicationPolicy {*
 String topicSource(String topic);
 String upstreamTopic(String topic);


 */** Returns heartbeats topic name.*/String heartbeatsTopic();*





 */** Returns the offset-syncs topic for given cluster alias. */
 String offsetSyncTopic(String targetAlias);/** Returns the name
 checkpoint topic for given cluster alias. */String
 checkpointTopic(String sourceAlias); *

 default String originalTopic(String topic) {
 String upstream = upstreamTopic(topic);
 if (upstream == null) {
 return topic;
 } else {
 return originalTopic(upstream);
 }
 }


 */** Internal topics are never replicated. */
 isInternalTopic(String topic) *//the implementaion will be moved to
 `DefaultReplicationPolicy` to handle both kafka topics and MM2 internal
 topics.
 }

>>>
>>> On Fri, Mar 26, 2021 at 3:05 PM Ryanne Dolan 
>>> wrote:
>>>
 Omnia, have we considered just adding methods to ReplicationPolicy? I'm
 reluctant to add a new class because, as Mickael points out, we'd need to
 carry it around in client code.

 Ryanne

 On Fri, Feb 19, 2021 at 8:31 AM Mickael Maison <
 mickael.mai...@gmail.com> wrote:

> Hi Omnia,
>
> Thanks for the clarifications.
>
> - I'm still a bit uneasy with the overlap between these 2 methods as
> currently `ReplicationPolicy.isInternalTopic` already handles MM2
> internal topics. Should we make it only handle Kafka internal topics
> and `isMM2InternalTopic()` only handle MM2 topics?
>
> - I'm not sure I understand what this method is used for. There are no
> such methods for the other 2 topics (offset-sync and heartbeat). Also
> what happens if there are other MM2 instances using different naming
> schemes in the same cluster. Do all instances have to know about the
> other naming schemes? What are the expected issues if they don't?
>
> - RemoteClusterUtils is a client-side utility so it does not have
> access to the MM2 configuration. Since this new API can affect the
> name of the checkpoint topic, it will need to be used client-side too
> so users can find the checkpoint topic name. I had to realized this
> was the case.
>
> Thanks
>
> On Mon, Feb 15, 2021 at 9:33 PM Omnia Ibrahim 
> wrote:
> >
> > Hi Mickael, did you have some time to check my answer?
> >
> > On Thu, Jan 21, 2021 at 10:10 PM Omnia Ibrahim <
> o.g.h.ibra...@gmail.com> wrote:
> >>
> >> Hi Mickael,
> >> Thanks for taking another look into the KIP, regards your questions
> >>
> >> - I believe we need both "isMM2InternalTopic" and
> `ReplicationPolicy.isInternalTopic`  as 
> `ReplicationPolicy.isInternalTopic`
> does check if a topic is Kafka internal topic, while `isMM2InternalTopic`
> is just focusing if a topic is MM2 internal topic or not(which is
> heartbeat/checkpoint/offset-sync). The fact that the default for MM2
> internal topics matches "ReplicationPolicy.isInternalTopic" will not be an
> accurate assumption anymore once we implement this KIP.
> >>
> >> - "isCheckpointTopic" will detect all 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #78

2021-04-29 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-12730) A single Kerberos login failure fails all future connections from Java 9 onwards

2021-04-29 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-12730.

  Reviewer: Manikumar
Resolution: Fixed

> A single Kerberos login failure fails all future connections from Java 9 
> onwards
> 
>
> Key: KAFKA-12730
> URL: https://issues.apache.org/jira/browse/KAFKA-12730
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 3.0.0
>
>
> The refresh thread for Kerberos performs re-login by logging out and then 
> logging in again. If login fails, we retry after a backoff. Every iteration 
> of the loop performs loginContext.logout() and loginContext.login(). If login 
> fails, we end up with two consecutive logouts. This used to work, but from 
> Java 9 onwards, this results in a NullPointerException due to 
> https://bugs.openjdk.java.net/browse/JDK-8173069. We should check if logout 
> is required before attempting logout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12731) High number of rebalances lead to GC Overhead limit exceeded JVM crash

2021-04-29 Thread Filip (Jira)
Filip created KAFKA-12731:
-

 Summary: High number of rebalances lead to GC Overhead limit 
exceeded JVM crash
 Key: KAFKA-12731
 URL: https://issues.apache.org/jira/browse/KAFKA-12731
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 2.5.1
Reporter: Filip
 Attachments: image-2021-04-29-15-39-12-608.png, 
image-2021-04-29-15-39-52-541.png, rebalancing.log

We have an application that uses Spring Cloud Stream which delegates to 
{{kafka-clients:2.5.1}}. 

The application is started as follows:
{code:java}
java -jar -Xmx3072m -XX:+CrashOnOutOfMemoryError reporting-service.jar{code}
Normally, the application starts and joins its consumer group which has a 
variable number of members. A rebalancing occurs on startup and partitions (of 
which there are *9* per topic) get assigned across all consumers in the group.

After starting two of these members within a short time of each other, we saw 
quite a large amount of rebalances on both clients which seems to have led to 
an extremely high CPU usage on one of the clients. Ultimately, this led to a 
{{GC Overhead limit exceeded}} JVM Crash as GC was unable to keep up and do 
meaningful work.

I've included logs in the {{rebalancing.log}} file attached to this issue.

In our monitoring, saw that the CPU usage for the container experiencing this 
issue shot up to over 300% (crashes occurred @11:00 & @12:07):

!image-2021-04-29-15-39-52-541.png!

We have plenty of JVM metrics but I am unsure which would be helpful in 
debugging this behaviour.

Since there are a lot of components at play here:
 * org.apache.kafka.clients.consumer.internals.Fetcher
 * org.apache.kafka.clients.consumer.internals.SubscriptionState
 * org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
 * org.apache.kafka.clients.consumer.internals.AbstractCoordinator
 * org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$1

it seems like some kind of memory leak is occurring which prohibits the GC from 
reclaiming any meaningful memory until the connection can be stably established 
and the consumer has fully joined the group. 

Any pointers as to where we could look into deeper would be much appreciated.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-735: Increase default consumer session timeout

2021-04-29 Thread Bruno Cadonna

Thank you for the KIP, Jason!

+1 (binding)

Best,
Bruno

On 29.04.21 10:10, Luke Chen wrote:

Hi Jason,
+1 (non-binding)

Really need this KIP to save poor jenkins flaky tests. :)

Luke

On Thu, Apr 29, 2021 at 4:01 PM David Jacot 
wrote:


+1 (binding)

Thanks for the KIP.

On Thu, Apr 29, 2021 at 2:27 AM Bill Bejeck  wrote:


Thanks for the KIP Jason, +1(binding)

-Bill

On Wed, Apr 28, 2021 at 7:47 PM Guozhang Wang 

wrote:



+1. Thanks Jason!

On Wed, Apr 28, 2021 at 12:50 PM Gwen Shapira



wrote:


I love this improvement.

+1 (binding)

On Wed, Apr 28, 2021 at 10:46 AM Jason Gustafson

wrote:


Hi All,

I'd like to start a vote on KIP-735:









https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

.
+1
from myself obviously

-Jason




--
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog




--
-- Guozhang









[jira] [Created] (KAFKA-12730) A single Kerberos login failure fails all future connections from Java 9 onwards

2021-04-29 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-12730:
--

 Summary: A single Kerberos login failure fails all future 
connections from Java 9 onwards
 Key: KAFKA-12730
 URL: https://issues.apache.org/jira/browse/KAFKA-12730
 Project: Kafka
  Issue Type: Bug
  Components: security
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 3.0.0


The refresh thread for Kerberos performs re-login by logging out and then 
logging in again. If login fails, we retry after a backoff. Every iteration of 
the loop performs loginContext.logout() and loginContext.login(). If login 
fails, we end up with two consecutive logouts. This used to work, but from Java 
9 onwards, this results in a NullPointerException due to 
https://bugs.openjdk.java.net/browse/JDK-8173069. We should check if logout is 
required before attempting logout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-735: Increase default consumer session timeout

2021-04-29 Thread Luke Chen
Hi Jason,
+1 (non-binding)

Really need this KIP to save poor jenkins flaky tests. :)

Luke

On Thu, Apr 29, 2021 at 4:01 PM David Jacot 
wrote:

> +1 (binding)
>
> Thanks for the KIP.
>
> On Thu, Apr 29, 2021 at 2:27 AM Bill Bejeck  wrote:
>
> > Thanks for the KIP Jason, +1(binding)
> >
> > -Bill
> >
> > On Wed, Apr 28, 2021 at 7:47 PM Guozhang Wang 
> wrote:
> >
> > > +1. Thanks Jason!
> > >
> > > On Wed, Apr 28, 2021 at 12:50 PM Gwen Shapira
>  > >
> > > wrote:
> > >
> > > > I love this improvement.
> > > >
> > > > +1 (binding)
> > > >
> > > > On Wed, Apr 28, 2021 at 10:46 AM Jason Gustafson
> > > > 
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to start a vote on KIP-735:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout
> > > > > .
> > > > > +1
> > > > > from myself obviously
> > > > >
> > > > > -Jason
> > > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Engineering Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: [VOTE] KIP-735: Increase default consumer session timeout

2021-04-29 Thread David Jacot
+1 (binding)

Thanks for the KIP.

On Thu, Apr 29, 2021 at 2:27 AM Bill Bejeck  wrote:

> Thanks for the KIP Jason, +1(binding)
>
> -Bill
>
> On Wed, Apr 28, 2021 at 7:47 PM Guozhang Wang  wrote:
>
> > +1. Thanks Jason!
> >
> > On Wed, Apr 28, 2021 at 12:50 PM Gwen Shapira  >
> > wrote:
> >
> > > I love this improvement.
> > >
> > > +1 (binding)
> > >
> > > On Wed, Apr 28, 2021 at 10:46 AM Jason Gustafson
> > > 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to start a vote on KIP-735:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout
> > > > .
> > > > +1
> > > > from myself obviously
> > > >
> > > > -Jason
> > > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Engineering Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
> >
> > --
> > -- Guozhang
> >
>