[jira] [Created] (KAFKA-13421) Fix ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2021-10-29 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-13421:


 Summary: Fix 
ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
 Key: KAFKA-13421
 URL: https://issues.apache.org/jira/browse/KAFKA-13421
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
 is failing with this error:

{code}
ConsumerBounceTest > testSubscribeWhenTopicUnavailable() PASSED
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup()
 failed, lo
g available in 
/home/cmccabe/src/kafka9/core/build/reports/testOutput/kafka.api.ConsumerBounceTest.testRollingBr
okerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup().test.stdout


ConsumerBounceTest > 
testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup() FAILED 
   
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:126)   
 
at 
kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904)
 
at 
kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842)
at 
kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1809)
at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96)
at kafka.server.KafkaServer.startup(KafkaServer.scala:320)
at 
kafka.integration.KafkaServerTestHarness.$anonfun$restartDeadBrokers$2(KafkaServerTestHarness.scala:2
12)
at 
scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
at scala.collection.AbstractIterable.foreach(Iterable.scala:919)
at scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:889)
at 
kafka.integration.KafkaServerTestHarness.restartDeadBrokers(KafkaServerTestHarness.scala:203)
at 
kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsB
igGroup$1(ConsumerBounceTest.scala:327)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
at 
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(C
onsumerBounceTest.scala:319) 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka-site] stanislavkozlovski opened a new pull request #379: MINOR: Update coding guide to mention need to maintain public client API compatibility

2021-10-29 Thread GitBox


stanislavkozlovski opened a new pull request #379:
URL: https://github.com/apache/kafka-site/pull/379


   Previously, the coding guide would imply that it's acceptable to break API 
compatibility due to the project's early stage in maturity. For years now, the 
project has been mature and adopted enough to warrant not breaking API 
compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Wiki Permissions Request

2021-10-29 Thread Matthias J. Sax

Added.

On 10/29/21 3:15 AM, Ahmed Oubalas wrote:

Hello all,

Wiki ID:  ahmed.oulabas

I need to create a new KIP page.
Thank you for your help!

Ahmed



Re: [DISCUSS] KIP-779: Allow Source Tasks to Handle Producer Exceptions

2021-10-29 Thread Knowles Atchison Jr
Arjun,

Thank you for your feedback, I have updated the KIP.

This solution is more elegant than my original proposal; however, after
working on the implementation, we have now pushed the configuration from
the connector/task itself back to the connect worker. All tasks running on
the worker would share this ignore producer exception configuration flag.
This works for my use cases where I cannot envision setting this for only
one type of connector we have, but this does take the choice out of the
hands of the connector developer. I suppose that is for the best, in a
vacuum only the worker should have a say in how it handles message
production.

Additional thoughts and feedback are welcome.

Knowles

On Thu, Oct 28, 2021 at 10:54 AM Arjun Satish 
wrote:

> Yes, that makes sense. And it fits in very nicely with the current error
> handling framework.
>
> On Thu, Oct 28, 2021 at 10:39 AM Knowles Atchison Jr <
> katchiso...@gmail.com>
> wrote:
>
> > That would work. I originally thought that it would be confusing to
> > overload that function when a Record that wasn't actually written, but
> > looking at SourceTask more closely, in commitRecord(SourceRecord,
> > RecordMetadata), the RecordMetadata is set to null in the event of a
> > filtered transformation so the framework is already doing this in a
> certain
> > regard.
> >
> > Knowles
> >
> > On Thu, Oct 28, 2021 at 10:29 AM Arjun Satish 
> > wrote:
> >
> > > To ack the message back to the source system, we already have a
> > > commitRecord method. Once the bad record is handled by skip/dlq, we
> could
> > > just call commitRecord() on it?
> > >
> > > On Thu, Oct 28, 2021 at 9:35 AM Knowles Atchison Jr <
> > katchiso...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thank you for your reply!
> > > >
> > > > It is a clarity error regarding the javadoc. I am not operationally
> > > > familiar with all of the exceptions Kafka considers non-retriable,
> so I
> > > > pulled the list from Callback.java:
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/1afe2a5190e9c98e38c84dc793f4303ea51bc19b/clients/src/main/java/org/apache/kafka/clients/producer/Callback.java#L35
> > > > to be an illustrative example of the types of exceptions that would
> > kill
> > > > the connector outright. Any exception thrown during the producer
> write
> > > will
> > > > be passed to this handler. I will update the KIP/PR to be more clear
> on
> > > > this matter.
> > > >
> > > > You raise an excellent point, how should the framework protect the
> > > > connector or developer from themselves? If a connector enables
> > > exactly-once
> > > > semantics, it would make sense to me to have the task killed. The
> > > framework
> > > > should enforce this type of misconfiguration that would break the
> > > internal
> > > > semantics of KIP-618. WorkerSourceTask could check the configuration
> > > before
> > > > handing off the records and exception to this function, fail initial
> > > > configuration check, or something of that nature.
> > > >
> > > > Hi Arjun,
> > > >
> > > > Thank you for your response!
> > > >
> > > > My specific use case is our custom JMS connector. We ack back to the
> > jms
> > > > broker once Kafka commits the record. We thread out our JMS consumer
> > such
> > > > that I would need access to the SourceRecord to confirm we are going
> to
> > > > throw away the message.
> > > >
> > > > Skipping such records, writing some log messages, and/or writing some
> > > error
> > > > context to a DLQ would cover most if not all of the use cases I
> > envision.
> > > >
> > > > "discard.message.on.producer.exception": "true"
> > > >
> > > > or some equivalent would get my personal use case 99% of the way
> > there. I
> > > > would still need some kind of callback from inside the connector with
> > the
> > > > Source Record to successfully ack back to my source system.
> > > >
> > > > I have updated the KIP regarding the callback being executed in a
> > > different
> > > > thread than poll().
> > > >
> > > > Knowles
> > > >
> > > > On Thu, Oct 28, 2021 at 2:02 AM Arjun Satish  >
> > > > wrote:
> > > >
> > > > > Hi Knowles,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > Could you please call out some use-cases on what the source
> > connectors
> > > > > would do when they hit such exceptions? I'm wondering if we would
> > need
> > > to
> > > > > do anything other than skipping such records, writing some log
> > > messages,
> > > > > and/or writing some error context to a DLQ?
> > > > >
> > > > > One of the goals for Connect was to abstract away intricacies of
> > Kafka
> > > > > topics, clients etc, so that connectors could focus on the external
> > > > systems
> > > > > themselves. Ideally, we'd want to see if we could call out the most
> > > > common
> > > > > cases and handle them in the framework itself, instead of
> delegating
> > > them
> > > > > back to the connector. This way, instead of the new API, we'd
> > probably
> > > 

[jira] [Created] (KAFKA-13420) consumer protocol should include "generation" field for assignor to distinguish between new/old OwnedPartitions

2021-10-29 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13420:
-

 Summary: consumer protocol should include "generation" field for 
assignor to distinguish between new/old OwnedPartitions
 Key: KAFKA-13420
 URL: https://issues.apache.org/jira/browse/KAFKA-13420
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Luke Chen
Assignee: Luke Chen


In 
[KIP-429|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol],
 we add a new field: `OwnedPartitions` into consumer protocol, for cooperative 
protocol do partition revoking things. But recently, we found the 
`ownedPartitions` info might be out-of-date due to some reasons (ex: unstable 
network), and the out-of-date  `ownedPartitions` causes unexpected rebalance 
stuck issue (ex: KAFKA-12984, KAFKA-13406). To fix it, we should consider to 
add the "generation" field in the consumer protocol, so that we can rely on the 
"generation" info to identify if the `ownedPartition` is up-to-date or 
out-of-date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-10-29 Thread Dongjin Lee
Hi David,

Please update the components of the following KIPs:

- KIP-390: Support Compression Level - Core, Clients
- KIP-653: Upgrade log4j to log4j2 - Clients, Connect, Core, Streams (that
is, Log4j-appender, Tools, and Trogdor are excluded.)

Best,
Dongjin

On Fri, Oct 29, 2021 at 2:24 AM Chris Egerton 
wrote:

> Hi David,
>
> I've moved KIP-618 to the "postponed" section as it will not be merged in
> time due to lack of review.
>
> Cheers,
>
> Chris
>
> On Thu, Oct 28, 2021 at 1:07 PM David Jacot 
> wrote:
>
> > Hi team,
> >
> > Just a quick reminder that the Feature freeze is tomorrow (October 29th).
> > In order to be fair with everyone in all the time zones, I plan to cut
> the
> > release branch early next week.
> >
> > Cheers,
> > David
> >
> > On Mon, Oct 18, 2021 at 9:56 AM David Jacot  wrote:
> >
> > > Hi team,
> > >
> > > KIP freeze for the next major release of Apache Kafka was reached
> > > last week.
> > >
> > > I have updated the release plan with all the adopted KIPs which are
> > > considered
> > > for AK 3.1.0. Please, verify the plan and let me know if any KIP should
> > be
> > > added
> > > to or removed from the release plan.
> > >
> > > For the KIPs which are still in progress, please work closely with your
> > > reviewers
> > > to make sure that they land on time for the feature freeze.
> > >
> > > The next milestone for the AK 3.1.0 release is the feature freeze on
> > > October 29th,
> > > 2021.
> > >
> > > Cheers,
> > > David
> > >
> > > On Fri, Oct 15, 2021 at 9:05 AM David Jacot 
> wrote:
> > >
> > >> Hi folks,
> > >>
> > >> Just a quick reminder that the KIP freeze is today. Don't forget to
> > close
> > >> your ongoing votes.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Thu, Oct 14, 2021 at 5:31 PM David Jacot 
> > wrote:
> > >>
> > >>> Hi Luke,
> > >>>
> > >>> Added it to the plan.
> > >>>
> > >>> Thanks,
> > >>> David
> > >>>
> > >>> On Thu, Oct 14, 2021 at 10:09 AM Luke Chen 
> wrote:
> > >>>
> >  Hi David,
> >  KIP-766 is merged into trunk. Please help add it into the release
> > plan.
> > 
> >  Thank you.
> >  Luke
> > 
> >  On Mon, Oct 11, 2021 at 10:50 PM David Jacot
> >  
> >  wrote:
> > 
> >  > Hi Michael,
> >  >
> >  > Sure. I have updated the release plan to include it. Thanks for
> the
> >  > heads up.
> >  >
> >  > Best,
> >  > David
> >  >
> >  > On Mon, Oct 11, 2021 at 4:39 PM Mickael Maison <
> >  mickael.mai...@gmail.com>
> >  > wrote:
> >  >
> >  > > Hi David,
> >  > >
> >  > > You can add KIP-690 to the release plan. The vote passed months
> > ago
> >  > > and I merged the PR today.
> >  > >
> >  > > Thanks
> >  > >
> >  > > On Fri, Oct 8, 2021 at 8:32 AM David Jacot
> >  
> >  > > wrote:
> >  > > >
> >  > > > Hi folks,
> >  > > >
> >  > > > Just a quick reminder that KIP Freeze is next Friday, October
> >  15th.
> >  > > >
> >  > > > Cheers,
> >  > > > David
> >  > > >
> >  > > > On Wed, Sep 29, 2021 at 3:52 PM Chris Egerton
> >  > > 
> >  > > > wrote:
> >  > > >
> >  > > > > Thanks David!
> >  > > > >
> >  > > > > On Wed, Sep 29, 2021 at 2:56 AM David Jacot
> >  > > 
> >  > > > > wrote:
> >  > > > >
> >  > > > > > Hi Chris,
> >  > > > > >
> >  > > > > > Sure thing. I have added KIP-618 to the release plan.
> Thanks
> >  for
> >  > the
> >  > > > > heads
> >  > > > > > up.
> >  > > > > >
> >  > > > > > Best,
> >  > > > > > David
> >  > > > > >
> >  > > > > > On Wed, Sep 29, 2021 at 8:53 AM David Jacot <
> >  dja...@confluent.io>
> >  > > wrote:
> >  > > > > >
> >  > > > > > > Hi Kirk,
> >  > > > > > >
> >  > > > > > > Yes, it is definitely possible if you can get the KIP
> > voted
> >  > before
> >  > > the
> >  > > > > > KIP
> >  > > > > > > freeze
> >  > > > > > > and the code committed before the feature freeze.
> Please,
> >  let me
> >  > > know
> >  > > > > > when
> >  > > > > > > the
> >  > > > > > > KIP is voted and I will add it to the release plan.
> >  > > > > > >
> >  > > > > > > Thanks,
> >  > > > > > > David
> >  > > > > > >
> >  > > > > > > On Tue, Sep 28, 2021 at 7:05 PM Chris Egerton
> >  > > > > > 
> >  > > > > > > wrote:
> >  > > > > > >
> >  > > > > > >> Hi David,
> >  > > > > > >>
> >  > > > > > >> Wondering if we can get KIP-618 included? The vote
> passed
> >  months
> >  > > ago
> >  > > > > > and a
> >  > > > > > >> PR has been available since mid-June.
> >  > > > > > >>
> >  > > > > > >> Cheers,
> >  > > > > > >>
> >  > > > > > >> Chris
> >  > > > > > >>
> >  > > > > > >> On Tue, Sep 28, 2021 at 12:53 PM Kirk True <
> >  > k...@mustardgrain.com
> >  > > >
> >  > > > > > wrote:
> >  > > > > > >>
> >  > > > > > 

Wiki Permissions Request

2021-10-29 Thread Ahmed Oubalas
Hello all,

Wiki ID:  ahmed.oulabas

I need to create a new KIP page.
Thank you for your help!

Ahmed


[jira] [Created] (KAFKA-13419) sync group failed with retriable error might cause out-of-date ownedPartition in Cooperative protocol

2021-10-29 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13419:
-

 Summary: sync group failed with retriable error might cause 
out-of-date ownedPartition in Cooperative protocol
 Key: KAFKA-13419
 URL: https://issues.apache.org/jira/browse/KAFKA-13419
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 3.0.0
Reporter: Luke Chen
Assignee: Luke Chen


In KAFKA-13406, we found there's user got stuck when in rebalancing with 
cooperative sticky assignor. The reason is the "ownedPartition" is out-of-date, 
and it failed the cooperative assignment validation.

Investigate deeper, I found the root cause is we didn't reset generation and 
state after sync group fail. In KAFKA-12983, we fixed the issue that the 
onJoinPrepare is not called in resetStateAndRejoin method. And it causes the 
ownedPartition not get cleared. But there's another case that the 
ownedPartition will be out-of-date. Here's the example:
 # consumer A joined and synced group successfully with generation 1
 # New rebalance started with generation 2, consumer A joined successfully, but 
somehow, consumer A doesn't send out sync group immediately
 # other consumer completed sync group successfully in generation 2, except 
consumer A.
 # After consumer A send out sync group, the new rebalance start, with 
generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group 
response
 # When receiving REBALANCE_IN_PROGRESS, we re-join the group, with generation 
3, with the assignment (ownedPartition) in generation 1.
 # So, now, we have out-of-date ownedPartition sent, with unexpected results 
happened

 

We might want to do resetStateAndRejoin when retriable errors happend in *sync 
group*. Because when we got sync group error, it means, join group passed, and 
other consumers (and the leader) might already completed this round of 
rebalance. The assignment distribution this consumer have is already 
out-of-date.

 

The errors should resetStateAndRejoin in sync group are:
{code:java}
if (exception instanceof UnknownMemberIdException ||
exception instanceof IllegalGenerationException ||
exception instanceof RebalanceInProgressException ||
exception instanceof MemberIdRequiredException)
continue;
else if (!future.isRetriable())
throw exception;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)