Re: Kafka capabilities

2024-04-04 Thread Kafka Life
Dear Kafka experts , Could anyone having this data share the details please

On Wed, Apr 3, 2024 at 3:42 PM Kafka Life  wrote:

> Hi Kafka users
>
> Does any one have a document or ppt that showcases the capabilities of
> Kafka along with any cost management capability?
> i have a customer who is still using IBM MQM and rabbit MQ. I want the
> client to consider kafka for messaging and data streaming. I wanted to seek
> your expert help if you have any document or ppt i can propose it as an
> example. could you pls help.
>
> thanks and regards
> KrisG
>


Kafka capabilities

2024-04-03 Thread Kafka Life
Hi Kafka users

Does any one have a document or ppt that showcases the capabilities of
Kafka along with any cost management capability?
i have a customer who is still using IBM MQM and rabbit MQ. I want the
client to consider kafka for messaging and data streaming. I wanted to seek
your expert help if you have any document or ppt i can propose it as an
example. could you pls help.

thanks and regards
KrisG


RE: Re: Kafka trunk test & build stability

2024-02-03 Thread kafka
I wonder if we've considered adding a Gradle task timeout [0] on unitTest and
integrationTest tasks. The timeout applies separately for each subproject and
marks the currently running test as SKIPPED on timeout. This helped me find
a test which stalls builds [1].

[0] 
https://docs.gradle.org/8.5/userguide/more_about_tasks.html#sec:task_timeouts
[1] https://issues.apache.org/jira/browse/KAFKA-16219

Best,
Gaurav


On 2024/01/25 21:49:00 Justine Olshan wrote:
> It looks like there was some server maintenance that shut down Jenkins.
> Upon coming back up, the builds were expired but unable to stop.
> 
> They all had similar logs:
> 
> Cancelling nested steps due to timeoutCancelling nested steps due to
> timeoutBody did not finish within grace period; terminating with
> extreme prejudiceBody did not finish within grace period; terminating
> with extreme prejudicePausing (Preparing for shutdown)
> Resuming build at Thu Jan 25 06:56:23 UTC 2024 after Jenkins restart
> Resuming build at Thu Jan 25 09:45:03 UTC 2024 after Jenkins restart
> Pausing (Preparing for shutdown)
> Resuming build at Thu Jan 25 10:37:41 UTC 2024 after Jenkins
> restartTimeout expired 7 hr 39 min agoTimeout expired 7 hr 39 min
> agoCancelling nested steps due to timeoutCancelling nested steps due
> to timeout*02:37:41*  Waiting for reconnection of builds41 before
> proceeding with build*02:37:41*  Waiting for reconnection of builds32
> before proceeding with buildStill pausedBody did not finish within
> grace period; terminating with extreme prejudiceBody did not finish
> within grace period; terminating with extreme prejudice
> 
> 
> I forcibly killed the builds running over one day to free resources. I
> believe the rest are running as expected now.
> 
> Justine
> 
> On Thu, Jan 25, 2024 at 10:22 AM Justine Olshan 
> wrote:
> 
> > Hey folks -- I noticed some builds have been running for a day or more. I
> > thought we limited builds to 8 hours. Any ideas why this is happening?
> >
> >
> > https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/activity/
> > I tried to abort the build for PR-15257, and it also still seems to be
> > running.
> >
> > Justine
> >
> > On Sun, Jan 14, 2024 at 6:25 AM Qichao Chu 
> > wrote:
> >
> >> Hi Divij and all,
> >>
> >> Regarding the speeding up of the build & de-flaking tests, LinkedIn has
> >> done some great work which we probably can borrow ideas from.
> >> In the LinkedIn/Kafka repo, we can see one of their most recent PRs
> >> <https://github.com/linkedin/kafka/pull/500/checks> only took < 9
> >> min(unit
> >> test) + < 12 min (integration-test) + < 9 (code check) = < 30 min to
> >> finish
> >> all the checks:
> >>
> >>1. Similar to what David(mumrah) has mentioned/experimented with, the
> >>LinkedIn team used GitHub Actions, which displayed the results in a
> >> cleaner
> >>    way directly from GitHub.
> >>2. Each top-level package is checked separately to increase the
> >>concurrency. To further boost the speed for integration tests, the
> >> tests
> >>inside one package are divided into sub-groups(A-Z) based on their
> >>names(see this job
> >><https://github.com/linkedin/kafka/actions/runs/7303478151/> for
> >>details).
> >>3. Once the tests are running at a smaller granularity with a decent
> >>runner, heavy integration tests are less likely to be flaky, and flaky
> >>tests are easier to catch.
> >>
> >>
> >> --
> >> Qichao
> >>
> >>
> >> On Wed, Jan 10, 2024 at 2:57 PM Divij Vaidya 
> >> wrote:
> >>
> >> > Hey folks
> >> >
> >> > We seem to have a handle on the OOM issues with the multiple fixes
> >> > community members made. In
> >> > https://issues.apache.org/jira/browse/KAFKA-16052,
> >> > you can see the "before" profile in the description and the "after"
> >> profile
> >> > in the latest comment to see the difference. To prevent future
> >> recurrence,
> >> > we have an ongoing solution at
> >> https://github.com/apache/kafka/pull/15101
> >> > and after that we will start another once to get rid of mockito mocks at
> >> > the end of every test suite using a similar extension. Note that this
> >> > doesn't solve the flaky test problems in the trunk but it removes the
> >> > aspect of build failures due to OOM (one of the many problems).
&

Re: [VOTE] 3.7.0 RC2

2024-01-26 Thread kafka
Apologies, I duplicated KAFKA-16157 twice in my previous message. I intended to 
mention KAFKA-16195
with the PR at https://github.com/apache/kafka/pull/15262 as the second JIRA.

Thanks,
Gaurav

> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> 
> Hi Stan,
> 
> I wanted to share some updates about the bugs you shared earlier.
> 
> - KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
> the fix works as intended.
> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed 
> fix. I've
> therefore raised https://github.com/apache/kafka/pull/15270 following a 
> discussion with Luke in JIRA.
> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm awaiting
> feedback/reviews at https://github.com/apache/kafka/pull/15136
> 
> In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
> attention to:
> 
> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. 
> I've raised
> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
> - KAFKA-16157: I raised this yesterday and have addressed feedback from Luke. 
> This should
> hopefully get merged soon.
> 
> Regards,
> Gaurav
> 
> 
>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
>> 
>> Hi Stanislav,
>> 
>> Thanks for bringing these JIRAs/PRs up.
>> 
>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
>> hope to have some feedback
>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
>> away. I'll try to build on his work in the meantime.
>> 
>> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
>> There's a PR open
>> by me for promoting an abandoned future replica with approvals from Omnia 
>> and Proven,
>> so I'd appreciate a committer reviewing it.
>> 
>> Regards,
>> Gaurav
>> 
>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>>  wrote:
>>> 
>>> Hey all, I figured I'd give an update about what known blockers we have
>>> right now:
>>> 
>>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>>> https://github.com/apache/kafka/pull/15193; This need not block RC
>>> creation, but we need the docs updated so that people can test properly
>>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
>>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
>>> this is blocking JBOD for 3.7
>>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
>>> - although I understand Proveen is out of office
>>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>>> hearing mixed opinions on whether this is a blocker (
>>> https://github.com/apache/kafka/pull/15136)
>>> 
>>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>>> all be merged this week - I am on the edge of voting to revert JBOD from
>>> this release, or mark it early access.
>>> 
>>> By all accounts, it seems that if we keep with JBOD the release will have
>>> to spill into February, which is a month extra from the time-based release
>>> plan we had of start of January.
>>> 
>>> Can I ask others for an opinion?
>>> 
>>> Best,
>>> Stan
>>> 
>>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I think I've found another blocker issue: KAFKA-16162
>>>> <https://issues.apache.org/jira/browse/KAFKA-16162> .
>>>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>>>> will be unavailable.
>>>> I've put my findings in the JIRA.
>>>> 
>>>> Thanks.
>>>> Luke
>>>> 
>>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>>>> 
>>>>> Stan, thanks for driving this all forward! Excellent job.
>>>>> 
>>>>> About
>>>>> 
>>>>>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
>>>>>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>>>>> 
>>>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>>>>> now in trunk and 3.7 (and actually also in 3.6...)
>>>>> 
>>>>> For `Stream

Re: [VOTE] 3.7.0 RC2

2024-01-26 Thread kafka
Hi Stan,

I wanted to share some updates about the bugs you shared earlier.

- KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
the fix works as intended.
- KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed fix. 
I've
therefore raised https://github.com/apache/kafka/pull/15270 following a 
discussion with Luke in JIRA.
- KAFKA-16082: I don't think this is marked as a blocker anymore. I'm awaiting
feedback/reviews at https://github.com/apache/kafka/pull/15136

In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
attention to:

- KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. I've 
raised
https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
- KAFKA-16157: I raised this yesterday and have addressed feedback from Luke. 
This should
hopefully get merged soon.

Regards,
Gaurav


> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
> 
> Hi Stanislav,
> 
> Thanks for bringing these JIRAs/PRs up.
> 
> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
> hope to have some feedback
> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
> away. I'll try to build on his work in the meantime.
> 
> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
> There's a PR open
> by me for promoting an abandoned future replica with approvals from Omnia and 
> Proven,
> so I'd appreciate a committer reviewing it.
> 
> Regards,
> Gaurav
> 
> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>  wrote:
>> 
>> Hey all, I figured I'd give an update about what known blockers we have
>> right now:
>> 
>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>> https://github.com/apache/kafka/pull/15193; This need not block RC
>> creation, but we need the docs updated so that people can test properly
>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
>> this is blocking JBOD for 3.7
>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
>> - although I understand Proveen is out of office
>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>> hearing mixed opinions on whether this is a blocker (
>> https://github.com/apache/kafka/pull/15136)
>> 
>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>> all be merged this week - I am on the edge of voting to revert JBOD from
>> this release, or mark it early access.
>> 
>> By all accounts, it seems that if we keep with JBOD the release will have
>> to spill into February, which is a month extra from the time-based release
>> plan we had of start of January.
>> 
>> Can I ask others for an opinion?
>> 
>> Best,
>> Stan
>> 
>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>> 
>>> Hi all,
>>> 
>>> I think I've found another blocker issue: KAFKA-16162
>>> <https://issues.apache.org/jira/browse/KAFKA-16162> .
>>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>>> will be unavailable.
>>> I've put my findings in the JIRA.
>>> 
>>> Thanks.
>>> Luke
>>> 
>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>>> 
>>>> Stan, thanks for driving this all forward! Excellent job.
>>>> 
>>>> About
>>>> 
>>>>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
>>>>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>>>> 
>>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>>>> now in trunk and 3.7 (and actually also in 3.6...)
>>>> 
>>>> For `StreamsStandbyTask` the failing test exposes a regression bug, so
>>>> it's a blocker. I updated the ticket accordingly. We already have an
>>>> open PR that reverts the code introducing the regression.
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> On 1/17/24 9:44 AM, Proven Provenzano wrote:
>>>>> We have another blocking issue for the RC :
>>>>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
>>>> to
>>>>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>>> however
>>>>> can lead to the new topic having partitions th

Re: [VOTE] 3.7.0 RC2

2024-01-24 Thread kafka
Hi Stanislav,

Thanks for bringing these JIRAs/PRs up.

I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
hope to have some feedback
by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's away. 
I'll try to build on his work in the meantime.

As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
There's a PR open
by me for promoting an abandoned future replica with approvals from Omnia and 
Proven,
so I'd appreciate a committer reviewing it.

Regards,
Gaurav

On 23 Jan 2024, at 20:17, Stanislav Kozlovski  
wrote:
> 
> Hey all, I figured I'd give an update about what known blockers we have
> right now:
> 
> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
> https://github.com/apache/kafka/pull/15193; This need not block RC
> creation, but we need the docs updated so that people can test properly
> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
> this is blocking JBOD for 3.7
> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
> - although I understand Proveen is out of office
> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
> hearing mixed opinions on whether this is a blocker (
> https://github.com/apache/kafka/pull/15136)
> 
> Given that there are 3 JBOD blocker bugs, and I am not confident they will
> all be merged this week - I am on the edge of voting to revert JBOD from
> this release, or mark it early access.
> 
> By all accounts, it seems that if we keep with JBOD the release will have
> to spill into February, which is a month extra from the time-based release
> plan we had of start of January.
> 
> Can I ask others for an opinion?
> 
> Best,
> Stan
> 
> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
> 
>> Hi all,
>> 
>> I think I've found another blocker issue: KAFKA-16162
>> <https://issues.apache.org/jira/browse/KAFKA-16162> .
>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>> will be unavailable.
>> I've put my findings in the JIRA.
>> 
>> Thanks.
>> Luke
>> 
>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>> 
>>> Stan, thanks for driving this all forward! Excellent job.
>>> 
>>> About
>>> 
>>>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
>>>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>>> 
>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>>> now in trunk and 3.7 (and actually also in 3.6...)
>>> 
>>> For `StreamsStandbyTask` the failing test exposes a regression bug, so
>>> it's a blocker. I updated the ticket accordingly. We already have an
>>> open PR that reverts the code introducing the regression.
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 1/17/24 9:44 AM, Proven Provenzano wrote:
>>>> We have another blocking issue for the RC :
>>>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
>>> to
>>>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>> however
>>>> can lead to the new topic having partitions that a producer cannot
>> write
>>> to.
>>>> 
>>>> --Proven
>>>> 
>>>> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
>>> pprovenz...@confluent.io>
>>>> wrote:
>>>> 
>>>>> 
>>>>> I have a PR https://github.com/apache/kafka/pull/15197 for
>>>>> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
>> now.
>>>>> --Proven
>>>>> 
>>>>> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
>>>>> 
>>>>>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
>>>>>> blocker bug because it *
>>>>>> *> will generate huge amount of logspam. I guess we didn't find it in
>>>>>> junit
>>>>>> tests *
>>>>>> *> since logspam doesn't fail the automated tests. But certainly it's
>>> not
>>>>>> suitable *
>>>>>> *> for production. Did you file a JIRA yet?*
>>>>>> 
>>>>>> Hi Colin,
>>>>>> 
>>>>>> I opened https://issues.apache.org/jira/browse/KAFKA-161

Re: [ANNOUNCE] New committer: Greg Harris

2023-07-10 Thread kafka
Congrats Greg!

-
Gaurav

> On 10 Jul 2023, at 17:25, Hector Geraldino (BLOOMBERG/ 919 3RD A) 
>  wrote:
> 
> Congrats Greg! Well deserved
> 
> From: dev@kafka.apache.org At: 07/10/23 12:18:48 UTC-4:00To:  
> dev@kafka.apache.org
> Subject: Re: [ANNOUNCE] New committer: Greg Harris
> 
> Congratulations!
> 
> On Mon, Jul 10, 2023 at 9:17 AM Randall Hauch  wrote:
>> 
>> Congratulations, Greg.
>> 
>> On Mon, Jul 10, 2023 at 11:13 AM Mickael Maison 
>> wrote:
>> 
>>> Congratulations Greg!
>>> 
>>> On Mon, Jul 10, 2023 at 6:08 PM Bill Bejeck 
>>> wrote:
>>>> 
>>>> Congrats Greg!
>>>> 
>>>> -Bill
>>>> 
>>>> On Mon, Jul 10, 2023 at 11:53 AM Divij Vaidya 
>>>> wrote:
>>>> 
>>>>> Congratulations Greg! I am going through a new committer teething
>>> process
>>>>> right now and would be happy to get you up to speed. Looking forward to
>>>>> working with you in your new role.
>>>>> 
>>>>> --
>>>>> Divij Vaidya
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jul 10, 2023 at 5:51 PM Josep Prat >>> 
>>>>> wrote:
>>>>> 
>>>>>> Congrats Greg!
>>>>>> 
>>>>>> 
>>>>>> ———
>>>>>> Josep Prat
>>>>>> 
>>>>>> Aiven Deutschland GmbH
>>>>>> 
>>>>>> Alexanderufer 3-7, 10117 Berlin
>>>>>> 
>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
>>>>>> 
>>>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>>>>>> 
>>>>>> m: +491715557497
>>>>>> 
>>>>>> w: aiven.io
>>>>>> 
>>>>>> e: josep.p...@aiven.io
>>>>>> 
>>>>>> On Mon, Jul 10, 2023, 17:47 Matthias J. Sax 
>>> wrote:
>>>>>> 
>>>>>>> Congrats!
>>>>>>> 
>>>>>>> On 7/10/23 8:45 AM, Chris Egerton wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> The PMC for Apache Kafka has invited Greg Harris to become a
>>>>> committer,
>>>>>>> and
>>>>>>>> we are happy to announce that he has accepted!
>>>>>>>> 
>>>>>>>> Greg has been contributing to Kafka since 2019. He has made over
>>> 50
>>>>>>> commits
>>>>>>>> mostly around Kafka Connect and Mirror Maker 2. His most notable
>>>>>>>> contributions include KIP-898: "Modernize Connect plugin
>>> discovery"
>>>>>> and a
>>>>>>>> deep overhaul of the offset syncing logic in MM2 that addressed
>>>>> several
>>>>>>>> technically-difficult, long-standing, high-impact issues.
>>>>>>>> 
>>>>>>>> He has also been an active participant in discussions and
>>> reviews on
>>>>>> the
>>>>>>>> mailing lists and on GitHub.
>>>>>>>> 
>>>>>>>> Thanks for all of your contributions, Greg. Congratulations!
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
> 
> 



Permission to contribute to Apache Kafka

2023-06-26 Thread kafka
Hi,

I'd like to request permissions to contribute to Apache Kafka. My account 
details are as follows:

# Wiki

Email: gaurav_naru...@apple.com <mailto:gaurav_naru...@apple.com>
Username: gnarula

# JIRA

Email: gaurav_naru...@apple.com <mailto:gaurav_naru...@apple.com>
Username: gnarula

Regards,
Gaurav

Consumer offset value -Apache kafka 3.2.3

2023-04-26 Thread Kafka Life
Dear Kafka Experts

How can we check for a particular offset number in Apache kafka 3.2.3
version.Could you please share some light.
The kafka_console_consumer tool is throwing class not found error.

./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
--topic your-topic
--group your-consumer-group
--zookeeper localhost:2181


Re: Consumer Lag Metrics/ Topic level metrics

2023-04-26 Thread Kafka Life
Many thanks Samuel. Will go thru this.

On Tue, Apr 25, 2023 at 9:03 PM Samuel Delepiere <
samuel.delepi...@celer-tech.com> wrote:

> Hi,
>
> I use a combination of the Prometheus JMX exporter (
> https://github.com/prometheus/jmx_exporter) and the Prometheus Kafka
> exporter (https://github.com/danielqsj/kafka_exporter).
> The consumer lag metrics come from the latter.
>
> I can then output the data in Grafana
>
>
> Regards,
>
> Sam.
>
>
>
> On 25 Apr 2023, at 16:26, Kafka Life  wrote:
>
> Dear Kafka Experts
>
> Could you please suggest good metrics exporter for consumer lag and topic
> level metrics apart from Linkedin kafka burrow for the kafka broker
> cluster.
>
>
>
> *This message, including any attachments, may include private, privileged
> and confidential information and is intended only for the personal and
> confidential use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> review, use, dissemination, distribution, printing or copying of this
> message or its contents is strictly prohibited and may be unlawful. If you
> are not an intended recipient or have received this communication in error,
> please immediately notify the sender by telephone and/or a reply email and
> permanently delete the original message, including any attachments, without
> making a copy.*
>


Consumer Lag Metrics/ Topic level metrics

2023-04-25 Thread Kafka Life
Dear Kafka Experts

Could you please suggest good metrics exporter for consumer lag and topic
level metrics apart from Linkedin kafka burrow for the kafka broker cluster.


Zookeeper upgrade 3.4.14 to 3.5.7

2023-04-12 Thread Kafka Life
Hi Kafka , zookeeper experts

Is it possible to upgrade the 3.4.14 version of zookeeper cluster in a
rolling fashion (one by one node) to 3.5.7 zookeeper version. Would the
cluster work with a possible combination of 3.4.14 and 3.5.7 . Please
advise .


Re: Kafka Metrics to Grafana Labs

2023-04-08 Thread Kafka Life
Any help from these experts ?

On Sat, Apr 8, 2023 at 2:23 PM Kafka Life  wrote:

> Hello Kafka Experts
>
> Need a help . Currently the grafana agent
> triggering kafka/3.2.3/config/kafka_metrics.yml is sending over 5 thousand
> metrics. Is there a way to limit these many metrics to be sent and send
> only what is required .Any pointers or such customized script is much
> appreciated.kafka/3.2.3/config/kafka_metrics.yml
> kafka/3.2.3/config/kafka_metrics.ymlkafka/3.2.3/config/kafka_metrics.yml
>


Kafka Metrics to Grafana Labs

2023-04-08 Thread Kafka Life
Hello Kafka Experts

Need a help . Currently the grafana agent
triggering kafka/3.2.3/config/kafka_metrics.yml is sending over 5 thousand
metrics. Is there a way to limit these many metrics to be sent and send
only what is required .Any pointers or such customized script is much
appreciated.kafka/3.2.3/config/kafka_metrics.yml
kafka/3.2.3/config/kafka_metrics.ymlkafka/3.2.3/config/kafka_metrics.yml


Re: Kafka (apache or confluent) - Subject of Work (SOW)

2023-04-06 Thread Kafka Life
Hi experts.. any pointers or guidance for this

On Wed, Apr 5, 2023 at 8:35 PM Kafka Life  wrote:

> Respected Kafka experts/managers
>
> Do anyone have Subject of work -Activities related to Kafka cluster
> management for Apache or Confluent kafka . Something to assess and propose
> to an enterprise for kafka cluster management. Request you to kindly share
> any such documentation please.
>


Kafka (apache or confluent) - Subject of Work (SOW)

2023-04-05 Thread Kafka Life
Respected Kafka experts/managers

Do anyone have Subject of work -Activities related to Kafka cluster
management for Apache or Confluent kafka . Something to assess and propose
to an enterprise for kafka cluster management. Request you to kindly share
any such documentation please.


Re: Kafka Cluster WITHOUT Zookeeper

2023-03-28 Thread Kafka Life
This is really great information Paul . Thank you .

On Tue, Mar 28, 2023 at 4:01 AM Brebner, Paul
 wrote:

> I have a recent 3 part blog series on Kraft (expanded version of ApacheCon
> 2022 talk):
>
>
>
>
> https://www.instaclustr.com/blog/apache-kafka-kraft-abandons-the-zookeeper-part-1-partitions-and-data-performance/
>
>
> https://www.instaclustr.com/blog/apache-kafka-kraft-abandons-the-zookeeper-part-2-partitions-and-meta-data-performance/
>
>
> https://www.instaclustr.com/blog/apache-kafka-kraft-abandons-the-zookeeper-part-3-maximum-partitions-and-conclusions/
>
>
>
> Regards, Paul
>
>
>
> *From: *Chia-Ping Tsai 
> *Date: *Monday, 27 March 2023 at 5:37 pm
> *To: *dev@kafka.apache.org 
> *Cc: *us...@kafka.apache.org ,
> mmcfarl...@cavulus.com , Israel Ekpo <
> israele...@gmail.com>, ranlupov...@gmail.com ,
> scante...@gmail.com , show...@gmail.com <
> show...@gmail.com>, sunilmchaudhar...@gmail.com <
> sunilmchaudhar...@gmail.com>
> *Subject: *Re: Kafka Cluster WITHOUT Zookeeper
>
> *NetApp Security WARNING*: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> hi
>
>
>
> You can use the keyword “kraft” to get the answer by google or chatgpt.
> For example:
>
>
>
> Introduction:
>
> KRaft - Apache Kafka Without ZooKeeper
> <https://developer.confluent.io/learn/kraft/>
>
> developer.confluent.io <https://developer.confluent.io/learn/kraft/>
>
>
>
>
>
> QuickStart:
>
> Apache Kafka <https://kafka.apache.org/quickstart>
>
> kafka.apache.org <https://kafka.apache.org/quickstart>
>
>
>
>
>
> —
>
> Chia-Ping
>
>
>
>
>
>
>
> Kafka Life  於 2023年3月27日 下午1:33 寫道:
>
> Hello  Kafka experts
>
> Is there a way where we can have Kafka Cluster be functional serving
> producers and consumers without having Zookeeper cluster manage the
> instance .
>
> Any particular version of kafka for this or how can we achieve this please
>
>


Re: Apache Kafka version - End of support

2023-03-27 Thread Kafka Life
Many thanks Joseph for your response

On Mon, Mar 27, 2023 at 4:50 PM Josep Prat 
wrote:

> Hello there,
>
> You can find the general policy here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan#TimeBasedReleasePlan-WhatIsOurEOLPolicy
>
> Basically, the last 3 releases are covered and one can upgrade to the
> latest version from the last previous 3. Bug fixes are ported to the last 3
> versions on a 'best effort basis'. Current latest release is 3.4.0.
> Version 0.11 reached "end of support" (or EOL) a long time ago.
>
> Hope this helps,
>
> On Mon, Mar 27, 2023 at 12:44 PM Kafka Life 
> wrote:
>
> > >
> > >
> > > Hello Kafka experts
> > >
> > > where can i see the end of support for apache kafka versions? i would
> > like
> > > to know about 0.11 Kafka version on when was this deprecated
> > >
> >
>
>
> --
> [image: Aiven] <https://www.aiven.io>
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io <https://www.aiven.io>   |   <https://www.facebook.com/aivencloud
> >
>   <https://www.linkedin.com/company/aiven/>   <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Apache Kafka version - End of support

2023-03-27 Thread Kafka Life
>
>
> Hello Kafka experts
>
> where can i see the end of support for apache kafka versions? i would like
> to know about 0.11 Kafka version on when was this deprecated
>


Kafka Cluster WITHOUT Zookeeper

2023-03-26 Thread Kafka Life
Hello  Kafka experts

Is there a way where we can have Kafka Cluster be functional serving
producers and consumers without having Zookeeper cluster manage the
instance .

Any particular version of kafka for this or how can we achieve this please


Re: Consumer Lag-Apache_kafka_JMX metrics

2022-08-16 Thread Kafka Life
Hello Experts, Any info or pointers on my query please.



On Mon, Aug 15, 2022 at 11:36 PM Kafka Life  wrote:

> Dear Kafka Experts
> we need to monitor the consumer lag in kafka clusters 2.5.1 and 2.8.0
> versions of kafka in Grafana.
>
> 1/ What is the correct path for JMX metrics to evaluate Consumer Lag in
> kafka cluster.
>
> 2/ I had thought it is FetcherLag  but it looks like it is not as per the
> link below.
>
> https://www.instaclustr.com/support/documentation/kafka/monitoring-information/fetcher-lag-metrics/#:~:text=Aggregated%20Fetcher%20Consumer%20Lag%20This%20metric%20aggregates%20lag,in%20sync%20with%20partitions%20that%20it%20is%20replicating
> .
>
> Could one of you experts please guide on which JMX i should use for
> consumer lag apart from kafka burrow or such intermediate tools
>
> Thanking you in advance
>
>


Consumer Lag-Apache_kafka_JMX metrics

2022-08-15 Thread Kafka Life
Dear Kafka Experts
we need to monitor the consumer lag in kafka clusters 2.5.1 and 2.8.0
versions of kafka in Grafana.

1/ What is the correct path for JMX metrics to evaluate Consumer Lag in
kafka cluster.

2/ I had thought it is FetcherLag  but it looks like it is not as per the
link below.
https://www.instaclustr.com/support/documentation/kafka/monitoring-information/fetcher-lag-metrics/#:~:text=Aggregated%20Fetcher%20Consumer%20Lag%20This%20metric%20aggregates%20lag,in%20sync%20with%20partitions%20that%20it%20is%20replicating
.

Could one of you experts please guide on which JMX i should use for
consumer lag apart from kafka burrow or such intermediate tools

Thanking you in advance


Re: HACKING vulnerability is SpringBoot (Java) for apache kafka

2022-04-04 Thread Kafka Life
Dear Luke , Thank you for your kind and prompt response.


On Mon, Apr 4, 2022 at 1:23 PM Luke Chen  wrote:

> Hi,
>
> The impact for the CVE-2022-22965? Since this is a RCE vulnerability, which
> means the whole system (including Kafka and ZK) is under the attackers'
> control, and can do whatever they want.
>
> The ideal fix for this is to upgrade Spring Framework 5.3.18 and 5.2.20 or
> greater. Alternatively, you can have workarounds:
> 1. Upgrading Tomcat
> 2. Downgrading to Java 8
> 3. Disallowed Fields
>
> I think this blog from Spring community is very clear:
> https://spring.io/blog/2022/03/31/spring-framework-rce-early-announcement
>
> Thank you.
> Luke
>
> On Mon, Apr 4, 2022 at 3:32 PM Kafka Life  wrote:
>
> > Hi Kafka Experts
> >
> > Regarding the recent threat of vulnerability in spring framework ,
> > CVE-2022-22965 vulnerability is SpringBoot (Java) for apache kafka and
> > Zookeeper. Could one of you suggest how Apache kafka and zk are impacted
> > and what should be the ideal fix for this .
> >
> > Vulnerability in the Spring Framework (CVE-2022-22965) | Information
> > Security Office (berkeley.edu)
> > <
> >
> https://security.berkeley.edu/news/vulnerability-spring-framework-cve-2022-22965
> > >
> >
> > Critical alert – Spring4Shell RCE (CVE-2022-22965 in Spring) | Acunetix
> > <
> >
> https://www.acunetix.com/blog/web-security-zone/critical-alert-spring4shell-rce-cve-2022-22965-in-spring/
> > >
> >
> >
> > Thanks in advance
> >
>


HACKING vulnerability is SpringBoot (Java) for apache kafka

2022-04-04 Thread Kafka Life
Hi Kafka Experts

Regarding the recent threat of vulnerability in spring framework ,
CVE-2022-22965 vulnerability is SpringBoot (Java) for apache kafka and
Zookeeper. Could one of you suggest how Apache kafka and zk are impacted
and what should be the ideal fix for this .

Vulnerability in the Spring Framework (CVE-2022-22965) | Information
Security Office (berkeley.edu)
<https://security.berkeley.edu/news/vulnerability-spring-framework-cve-2022-22965>

Critical alert – Spring4Shell RCE (CVE-2022-22965 in Spring) | Acunetix
<https://www.acunetix.com/blog/web-security-zone/critical-alert-spring4shell-rce-cve-2022-22965-in-spring/>


Thanks in advance


Re: KAFKA- UNDER_REPLICATED_PARTIONS - AUTOMATE

2022-02-27 Thread Kafka Life
Thank you Malcolm. Will go through this.

On Sat, Feb 26, 2022 at 2:22 AM Malcolm McFarland 
wrote:

> Maybe this could help?
> https://github.com/dimas/kafka-reassign-tool
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> On Fri, Feb 25, 2022 at 9:00 AM Kafka Life  wrote:
>
> > Dear Experts
> >
> > do you have any solution for this please
> >
> > On Tue, Feb 22, 2022 at 8:31 PM Kafka Life 
> wrote:
> >
> > > Dear Kafka Experts
> > >
> > > Does anyone have a dynamically generated Json file based on the Under
> > > replicated partition in the kafka cluster.
> > > Everytime when the URP is increased to over 500 , it is a tedious job
> to
> > > manually create a Json file .
> > >
> > > I request you to share any such dynamically generated script /json
> file.
> > >
> > > Thanks in advance.
> > >
> > >>
> >
>


Re: KAFKA- UNDER_REPLICATED_PARTIONS - AUTOMATE

2022-02-25 Thread Kafka Life
Dear Experts

do you have any solution for this please

On Tue, Feb 22, 2022 at 8:31 PM Kafka Life  wrote:

> Dear Kafka Experts
>
> Does anyone have a dynamically generated Json file based on the Under
> replicated partition in the kafka cluster.
> Everytime when the URP is increased to over 500 , it is a tedious job to
> manually create a Json file .
>
> I request you to share any such dynamically generated script /json file.
>
> Thanks in advance.
>
>>


KAFKA- UNDER_REPLICATED_PARTIONS - AUTOMATE

2022-02-22 Thread Kafka Life
Dear Kafka Experts

Does anyone have a dynamically generated Json file based on the Under
replicated partition in the kafka cluster.
Everytime when the URP is increased to over 500 , it is a tedious job to
manually create a Json file .

I request you to share any such dynamically generated script /json file.

Thanks in advance.

>


Mirror Maker Issue : Huge difference in topic size on disk

2022-02-06 Thread Kafka Life
Dear Kafka Experts , Need your advice please

I am running a mirror maker in kafka 2.8 to replicate a topic from kafka
0.11 instance.
The size of each partition for a topic on 0.11 is always in 5 to 6 GB but
the replicated topic in 2.8 instances is in 40 GB for the same partition.

The topic configuration in terms of retention, topic partition max size is
same across both the instances. Is there any issue with MM on 2.8 kafka or
any other pointers to look at please.

Many thanks in advance .


Kafka : High avaialability settings

2021-11-08 Thread Kafka Life
Dear Kafka experts

i have a 10 broker kafka cluster with all topics having replication factor
as 3 and partition 50

min.in.synch replicas is 2.


One broker went down for a hardware failure, but many applications
complained they are not able to produce /consume messages.

I request you to please suggest, how do i overcome this problem and make
kafka high available even during broker being down or during rolling
restarts.

IS there a configuration at a topic level i can set it up to have new
partition created in active and running brokers when a node is down ?
i read through ack -0/1/all to be set at application /producer end .But
applications are not ready to change ack all .
Can you please suggest .

many thanks in advance


Re: Automation Script : Kafka topic creation

2021-11-08 Thread Kafka Life
Thank you Men and Ran



On Sat, Nov 6, 2021 at 7:23 PM Men Lim  wrote:

> I'm currently using Kafka-gitops.
>
> On Sat, Nov 6, 2021 at 3:35 AM Kafka Life  wrote:
>
> > Dear Kafka experts
> >
> > does anyone have ready /automated script to create /delete /alter topics
> in
> > different environments?
> > taking Configuration parameter as input .
> >
> > if yes i request you to kindly share it with me .. please
> >
>


Automation Script : Kafka topic creation

2021-11-06 Thread Kafka Life
Dear Kafka experts

does anyone have ready /automated script to create /delete /alter topics in
different environments?
taking Configuration parameter as input .

if yes i request you to kindly share it with me .. please


Re: New Kafka Consumer : unknown member id

2021-11-05 Thread Kafka Life
Hello Luke

i have build a new kafka environment with kafka 2.8.0

the consumer is a new consumer set up to this environment is throwing the
below error... the old consumers for the same applications for the same
environment -2.8.0 is working fine.. .

could you please advise

2021-11-02 12:25:24 DEBUG AbstractCoordinator:557 - [Consumer
clientId=, groupId=] Attempt to join group failed due to unknown
member id.

On Fri, Oct 29, 2021 at 7:36 AM Luke Chen  wrote:

> Hi,
> Which version of kafka client are you using?
> I can't find this error message in the source code.
> When googling this error message, it showed the error is in Kafka v0.9.
>
> Could you try to use the V3.0.0 and see if that issue still exist?
>
> Thank you.
> Luke
>
> On Thu, Oct 28, 2021 at 11:15 PM Kafka Life 
> wrote:
>
> > Dear Kafka Experts
> >
> > We have set up a group.id (consumer ) = YYY
> > But when tried to connect to kafka instance : i get this error message. I
> > am sure this consumer (group id does not exist in kafka) .We user plain
> > text protocol to connect to kafka 2.8.0. Please suggest how to resolve
> this
> > issue.
> >
> > DEBUG AbstractCoordinator:557 - [Consumer clientId=X,
> groupId=YYY]
> > Attempt to join group failed due to unknown member id.
> >
>


New Kafka Consumer : unknown member id

2021-10-28 Thread Kafka Life
Dear Kafka Experts

We have set up a group.id (consumer ) = YYY
But when tried to connect to kafka instance : i get this error message. I
am sure this consumer (group id does not exist in kafka) .We user plain
text protocol to connect to kafka 2.8.0. Please suggest how to resolve this
issue.

DEBUG AbstractCoordinator:557 - [Consumer clientId=X, groupId=YYY]
Attempt to join group failed due to unknown member id.


Apache Kafka : start up scripts

2021-10-27 Thread Kafka Life
Dear Kafka experts

when an broker is started using start script , could any of you please let
me know the sequence of steps that happens in the back ground when the node
UP

like : when the script is initiated to start ,
1/ is it checking indexes .. ?
2/ is it checking isr ?
3/ is URP being made to zero.. ?

i tried to look in ther server log but could not under the sequence of
events  performed till the node was up .. could some one please help ..

Thanks


Re: Zookeeper : Throttling connections

2021-07-16 Thread Kafka Life
Thank you very much Mr. Israel Ekpo. Really appreciate it.

We are using the 0.10 version of kafka and in the process of upgrading to
2.6.1 . Planning in process and Yes, these connections to zookeepers are
for Kafka functionality.

frequently there are incidents where zookeepers get bombarded with around
8,000 to 10,000 connections from multiple clients.
Do you suggest we can configure globalOutstandingLimit to 8000 ?
should this be set in zoo.cfg file in Zookeeper ?


On Fri, Jul 16, 2021 at 7:11 PM Israel Ekpo  wrote:

> Hello,
>
> I am assuming you are using Zookeeper because of your Kafka brokers. What
> version of Kafka are you using.
>
> I would like to start by stating that very soon this will no longer be an
> issue as the project is taking steps to decouple Kafka from Zookeeper. Take
> a look at KIP-500 for additional information. In Kafka 2.8.0 you should be
> able to configure the early access version of running Kafka without
> Zookeeper.
>
> When running Kafka in KRaft Mode (Without Zookeeper) you do not need to
> worry about this throttling since Zookeeper is not part of the
> architecture.
>
> To answer you questions, some changes were made to prevent Zookeeper from
> getting overwhelmed hence to avoid recurring crashes during stressful loads
> restrictions can be configured to manage the load on Zookeeper via the
> throttling settings
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-3243
>
> You should be able to see additional information in the administration
> guides on how this is configured
>
> https://zookeeper.apache.org/doc/r3.2.2/zookeeperAdmin.html
>
> For example, the configuration setting below
>
> globalOutstandingLimit
>
> (Java system property: *zookeeper.globalOutstandingLimit.*)
>
> Clients can submit requests faster than ZooKeeper can process them,
> especially if there are a lot of clients. To prevent ZooKeeper from running
> out of memory due to queued requests, ZooKeeper will throttle clients so
> that there is no more than globalOutstandingLimit outstanding requests in
> the system. The default limit is 1,000.
>
> I hope this gets you started
>
> Feel free to reach out if you have additional questions
>
> Have a great day
>
>
> On Fri, Jul 16, 2021 at 7:44 AM Kafka Life  wrote:
>
> > Dear KAFKA & Zookeeper experts.
> >
> > 1/ What is zookeeper Throttling ? Is it done at zookeepr ? How is it set
> > configured ?
> > 2/ Is it helpful ?
> >
>


Zookeeper : Throttling connections

2021-07-16 Thread Kafka Life
Dear KAFKA & Zookeeper experts.

1/ What is zookeeper Throttling ? Is it done at zookeepr ? How is it set
configured ?
2/ Is it helpful ?


consumer group exit :explanation

2021-07-04 Thread Kafka Life
Dear kafka Experts

Could one of you please help to explain what this below log in broker
instance mean..and what scenarios it would occur when there is no change
done .

 INFO [GroupCoordinator 9610]: Member
webhooks-retry-app-840d3107-833f-4908-90bc-ea8c394c07c3-StreamThread-2-consumer-f87c3b85-5aa1-40f5-a42f-58927421b89e
in group webhooks-retry-app has failed, removing it from the group
(kafka.coordinator.group.GroupCoordinator)


 INFO [GroupCoordinator 9611]: Member
cm.consumer.9-d65d39d3-703f-408b-bf4b-fbf087321d8c in group
cm_group_apac_sy_cu_01 has failed, removing it from the group
(kafka.coordinator.group.GroupCoordinator)


Please help to explain .

thanks


Consumer Reporting error

2021-06-16 Thread Kafka Life
Hello Kafka experts

The consumer team is reporting issue while consuming the data from the
topic as Singularity Header issue.
Can some one please tell on how to resolve this issue.
Error looks like ;

Starting offset: 1226716

offset: 1226716 position: 0 CreateTime: 1583780622665 isvalid: true
keysize: -1 valuesize: 346 magic: 2 compresscodec: NONE producerId: -1
producerEpoch: -1 sequence: -1 isT*ransactional: false headerKeys:
[singularityheader]* payload: ^@^@^@^A^@^@^@

5.0.0.1246^T5.0.0.1246???\^B,AzLJC6OMQFemD1qRiUdTbg^B^@^BH0aa8db3c-6967-41a3-a7e5-395a5b70a59b^B^P79637040^@^@^@?^C^@^@^B$137@10
@avor-abb113???\??^A^@^@?^A^B^@^BH0186bf9c-53d8-4ec1-ae0b-3f9f6c98c4f4^Rundefined^B?8?>^B^FSIC


Apache Version upgrade

2021-06-08 Thread Kafka Life
Dear Kafka Experts

1- Can any one share the upgrade plan with steps /Plan /tracker or any
useful documentation please.

2- upgrading kafka from old version of 0.11 to 2.5 .Any
suggestions/directions is highly appreciated.

Thanks


confluent-kafka python library is not working with ubutu14 and python3

2020-08-28 Thread Kafka Shil
I am using "confluent-kafka==1.0.1". It works fine when I am using py3 and
ubuntu18, but fails with py3 and ubuntu14. I get the following error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/metrics_agent/kafka_writer.py",
line 147, in enqueue_for_topic
producer.produce(topic_name, msg,
partition=_get_partition(producer, topic_name))
  File 
"/usr/local/lib/python3.4/dist-packages/confluent_kafka/serializing_producer.py",
line 168, in produce
raise KeySerializationError(se)
confluent_kafka.error.KeySerializationError:
KafkaError{code=_KEY_SERIALIZATION,val=-162,str="'bytes' object has no
attribute 'encode'"}
Exception KafkaError{code=_KEY_SERIALIZATION,val=-162,str="'bytes'
object has no attribute 'encode'"}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.4/dist-packages/confluent_kafka/serializing_producer.py",
line 166, in produce
key = self._key_serializer(key, ctx)
  File 
"/usr/local/lib/python3.4/dist-packages/confluent_kafka/serialization/__init__.py",
line 369, in __call__
return obj.encode(self.codec)
AttributeError: 'bytes' object has no attribute 'encode'


Contribute to kafka

2020-05-25 Thread Kafka Shil
Hi
My jira id is sshil. I want to contribute to kafka. Can you please add me
to the contributors list?


Pristine Zookeeper and Kafka (via Docker) producing OFFSET_OUT_OF_RANGE for Fetch

2020-03-10 Thread kafka
Hi all,


I'm implementing a custom client.

I was wondering whether anyone could explain the OFFSET_OUT_OF_RANGE
error in this scenario.

My test suite tears down and spins up a fresh zookeeper and kafka every
time inside a pristine docker container.

The test suite runs as:

1. Producer runs first and finishes.
2. Consumer group members then runs later in 3 separate threads.

I write key/pairs of "fruit", "animal" and "vegetable" with a
round-robin algorithm for each partition.

The consumer group process runs an OffsetCommit with offset=0 for each
partition to kick off. (I found that if I just started with OffsetFetch I
would get UNKNOWN_TOPIC_OR_PARTITION, and couldn't find docs about
whether this was "normal" or not. But that's a tangent.)

This page shows writing to three partitions within a topic, and
each Produce request succeeding:

https://chrisdone.com/consumer-groups-sink-out-of-range.html [fine]

Each column is a thread in my test suite.

However, when trying to fetch those three partitions, for some reason on
partition 2, I get OFFSET_OUT_OF_RANGE. The other two partitions consume
successfully. This can be seen in the consumer side shown in this page:

https://chrisdone.com/consumer-groups-source-out-of-range.html [problem]

(scroll to about half way through, as the first half is three threads trying to 
join the group, then when they've joined the group. Three new threads spin up 
to the right, one for each consumer in the consumer group.)

Yet, this is a nondeterministic error that seems to depend on
timing. I have random 1-500ms waiting lag intentionally placed in
every message log so that the program might exhibit real-world cases
like this.

If I remove the random timeouts, this process works every time
(demonstrated here:
https://chrisdone.com/consumer-groups-source-working.html). So there is
some kind of timing issue that I cannot identify.

Upon receiving an OFFSET_OUT_OF_RANGE error, you can see in the log
that I wait, refresh metadata, and retry the request again (as I read
elsewhere[1] that this "typically implies a leader change") only to
get another OFFSET_OUT_OF_RANGE.

I'm receiving a OffsetFetch response of partitionIndex = 2 and
committedOffset = 0,

( ThreadId 46
, SourceRequestMsg
 (ReceivedResponse
 0.006022722
 s
 (OffsetFetchResponseV0
 { topicsArray =
 ARRAY
 [ OffsetFetchResponseV0Topics
 { name = STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionsArray =
 ARRAY
 [ OffsetFetchResponseV0TopicsPartitions
 { partitionIndex = 2
 , committedOffset = 0
 , metadata = NULLABLE_STRING (Just "")
 , errorCode = NONE
 }
 ]
 }
 ]
 })))

I sent a fetch request,

( ThreadId 49
, ConsumerGroupConsumerFor
 "myclientid-e79931cc-d6d4-479b-90d6-1b61aab85198"
 [PartitionId 2]
 (KafkaSourceMsg
 (SourceRequestMsg
 (SendingRequest
 (FetchRequestV4
 { replicaId = -1
 , maxWaitTime = 200
 , minBytes = 5
 , maxBytes = 1048576
 , isolationLevel = 0
 , topicsArray =
 ARRAY
 [ FetchRequestV4Topics
 { topic =
 STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionsArray =
 ARRAY
 [ FetchRequestV4TopicsPartitions
 { partition = 2
 , fetchOffset = 0
 , partitionMaxBytes = 1048576
 }
 ]
 }
 ]
 })

And yet it returns

FetchResponseV4Responses
 { topic = STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionResponsesArray =
 ARRAY
 [ FetchResponseV4ResponsesPartitionResponses
 { partitionHeader =
 FetchResponseV4ResponsesPartitionResponsesPartitionHeader
 { partition = 2
 , errorCode = OFFSET_OUT_OF_RANGE
 , highWatermark = -1
 , lastStableOffset = -1
 , abortedTransactionsArray = ARRAY []
 }
 , recordSet = RecordBatchV2Sequence {recordBatchV2Sequence = []}
 }
 ]
 }

So I am very confused.

Can someone who is more familiar with this process hazard a guess as to
what's going on?

Cheers,

Chris

[1]: 
https://issues.apache.org/jira/browse/KAFKA-7395?focusedCommentId=16640313=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16640313

Re: How to change schema registry port

2019-12-25 Thread Kafka Shil
On Wed, Dec 25, 2019 at 5:51 PM Kafka Shil  wrote:

> Hi,
> I am using docker compose file to start schema-registry. I need to change
> default port to 8084 instead if 8081. I made the following changes in
> docker compose file.
>
> schema-registry:
> image: confluentinc/cp-schema-registry:5.3.1
> hostname: schema-registry
> container_name: schema-registry
> depends_on:
>   - zookeeper
>   - broker
> ports:
>   - "8084:8084"
> environment:
>   SCHEMA_REGISTRY_HOST_NAME: schema-registry
>   SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
>   SCHEMA_REGISTRY_LISTENERS: http://localhost:8084
>
> But If I execute "docker ps", I can see it still listens to 8081.
>
> 69511efd32d4confluentinc/cp-schema-registry:5.3.1
> "/etc/confluent/dock…"   9 minutes ago   Up 9 minutes
> 8081/tcp, 0.0.0.0:8084->8084/tcp schema-registry
>
> How do I change port to 8084?
>
> Thanks
>


How to change schema registry port

2019-12-25 Thread Kafka Shil
Hi,
I am using docker compose file to start schema-registry. I need to change
default port to 8084 instead if 8081. I made the following changes in
docker compose file.

schema-registry:
image: confluentinc/cp-schema-registry:5.3.1
hostname: schema-registry
container_name: schema-registry
depends_on:
  - zookeeper
  - broker
ports:
  - "8084:8084"
environment:
  SCHEMA_REGISTRY_HOST_NAME: schema-registry
  SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
  SCHEMA_REGISTRY_LISTENERS: http://localhost:8084

But If I execute "docker ps", I can see it still listens to 8081.

69511efd32d4confluentinc/cp-schema-registry:5.3.1
"/etc/confluent/dock…"   9 minutes ago   Up 9 minutes
8081/tcp, 0.0.0.0:8084->8084/tcp schema-registry

How do I change port to 8084?

Thanks


[jira] [Created] (KAFKA-7710) Poor Zookeeper ACL management with Kerberos

2018-12-05 Thread Mr Kafka (JIRA)
Mr Kafka created KAFKA-7710:
---

 Summary: Poor Zookeeper ACL management with Kerberos
 Key: KAFKA-7710
 URL: https://issues.apache.org/jira/browse/KAFKA-7710
 Project: Kafka
  Issue Type: Bug
Reporter: Mr Kafka


I have seen many organizations run many Kafka clusters. The simplest scenario 
is you may have a *kafka.dev.example.com* cluster and a 
*kafka.prod.example.com* cluster. The more extreme examples is teams with in an 
organization may run their own individual clusters.

 

When you enable Zookeeper ACLs in Kafka the ACL looks to be set to the 
principal (SPN) that is used to authenticate against Zookeeper.

For example I have brokers:
 * *01.kafka.dev.example.com*
 * *02.kafka.dev.example.com***
 * *03.kafka.dev.example.com***

On *01.kafka.dev.example.com* **I run the below the security-migration tool:
{code:java}
KAFKA_OPTS="-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.conf 
-Dzookeeper.sasl.clientconfig=ZkClient" zookeeper-security-migration 
--zookeeper.acl=secure --zookeeper.connect=a01.zookeeper.dev.example.com:2181
{code}
I end up with ACL's in Zookeeper as below:
{code:java}
# [zk: localhost:2181(CONNECTED) 2] getAcl /cluster
# 'sasl,'kafka/01.kafka.dev.example.com@EXAMPLE
# : cdrwa
{code}
This ACL means no other broker in the cluster can access the znode in Zookeeper 
except broker 01.

To resolve the issue you need to set the below properties in Zookeeper's config:
{code:java}
kerberos.removeHostFromPrincipal = true
kerberos.removeRealmFromPrincipal = true
{code}
Now when Kafka set ACL's they are stored as:

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7510) KStreams RecordCollectorImpl leaks data to logs on error

2018-10-15 Thread Mr Kafka (JIRA)
Mr Kafka created KAFKA-7510:
---

 Summary: KStreams RecordCollectorImpl leaks data to logs on error
 Key: KAFKA-7510
 URL: https://issues.apache.org/jira/browse/KAFKA-7510
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Mr Kafka


org.apache.kafka.streams.processor.internals.RecordCollectorImpl leaks data on 
error as it dumps the *value* / message payload to the logs.

This is problematic as it may contain personally identifiable information (pii) 
or other secret information to plain text log files which can then be 
propagated to other log systems i.e Splunk.

I suggest the *key*, and *value* fields be moved to debug level as it is useful 
for some people while error level contains the *errorMessage, timestamp, topic* 
and *stackTrace*.
{code:java}
private  void recordSendError(
final K key,
final V value,
final Long timestamp,
final String topic,
final Exception exception
) {
String errorLogMessage = LOG_MESSAGE;
String errorMessage = EXCEPTION_MESSAGE;
if (exception instanceof RetriableException) {
errorLogMessage += PARAMETER_HINT;
errorMessage += PARAMETER_HINT;
}
log.error(errorLogMessage, key, value, timestamp, topic, 
exception.toString());
sendException = new StreamsException(
String.format(
errorMessage,
logPrefix,
"an error caught",
key,
value,
timestamp,
topic,
exception.toString()
),
exception);
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7349) Long Disk Writes cause Zookeeper Disconnects

2018-08-27 Thread Adam Kafka (JIRA)
Adam Kafka created KAFKA-7349:
-

 Summary: Long Disk Writes cause Zookeeper Disconnects
 Key: KAFKA-7349
 URL: https://issues.apache.org/jira/browse/KAFKA-7349
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 0.11.0.1
Reporter: Adam Kafka
 Attachments: SpikeInWriteTime.png

We run our Kafka cluster on a cloud service provider. As a consequence, we 
notice a large tail latency write time that is out of our control. Some writes 
take on the order of seconds. We have noticed that often these long write times 
are correlated with subsequent Zookeeper disconnects from the brokers. It 
appears that during the long write time, the Zookeeper heartbeat thread does 
not get scheduled CPU time, resulting in a long gap of heartbeats sent. After 
the write, the ZK thread does get scheduled CPU time, but it detects that it 
has not received a heartbeat from Zookeeper in a while, so it drops its 
connection then rejoins the cluster.

Note that the timeout reported is inconsistent with the timeout as set by the 
configuration ({{zookeeper.session.timeout.ms}} = default of 6 seconds). We 
have seen a range of values reported here, including 5950ms (less than 
threshold), 12032ms (double the threshold), 25999ms (much larger than the 
threshold).

We noticed that during a service degradation of the storage service of our 
cloud provider, these Zookeeper disconnects increased drastically in frequency. 

We are hoping there is a way to decouple these components. Do you agree with 
our diagnosis that the ZK disconnects are occurring due to thread contention 
caused by long disk writes? Perhaps the ZK thread could be scheduled at a 
higher priority? Do you have any suggestions for how to avoid the ZK 
disconnects?

Here is an example of one of these events:
Logs on the Broker:
{code}
[2018-08-25 04:10:19,695] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:21,697] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:23,700] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:25,701] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:27,702] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:29,704] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:31,707] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:33,709] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:35,712] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:37,714] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:39,716] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:41,719] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
...
[2018-08-25 04:10:53,752] WARN Client session timed out, have not heard from 
server in 12032ms for sessionid 0x36202ab4337002c 
(org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:53,754] INFO Client session timed out, have not heard from 
server in 12032ms for sessionid 0x36202ab4337002c, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:53,920] INFO zookeeper state changed (Disconnected) 
(org.I0Itec.zkclient.ZkClient)
[2018-08-25 04:10:53,920] INFO Waiting for keeper state SyncConnected 
(org.I0Itec.zkclient.ZkClient)
...
{code}

GC logs during the same time (demonstrating this is not just a long GC):
{code}
2018-08-25T04:10:36.434+: 35150.779: [GC (Allocation Failure)  
3074119K->2529089K(6223360K), 0.0137342 secs]
2018-08-25T04:10:37.367+: 35151.713: [GC (Allocation Failure)  
3074433K->2528524K(6223360K), 0.0127938 secs]
2018-08-25T04:10:38.274+: 35152.620: [GC (Allocation Failure)  
3073868K->2528357K(6223360K), 0.0131040 secs]
2018-08-25T04:10:39.220+: 35153.566: [GC (Allocation Failure)  
3073701K->2528885K(6223360K), 0.0133247 secs]
2018-08-25T04:10:40.175+: 35154.520: [GC (Allocation Failure)  
3074229K->2528639K(6223360K), 0.0127870 secs]
2018-08-25T04:10:41.084+: 35155.429: [GC (Allocation Failure)  
3073983K->2530769K(6223360K), 0.0135058 secs]
2018-08-25T04:10:42.012+: 35156.358: [GC (Allocation Failure)  
3076113K->2531772K(6223360K), 0.0165919 secs]
2018-08-25T04:10:5

client use high cpu which caused by delayedFetch operation immediately return

2016-10-08 Thread Kafka
Hi all,
we found our consumer have high cpu load in our product enviroment,as 
we know,fetch.min.bytes and fetch.wait.ma <http://fetch.wait.ma/>x.ms will 
affect the frequency of consumer’s return,
so we adjust them to very big so that broker is very hard to satisfy it.
then we found the problem is not be solved,then we check the kafka’s 
code,we check delayedFetch’s tryComplete() function has these codes,

 if (endOffset.messageOffset != fetchOffset.messageOffset) {
  if (endOffset.onOlderSegment(fetchOffset)) {
// Case C, this can happen when the new fetch operation is on a 
truncated leader
debug("Satisfying fetch %s since it is fetching later segments 
of partition %s.".format(fetchMetadata, topicAndPartition))
return forceComplete()
  } else if (fetchOffset.onOlderSegment(endOffset)) {
// Case C, this can happen when the fetch operation is falling 
behind the current segment
// or the partition has just rolled a new segment
debug("Satisfying fetch %s immediately since it is fetching 
older segments.".format(fetchMetadata))
return forceComplete()
  } else if (fetchOffset.messageOffset < endOffset.messageOffset) {
// we need take the partition fetch size as upper bound when 
accumulating the bytes
accumulatedSize += 
math.min(endOffset.positionDiff(fetchOffset), fetchStatus.fetchInfo.fetchSize)
  }
}

so we can ensure that our fetchOffset’s segmentBaseOffset is not the same as 
endOffset’s segmentBaseOffset,then we check our topic-partition’s segment, we 
found the data in the segment is all cleaned by the kafka for log.retention.
and we guess that the  fetchOffset’s segmentBaseOffset is smaller than 
endOffset’s segmentBaseOffset leads this problem.

but my point is should we use we use these code to make client use less cpu,
   if (endOffset.messageOffset != fetchOffset.messageOffset) {
  if (endOffset.onOlderSegment(fetchOffset)) {
return false
  } else if (fetchOffset.onOlderSegment(endOffset)) {
return false
  }
}

and then it will response after fetch.wait.ma <http://fetch.wait.ma/>x.ms in 
this scene instead of immediately return.

Feedback is greatly appreciated. Thanks.





Re: Does Kafka 0.9 can guarantee not loss data

2016-09-23 Thread Kafka
Oh please ignore my last reply.
I find if leaderReplica.highWatermark.messageOffset >= requiredOffset , this 
can ensure all replicas’ leo  in curInSyncReplicas is >=  the requiredOffset.

> 在 2016年9月23日,下午3:39,Kafka <kafka...@126.com> 写道:
> 
> OK, the example before is not enough to exposure problem.
> What will happen to the situation under the numAcks is 1,and 
> curInSyncReplica.size >= minIsr,but in fact the replica in curInSyncReplica 
> only have one replica has caught up to leader,
> and this replica is the leader replica itself,this is not safe when the 
> machine that deploys leader partition’s broker is restart. 
> 
> current code is as belows,
> if (minIsr <= curInSyncReplicas.size) {
>(true, ErrorMapping.NoError)
>  } else {
>(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>  }
> 
> why not the code as belows,
> if (minIsr <= curInSyncReplicas.size && minIsr <= numAcks) {
>(true, ErrorMapping.NoError)
>  } else {
>(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>  }
> 
> Its seems that only one condition in kafka broker’s code is not enough to 
> ensure safe,because replicas in curInSyncReplicas is not Strong 
> synchronization.
> 
>> 在 2016年9月23日,下午1:45,Becket Qin <becket@gmail.com> 写道:
>> 
>> In order to satisfy a produce response, there are two conditions:
>> A. The leader's high watermark should be higher than the requiredOffset
>> (max offset in that produce request of that partition)
>> B. The number of in sync replica is greater than min.isr.
>> 
>> The ultimate goal here is to make sure at least min.isr number of replicas
>> has caught up to requiredOffset. So the check is not only whether we have
>> enough number of replicas in the isr, but also whether those replicas in
>> the ISR has caught up to the required offset.
>> 
>> In your example, if numAcks is 0 and curInSyncReplica.size >= minIsr, the
>> produce response won't return if min.isr > 0, because
>> leaderReplica.highWatermark must be less than requiredOffset given the fact
>> that numAcks is 0. i.e. condition A is not met.
>> 
>> We are actually even doing a stronger than necessary check here.
>> Theoretically as long as min.isr number of replicas has caught up to
>> requiredOffset, we should be able to return the response, but we also
>> require those replicas to be in the ISR.
>> 
>> On Thu, Sep 22, 2016 at 8:15 PM, Kafka <kafka...@126.com> wrote:
>> 
>>> @wangguozhang,could you give me some advices.
>>> 
>>>> 在 2016年9月22日,下午6:56,Kafka <kafka...@126.com> 写道:
>>>> 
>>>> Hi all,
>>>> in terms of topic, we create a topic with 6 partition,and each
>>> with 3 replicas.
>>>>  in terms of producer,when we send message with ack -1 using sync
>>> interface.
>>>> in terms of brokers,we set min.insync.replicas to 2.
>>>> 
>>>> after we review the kafka broker’s code,we know that we send a message
>>> to broker with ack -1, then we can get response if ISR of this partition is
>>> great than or equal to min.insync.replicas,but what confused
>>>> me is replicas in ISR is not strongly consistent,in kafka 0.9 we use
>>> replica.lag.time.max.ms param to judge whether to shrink ISR, and the
>>> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
>>>> we we restart broker which own this partitions’ leader, then controller
>>> will start a new leader election, which will choose the first replica in
>>> ISR that not equals to current leader as new leader, then this will loss
>>> data.
>>>> 
>>>> 
>>>> The main produce handle code shows below:
>>>> val numAcks = curInSyncReplicas.count(r => {
>>>>if (!r.isLocal)
>>>>  if (r.logEndOffset.messageOffset >= requiredOffset) {
>>>>trace("Replica %d of %s-%d received offset
>>> %d".format(r.brokerId, topic, partitionId, requiredOffset))
>>>>true
>>>>  }
>>>>  else
>>>>false
>>>>else
>>>>  true /* also count the local (leader) replica */
>>>>  })
>>>> 
>>>>  trace("%d acks satisfied for %s-%d with acks =
>>> -1".format(numAcks, topic, partitionId))
>>>> 
>>>>  val minIsr = leaderReplica.log.get.config.minInSyncReplicas

Re: Does Kafka 0.9 can guarantee not loss data

2016-09-23 Thread Kafka
OK, the example before is not enough to exposure problem.
What will happen to the situation under the numAcks is 1,and 
curInSyncReplica.size >= minIsr,but in fact the replica in curInSyncReplica 
only have one replica has caught up to leader,
and this replica is the leader replica itself,this is not safe when the machine 
that deploys leader partition’s broker is restart. 

current code is as belows,
if (minIsr <= curInSyncReplicas.size) {
(true, ErrorMapping.NoError)
  } else {
(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
  }

why not the code as belows,
if (minIsr <= curInSyncReplicas.size && minIsr <= numAcks) {
(true, ErrorMapping.NoError)
  } else {
(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
  }

Its seems that only one condition in kafka broker’s code is not enough to 
ensure safe,because replicas in curInSyncReplicas is not Strong synchronization.

> 在 2016年9月23日,下午1:45,Becket Qin <becket@gmail.com> 写道:
> 
> In order to satisfy a produce response, there are two conditions:
> A. The leader's high watermark should be higher than the requiredOffset
> (max offset in that produce request of that partition)
> B. The number of in sync replica is greater than min.isr.
> 
> The ultimate goal here is to make sure at least min.isr number of replicas
> has caught up to requiredOffset. So the check is not only whether we have
> enough number of replicas in the isr, but also whether those replicas in
> the ISR has caught up to the required offset.
> 
> In your example, if numAcks is 0 and curInSyncReplica.size >= minIsr, the
> produce response won't return if min.isr > 0, because
> leaderReplica.highWatermark must be less than requiredOffset given the fact
> that numAcks is 0. i.e. condition A is not met.
> 
> We are actually even doing a stronger than necessary check here.
> Theoretically as long as min.isr number of replicas has caught up to
> requiredOffset, we should be able to return the response, but we also
> require those replicas to be in the ISR.
> 
> On Thu, Sep 22, 2016 at 8:15 PM, Kafka <kafka...@126.com> wrote:
> 
>> @wangguozhang,could you give me some advices.
>> 
>>> 在 2016年9月22日,下午6:56,Kafka <kafka...@126.com> 写道:
>>> 
>>> Hi all,
>>>  in terms of topic, we create a topic with 6 partition,and each
>> with 3 replicas.
>>>   in terms of producer,when we send message with ack -1 using sync
>> interface.
>>>  in terms of brokers,we set min.insync.replicas to 2.
>>> 
>>> after we review the kafka broker’s code,we know that we send a message
>> to broker with ack -1, then we can get response if ISR of this partition is
>> great than or equal to min.insync.replicas,but what confused
>>> me is replicas in ISR is not strongly consistent,in kafka 0.9 we use
>> replica.lag.time.max.ms param to judge whether to shrink ISR, and the
>> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
>>> we we restart broker which own this partitions’ leader, then controller
>> will start a new leader election, which will choose the first replica in
>> ISR that not equals to current leader as new leader, then this will loss
>> data.
>>> 
>>> 
>>> The main produce handle code shows below:
>>> val numAcks = curInSyncReplicas.count(r => {
>>> if (!r.isLocal)
>>>   if (r.logEndOffset.messageOffset >= requiredOffset) {
>>> trace("Replica %d of %s-%d received offset
>> %d".format(r.brokerId, topic, partitionId, requiredOffset))
>>> true
>>>   }
>>>   else
>>> false
>>> else
>>>   true /* also count the local (leader) replica */
>>>   })
>>> 
>>>   trace("%d acks satisfied for %s-%d with acks =
>> -1".format(numAcks, topic, partitionId))
>>> 
>>>   val minIsr = leaderReplica.log.get.config.minInSyncReplicas
>>> 
>>>   if (leaderReplica.highWatermark.messageOffset >= requiredOffset
>> ) {
>>> /*
>>> * The topic may be configured not to accept messages if there
>> are not enough replicas in ISR
>>> * in this scenario the request was already appended locally and
>> then added to the purgatory before the ISR was shrunk
>>> */
>>> if (minIsr <= curInSyncReplicas.size) {
>>>   (true, ErrorMapping.NoError)
>>> } else {
>>>   (true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>>> }
>>>   } else
>>> (false, ErrorMapping.NoError)
>>> 
>>> 
>>> why only logging unAcks and not use numAcks to compare with minIsr, if
>> numAcks is 0, but curInSyncReplicas.size >= minIsr, then this will return,
>> as ISR shrink procedure is not real time, does this will loss data after
>> leader election?
>>> 
>>> Feedback is greatly appreciated. Thanks.
>>> meituan.inf
>>> 
>>> 
>>> 
>> 
>> 
>> 




Re: Does Kafka 0.9 can guarantee not loss data

2016-09-22 Thread Kafka
@wangguozhang,could you give me some advices.

> 在 2016年9月22日,下午6:56,Kafka <kafka...@126.com> 写道:
> 
> Hi all,   
>   in terms of topic, we create a topic with 6 partition,and each with 3 
> replicas.
>in terms of producer,when we send message with ack -1 using sync 
> interface.
>   in terms of brokers,we set min.insync.replicas to 2.
> 
> after we review the kafka broker’s code,we know that we send a message to 
> broker with ack -1, then we can get response if ISR of this partition is 
> great than or equal to min.insync.replicas,but what confused
> me is replicas in ISR is not strongly consistent,in kafka 0.9 we use 
> replica.lag.time.max.ms param to judge whether to shrink ISR, and the 
> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
> we we restart broker which own this partitions’ leader, then controller will 
> start a new leader election, which will choose the first replica in ISR that 
> not equals to current leader as new leader, then this will loss data.
> 
> 
> The main produce handle code shows below:
> val numAcks = curInSyncReplicas.count(r => {
>  if (!r.isLocal)
>if (r.logEndOffset.messageOffset >= requiredOffset) {
>  trace("Replica %d of %s-%d received offset 
> %d".format(r.brokerId, topic, partitionId, requiredOffset))
>  true
>}
>else
>  false
>  else
>true /* also count the local (leader) replica */
>})
> 
>trace("%d acks satisfied for %s-%d with acks = -1".format(numAcks, 
> topic, partitionId))
> 
>val minIsr = leaderReplica.log.get.config.minInSyncReplicas
> 
>if (leaderReplica.highWatermark.messageOffset >= requiredOffset ) {
>  /*
>  * The topic may be configured not to accept messages if there are 
> not enough replicas in ISR
>  * in this scenario the request was already appended locally and then 
> added to the purgatory before the ISR was shrunk
>  */
>  if (minIsr <= curInSyncReplicas.size) {
>(true, ErrorMapping.NoError)
>  } else {
>(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>  }
>} else
>  (false, ErrorMapping.NoError)
> 
> 
> why only logging unAcks and not use numAcks to compare with minIsr, if 
> numAcks is 0, but curInSyncReplicas.size >= minIsr, then this will return, as 
> ISR shrink procedure is not real time, does this will loss data after leader 
> election?
> 
> Feedback is greatly appreciated. Thanks.
> meituan.inf
> 
> 
> 




Does Kafka 0.9 can guarantee not loss data

2016-09-22 Thread Kafka
Hi all, 
in terms of topic, we create a topic with 6 partition,and each with 3 
replicas.
in terms of producer,when we send message with ack -1 using sync 
interface.
in terms of brokers,we set min.insync.replicas to 2.

after we review the kafka broker’s code,we know that we send a message to 
broker with ack -1, then we can get response if ISR of this partition is great 
than or equal to min.insync.replicas,but what confused
me is replicas in ISR is not strongly consistent,in kafka 0.9 we use 
replica.lag.time.max.ms param to judge whether to shrink ISR, and the defaults 
is 1 ms, so replicas’ data in isr can lag 1ms at most,
we we restart broker which own this partitions’ leader, then controller will 
start a new leader election, which will choose the first replica in ISR that 
not equals to current leader as new leader, then this will loss data.


The main produce handle code shows below:
val numAcks = curInSyncReplicas.count(r => {
  if (!r.isLocal)
if (r.logEndOffset.messageOffset >= requiredOffset) {
  trace("Replica %d of %s-%d received offset %d".format(r.brokerId, 
topic, partitionId, requiredOffset))
  true
}
else
  false
  else
true /* also count the local (leader) replica */
})

trace("%d acks satisfied for %s-%d with acks = -1".format(numAcks, 
topic, partitionId))

val minIsr = leaderReplica.log.get.config.minInSyncReplicas

if (leaderReplica.highWatermark.messageOffset >= requiredOffset ) {
  /*
  * The topic may be configured not to accept messages if there are not 
enough replicas in ISR
  * in this scenario the request was already appended locally and then 
added to the purgatory before the ISR was shrunk
  */
  if (minIsr <= curInSyncReplicas.size) {
(true, ErrorMapping.NoError)
  } else {
(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
  }
} else
  (false, ErrorMapping.NoError)


why only logging unAcks and not use numAcks to compare with minIsr, if numAcks 
is 0, but curInSyncReplicas.size >= minIsr, then this will return, as ISR 
shrink procedure is not real time, does this will loss data after leader 
election?

Feedback is greatly appreciated. Thanks.
meituan.inf





Re: Compacted topic cannot accept message without key

2016-07-18 Thread Kafka
thanks for your answer,I know the necessity of key for compacted topics,and as 
you know,__consumer_offsets is a internal compacted topic in kafka,and it’s key 
is a triple of <groupid, topic, partition>,these errors are occurred when the 
consumer client wants to commit group offset.
so why does his happen?


> 在 2016年7月19日,上午1:27,Dustin Cote <dus...@confluent.io> 写道:
> 
> Compacted topics require keyed messages in order for compaction to
> function.  The solution is to provide a key for your messages.  I would
> suggest reading the wiki on log compaction.
> <https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction>
> 
> On Mon, Jul 18, 2016 at 12:03 PM, Kafka <kafka...@126.com> wrote:
> 
>> Hi,
>>The server log shows error as belows on broker 0.9.0.
>>ERROR [Replica Manager on Broker 0]: Error processing append
>> operation on partition [__consumer_offsets,5] (kafka.server.ReplicaManager)
>> kafka.message.InvalidMessageException: Compacted topic cannot accept
>> message without key.
>> 
>> Why does this happen and what’s the solution?
>> 
>> 
>> 
> 
> 
> -- 
> Dustin Cote
> confluent.io




Compacted topic cannot accept message without key

2016-07-18 Thread Kafka
Hi, 
The server log shows error as belows on broker 0.9.0.
ERROR [Replica Manager on Broker 0]: Error processing append operation 
on partition [__consumer_offsets,5] (kafka.server.ReplicaManager)
kafka.message.InvalidMessageException: Compacted topic cannot accept message 
without key.

Why does this happen and what’s the solution?




__consumer_offsets leader is -1

2016-07-07 Thread Kafka
Hi,  __consumer_offsets ’s partition 7 and partition 27  leader is -1, and isr 
is null,who can tell me how to recover it,thank you.

Topic: __consumer_offsets   Partition: 0Leader: 3   Replicas: 3,4,5 
Isr: 4,5,3
Topic: __consumer_offsets   Partition: 1Leader: 4   
Replicas: 4,5,0 Isr: 0,4,5
Topic: __consumer_offsets   Partition: 2Leader: 5   
Replicas: 5,0,1 Isr: 0,5,1
Topic: __consumer_offsets   Partition: 3Leader: 0   
Replicas: 0,1,3 Isr: 0,1,3
Topic: __consumer_offsets   Partition: 4Leader: 1   
Replicas: 1,3,4 Isr: 4,1,3
Topic: __consumer_offsets   Partition: 5Leader: 3   
Replicas: 3,5,0 Isr: 5,3,0
Topic: __consumer_offsets   Partition: 6Leader: 4   
Replicas: 4,0,1 Isr: 0,4,1
Topic: __consumer_offsets   Partition: 7Leader: -1  
Replicas: 5,1,3 Isr:
Topic: __consumer_offsets   Partition: 8Leader: 0   
Replicas: 0,3,4 Isr: 0,4,3
Topic: __consumer_offsets   Partition: 9Leader: 1   
Replicas: 1,4,5 Isr: 4,5,1
Topic: __consumer_offsets   Partition: 10   Leader: 3   
Replicas: 3,0,1 Isr: 0,1,3
Topic: __consumer_offsets   Partition: 11   Leader: 4   
Replicas: 4,1,3 Isr: 4,1,3
Topic: __consumer_offsets   Partition: 12   Leader: 5   
Replicas: 5,3,4 Isr: 4,5,3
Topic: __consumer_offsets   Partition: 13   Leader: 0   
Replicas: 0,4,5 Isr: 0,4,5
Topic: __consumer_offsets   Partition: 14   Leader: 1   
Replicas: 1,5,0 Isr: 0,5,1
Topic: __consumer_offsets   Partition: 15   Leader: 3   
Replicas: 3,1,4 Isr: 1,4,3
Topic: __consumer_offsets   Partition: 16   Leader: 4   
Replicas: 4,3,5 Isr: 4,5,3
Topic: __consumer_offsets   Partition: 17   Leader: 5   
Replicas: 5,4,0 Isr: 0,4,5
Topic: __consumer_offsets   Partition: 18   Leader: 0   
Replicas: 0,5,1 Isr: 0,5,1
Topic: __consumer_offsets   Partition: 19   Leader: 1   
Replicas: 1,0,3 Isr: 0,1,3
Topic: __consumer_offsets   Partition: 20   Leader: 3   
Replicas: 3,4,5 Isr: 4,5,3
Topic: __consumer_offsets   Partition: 21   Leader: 4   
Replicas: 4,5,0 Isr: 0,4,5
Topic: __consumer_offsets   Partition: 22   Leader: 5   
Replicas: 5,0,1 Isr: 0,5,1
Topic: __consumer_offsets   Partition: 23   Leader: 0   
Replicas: 0,1,3 Isr: 0,1,3
Topic: __consumer_offsets   Partition: 24   Leader: 1   
Replicas: 1,3,4 Isr: 4,1,3
Topic: __consumer_offsets   Partition: 25   Leader: 3   
Replicas: 3,5,0 Isr: 5,3,0
Topic: __consumer_offsets   Partition: 26   Leader: 4   
Replicas: 4,0,1 Isr: 0,4,1
Topic: __consumer_offsets   Partition: 27   Leader: -1  
Replicas: 5,1,3 Isr:


Re: delay of producer and consumer in kafka 0.9 is too big to be accepted

2016-06-27 Thread Kafka
can someone please explain why latency is so big for me?thanks

> 在 2016年6月25日,下午11:16,Jay Kreps <j...@confluent.io> 写道:
> 
> Can you sanity check this with the end-to-end latency test that ships with
> Kafka in the tools package?
> 
> https://apache.googlesource.com/kafka/+/1769642bb779921267bd57d3d338591dbdf33842/core/src/main/scala/kafka/tools/TestEndToEndLatency.scala
> 
> On Saturday, June 25, 2016, Kafka <kafka...@126.com> wrote:
> 
>> Hi all,
>>my kafka cluster is composed of three brokers with each have 8core
>> cpu and 8g memory and 1g network card.
>>with java async client,I sent 100 messages with size of 1024
>> bytes per message ,the send gap between each sending is 20us,the consumer’s
>> config is like this,fetch.min.bytes is set to 1, fetch.wait.max.ms is set
>> to 100.
>>to avoid the inconformity bewteen two machines,I start producer
>> and consumer at the same machine,the machine’s configurations  have enough
>> resources to satisfy these two clients.
>> 
>>I start consumer before producer on each test,with the sending
>> timestamp in each message,when consumer receive the message,then I can got
>> the consumer delay through the substraction between current timesstamp and
>> sending timestamp.
>>when I set acks to 0,replica to 2,then the average producer delay
>> is 2.98ms, the average consumer delay is 52.23ms.
>>when I set acks to 1,replica to 2,then the average producer delay
>> is 3.9ms,the average consumer delay is 44.88ms.
>>when I set acks to -1, replica to 2, then the average producer
>> delay is 1782ms, the average consumer delay is 1786ms.
>> 
>>I have two doubts,the first is why my  consumer's delay with acks
>> settled to 0  is logger than the consumer delay witch acks settled to 1.
>> the second is why the delay of producer and consumer is so big when I set
>> acks to -1,I think this delay is can not be accepted.
>>and I found this delay is amplified with sending more messages.
>> 
>>any feedback is appreciated.
>> thanks
>> 
>> 
>> 
>> 




delay of producer and consumer in kafka 0.9 is too big to be accepted

2016-06-25 Thread Kafka
Hi all,
my kafka cluster is composed of three brokers with each have 8core cpu 
and 8g memory and 1g network card.
with java async client,I sent 100 messages with size of 1024 bytes 
per message ,the send gap between each sending is 20us,the consumer’s config is 
like this,fetch.min.bytes is set to 1, fetch.wait.max.ms is set to 100.
to avoid the inconformity bewteen two machines,I start producer and 
consumer at the same machine,the machine’s configurations  have enough 
resources to satisfy these two clients.

I start consumer before producer on each test,with the sending 
timestamp in each message,when consumer receive the message,then I can got the 
consumer delay through the substraction between current timesstamp and sending 
timestamp.
when I set acks to 0,replica to 2,then the average producer delay is 
2.98ms, the average consumer delay is 52.23ms.
when I set acks to 1,replica to 2,then the average producer delay is 
3.9ms,the average consumer delay is 44.88ms.
when I set acks to -1, replica to 2, then the average producer delay is 
1782ms, the average consumer delay is 1786ms.

I have two doubts,the first is why my  consumer's delay with acks 
settled to 0  is logger than the consumer delay witch acks settled to 1.
the second is why the delay of producer and consumer is so big when I set acks 
to -1,I think this delay is can not be accepted.
and I found this delay is amplified with sending more messages.

any feedback is appreciated. 
thanks





Re: test of producer's delay and consumer's delay

2016-06-18 Thread Kafka
To Christian Posta 
<https://www.mail-archive.com/search?l=dev@kafka.apache.org=from:%22Christian+Posta%22>,
I have taken into account the  interpretation of time.
my producer and consumer are deployed on the same  machine, the 
machine’s configuration is very good, so it will not be the bottlenecks.

> 在 2016年6月18日,下午10:29,Kafka <kafka...@126.com> 写道:
> 
> I send every message with timestamp, and when I receive a message,I do a 
> subtraction between current timestamp and message’s timestamp.  then I get 
> the consumer’s delay.
> 
>> 在 2016年6月18日,上午11:28,Kafka <kafka...@126.com> 写道:
>> 
>> 
> 



Re: test of producer's delay and consumer's delay

2016-06-18 Thread Kafka
I send every message with timestamp, and when I receive a message,I do a 
subtraction between current timestamp and message’s timestamp.  then I get the 
consumer’s delay.

> 在 2016年6月18日,上午11:28,Kafka <kafka...@126.com> 写道:
> 
> 




test of producer's delay and consumer's delay

2016-06-17 Thread Kafka
hello,I have done a series of tests on kafka 0.9.0,and one of the results 
confused me.

test enviroment:
 kafka cluster: 3 brokers,8core cpu / 8g mem /1g netcard
 client:4core cpu/4g mem
 topic:6 partitions,2 replica
 
 total messages:1
 singal message size:1024byte
 fetch.min.bytes:1
 fetch.wait.max.ms:100ms

all send tests are under the enviroment of using scala sync interface,

when I set ack to 0,the producer’s delay is 0.3ms,the consumer’s delay is 7.7ms
when I set ack to 1,the producer's delay is 1.6ms, the consumer’s delay is 3.7ms
when I set ack to -1,the produce's delay is 3.5ms, the consumer’s delay is 4.2ms

but why consumer’s delay is decreased when I set ack from 0 to 1,its confused 
me。




kafka 0.8.2.1 High CPU load

2016-04-22 Thread Kafka
Hi,we are using kafka_2.10-0.8.2.1,and our vm machine config is:4 core,8G
our cluster is consist of three brokers,and our broker config is default 2 
replica,our broker load often very high once in a while,
load is greater than 1.5 on average core。

we have about 70 topics on this cluster
when we use Top util, we can see 280% cpu unilization, then i use JSTACK, I 
found there are 4 threads use cpu most, which show below:
"kafka-network-thread-9092-0" prio=10 tid=0x7f46c8709000 nid=0x35dd 
runnable [0x7f46b73f2000]
  java.lang.Thread.State: RUNNABLE
"kafka-network-thread-9092-1" prio=10 tid=0x7f46c873c000 nid=0x35de 
runnable [0x7f46b75f4000]
/kafka-network-thread
"kafka-network-thread-9092-2" prio=10 tid=0x7f46c8756000 nid=0x35df 
runnable [0x7f46b7cfb000]
  java.lang.Thread.State: RUNNABLE
"kafka-network-thread-9092-3" prio=10 tid=0x7f46c876f800 nid=0x35e0 
runnable [0x7f46b5adb000]
  java.lang.Thread.State: RUNNABLE


I found one task:https:// <https://issues.apache.org/jira/browse/KAFKA-493 
<https://issues.apache.org/jira/browse/KAFKA-493>>issues.apache.org/jira/browse/KAFKA-493
 
<http://issues.apache.org/jira/browse/KAFKA-493><http://issues.apache.org/jira/browse/KAFKA-493
 <http://issues.apache.org/jira/browse/KAFKA-493>> concerns this, but I think 
this does’t fit me, because we have other clusters that deployed in 
the same config’s vm machine, and others are not occurred this.

the brokers config is same,but we add:

replica.fetch.wait.max.ms=100
num.replica.fetchers=2


I want to question does is the main reason, or have other reason that leads to 
high cpu load?