Deleting topics in windows - fix estimate or workaround

2019-11-30 Thread Sachin Mittal
Hi All,
I hope we are well aware of the critical bug in windows where kafka crashes
when we delete a topic. This affects other areas too like in stream
processing when trying to reset a stream using StreamsResetter.

Right now only workaround I have found is to stop zookeeper and kafka
server and manually delete directories and files containing streams, topics
and offset information.

However doing this every time is kind of unproductive.

I see that there are multiple critical bugs logged in JIRA
https://issues.apache.org/jira/browse/KAFKA-6203
https://issues.apache.org/jira/browse/KAFKA-1194
around the same issue.

I would like to know by when would the fix be available?
I see that there have been multiple pull requests issued around fixes of
these issues.

I wanted to know if one or more pull requests need to be merged to get the
fix out or if there is something I can try config wise to have some
workaround for this issue.

Please note that there can be few of us who might be using windows in
production too, so this fix is highly important for us.

Please let me know what can be done to address this issue.

Thanks
Sachin


RE: More partitions => less throughput?

2019-11-30 Thread Eric Owhadi
What is happening imho is that when you have multiple partitions, each consumer 
will fetch data from its partition and find only 1/64th the amount of data 
(compared to the single partition case) to send every time it is its turn to 
send stuff. Therefore you end up having a more chatty situation, where each 
push to broker carry too small number of messages, compared to the single 
partition case that optimize can perform the same function but each set of 
message send to broker contains higher message count.
Eric

-Original Message-
From: Craig Pastro  
Sent: Thursday, November 28, 2019 9:10 PM
To: users@kafka.apache.org
Subject: More partitions => less throughput?

External

Hello there,

I was wondering if anyone here could help me with some insight into a conundrum 
that I am facing.

Basically, the story is that I am running three Kafka brokers via docker on a 
single vm with log.flush.interval.messages = 1 and min.insync.replicas = 2. 
Then I create two topics: both with replication factor = 3, but one with one 
partition and the other with 64.

Then I try to run a benchmark using these topics and what I find is as
follows:

1 partition, 1381.02 records/sec,  685.87 ms average latency
64 partitions, 601.00 records/sec, 1298.18 ms average latency

This is the opposite of what I expected. In neither case am I even close to the 
IOPS of what the disk can handle. So what I would like to know is if there is 
any obvious reason that I am missing for the slow down with more partitions?

If it is helpful the docker-compose file and the code to do the benchmarking 
can be found at https://github.com/siyopao/kafka-benchmark.
(Any comments or advice on how to make the code better are greatly
appreciated!) The benchmarking code is inspired by and very similar to what the 
bin/kafka-producer-perf-test.sh script does.

Thank you!

Best wishes,
Craig


[VOTE] 2.4.0 RC2

2019-11-30 Thread Manikumar
Hello Kafka users, developers and client-developers,

This is the third candidate for release of Apache Kafka 2.4.0.

This release includes many new features, including:
- Allow consumers to fetch from closest replica
- Support for incremental cooperative rebalancing to the consumer rebalance
protocol
- MirrorMaker 2.0 (MM2), a new multi-cluster, cross-datacenter replication
engine
- New Java authorizer Interface
- Support for  non-key joining in KTable
- Administrative API for replica reassignment
- Sticky partitioner
- Return topic metadata and configs in CreateTopics response
- Securing Internal connect REST endpoints
- API to delete consumer offsets and expose it via the AdminClient.

Release notes for the 2.4.0 release:
https://home.apache.org/~manikumar/kafka-2.4.0-rc2/RELEASE_NOTES.html

*** Please download, test and vote by Thursday, December 5th, 9am PT

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~manikumar/kafka-2.4.0-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~manikumar/kafka-2.4.0-rc2/javadoc/

* Tag to be voted upon (off 2.4 branch) is the 2.4.0 tag:
https://github.com/apache/kafka/releases/tag/2.4.0-rc2

* Documentation:
https://kafka.apache.org/24/documentation.html

* Protocol:
https://kafka.apache.org/24/protocol.html

Thanks,
Manikumar


Re: More partitions => less throughput?

2019-11-30 Thread Peter Bukowinski
Testing multiple brokers VMs on a single host won’t give you accurate 
performance numbers unless that is how you will be deploying kafka in 
production. (Don’t do this.) All your kafka networking is being handled by a 
single host, so instead of being spread out between machines to increase total 
possible throughput, they are competing with each other.

Given that this is the test environment you settled on, you should tune the 
number of partitions taking number of producers and consumers, and also the 
average message size into account. If you have only one producer, then a single 
consumer should be sufficient to read the data in real-time. If you have 
multiple producers, you may need to scale up the consumer count and use 
consumer groups.

-- Peter

> On Nov 30, 2019, at 8:57 AM, Tom Brown  wrote:
> 
> I think the number of partitions needs to be tuned to the size of the
> cluster; 64 partitions on what is essentially a single box seems high. Do
> you know what hardware you will be deploying on in production? Can you run
> your benchmark on that instead of a vm?
> 
> —Tom
> 
>> On Thursday, November 28, 2019, Craig Pastro  wrote:
>> 
>> Hello there,
>> 
>> I was wondering if anyone here could help me with some insight into a
>> conundrum that I am facing.
>> 
>> Basically, the story is that I am running three Kafka brokers via docker on
>> a single vm with log.flush.interval.messages = 1 and min.insync.replicas =
>> 2. Then I create two topics: both with replication factor = 3, but one with
>> one partition and the other with 64.
>> 
>> Then I try to run a benchmark using these topics and what I find is as
>> follows:
>> 
>> 1 partition, 1381.02 records/sec,  685.87 ms average latency
>> 64 partitions, 601.00 records/sec, 1298.18 ms average latency
>> 
>> This is the opposite of what I expected. In neither case am I even close to
>> the IOPS of what the disk can handle. So what I would like to know is if
>> there is any obvious reason that I am missing for the slow down with more
>> partitions?
>> 
>> If it is helpful the docker-compose file and the code to do the
>> benchmarking can be found at https://github.com/siyopao/kafka-benchmark.
>> (Any comments or advice on how to make the code better are greatly
>> appreciated!) The benchmarking code is inspired by and very similar to what
>> the bin/kafka-producer-perf-test.sh script does.
>> 
>> Thank you!
>> 
>> Best wishes,
>> Craig
>> 


Re: More partitions => less throughput?

2019-11-30 Thread Tom Brown
I think the number of partitions needs to be tuned to the size of the
cluster; 64 partitions on what is essentially a single box seems high. Do
you know what hardware you will be deploying on in production? Can you run
your benchmark on that instead of a vm?

—Tom

On Thursday, November 28, 2019, Craig Pastro  wrote:

> Hello there,
>
> I was wondering if anyone here could help me with some insight into a
> conundrum that I am facing.
>
> Basically, the story is that I am running three Kafka brokers via docker on
> a single vm with log.flush.interval.messages = 1 and min.insync.replicas =
> 2. Then I create two topics: both with replication factor = 3, but one with
> one partition and the other with 64.
>
> Then I try to run a benchmark using these topics and what I find is as
> follows:
>
> 1 partition, 1381.02 records/sec,  685.87 ms average latency
> 64 partitions, 601.00 records/sec, 1298.18 ms average latency
>
> This is the opposite of what I expected. In neither case am I even close to
> the IOPS of what the disk can handle. So what I would like to know is if
> there is any obvious reason that I am missing for the slow down with more
> partitions?
>
> If it is helpful the docker-compose file and the code to do the
> benchmarking can be found at https://github.com/siyopao/kafka-benchmark.
> (Any comments or advice on how to make the code better are greatly
> appreciated!) The benchmarking code is inspired by and very similar to what
> the bin/kafka-producer-perf-test.sh script does.
>
> Thank you!
>
> Best wishes,
> Craig
>


Re: [VOTE] 2.4.0 RC1

2019-11-30 Thread Manikumar
Hi All,

We will consider KAFKA-9244
 as blocker and include
the fix in 2.4 release.

I am canceling this VOTE and will create third release candidate.

Thank you all for testing.

On Fri, Nov 29, 2019 at 10:52 AM Matthias J. Sax 
wrote:

> I did not find the bug -- it was reported by Kin Sui
> (https://issues.apache.org/jira/browse/KAFKA-9244)
>
> If the bug is a blocker is a judgment call thought, because it's
> technically not a regression. However, if we don't include the fix in
> 2.4.0, as Adam pointed out, the new foreign-key join would compute
> incorrect results, and thus, it's at least a critical issue.
>
>
> -Matthias
>
>
>
> On 11/28/19 11:48 AM, Adam Bellemare wrote:
> > mjsax found an important issue for the foreign-key joiner, which I think
> > should be a blocker (if it isn't already) since it is functionally
> > incorrect without the fix:
> >
> > https://github.com/apache/kafka/pull/7758
> >
> >
> >
> > On Tue, Nov 26, 2019 at 6:26 PM Sean Glover 
> > wrote:
> >
> >> Hi,
> >>
> >> I also used Eric's test script.  I had a few issues running it that I
> >> address below[0][1], otherwise looks good.
> >>
> >> - Signing keys all good
> >> - All md5, sha1sums and sha512sums are good
> >> - A couple transient test failures that passed on a second run
> >> (ReassignPartitionsClusterTest.shouldMoveSinglePartitionWithinBroker,
> >> SaslScramSslEndToEndAuthorizationTest.
> >> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl)
> >> - Passes our own test suite for Alpakka Kafka (
> >> https://travis-ci.org/akka/alpakka-kafka/builds/616861540,
> >> https://github.com/akka/alpakka-kafka/pull/971)
> >>
> >> +1 (non-binding)
> >>
> >> ..
> >>
> >> Issues while running test script:
> >>
> >> [0] Error with Eric test script.  I had an issue running the script
> with my
> >> version of bash (TMPDIR was unassigned), which I provided a PR for (
> >> https://github.com/elalonde/kafka/pull/1)
> >> [1] Gradle incompatibility. I ran into difficulty running the gradle
> build
> >> with the latest version of gradle (6.0.1).  I had to revert to the last
> >> patch of version 5 (5.6.4):
> >>
> >>  ✘ seglo@slice 
> /tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src 
> >> gradle wrapper --warning-mode all
> >>
> >>> Configure project :
> >> The maven plugin has been deprecated. This is scheduled to be removed in
> >> Gradle 7.0. Please use the maven-publish plugin instead.
> >> at
> >>
> >>
> build_c0129pbfzzxjolwxmds3lsevz$_run_closure5.doCall(/tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src/build.gradle:160)
> >> (Run with --stacktrace to get the full stack trace of this
> >> deprecation warning.)
> >>
> >> FAILURE: Build failed with an exception.
> >>
> >> * Where:
> >> Build file
> >> '/tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src/build.gradle'
> line:
> >> 472
> >>
> >> * What went wrong:
> >> A problem occurred evaluating root project 'kafka-2.4.0-src'.
> >>> Could not create task ':clients:spotbugsMain'.
> >>> Could not create task of type 'SpotBugsTask'.
> >>   > Could not create an instance of type
> >> com.github.spotbugs.internal.SpotBugsReportsImpl.
> >>  >
> >>
> >>
> org.gradle.api.reporting.internal.TaskReportContainer.(Ljava/lang/Class;Lorg/gradle/api/Task;)V
> >>
> >> * Try:
> >> Run with --stacktrace option to get the stack trace. Run with --info or
> >> --debug option to get more log output. Run with --scan to get full
> >> insights.
> >>
> >> * Get more help at https://help.gradle.org
> >>
> >> BUILD FAILED in 699ms
> >>
> >> On Tue, Nov 26, 2019 at 1:31 PM Manikumar 
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> Please download, test and vote the RC1 in order to provide quality
> >>> assurance for the forthcoming 2.4 release.
> >>>
> >>> Thanks.
> >>>
> >>> On Tue, Nov 26, 2019 at 8:11 PM Adam Bellemare <
> adam.bellem...@gmail.com
> >>>
> >>> wrote:
> >>>
>  Hello,
> 
>  Ran Eric's test script:
>  $ git clone https://github.com/elalonde/kafka
>  $ ./kafka/bin/verify-kafka-rc.sh 2.4.0
>  https://home.apache.org/~manikumar/kafka-2.4.0-rc1
>  
> 
>  - All PGP signatures are good
>  - All md5, sha1sums and sha512sums pass
>  - Had a few intermittent failures in tests that passed upon rerunning.
> 
>  +1 (non-binding) from me.
> 
>  Adam
> 
>  On Wed, Nov 20, 2019 at 10:37 AM Manikumar  >
>  wrote:
> 
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second candidate for release of Apache Kafka 2.4.0.
> >
> > This release includes many new features, including:
> > - Allow consumers to fetch from closest replica
> > - Support for incremental cooperative rebalancing to the consumer
>  rebalance
> > protocol
> > - MirrorMaker 2.0 (MM2), a new multi-cluster, cross-datacenter
>  replication
> > engine
>