Problem with Kafka KRaft 3.1.X

2022-09-01 Thread Paul Brebner
Hi all,

I've been attempting to benchmark Kafka KRaft version for an ApacheCon talk
and have identified 2 problems:

1 - it's still impossible to create large number of partitions/topics - I
can create more than the comparable Zookeeper version but still not
"millions" - this is with RF=1 (as anything higher needs huge clusters to
cope with the replication CPU overhead) only, and no load on the clusters
yet (i.e. purely a topic/partition creation experiment).

2 - eventually the topic/partition creation command causes the Kafka
process to fail - looks like a memory error -

java.lang.OutOfMemoryError: Metaspace
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x7f4f554f9000, 65536, 1) failed; error='Not enough
space' (errno=12)

or similar error

seems to happen consistently around 30,000+ partitions - this is on a test
EC2 instance with 32GB Ram, 500,000 file descriptors (increased from
default) and 64GB disk (plenty spare). I'm not an OS expert, but the kafka
process and the OS both seem to have plenty of RAM when this error occurs.

So there's 3 questions really: What's going wrong exactly? How to achieve
more partitions? And should the topic create command (just using the CLI at
present to create topics) really be capable of killing the Kafka instance,
or should it fail and throw an error, and the Kafka instance still continue
working...

Regards, Paul Brebner


Re: Problem with Kafka KRaft 3.1.X

2022-09-09 Thread Paul Brebner
Colin, hi, current max partitions reached is about 600,000 - I had to
increase Linux file descriptors, mmap, and tweak the JVM heap settings a
bit - heap error again.
This is a bit of a hack to, as RF=1 and only a single EC2 instance - a
proper 3 node cluster would in theory give >1M partitions which was what I
really wanted to test out. I think I was also hitting this error attempting
to create a single topic with lots of partitions:
https://github.com/apache/kafka/pull/12595
Current approach is to create multiple topics with 1000 partitions each, or
single topic and increase the number of partitions.
I've also got some good numbers around speed of meta data operations of
Zookeeper vs. KRaft mode (KRaft lots faster = O(1) c.f. O(n) for ZK) etc.
Anyway I'm happy I've got some numbers to report for my talk now, thanks
for the info.

Regards, Paul

On Sat, 10 Sept 2022 at 02:43, Colin McCabe  wrote:

> Hi Paul,
>
> As Keith wrote, it does sound like you are hitting a separate Linux limit
> like the max mmap count.
>
> I'm curious how many partitions you can create if you change that config!
>
> best,
> Colin
>
>
> On Tue, Sep 6, 2022, at 14:02, Keith Paulson wrote:
> > I've had similar errors cause by mmap counts; try with
> > vm.max_map_count=262144
> >
> >
> > On 2022/09/01 23:57:54 Paul Brebner wrote:
> >> Hi all,
> >>
> >> I've been attempting to benchmark Kafka KRaft version for an ApacheCon
> > talk
> >> and have identified 2 problems:
> >>
> >> 1 - it's still impossible to create large number of partitions/topics -
> I
> >> can create more than the comparable Zookeeper version but still not
> >> "millions" - this is with RF=1 (as anything higher needs huge clusters
> to
> >> cope with the replication CPU overhead) only, and no load on the
> clusters
> >> yet (i.e. purely a topic/partition creation experiment).
> >>
> >> 2 - eventually the topic/partition creation command causes the Kafka
> >> process to fail - looks like a memory error -
> >>
> >> java.lang.OutOfMemoryError: Metaspace
> >> OpenJDK 64-Bit Server VM warning: INFO:
> >> os::commit_memory(0x7f4f554f9000, 65536, 1) failed; error='Not
> enough
> >> space' (errno=12)
> >>
> >> or similar error
> >>
> >> seems to happen consistently around 30,000+ partitions - this is on a
> test
> >> EC2 instance with 32GB Ram, 500,000 file descriptors (increased from
> >> default) and 64GB disk (plenty spare). I'm not an OS expert, but the
> kafka
> >> process and the OS both seem to have plenty of RAM when this error
> occurs.
> >>
> >> So there's 3 questions really: What's going wrong exactly? How to
> achieve
> >> more partitions? And should the topic create command (just using the CLI
> > at
> >> present to create topics) really be capable of killing the Kafka
> instance,
> >> or should it fail and throw an error, and the Kafka instance still
> > continue
> >> working...
> >>
> >> Regards, Paul Brebner
> >>
>


Re: Problem with Kafka KRaft 3.1.X

2022-09-11 Thread Paul Brebner
Thanks, that fix would be nice :-) Paul

On Mon, 12 Sept 2022 at 10:41, Colin McCabe  wrote:

> Thanks, Paul. I would be really curious to see the talk when you're done :)
>
> BTW, David Arthur posted a KIP recently that should avoid the upper limit
> on the number of elements in a batch for CreateTopics or CreatePartitions
> when it's done.
>
> best,
> Colin
>
>
> On Fri, Sep 9, 2022, at 17:22, Paul Brebner wrote:
> > Colin, hi, current max partitions reached is about 600,000 - I had to
> > increase Linux file descriptors, mmap, and tweak the JVM heap settings a
> > bit - heap error again.
> > This is a bit of a hack to, as RF=1 and only a single EC2 instance - a
> > proper 3 node cluster would in theory give >1M partitions which was what
> I
> > really wanted to test out. I think I was also hitting this error
> attempting
> > to create a single topic with lots of partitions:
> > https://github.com/apache/kafka/pull/12595
> > Current approach is to create multiple topics with 1000 partitions each,
> or
> > single topic and increase the number of partitions.
> > I've also got some good numbers around speed of meta data operations of
> > Zookeeper vs. KRaft mode (KRaft lots faster = O(1) c.f. O(n) for ZK) etc.
> > Anyway I'm happy I've got some numbers to report for my talk now, thanks
> > for the info.
> >
> > Regards, Paul
> >
> > On Sat, 10 Sept 2022 at 02:43, Colin McCabe  wrote:
> >
> >> Hi Paul,
> >>
> >> As Keith wrote, it does sound like you are hitting a separate Linux
> limit
> >> like the max mmap count.
> >>
> >> I'm curious how many partitions you can create if you change that
> config!
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Tue, Sep 6, 2022, at 14:02, Keith Paulson wrote:
> >> > I've had similar errors cause by mmap counts; try with
> >> > vm.max_map_count=262144
> >> >
> >> >
> >> > On 2022/09/01 23:57:54 Paul Brebner wrote:
> >> >> Hi all,
> >> >>
> >> >> I've been attempting to benchmark Kafka KRaft version for an
> ApacheCon
> >> > talk
> >> >> and have identified 2 problems:
> >> >>
> >> >> 1 - it's still impossible to create large number of
> partitions/topics -
> >> I
> >> >> can create more than the comparable Zookeeper version but still not
> >> >> "millions" - this is with RF=1 (as anything higher needs huge
> clusters
> >> to
> >> >> cope with the replication CPU overhead) only, and no load on the
> >> clusters
> >> >> yet (i.e. purely a topic/partition creation experiment).
> >> >>
> >> >> 2 - eventually the topic/partition creation command causes the Kafka
> >> >> process to fail - looks like a memory error -
> >> >>
> >> >> java.lang.OutOfMemoryError: Metaspace
> >> >> OpenJDK 64-Bit Server VM warning: INFO:
> >> >> os::commit_memory(0x7f4f554f9000, 65536, 1) failed; error='Not
> >> enough
> >> >> space' (errno=12)
> >> >>
> >> >> or similar error
> >> >>
> >> >> seems to happen consistently around 30,000+ partitions - this is on a
> >> test
> >> >> EC2 instance with 32GB Ram, 500,000 file descriptors (increased from
> >> >> default) and 64GB disk (plenty spare). I'm not an OS expert, but the
> >> kafka
> >> >> process and the OS both seem to have plenty of RAM when this error
> >> occurs.
> >> >>
> >> >> So there's 3 questions really: What's going wrong exactly? How to
> >> achieve
> >> >> more partitions? And should the topic create command (just using the
> CLI
> >> > at
> >> >> present to create topics) really be capable of killing the Kafka
> >> instance,
> >> >> or should it fail and throw an error, and the Kafka instance still
> >> > continue
> >> >> working...
> >> >>
> >> >> Regards, Paul Brebner
> >> >>
> >>
>


[jira] [Created] (KAFKA-6973) setting invalid timestamp causes Kafka broker restart to fail

2018-05-30 Thread Paul Brebner (JIRA)
Paul Brebner created KAFKA-6973:
---

 Summary: setting invalid timestamp causes Kafka broker restart to 
fail
 Key: KAFKA-6973
 URL: https://issues.apache.org/jira/browse/KAFKA-6973
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 1.1.0
Reporter: Paul Brebner


Setting timestamp to invalid value causes Kafka broker to fail upon startup. 
E.g.

./kafka-topics.sh --create --zookeeper localhost --topic duck3 --partitions 1 
--replication-factor 1 --config message.timestamp.type=boom

 

Also note that the docs says the parameter name is log.message.timestamp.type, 
but this is silently ignored.

This works with no error for the invalid timestamp value. But next time you 
restart Kafka:

 

[2018-05-29 13:09:05,806] FATAL [KafkaServer id=0] Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

java.util.NoSuchElementException: Invalid timestamp type boom

at org.apache.kafka.common.record.TimestampType.forName(TimestampType.java:39)

at kafka.log.LogConfig.(LogConfig.scala:94)

at kafka.log.LogConfig$.fromProps(LogConfig.scala:279)

at kafka.log.LogManager$$anonfun$17.apply(LogManager.scala:786)

at kafka.log.LogManager$$anonfun$17.apply(LogManager.scala:785)

at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)

at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)

at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

at scala.collection.AbstractTraversable.map(Traversable.scala:104)

at kafka.log.LogManager$.apply(LogManager.scala:785)

at kafka.server.KafkaServer.startup(KafkaServer.scala:222)

at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)

at kafka.Kafka$.main(Kafka.scala:92)

at kafka.Kafka.main(Kafka.scala)

[2018-05-29 13:09:05,811] INFO [KafkaServer id=0] shutting down 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)