Dear community,

I recently faced an unexpected type of failures in the middle of an
incident related to the exhaustion of memory-map handles on a Kafka
broker.

The use case is as follows - a broker, not overloaded, manages enough
indexes to reach the limit on mmap count per process. This leads to
file memory-mapping failures at broker start-up.
It was eventually mitigated by increasing the said limit or reducing
the number of files to mmap.

But before I could mitigate the problem, I was trying to restart the
broker and faced the same failure every time - except once, where map
I/O failures disappeared and instead, every TLS connection attempt
started to fail, with the following exception:

INFO Failed to create channel due to
(org.apache.kafka.common.network.SslChannelBuilder)
java.lang.IllegalArgumentException: Cannot support
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 with currently installed
providers
    at sun.security.ssl.CipherSuiteList.<init>(CipherSuiteList.java:81)
    at 
sun.security.ssl.SSLEngineImpl.setEnabledCipherSuites(SSLEngineImpl.java:2027)
    at 
org.apache.kafka.common.security.ssl.SslFactory.createSslEngine(SslFactory.java:278)
    ...
    at java.lang.Thread.run(Thread.java:748)

However, there were absolutely no change on the certificates,
truststore and keystore files on the host, and neither were the
application binaries changed nor the JRE used to run Kafka. And at the
subsequent restart, this particular type of failure disappeared, and
the map I/O failures resumed.

I cannot understand the origin of these failures, and figure out if it
can find its foundations in (map or regular) I/O faults as the
surrounding failures.

Has anyone encountered this scenario in the past?
How strong would you estimate the correlation between map I/O failures
and that one?

Many thanks,

Alexandre

Reply via email to