Re: KAFKA-3933: Kafka OOM During Log Recovery Due to Leaked Native Memory
Hi, Ismael So many thanks for the quick reply I checked the trunk tree at github, I did not see the merge. did I make some mistake? sorry for that. for example, the pull 1598. file related to the LogSegment.scala. the one of the PR add try catch to close the leaking resource. but I did not see the code is at trunk. :-( https://github.com/heroku/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L189 https://github.com/apache/kafka/pull/1598/files#r70071062 On Wed, Sep 14, 2016 at 5:06 AM, Ismael Juma <mli...@juma.me.uk> wrote: > Hi, > > We did merge the PR to trunk and 0.10.0. > > Ismael > > On Wed, Sep 14, 2016 at 9:21 AM, feifei hsu <easyf...@gmail.com> wrote: > >> Hi Tom and Ismael. >>I am following the kafka-3933. the memory leak. but I did not see the >> pr #1598 #1614 #1660 are merged into the trunk. >> Do you know what the current status? >> So many thanks. >>We are also thinking backport it to 0.9.0.1 >> >> --easy >> > >
KAFKA-3933: Kafka OOM During Log Recovery Due to Leaked Native Memory
Hi Tom and Ismael. I am following the kafka-3933. the memory leak. but I did not see the pr #1598 #1614 #1660 are merged into the trunk. Do you know what the current status? So many thanks. We are also thinking backport it to 0.9.0.1 --easy
Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors
please refer (KAFKA-3933) a workaround is -XX:MaxDirectMemorySize=1024m if your callstack has direct buffer issues.(effectively off heap memory) On Wed, May 11, 2016 at 9:50 AM, Russ Lavoiewrote: > Good Afternoon, > > I am currently trying to do a rolling upgrade from Kafka 0.8.2.1 to 0.9.0.1 > and am running into a problem when starting 0.9.0.1 with the protocol > version 0.8.2.1 set in the server.properties. > > Here is my current Kafka topic setup, data retention and hardware used: > > 3 Zookeeper nodes > 5 Broker nodes > Topics have at least 2 replicas > Topics have no more than 200 partitions > 4,564 partitions across 61 topics > 14 day retention > Each Kafka node has between 2.1T - 2.9T of data > Hardware is C4.2xlarge AWS instances > - 8 Core (Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz) > - 14G Ram > - 4TB EBS volume (10k IOPS [never gets maxed unless I up the > num.io.threads]) > > Here is my running broker configuration for 0.9.0.1: > > [2016-05-11 11:43:58,172] INFO KafkaConfig values: > advertised.host.name = server.domain > metric.reporters = [] > quota.producer.default = 9223372036854775807 > offsets.topic.num.partitions = 150 > log.flush.interval.messages = 9223372036854775807 > auto.create.topics.enable = false > controller.socket.timeout.ms = 3 > log.flush.interval.ms = 1000 > principal.builder.class = class > org.apache.kafka.common.security.auth.DefaultPrincipalBuilder > replica.socket.receive.buffer.bytes = 65536 > min.insync.replicas = 1 > replica.fetch.wait.max.ms = 500 > num.recovery.threads.per.data.dir = 1 > ssl.keystore.type = JKS > default.replication.factor = 3 > ssl.truststore.password = null > log.preallocate = false > sasl.kerberos.principal.to.local.rules = [DEFAULT] > fetch.purgatory.purge.interval.requests = 1000 > ssl.endpoint.identification.algorithm = null > replica.socket.timeout.ms = 3 > message.max.bytes = 10485760 > num.io.threads =8 > offsets.commit.required.acks = -1 > log.flush.offset.checkpoint.interval.ms = 6 > delete.topic.enable = true > quota.window.size.seconds = 1 > ssl.truststore.type = JKS > offsets.commit.timeout.ms = 5000 > quota.window.num = 11 > zookeeper.connect = zkserver:2181/kafka > authorizer.class.name = > num.replica.fetchers = 8 > log.retention.ms = null > log.roll.jitter.hours = 0 > log.cleaner.enable = false > offsets.load.buffer.size = 5242880 > log.cleaner.delete.retention.ms = 8640 > ssl.client.auth = none > controlled.shutdown.max.retries = 3 > queued.max.requests = 500 > offsets.topic.replication.factor = 3 > log.cleaner.threads = 1 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > socket.request.max.bytes = 104857600 > ssl.trustmanager.algorithm = PKIX > zookeeper.session.timeout.ms = 6000 > log.retention.bytes = -1 > sasl.kerberos.min.time.before.relogin = 6 > zookeeper.set.acl = false > connections.max.idle.ms = 60 > offsets.retention.minutes = 1440 > replica.fetch.backoff.ms = 1000 > inter.broker.protocol.version = 0.8.2.1 > log.retention.hours = 168 > num.partitions = 16 > broker.id.generation.enable = false > listeners = null > ssl.provider = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] > log.roll.ms = null > log.flush.scheduler.interval.ms = 9223372036854775807 > ssl.cipher.suites = null > log.index.size.max.bytes = 10485760 > ssl.keymanager.algorithm = SunX509 > security.inter.broker.protocol = PLAINTEXT > replica.fetch.max.bytes = 104857600 > advertised.port = null > log.cleaner.dedupe.buffer.size = 134217728 > replica.high.watermark.checkpoint.interval.ms = 5000 > log.cleaner.io.buffer.size = 524288 > sasl.kerberos.ticket.renew.window.factor = 0.8 > zookeeper.connection.timeout.ms = 6000 > controlled.shutdown.retry.backoff.ms = 5000 > log.roll.hours = 168 > log.cleanup.policy = delete > host.name = > log.roll.jitter.ms = null > max.connections.per.ip = 2147483647 > offsets.topic.segment.bytes = 104857600 > background.threads = 10 > quota.consumer.default = 9223372036854775807 > request.timeout.ms = 3 > log.index.interval.bytes = 4096 > log.dir = /tmp/kafka-logs > log.segment.bytes = 268435456 > log.cleaner.backoff.ms = 15000 > offset.metadata.max.bytes = 4096 > ssl.truststore.location = null > group.max.session.timeout.ms = 3 > ssl.keystore.password = null > zookeeper.sync.time.ms = 2000 > port = 9092 > log.retention.minutes = null > log.segment.delete.delay.ms = 6 > log.dirs = /mnt/kafka/data > controlled.shutdown.enable = true > compression.type = producer > max.connections.per.ip.overrides = > sasl.kerberos.kinit.cmd = /usr/bin/kinit > log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308 > auto.leader.rebalance.enable = true > leader.imbalance.check.interval.seconds = 300 > log.cleaner.min.cleanable.ratio = 0.5 > replica.lag.time.max.ms = 1 > num.network.threads =8 > ssl.key.password = null > reserved.broker.max.id = 1000 > metrics.num.samples = 2 > socket.send.buffer.bytes = 2097152 > ssl.protocol = TLS
Re: two questions
They also document that as of now. However, 0.9 Brokers work with 0.8.x clients. However, Anyone has a large deployment on this scenario. e.g. 0.9 brokers + 0.8.x clients? How is your experience and result? more concern in term of system issues. like reliability/scability/performance? we have been heavily invested in the 0.8.x clients. upgrading 0.8.x to 0.9 brokers first,to take expected features for a while might be the best /feasible choice concerning the 0.9 client incompatibility . we could later upgrade the client when we have more resource to develop our client codes. Your input will be really appreciated. On Mon, Mar 21, 2016 at 10:42 AM, Alexis Midon < alexis.mi...@airbnb.com.invalid> wrote: > Hi Ismael, > > could you elaborate on "newer clients don't work with older brokers > though."? doc pointers are fine. > I was under the impression that I could the 0.9 clients with 0.8 brokers. > > thanks > > Alexis > > On Mon, Mar 21, 2016 at 2:05 AM Ismael Jumawrote: > > > Hi Allen, > > > > Answers inline. > > > > On Mon, Mar 21, 2016 at 5:56 AM, allen chan < > allen.michael.c...@gmail.com> > > wrote: > > > > > 1) I am using the upgrade instructions to upgrade from 0.8 to 0.9. Can > > > someone tell me if i need to continue to bump the > > > inter.broker.protocol.version after each upgrade? Currently the broker > > code > > > is 0.9.0.1 but i have the config file listing as > > > inter.broker.protocol.versi > > > on=0.9.0.0 > > > > > > > When it comes to inter.broker.protocol.version, 0.9.0.0 and 0.9.0.1 are > the > > same , so you don't have to. Internally, they are both mapped to 0.9.0.X. > > > > > > > 2) Is it possible to use multiple variations of producers / consumers? > > > My broker is on 0.9.0.1 and i am currently using 0.8.x > > producer/consumer. I > > > want to test the new producer first then the new consumer. So would > there > > > be issues if the setup was: > > > 0.9.x producer -> 0.9.x broker -> 0.8.x consumer > > > > > > > Newer brokers support older clients, so this is fine. Note that newer > > clients don't work with older brokers though. > > > > Ismael > > >
Kafka mirror maker issue. (data loss?)
Hi, We are thinking using mirror maker to replic our kafka data stream. However, I heard mirror maker may lose data which we do not want. I am wondering if anyone has experience of mirror maker. How good and what the best practice to prevent dataloss is when we do data replica? Thanks