Re: KAFKA-3933: Kafka OOM During Log Recovery Due to Leaked Native Memory

2016-09-14 Thread feifei hsu
Hi, Ismael
   So many thanks for the quick reply
   I checked the trunk tree at github, I did not see the merge.
   did I make some mistake? sorry for that.

for example, the pull 1598. file related to the LogSegment.scala. the one
of the PR add try catch to close the leaking resource. but I did not see
the code is at trunk. :-(

https://github.com/heroku/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L189
https://github.com/apache/kafka/pull/1598/files#r70071062


On Wed, Sep 14, 2016 at 5:06 AM, Ismael Juma <mli...@juma.me.uk> wrote:

> Hi,
>
> We did merge the PR to trunk and 0.10.0.
>
> Ismael
>
> On Wed, Sep 14, 2016 at 9:21 AM, feifei hsu <easyf...@gmail.com> wrote:
>
>> Hi Tom and Ismael.
>>I am following the  kafka-3933. the memory leak. but I did not see the
>> pr #1598 #1614 #1660 are merged into the trunk.
>> Do you know what the current status?
>>   So many thanks.
>>We are also thinking backport it to 0.9.0.1
>>
>> --easy
>>
>
>


KAFKA-3933: Kafka OOM During Log Recovery Due to Leaked Native Memory

2016-09-14 Thread feifei hsu
Hi Tom and Ismael.
   I am following the  kafka-3933. the memory leak. but I did not see the
pr #1598 #1614 #1660 are merged into the trunk.
Do you know what the current status?
  So many thanks.
   We are also thinking backport it to 0.9.0.1

--easy


Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors

2016-07-11 Thread feifei hsu
please refer (KAFKA-3933)
a workaround is -XX:MaxDirectMemorySize=1024m
if your callstack has direct buffer issues.(effectively off heap memory)

On Wed, May 11, 2016 at 9:50 AM, Russ Lavoie  wrote:

> Good Afternoon,
>
> I am currently trying to do a rolling upgrade from Kafka 0.8.2.1 to 0.9.0.1
> and am running into a problem when starting 0.9.0.1 with the protocol
> version 0.8.2.1 set in the server.properties.
>
> Here is my current Kafka topic setup, data retention and hardware used:
>
> 3 Zookeeper nodes
> 5 Broker nodes
> Topics have at least 2 replicas
> Topics have no more than 200 partitions
> 4,564 partitions across 61 topics
> 14 day retention
> Each Kafka node has between 2.1T - 2.9T of data
> Hardware is C4.2xlarge AWS instances
>  - 8 Core (Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz)
>  - 14G Ram
>  - 4TB EBS volume (10k IOPS [never gets maxed unless I up the
> num.io.threads])
>
> Here is my running broker configuration for 0.9.0.1:
> 
> [2016-05-11 11:43:58,172] INFO KafkaConfig values:
> advertised.host.name = server.domain
> metric.reporters = []
> quota.producer.default = 9223372036854775807
> offsets.topic.num.partitions = 150
> log.flush.interval.messages = 9223372036854775807
> auto.create.topics.enable = false
> controller.socket.timeout.ms = 3
> log.flush.interval.ms = 1000
> principal.builder.class = class
> org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
> replica.socket.receive.buffer.bytes = 65536
> min.insync.replicas = 1
> replica.fetch.wait.max.ms = 500
> num.recovery.threads.per.data.dir = 1
> ssl.keystore.type = JKS
> default.replication.factor = 3
> ssl.truststore.password = null
> log.preallocate = false
> sasl.kerberos.principal.to.local.rules = [DEFAULT]
> fetch.purgatory.purge.interval.requests = 1000
> ssl.endpoint.identification.algorithm = null
> replica.socket.timeout.ms = 3
> message.max.bytes = 10485760
> num.io.threads =8
> offsets.commit.required.acks = -1
> log.flush.offset.checkpoint.interval.ms = 6
> delete.topic.enable = true
> quota.window.size.seconds = 1
> ssl.truststore.type = JKS
> offsets.commit.timeout.ms = 5000
> quota.window.num = 11
> zookeeper.connect = zkserver:2181/kafka
> authorizer.class.name =
> num.replica.fetchers = 8
> log.retention.ms = null
> log.roll.jitter.hours = 0
> log.cleaner.enable = false
> offsets.load.buffer.size = 5242880
> log.cleaner.delete.retention.ms = 8640
> ssl.client.auth = none
> controlled.shutdown.max.retries = 3
> queued.max.requests = 500
> offsets.topic.replication.factor = 3
> log.cleaner.threads = 1
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> socket.request.max.bytes = 104857600
> ssl.trustmanager.algorithm = PKIX
> zookeeper.session.timeout.ms = 6000
> log.retention.bytes = -1
> sasl.kerberos.min.time.before.relogin = 6
> zookeeper.set.acl = false
> connections.max.idle.ms = 60
> offsets.retention.minutes = 1440
> replica.fetch.backoff.ms = 1000
> inter.broker.protocol.version = 0.8.2.1
> log.retention.hours = 168
> num.partitions = 16
> broker.id.generation.enable = false
> listeners = null
> ssl.provider = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> log.roll.ms = null
> log.flush.scheduler.interval.ms = 9223372036854775807
> ssl.cipher.suites = null
> log.index.size.max.bytes = 10485760
> ssl.keymanager.algorithm = SunX509
> security.inter.broker.protocol = PLAINTEXT
> replica.fetch.max.bytes = 104857600
> advertised.port = null
> log.cleaner.dedupe.buffer.size = 134217728
> replica.high.watermark.checkpoint.interval.ms = 5000
> log.cleaner.io.buffer.size = 524288
> sasl.kerberos.ticket.renew.window.factor = 0.8
> zookeeper.connection.timeout.ms = 6000
> controlled.shutdown.retry.backoff.ms = 5000
> log.roll.hours = 168
> log.cleanup.policy = delete
> host.name =
> log.roll.jitter.ms = null
> max.connections.per.ip = 2147483647
> offsets.topic.segment.bytes = 104857600
> background.threads = 10
> quota.consumer.default = 9223372036854775807
> request.timeout.ms = 3
> log.index.interval.bytes = 4096
> log.dir = /tmp/kafka-logs
> log.segment.bytes = 268435456
> log.cleaner.backoff.ms = 15000
> offset.metadata.max.bytes = 4096
> ssl.truststore.location = null
> group.max.session.timeout.ms = 3
> ssl.keystore.password = null
> zookeeper.sync.time.ms = 2000
> port = 9092
> log.retention.minutes = null
> log.segment.delete.delay.ms = 6
> log.dirs = /mnt/kafka/data
> controlled.shutdown.enable = true
> compression.type = producer
> max.connections.per.ip.overrides =
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
> auto.leader.rebalance.enable = true
> leader.imbalance.check.interval.seconds = 300
> log.cleaner.min.cleanable.ratio = 0.5
> replica.lag.time.max.ms = 1
> num.network.threads =8
> ssl.key.password = null
> reserved.broker.max.id = 1000
> metrics.num.samples = 2
> socket.send.buffer.bytes = 2097152
> ssl.protocol = TLS

Re: two questions

2016-03-21 Thread feifei hsu
 They also document that as of now. However, 0.9 Brokers work with 0.8.x
clients.

However, Anyone has a large deployment on this scenario. e.g. 0.9
brokers + 0.8.x clients? How is your experience and result? more concern in
term of system issues. like reliability/scability/performance?  we have
been  heavily invested in the 0.8.x clients. upgrading 0.8.x to 0.9 brokers
first,to take expected features for a while might be the best /feasible
choice concerning the 0.9 client incompatibility . we could later upgrade
the client when we have more resource to develop our client codes.
Your input will be really appreciated.

On Mon, Mar 21, 2016 at 10:42 AM, Alexis Midon <
alexis.mi...@airbnb.com.invalid> wrote:

> Hi Ismael,
>
> could you elaborate on "newer clients don't work with older brokers
> though."? doc pointers are fine.
> I was under the impression that I could the 0.9 clients with 0.8 brokers.
>
> thanks
>
> Alexis
>
> On Mon, Mar 21, 2016 at 2:05 AM Ismael Juma  wrote:
>
> > Hi Allen,
> >
> > Answers inline.
> >
> > On Mon, Mar 21, 2016 at 5:56 AM, allen chan <
> allen.michael.c...@gmail.com>
> > wrote:
> >
> > > 1) I am using the upgrade instructions to upgrade from 0.8 to 0.9. Can
> > > someone tell me if i need to continue to bump the
> > > inter.broker.protocol.version after each upgrade? Currently the broker
> > code
> > > is 0.9.0.1 but i have the config file listing as
> > > inter.broker.protocol.versi
> > > on=0.9.0.0
> > >
> >
> > When it comes to inter.broker.protocol.version, 0.9.0.0 and 0.9.0.1 are
> the
> > same , so you don't have to. Internally, they are both mapped to 0.9.0.X.
> >
> >
> > > 2) Is it possible to use multiple variations of producers / consumers?
> > > My broker is on 0.9.0.1 and i am currently using 0.8.x
> > producer/consumer. I
> > > want to test the new producer first then the new consumer. So would
> there
> > > be issues if the setup was:
> > > 0.9.x producer -> 0.9.x broker -> 0.8.x consumer
> > >
> >
> > Newer brokers support older clients, so this is fine. Note that newer
> > clients don't work with older brokers though.
> >
> > Ismael
> >
>


Kafka mirror maker issue. (data loss?)

2016-03-14 Thread feifei hsu
Hi,
  We are thinking using mirror maker to replic our kafka data stream.
However, I heard mirror maker may lose data which we do not want. I am
wondering if anyone has experience of mirror maker. How good and what the
best practice to prevent dataloss is when we do data replica?

Thanks