Re: kafka user group in los angeles

2015-04-24 Thread Jon Bringhurst
Hey Alex,

It looks like this group might be appropriate to have a Kafka talk at:

http://www.meetup.com/Los-Angeles-Big-Data-Users-Group/

It might be worth showing up at one of their events and asking around.

-Jon

On Thu, Apr 23, 2015 at 11:40 AM, Alex Toth a...@purificator.net wrote:
 Hi,
 Sorry this isn't directly a kafka question, but I was wondering if there are 
 andy Kafka user groups in (or in near driving range of) Los Angeles.  Looking 
 through meetup.com and the usual web search engines hasn't brought me much 
 outside of the LA Hadoop user group and I was hoping for something more 
 specific.
 If I should have asked this somewhere else, again, sorry and let me know.


   alex


Re: Post on running Kafka at LinkedIn

2015-03-20 Thread Jon Bringhurst
Keep in mind that these brokers aren't really stressed too much at any given 
time -- we need to stay ahead of the capacity curve.

Your message throughput will really just depend on what hardware you're using. 
However, in the past, we've benchmarked at 400,000 to more than 800,000 
messages / broker / sec, depending on configuration 
(https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines).

-Jon

On Mar 20, 2015, at 3:03 PM, Emmanuel ele...@msn.com wrote:

 800B messages / day = 9.26M messages / sec over 1100 brokers 
 = ~8400 message / broker / sec
 Do I get this right?
 Trying to benchmark my own test cluster and that's what I see with 2 
 brokers...Just wondering if my numbers are good or bad...
 
 
 Subject: Re: Post on running Kafka at LinkedIn
 From: cl...@kafka.guru
 Date: Fri, 20 Mar 2015 14:27:58 -0700
 To: users@kafka.apache.org
 
 Yep! We are growing :)
 
 -Clark
 
 Sent from my iPhone
 
 On Mar 20, 2015, at 2:14 PM, James Cheng jch...@tivo.com wrote:
 
 Amazing growth numbers.
 
 At the meetup on 1/27, Clark Haskins presented their Kafka usage at the 
 time. It was:
 
 Bytes in: 120 TB
 Messages In: 585 million
 Bytes out: 540 TB
 Total brokers: 704
 
 In Todd's post, the current numbers:
 
 Bytes in: 175 TB (45% growth)
 Messages In: 800 billion (36% growth)
 Bytes out: 650 TB (20% growth)
 Total brokers: 1100 (56% growth)
 
 That much growth in just 2 months? Wowzers.
 
 -James
 
 On Mar 20, 2015, at 11:30 AM, James Cheng jch...@tivo.com wrote:
 
 For those who missed it:
 
 The Kafka Audit tool was also presented at the 1/27 Kafka meetup:
 http://www.meetup.com/http-kafka-apache-org/events/219626780/
 
 Recorded video is here, starting around the 40 minute mark:
 http://www.ustream.tv/recorded/58109076
 
 Slides are here:
 http://www.ustream.tv/recorded/58109076
 
 -James
 
 On Mar 20, 2015, at 9:47 AM, Todd Palino tpal...@gmail.com wrote:
 
 For those who are interested in detail on how we've got Kafka set up at
 LinkedIn, I have just published a new posted to our Engineering blog 
 titled
 Running Kafka at Scale
 
  https://engineering.linkedin.com/kafka/running-kafka-scale
 
 It's a general overview of our current Kafka install, tiered architecture,
 audit, and the libraries we use for producers and consumers. You'll also 
 be
 seeing more posts from the SRE team here in the coming weeks on deeper
 looks into both Kafka and Samza.
 
 Additionally, I'll be giving a talk at ApacheCon next month on running
 tiered Kafka architectures. If you're in Austin for that, please come by
 and check it out.
 
 -Todd
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Anyone interested in speaking at Bay Area Kafka meetup @ LinkedIn on March 24?

2015-03-02 Thread Jon Bringhurst
The meetups are recorded. For example, here's a link to the January meetup:

http://www.ustream.tv/recorded/58109076

The links to the recordings are usually posted to the comments for each meetup 
on http://www.meetup.com/http-kafka-apache-org/

-Jon

On Feb 23, 2015, at 3:24 PM, Ruslan Khafizov ruslan.khafi...@gmail.com wrote:

 +1 For recording sessions.
 On 24 Feb 2015 07:22, Jiangjie Qin j...@linkedin.com.invalid wrote:
 
 +1, I¹m very interested.
 
 On 2/23/15, 3:05 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
 +1
 
 I think something like Kafka on AWS at Netflix would be hugely
 interesting to a lot of people.
 
 -Jay
 
 On Mon, Feb 23, 2015 at 3:02 PM, Allen Wang aw...@netflix.com.invalid
 wrote:
 
 We (Steven Wu and Allen Wang) can talk about Kafka use cases and
 operations
 in Netflix. Specifically, we can talk about how we scale and operate
 Kafka
 clusters in AWS and how we migrate our data pipeline to Kafka.
 
 Thanks,
 Allen
 
 
 On Mon, Feb 23, 2015 at 12:15 PM, Ed Yakabosky 
 eyakabo...@linkedin.com.invalid wrote:
 
 Hi Kafka Open Source -
 
 LinkedIn will host another Bay Area Kafka meetup in Mountain View on
 March
 24.  We are planning to present on Offset Management but are looking
 for
 additional speakers.  If you¹re interested in presenting a use case,
 operational plan, or your experience with a particular feature (REST
 interface, WebConsole), please reply-all to let us know.
 
 [BCC: Open Source lists]
 
 Thanks,
 Ed



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: question about new consumer offset management in 0.8.2

2015-02-05 Thread Jon Bringhurst
There should probably be a wiki page started for this so we have the details in 
one place. The same question was asked on Freenode IRC a few minutes ago. :)

A summary of the migration procedure is:

1) Upgrade your brokers and set dual.commit.enabled=false and 
offsets.storage=zookeeper (Commit offsets to Zookeeper Only).
2) Set dual.commit.enabled=true and offsets.storage=kafka and restart (Commit 
offsets to Zookeeper and Kafka).
3) Set dual.commit.enabled=false and offsets.storage=kafka and restart (Commit 
offsets to Kafka only).

-Jon

On Feb 5, 2015, at 9:03 AM, Jason Rosenberg j...@squareup.com wrote:

 Hi,
 
 For 0.8.2, one of the features listed is:
  - Kafka-based offset storage.
 
 Is there documentation on this (I've heard discussion of it of course)?
 
 Also, is it something that will be used by existing consumers when they
 migrate up to 0.8.2?  What is the migration process?
 
 Thanks,
 
 Jason



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-29 Thread Jon Bringhurst
Hey Mingjie,

Here's how we have our mirror makers configured. For some context, let me try 
to describe this using the example datacenter layout as described in:

https://engineering.linkedin.com/samza/operating-apache-samza-scale

In that example, there are four data centers (A, B, C, and D). However, we only 
need Datacenter A and B to describe this.

Datacenter A mirrors data from local(A) to aggregate(A) as well as local(B) to 
aggregate(A).

Datacenter B mirrors data from local(B) to aggregate(B) as well as local(A) to 
aggregate(B).

The diagram in the article should make easy to visualize. Note that the mirror 
makers are running in the destination datacenter and pull the traffic in.

Let's say we have two physical machines (lets call them servers 1 and 2 in 
datacenter A; servers 3 and 4 in datacenter B) in each datacenter dedicated to 
running mirror makers. This is how the layout of mirror maker processes would 
look like:

* Datacenter A MirrorMaker Cluster
* Server 1
* local(A) to aggregate(A) MirrorMaker Instance
* local(B) to aggregate(A) MirrorMaker Instance
* Server 2
* local(A) to aggregate(A) MirrorMaker Instance
* local(B) to aggregate(A) MirrorMaker Instance

* Datacenter B MirrorMaker Cluster
* Server 3
* local(B) to aggregate(B) MirrorMaker Instance
* local(A) to aggregate(B) MirrorMaker Instance
* Server 4
* local(B) to aggregate(B) MirrorMaker Instance
* local(A) to aggregate(B) MirrorMaker Instance

The benefit of this layout is that if the load becomes too high, we would then 
add on another server to each cluster that looks exactly like the others in the 
cluster (easy to provision). If you get really huge, you can start creating 
multiple mirror maker clusters that each handle a specific flow (but still have 
homogeneous processes within each cluster).

Of course, YMMV, but this is what works well for us. :)

-Jon

On Jan 28, 2015, at 3:54 PM, Daniel Compton daniel.compton.li...@gmail.com 
wrote:

 Hi Mingjie
 
 I would recommend the first option of running one mirrormaker instance
 pulling from multiple DC's.
 
 A single MM instance will be able to make more efficient use of the machine
 resources in two ways:
 1. You will only have to run one process which will be able to be allocated
 the full amount of resources
 2. Within the process, if you run enough consumer threads, I think that
 they should be able to rebalance and pick up the load if they don't have
 anything to do. I'm not 100% sure on this, but 1 still holds.
 
 A single MM instance should handle connectivity issues with one DC without
 affecting the rest of the consumer threads for other DC's.
 
 You would gain process isolation running a MM per DC, but this would raise
 the operational burden and resource requirements. I'm not sure what benefit
 you'd actually get from process isolation, so I'd recommend against it.
 However I'd be interested to hear if others do things differently.
 
 Daniel.
 
 On Thu Jan 29 2015 at 11:14:29 AM Mingjie Lai m...@apache.org wrote:
 
 Hi.
 
 We have a pretty typical data ingestion use case that we use mirrormaker at
 one hadoop data center, to mirror kafka data from multiple remote
 application data centers. I know mirrormaker can support to consume kafka
 data from multiple kafka source, by one instance at one physical node. By
 this, we can give one instance of mm multiple consumer config files, so it
 can consume data from muti places.
 
 Another option is to have multiple mirrormaker instances at one node, each
 mm instance is dedicated to grab data from one single source data center.
 Certainly there will be multiple mm nodes to balance the load.
 
 The second option looks better since it kind of has an isolation for
 different data centers.
 
 Any recommendation for this kind of data aggregation cases?
 
 Still new to kafka and mirrormaker. Welcome any information.
 
 Thanks,
 Mingjie
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


LinkedIn Engineering Blog Post - Current and Future

2015-01-29 Thread Jon Bringhurst
Here's an overview of what LinkedIn plans to concentrate on in the upcoming 
year.

https://engineering.linkedin.com/kafka/kafka-linkedin-%E2%80%93-current-and-future

-Jon


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Production settings for JDK 7 + G1 GC

2015-01-15 Thread Jon Bringhurst
We're currently using JDK 8 update 5 with the following settings:

-server
-Xms4g
-Xmx4g
-XX:PermSize=96m
-XX:MaxPermSize=96m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:logs/gc.log
-XX:ErrorFile=logs/err.log

This works well for us, but you should customize it to your workload. :)

-Jon

On Jan 14, 2015, at 5:51 PM, Albert Strasheim full...@gmail.com wrote:

 Greetings all
 
 We're expanding our Kafka cluster, and I thought this would be a good
 time to try the suggestions in
 
 http://www.slideshare.net/ToddPalino/enterprise-kafka-kafka-as-a-service
 
 slide #37 about running on JDK 7 with G1 GC.
 
 Anybody (Todd?) that could shed some light on a complete set of good
 GC flags to start with and what the best JDK version is to run with
 these days?
 
 Thanks!
 
 Regards
 
 Albert



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Kafka 0.8.1.1 Leadership changes are happening very often

2015-01-05 Thread Jon Bringhurst
Several features in Zookeeper depend on server time. I would highly recommend 
that you properly setup ntpd (or whatever), then try to reproduce.

-Jon

On Jan 2, 2015, at 2:35 PM, Birla, Lokesh lokesh.bi...@verizon.com wrote:

 We don¹t see zookeeper expiration. However I noticed that our servers
 system time is NOT synced. Hence server1 and server2 had 30+sec delay. Do
 you think that could cause leadership change or any other issue.
 
 On 12/31/14, 4:03 PM, Jun Rao j...@confluent.io wrote:
 
 A typical cause of frequent leadership changes is GC-induced soft failure.
 Do you see ZK session expiration on the broker? If so, you may want to
 enable GC log to see the GC time.
 
 Thanks,
 
 Jun
 
 On Tue, Dec 23, 2014 at 2:06 PM, Birla, Lokesh lokesh.bi...@verizon.com
 wrote:
 
 
 I was already using 4GB heap memory. I even changed to 8 GB heap memory
 and could see leadership changing very often. In my 5 minute run, I saw
 leadership changed from 1,2,3 to 3,3,3,  to 1,1,1.
 Also my message rate is just: 7k and total msg count is only 2,169,001.
 
 Does anyone has cline on leadership change?
 
 ‹Lokesh
 
 
 
 From: Thunder Stumpges tstump...@ntent.commailto:tstump...@ntent.com
 Date: Monday, December 22, 2014 at 6:31 PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org 
 users@kafka.apache.orgmailto:users@kafka.apache.org
 Cc: Birla, Lokesh lokesh.bi...@one.verizon.commailto:
 lokesh.bi...@one.verizon.com
 Subject: RE: Kafka 0.8.1.1 eadership changes are happening very often
 
 Did you check the GC logs in the server? We ran into this and the
 default
 setting of 1G max heap on the broker process was nowhere near enough. We
 currently have set to 4G.
 -T
 
 -Original Message-
 From: Birla, Lokesh [lokesh.bi...@verizon.commailto:
 lokesh.bi...@verizon.com]
 Received: Monday, 22 Dec 2014, 5:27PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org [
 users@kafka.apache.orgmailto:users@kafka.apache.org]
 CC: Birla, Lokesh [lokesh.bi...@verizon.commailto:
 lokesh.bi...@verizon.com]
 Subject: Kafka 0.8.1.1 eadership changes are happening very often
 
 Hello,
 
 I am running 3 brokers, one zookeeper and producer all on separate
 machine. I am also sending very low load around 6K msg/sec. Each msg is
 around 150 bytes only.
 I ran the load for only 5 minutes and during this time, I see leadership
 chained very often.
 
 I created 3 partitions.
 
 Here leadership for each partitions changed.  All 3 brokers are running
 perfectly fine. No broker is down. Could someone let me know why kafka
 leadership changed very often.
 
 Initially:
 
 Topic:mmetopic1PartitionCount:3 ReplicationFactor:3 Configs:
 
 Topic: mmetopic1Partition: 0 Leader: 2Replicas: 2,3,1 Isr: 2,3,1
 
 Topic: mmetopic1Partition: 1 Leader: 3Replicas: 3,1,2 Isr: 3,1,2
 
 Topic: mmetopic1Partition: 2 Leader: 1Replicas: 1,2,3 Isr: 1,2,3
 
 
 Changed to:
 
 
 Topic:mmetopic1PartitionCount:3 ReplicationFactor:3 Configs:
 
 Topic: mmetopic1Partition: 0 Leader: 3Replicas: 2,3,1 Isr: 3,1,2
 
 Topic: mmetopic1Partition: 1 Leader: 3Replicas: 3,1,2 Isr: 3,1,2
 
 Topic: mmetopic1Partition: 2 Leader: 1Replicas: 1,2,3 Isr: 1,3,2
 
 
 Changed to:
 
 
 Topic:mmetopic1PartitionCount:3 ReplicationFactor:3 Configs:
 
 Topic: mmetopic1Partition: 0 Leader: 1Replicas: 2,3,1 Isr: 1,2,3
 
 Topic: mmetopic1Partition: 1 Leader: 1Replicas: 3,1,2 Isr: 1,2,3
 
 Topic: mmetopic1Partition: 2 Leader: 2Replicas: 1,2,3 Isr: 2,1,3
 
 Changed to:
 
 
 Topic:mmetopic1PartitionCount:3 ReplicationFactor:3 Configs:
 
 Topic: mmetopic1Partition: 0 Leader: 3Replicas: 2,3,1 Isr: 3,1,2
 
 Topic: mmetopic1Partition: 1 Leader: 3Replicas: 3,1,2 Isr: 3,1,2
 
 Topic: mmetopic1Partition: 2 Leader: 1Replicas: 1,2,3 Isr: 1,3,2
 
 
 Thanks,
 Lokesh
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: keyed-messages de-duplication

2014-05-14 Thread Jon Bringhurst
It looks like the log.cleanup.policy config option was changed from dedupe to 
compact.

https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L68

-Jon

On May 13, 2014, at 1:08 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Hi,
 
 The compaction is done to clean-up space. It isn't done immediately only
 periodically.
 
 I suspect the reason you see no compaction is that we never compact the
 active segment of the log (the most recent file) as that is still being
 written to. The compaction would not happen until a new segment file was
 rolled. If you want to see this happen I recommend changing the file
 segment size configuration to something small (5mb) and produce enough
 messages to roll a new segment file. You should then see logging about
 compaction in logs/log-cleaner.log.
 
 -Jay
 
 
 On Tue, May 13, 2014 at 11:52 AM, C 4.5 cfourf...@gmail.com wrote:
 
 I understand Kafka supports keyed messages (I am using 0.8.1.1) and it is
 possible to de-duplicate messages based on the message key.
 
 (The log compaction section of the on-line documentation described how that
 works.)
 
 I am using a code example that come with Kafka (namely
 KafkaConsumerProducerDemo) and run it through Kafka local mode. I write a
 set of messages with the same String key and then have a consumer that
 consumes data.
 
 The consumer consumes messages *only* after the producer has produced all
 its messages.
 
 I would expect the consumer to retrieve only the latest message (as all
 messages have the same key) but it retrieves all messages the producer has
 emitted.
 
 I have also turned on these properties in the Kafka server:
 
 log.cleaner.enable=true
 log.cleanup.policy=dedupe
 
 - is de-duplication of messages guaranteed to take effect only after
 compaction?
 
 - I have tried to force compaction by setting log.cleaner.backoff.ms
 and log.cleaner.min.cleanabke.ratio to very low values, but I still
 observe the same behavior.
 
 Any ideas or pointers?
 
 Thanks.
 



signature.asc
Description: Message signed with OpenPGP using GPGMail