Re: Kafka/Hadoop consumers and producers

2013-08-09 Thread Andrew Otto
For the last 6 months, we've been using this: https://github.com/wikimedia-incubator/kafka-hadoop-consumer In combination with this wrapper script: https://github.com/wikimedia/kraken/blob/master/bin/kafka-hadoop-consume It's not great, but it works! On Aug 9, 2013, at 2:06 PM, Felix GV

Re: Kafka 08 clients

2013-08-12 Thread Andrew Otto
This is the Kafka C client for 0.8 we are using at Wikimedia: https://github.com/edenhill/librdkafka If you're using Debian/Ubuntu: you use the debian branch here to build a .deb: https://github.com/paravoid/librdkafka/tree/debian On Aug 12, 2013, at 12:06 AM, Jun Rao jun...@gmail.com

Re: Kafka/Hadoop consumers and producers

2013-08-12 Thread Andrew Otto
We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service. https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian Most relevant, Ken, is an init script for Kafka:

Re: Kafka/Hadoop consumers and producers

2013-08-13 Thread Andrew Otto
Andrew, I'm about to dive into figuring out how to use Camus without Avro. Perhaps we should join forces? (Be warned thought! My java fu is low at the moment. :) ). -Ao On Aug 12, 2013, at 11:20 PM, Andrew Psaltis andrew.psal...@webtrends.com wrote: Kam, I am perfectly fine if you pick

Re: Kafka/Hadoop consumers and producers

2013-08-13 Thread Andrew Otto
support. From: Andrew Otto o...@wikimedia.org To: Kam Kasravi kamkasr...@yahoo.com Cc: d...@kafka.apache.org d...@kafka.apache.org; Ken Goodhope kengoodh...@gmail.com; Andrew Psaltis psaltis.and...@gmail.com; dibyendu.bhattacha...@pearson.com dibyendu.bhattacha...@pearson.com; camus_

Kafka Mirroring setup

2013-08-20 Thread Andrew Otto
Hi all! Wikimedia is investigating how best to set up Broker clusters in multiple data centers. Our main analytics Broker cluster is currently in our main datacenter. It is possible for all of the main DC's frontend producers to produce directly to our analytics cluster, but we're not sure

Re: Kafka Mirroring setup

2013-08-20 Thread Andrew Otto
On Tue, Aug 20, 2013 at 10:35 AM, Andrew Otto o...@wikimedia.org wrote: Hi all! Wikimedia is investigating how best to set up Broker clusters in multiple data centers. Our main analytics Broker cluster is currently in our main datacenter. It is possible for all of the main DC's

Re: Ganglia Metrics Reporter

2013-08-22 Thread Andrew Otto
Cool! At WMF, we use jmxtrans to do this: https://github.com/jmxtrans/jmxtrans And, if you use puppet, here's a nice little module to help generate jmxtrans json files, and an example of metrics we were sending to Ganglia from Kafka 0.7.2.

Re: Ganglia Metrics Reporter

2013-08-23 Thread Andrew Otto
Jun, Note that the puppet module README I liked to isn't a full jmxtrans example JSON query. It is a jmxtrans puppet module usage example. So, using that in puppet will generate a .json file containing the query. We'll be moving to 0.8 in the next coming months, and I'll try to get some real

Re: Kafka - HDFS

2013-09-03 Thread Andrew Otto
Mark, I had the same question! Camus is super awesome, but doesn't have out of the box support for just writing Strings into HDFS. I submitted this pull request to support that: https://github.com/linkedin/camus/pull/28 You can clone this directly from the wikimedia branch of Camus:

Re: Ganglia Metrics Reporter

2013-10-29 Thread Andrew Otto
? :) -Andrew Otto (Thanks for writing this, btw!) On Aug 22, 2013, at 11:42 AM, Maxime Brugidou maxime.brugi...@gmail.com wrote: Hi all, Since I couldn't find any other way to publish kafka metrics to ganglia from kafka 0.8 (beta), I just published on github a super-simple ganglia metrics

Re: Ganglia Metrics Reporter

2013-11-01 Thread Andrew Otto
you plan to do. Cheers On Oct 29, 2013 2:00 PM, Andrew Otto o...@wikimedia.org wrote: Hi Maxime, I'm using this at the Wikimedia Foundation to send Kafka Broker metrics to Ganglia. However, we use Ganglia in multicast mode. This mostly seems to work with your code, but the ttl

Incorrect JMX MBean name on Kafka doc page

2013-11-01 Thread Andrew Otto
In http://kafka.apache.org/documentation.html#monitoring, ISR expansion rate kafka.server:name=ISRShrinksPerSec,type=ReplicaManagerSee above I believe this should be kafka.server:name=IsrExpandsPerSec,type=ReplicaManager -Andrew Otto

Kafka 0.8 jmxtrans + puppet

2013-11-06 Thread Andrew Otto
. Hope this is useful to someone! -Andrew Otto

Kafka IPv6

2013-11-06 Thread Andrew Otto
Hm, Does Kafka support IPv6? I'm trying it now, but I'm just using the console-producer, which seems to not be able to read in --broker-list with IPv6 addresses. It looks like it is interpreting the colons in the address as the addy:port separator. -Andrew

Re: Kafka 0.8 jmxtrans + puppet

2013-11-06 Thread Andrew Otto
/ On Wed, Nov 6, 2013 at 9:29 AM, Neha Narkhede neha.narkh...@gmail.comwrote: Cool, thanks for sharing this! -Neha On Nov 6, 2013 6:16 AM, Andrew Otto o...@wikimedia.org wrote: Hi, I just got jmxtrans set up with Kafka 0.8 over at Wikimedia. We're

Re: Kafka IPv6

2013-11-06 Thread Andrew Otto
On Wed, Nov 6, 2013 at 7:43 AM, Andrew Otto o...@wikimedia.org wrote: Hm, Does Kafka support IPv6? I'm trying it now, but I'm just using the console-producer, which seems to not be able to read in --broker-list with IPv6 addresses. It looks like it is interpreting the colons in the address

Re: List of topics with JMX?

2013-11-19 Thread Andrew Otto
Would kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec count be easier? Also, correct me if I am wrong, but I believe that these count values are the total number of messages seen for a topic (or all topics) since the Broker was started, not the total number of messages

Re: List of topics with JMX?

2013-11-19 Thread Andrew Otto
...@gmail.com wrote: Hi, The count in the Mbean kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec gives the total no of messages for all the topics on the broker. On Tue, Nov 19, 2013 at 8:13 PM, Andrew Otto o...@wikimedia.org wrote: Would kafka.server:type=BrokerTopicMetrics

Re: How to get monitoring stats

2013-11-20 Thread Andrew Otto
For Kafka 0.8, you could use http://www.jmxtrans.org/ and a variation of this json file: https://github.com/wikimedia/puppet-kafka/blob/master/kafka-jmxtrans.json.md Just change the ganglia output writers to graphite ones. On Nov 19, 2013, at 8:25 PM, Benjamin Black b...@b3k.us wrote:

Re: Kafka Cluster Failover

2013-11-27 Thread Andrew Otto
would anyway need to ship your Kafka logs from the remote DC to the main DC correct? Joel On Wed, Nov 27, 2013 at 12:47:01PM -0500, Andrew Otto wrote: Wikimedia is close to using Kafka to collect webrequest access logs from multiple data centers. I know that MirrorMaker is the recommended

Re: How to design a robust producer?

2014-01-30 Thread Andrew Otto
Thibaud, I wouldn't say this is a 'robust' solution, but the Wikimedia Foundation uses a piece of software we wrote called udp2log. We are in the process of replacing it with more robust direct Kafka producers, but it has worked for us in the intermediary. udp2log is a c++ daemon that listens

Broker rejoin with big replica lag

2014-02-05 Thread Andrew Otto
(kafka data files and in zookeeper), but to do so properly with 0.8.0 I think I’d have to shut down the whole cluster, correct? I’d rather not do this, as another topic does have a consumer and I don’t want to lose messages for it. Thanks! -Andrew Otto

Re: Broker rejoin with big replica lag

2014-02-05 Thread Andrew Otto
- Increasing num.replica.fetchers (defaults is one) Awesome! I just tried this one, bumped it up to 8 (12 cores on this broker box). It is now catching up at around 17K msgs/sec, which will mean it will finish in about 4 or 5 hours. I’ll check up on it again tomorrow. That should do it,

Re: Description of jmx exposed metrics?

2014-02-11 Thread Andrew Otto
Here tis! https://kafka.apache.org/documentation.html#monitoring On Feb 11, 2014, at 6:50 AM, Tomas Nunez nu...@pythian.com wrote: Hi! Sorry if this question has already been answered, but I've search the archives, the project page and the wiki unsuccessfully. I'd like to know the

Re: Description of jmx exposed metrics?

2014-02-11 Thread Andrew Otto
Although that is not all of them, just recommended ones to pay attention to :/ On Feb 11, 2014, at 6:50 AM, Tomas Nunez nu...@pythian.com wrote: Hi! Sorry if this question has already been answered, but I've search the archives, the project page and the wiki unsuccessfully. I'd like to

Re: Unexpected broker election

2014-02-22 Thread Andrew Otto
On Fri, Feb 21, 2014 at 10:22 AM, Andrew Otto o...@wikimedia.org wrote: Hi all, This has happened a couple of times to me now in the past month, and I'm not entirely sure of the cause, although I have a suspicion. Early this morning (UTC), it looks like one of my two brokers (id 21) lost

Re: Puppet module for deploying Kafka released

2014-02-26 Thread Andrew Otto
Oh so many puppet modules! https://github.com/wikimedia/puppet-kafka This one requires a Kafka .deb built from https://github.com/wikimedia/operations-debs-kafka/tree/debian/debian, which can be found prebuilt here: http://apt.wikimedia.org/wikimedia/pool/universe/k/kafka/ :) On Feb

Re: Puppet module for deploying Kafka released

2014-02-26 Thread Andrew Otto
on RHEL). I really like that your module already supports Kafka mirroring and jmxtrans. :-) --Michael On 02/26/2014 03:41 PM, Andrew Otto wrote: Oh so many puppet modules! https://github.com/wikimedia/puppet-kafka This one requires a Kafka .deb built from https://github.com

Re: Using Kafka Metrics

2014-03-20 Thread Andrew Otto
I’m using jmxtrans to do this for Ganglia, but it should work the same for Graphite: http://www.jmxtrans.org/ Here’s an example Kafka jmxtrans json file. https://github.com/wikimedia/puppet-kafka/blob/master/kafka-jmxtrans.json.md You can change the output writers to use Graphite instead of

Zookeeper reconnect failed due to 'state changed (Expired)'

2014-03-20 Thread Andrew Otto
for all partitions. I can rebalance the leaders, but I’d prefer if this didn’t happen in the first place. 1. What does 'zookeeper state changed (Expired)’ mean? 2. Has anyone seen issues like this before? Where zookeeper connections are flaky enough to cause leader elections? Thanks! -Andrew

Re: Cluster design distribution and JBOD vs RAID

2014-04-18 Thread Andrew Otto
BOB We are using RAID10. It was a requirement from our Unix guys. The rationale for this was we didn't want to lose just a disk and to have to rebuild/re-replicate 20TB of data. We haven't experienced any drive failures that I am aware of. We have had complete server failures, but the data

Re: Cluster design distribution and JBOD vs RAID

2014-04-21 Thread Andrew Otto
Message- From: Andrew Otto [mailto:ao...@wikimedia.org] Sent: Friday, April 18, 2014 8:36 AM To: users@kafka.apache.org Subject: Re: Cluster design distribution and JBOD vs RAID BOB We are using RAID10. It was a requirement from our Unix guys. The rationale for this was we didn't want

Re: Kafka/Zookeeper co-location

2014-04-24 Thread Andrew Otto
Oo, I’m curious about this as well! Wikimedia is considering doing this if/when we install brokers in our web caching data centers. On Apr 24, 2014, at 11:49 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) skada...@bloomberg.net wrote: Are there any thoughts on running Zookeeper on the same

Re: Interested in contributing to Kafka?

2014-07-21 Thread Andrew Otto
Hm, curious! Would this be useful to contribute upstream? https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka Wikimedia uses it instead of the myriad of bin/*.sh scripts that come with Kafka. We didn’t want to build a .deb package that installed 16ish short shell

Re: Interested in contributing to Kafka?

2014-07-21 Thread Andrew Otto
Oh, BTW, I think Yelp is using this .deb packaging (and shell script) too. On Jul 21, 2014, at 10:16 AM, Andrew Otto ao...@wikimedia.org wrote: Hm, curious! Would this be useful to contribute upstream? https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka

Manual Leader Assignment

2014-09-03 Thread Andrew Otto
. Thanks! -Andrew Otto [1] https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Howtousethetool?.1

Re: Manual Leader Assignment

2014-09-09 Thread Andrew Otto
, you can reduce the unavailability window by using controlled shutdown. See http://kafka.apache.org/documentation.html#basic_ops_restarting Thanks, Jun On Wed, Sep 3, 2014 at 11:48 AM, Andrew Otto ao...@wikimedia.org wrote: Hiya, During leader changes, we see short periods

Re: GenericJMX plugin file for Kafka 0.8.1

2014-09-09 Thread Andrew Otto
We use jmxtrans to pull data out of JMX. https://github.com/wikimedia/puppet-kafka/blob/master/kafka-jmxtrans.json.md On Sep 9, 2014, at 7:54 AM, Navneet Gupta (Tech - BLR) navneet.gu...@flipkart.com wrote: Hi, We plan to capture various metrics exposed in Kafka via JMX and wanted to

Re: ISR differs between Kafka Metadata and Zookeeper

2014-09-19 Thread Andrew Otto
I am seeing this behavior using librdkafka, as is another user. Listing the topic metadata with the tool provided with Kafka (kafka-topic.sh) shows all replicas in the ISR. However, using kafkacat[1] (built with librdkafka) shows that many ISRs are missing some replicas. I talked with Magnus

Re: Zookeeper reconnect failed due to 'state changed (Expired)'

2014-09-29 Thread Andrew Otto
real GCs. The fix is to tune dirty_expire_centisecs and dirty_writeback_centisecs to flush dirty pages more frequently to avoid such drafting. Thanks, Jun On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto o...@wikimedia.org wrote: Hi again! I've been having this issue consistently since I

Re: Zookeeper reconnect failed due to 'state changed (Expired)'

2014-10-01 Thread Andrew Otto
into the ISR? On Sep 30, 2014, at 7:17 PM, Jun Rao jun...@gmail.com wrote: With ack=1, acked messages could be lost when the leader fails. Thanks, Jun On Mon, Sep 29, 2014 at 8:04 AM, Andrew Otto o...@wikimedia.org wrote: This happened again to me this weekend. I've done some sleuthing

Re: Zookeeper reconnect failed due to 'state changed (Expired)'

2014-10-01 Thread Andrew Otto
messages that didn't reach the new leader are lost. If the old leader rejoins ISR, it will also truncate it's log to follow the new leader's log. Thanks, Neha On Wed, Oct 1, 2014 at 5:48 AM, Andrew Otto ao...@wikimedia.org wrote: I understand that, but even if the leader quickly (within

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Andrew Otto
Check out Camus. It was built to do parallel loads from Kafka into time bucketed directories in HDFS. On Oct 16, 2014, at 9:32 AM, Gwen Shapira gshap...@cloudera.com wrote: I assume the messages themselves contain the timestamp? If you use Flume, you can configure a Kafka source to pull

Re: powered by kafka

2014-11-10 Thread Andrew Otto
Oo, add us too! The Wikimedia Foundation (http://wikimediafoundation.org/wiki/Our_projects) uses Kafka as a log transport for analytics data from production webservers and applications. This data is consumed into Hadoop using Camus and to other processors of analytics data. On Nov 10,

Re: How to push metrics to graphite - jmxtrans does not work

2014-12-02 Thread Andrew Otto
Maybe also set: -Dcom.sun.management.jmxremote.port= ? On Dec 2, 2014, at 02:59, David Montgomery davidmontgom...@gmail.com wrote: Hi, I am having a very difficult time trying to report kafka 8 metrics to Graphite. Nothing is listening on and and no data in graphite. If

Re: integrate Camus and Hive?

2015-03-11 Thread Andrew Otto
, Bhavesh On Wed, Mar 11, 2015 at 7:24 AM, Andrew Otto ao...@wikimedia.org wrote: Hive provides the ability to provide custom patterns for partitions. You can use this in combination with MSCK REPAIR TABLE to automatically detect and load the partitions into the metastore. I tried

Re: integrate Camus and Hive?

2015-03-11 Thread Andrew Otto
/java/com/linkedin/camus/etl/Partitioner.java and use configuration etl.partitioner.class=CLASSNAME then you can organize any way you like. I hope this helps. Thanks, Bhavesh On Wed, Mar 11, 2015 at 8:36 AM, Andrew Otto ao...@wikimedia.org wrote: e.g File produce by the camus job

Re: REST/Proxy Consumer access

2015-03-05 Thread Andrew Otto
BTW, Wikimedia uses varnishkafka to produce http requests to Kafka, and we are pretty happy with it. https://github.com/wikimedia/varnishkafka On Mar 5, 2015, at 13:09, Ewen Cheslack-Postava e...@confluent.io wrote: Yes, Confluent built a REST proxy that gives access to cluster metadata

Re: integrate Camus and Hive?

2015-03-12 Thread Andrew Otto
, int partitionId, String *encodedPartition*) { StringBuilder sb = new StringBuilder(); sb.append(Create your HDFS custom path here); return sb.toString(); } } I Thanks, Bhavesh On Wed, Mar 11, 2015 at 10:42 AM, Andrew Otto ao...@wikimedia.org wrote: Thanks

Re: Alternative to camus

2015-03-13 Thread Andrew Otto
We are currently using spark streaming 1.2.1 with kafka and write-ahead log. I will only say one thing : a nightmare. ;-) I’d be really interested in hearing about your experience here. I’m exploring streaming frameworks a bit, and Spark Streaming is just so easy to use and set up. I’d be

Re: integrate Camus and Hive?

2015-03-11 Thread Andrew Otto
Hive provides the ability to provide custom patterns for partitions. You can use this in combination with MSCK REPAIR TABLE to automatically detect and load the partitions into the metastore. I tried this yesterday, and as far as I can tell it doesn’t work with a custom partition layout. At

Re: Announcing the Confluent Platform built on Apache Kafka

2015-02-25 Thread Andrew Otto
Wow, .deb packages. I love you. On Feb 25, 2015, at 14:48, Joseph Lawson jlaw...@roomkey.com wrote: This is really awesome stuff. It's great to see y'all growing! Thank you and congratulations! From: Neha Narkhede n...@confluent.io Sent:

Kafka partitions unbalanced

2015-05-27 Thread Andrew Otto
Hi all, I’ve recently noticed that our broker log.dirs are using up different amounts of storage. We use JBOD for our brokers, with 12 log.dirs, 1 on each disk. One of our topics is larger than the others, and has 12 partitions. Replication factor is 3, and we have 4 brokers. Each broker

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-18 Thread Andrew Otto
I agree: keep it simple :) The latest stable version of Kafka right now has a critical bug in it. Fixing that would be good enough. 0.8.2.2 should probably just a maintenance/bugfix release. On Aug 18, 2015, at 14:29, Edward Ribeiro edward.ribe...@gmail.com wrote: I sort of follow

Decomissioning a broker

2015-07-30 Thread Andrew Otto
as I know that broker will still be registered in Zookeeper. Should I just delete the znode for that broker once it has been shut down? Thanks! -Andrew Otto

Re: Kafka Simple Consumer Replicas versus ISR

2015-08-05 Thread Andrew Otto
Hi, I’m not sure, but it is possible the discrepancy you are seeing is related to this: https://issues.apache.org/jira/browse/KAFKA-1367 I’m pretty sure the CLI talks to Zookeeper directly, whereas likely the SimpleConsumer talks to the Brokers. On Aug 5, 2015, at 13:13, d...@ariens.ca

Re: Decomissioning a broker

2015-08-04 Thread Andrew Otto
. Though, manual reassignment may be preferred in your case. Here is some extra information on controlled shutdowns: http://kafka.apache.org/documentation.html#basic_ops_restarting Thanks, Grant On Thu, Jul 30, 2015 at 4:37 PM, Andrew Otto ao...@wikimedia.org wrote: I’m sure this has been

Re: 0.8.2.1 upgrade causes much more IO

2015-08-13 Thread Andrew Otto
Bruce *From:* Andrew Otto [mailto:ao...@wikimedia.org] *Sent:* Tuesday, August 11, 2015 3:15 PM *To:* users@kafka.apache.org *Cc:* Dan Andreescu dandree...@wikimedia.org; Joseph Allemandou jalleman...@wikimedia.org *Subject:* Re: 0.8.2.1 upgrade causes much more IO Hi Todd, We

Re: Replica not available...when it is!

2015-08-15 Thread Andrew Otto
means that there is a replica that is down. If you get Leader not available that means the partition is offline. -Clark Sent from my iPhone On Aug 15, 2015, at 8:41 AM, Andrew Otto ao...@wikimedia.org wrote: Also strange: If I start this broker back up, and then issue a kafkacat

Re: Replica not available...when it is!

2015-08-15 Thread Andrew Otto
the latest version of Camus? -Clark Sent from my iPhone On Aug 15, 2015, at 10:25 AM, Andrew Otto ao...@wikimedia.org wrote: Hm, interesting. So my real issue is more with Camus than with cluster problems? It seems that Camus won’t consume if it encounters a ReplicaNotAvailableException

Replica not available...when it is!

2015-08-15 Thread Andrew Otto
I am having trouble with a single broker causing consumers to lag. As I am troubleshooting this issue, I have stopped this broker in the hopes that other replicas will take over as leader for this broker’s preferred partitions. However, when I do so, Camus reports: kafka.CamusJob: Skipping

Re: Kafka metadata

2015-08-10 Thread Andrew Otto
Note that broker metadata is not necessarily kept in sync with zookeeper on all brokers at all times: https://issues.apache.org/jira/browse/KAFKA-1367 This looks like it is fixed in the upcoming 0.8.3 On Aug 8, 2015, at 01:08, Abdoulaye Diallo abdoulaye...@gmail.com wrote: @Rahul If

Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
to broker to tell it to do the controlled shutdown. I also got confused before and had to look at the code to figure that out. I think it is better if we can add this to the document. -Binh On Mon, Jul 27, 2015 at 11:50 AM, Andrew Otto ao...@wikimedia.org wrote: Thanks! But how do I

Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org mailto:ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t

Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew

Replica Fetcher Reset Its Offset to beginning

2015-10-29 Thread Andrew Otto
Hi all, This morning I woke up to see a very high max replica lag on one of my brokers. I looked at logs, and it seems that one of the replica fetchers for a partition just decided that its offset was out of range, so it reset its offset to the beginning of the leader’s log and started

Re: API to query cluster metadata on-demand

2015-09-03 Thread Andrew Otto
If you don’t mind doing it with a C CLI: https://github.com/edenhill/kafkacat $ kafkacat -L -b mybroker But, uhhh, you probably want a something in the Java API. :) > On Sep 3, 2015, at 13:58, Gwen Shapira wrote: > > Ah, I wish. > > We are working on it :) > > On Thu,

Re: best python library to use?

2016-01-12 Thread Andrew Otto
ail and any > > attachments are confidential and may also be privileged. If youare not > the > > intended recipient, please notify the sender immediately, and do > > notdisclose > > the contents to another person, use it for any purpose, or store, or > > copythe &

Re: Brokers changing mtime on data files during startup?

2016-06-07 Thread Andrew Otto
is value. > > > > > > This way you do not loose ~7 days of data and can be sure that your > disks > > > will not fill up. > > > > > > Maybe I should add a comment in > > > https://issues.apache.org/jira/browse/KAFKA-1379 > > > > > > Bye

Re: Relaying UDP packets into Kafka

2016-05-25 Thread Andrew Otto
Super old, but: https://github.com/atdt/UdpKafka On Wed, May 25, 2016 at 4:20 PM, Joe San wrote: > What about this one: https://github.com/agaoglu/udp-kafka-bridge > > On Wed, May 25, 2016 at 6:48 PM, Sunil Saggar > wrote: > > > Hi All, > > > >

Re: Changing default logger to RollingFileAppender (KAFKA-2394)

2016-06-02 Thread Andrew Otto
+1, this is what Wikimedia uses in production. On Thu, Jun 2, 2016 at 10:38 AM, Tauzell, Dave wrote: > I haven't started using this in production but this is how I will likely > setup the logging as it is easier to manage. > > -Dave > > -Original Message-

Re: best python library to use?

2016-01-11 Thread Andrew Otto
pykafka’s balanced consumer is very useful. pykafka also has Python bindings to the librdkafka C library that you can optionally enable, which might get you some speed boosts. python-kafka (oh, I just saw this 0.9x version, hm!) was better at producing than pykafka for us, so we am currently

Re: Apache Kafka Case Studies

2016-02-03 Thread Andrew Otto
Talk I gave about Kafka at the Wikimedia Foundation at Kafka NYC Meetup in 2014. https://www.hakkalabs.co/articles/apache-kafka-wikimedia On Wed, Feb 3, 2016 at 1:56 PM, Joe San wrote: > The OReilly online training seems to be interesting! Is there anything else > that

Re: MirrorMaker —new.producer

2016-01-19 Thread Andrew Otto
ass=kafka.serializer.StringEncoder. But still, should I be using —new.producer? On Tue, Jan 19, 2016 at 11:50 AM, Andrew Otto <o...@wikimedia.org> wrote: > Hi all, > > I finally have a need to understand MirrorMaker well. I’m running Kafka > 0.8.2.2. I see that my versi

MirrorMaker —new.producer

2016-01-19 Thread Andrew Otto
Hi all, I finally have a need to understand MirrorMaker well. I’m running Kafka 0.8.2.2. I see that my version of MirrorMaker has a —new.producer option, which uses NewShinyProducer instead of OldProducer. Without —new.producer, kafka-console-producer.sh seems to produce byte messages that

Re: Filter plugins in Kafka

2016-05-02 Thread Andrew Otto
If you want something really simple and hacky, you could use kafkatee[1] and kafkacat[2] together: kafkatee.conf: input [encoding=string] pipe tail -f a.log output pipe 1 grep -v ’not this’ | kafkacat -P -b b1:9092 -t mytopic [1] https://github.com/wikimedia/analytics-kafkatee [2]

Re: Brokers changing mtime on data files during startup?

2016-05-25 Thread Andrew Otto
“We use the default log retention of 7 *days*" :)* On Wed, May 25, 2016 at 12:34 PM, Andrew Otto <o...@wikimedia.org> wrote: > Hiya, > > We’ve recently upgraded to 0.9. In 0.8, when we restarted a broker, data > log file mtimes were not changed. In 0.9, any data log

Brokers changing mtime on data files during startup?

2016-05-25 Thread Andrew Otto
Hiya, We’ve recently upgraded to 0.9. In 0.8, when we restarted a broker, data log file mtimes were not changed. In 0.9, any data log file that was on disk before the broker has it’s mtime modified to the time of the broker restart. This causes problems with log retention, as all the files

Re: Is there any command can be used to get broker.id on one of broker with command?

2017-02-16 Thread Andrew Otto
Best I got for ya, using https://github.com/r4um/jmx-dump $ broker_hostname=localhost $ jmx_port= $ java -jar jmx-dump-0.4.2-standalone.jar -h $broker_hostname -p $jmx_port -m | grep ‘kafka.server:id=.*type=app-info’ kafka.server:id=12,type=app-info This will only work if a JMX port is

Re: [VOTE] Add REST Server to Apache Kafka

2016-10-26 Thread Andrew Otto
-1 for http kafka client in core Although a read only management interface, perhaps via http, sounds kinda useful for things like health checks as mentioned. On Wed, Oct 26, 2016 at 2:00 PM, Zakee wrote: > -1 > > Thanks. > > On Oct 25, 2016, at 2:16 PM, Harsha

Upgrade by replacing brokers?

2017-07-26 Thread Andrew Otto
it to? Is there anything I’m missing? Are there gotchas related with on disk log file formats that might cause some issues? Thanks! - Andrew Otto Systems Engineer, Wikimedia Foundation

super.users or kafka-acls —cluster for broker ACLs?

2017-12-04 Thread Andrew Otto
Hi all, Is there any reason not to list broker principals as super.users? I know there is the —cluster shortcut for adding broker ACL permissions via the kafka-acls CLI. Adding them to super.users would be simpler, as it can more easily be done via configuration management. Thanks! - Andrew

Re: Kafka Monitoring..

2017-11-09 Thread Andrew Otto
We’ve recently started using Prometheus, and use Prometheus JMX Exporter to get Kafka metrics into prometheus. Here’s our JMX Exporter config:

Re: Kafka mirror maker help

2018-04-27 Thread Andrew Otto
Hiya, Saravanan, I saw you emailed my colleague Alex about WMF’s old debian packaging. I’ll reply here. We now use Confluent’s Kafka debian packaging which does not (or did not?) ship with init scripts. We don’t use Sys V init.d scripts anymore either, but use systemd instead. Our systemd

Re: Cross-cluster mirror making

2018-02-12 Thread Andrew Otto
by the Kafka cluster. If the buffer gets too full, it will have to start dropping messages. - Andrew Otto On Thu, Feb 8, 2018 at 5:21 PM, Husna Hadi <hh...@adobe.com.invalid> wrote: > Hi, I read on The Definitive Guide to Kafka that when using cross-cluster > kafka mirroring, w

Re: Kafka Connect REST connector with additional logging to kafka

2018-08-23 Thread Andrew Otto
Hiya, this doesn’t help answer your question, but as an FYI, Wikimedia has implemented https://github.com/wikimedia/change-propagation to do what you are trying to do: issue HTTP requests (and other things) triggered by incoming messages in Kafka. On Thu, Aug 23, 2018 at 9:45 AM Andrea Spina

Re: offsetsForTimes API performance

2018-01-22 Thread Andrew Otto
Speaking of, has there been any talk of combining those two requests into a single API call? I’d assume that offsetForTimes + consumer seek is probably the most common use case of offsetForTimes. Maybe a round trip could be avoided if the broker could just auto-assign the consumer to the offset

JSONSchema Kafka Connect Converter

2018-01-23 Thread Andrew Otto
erted to a ConnectRecord, the messages could be used with any Connector out there, right? I might have space in the next year to work on something like this, but I thought I’d ask here first to see what others thought. Would this be useful? If so, is this something that might be upstreamed into Apache Kafka?

Kafka 0.9 MirrorMaker failing with Batch Expired when producing to Kafka 1.0 cluster

2018-03-12 Thread Andrew Otto
://gist.github.com/ottomata/5324fc3becdd20e9a678d5d37c2db872 Any help is appreciated, thanks! -Andrew Otto Senior Systems Engineer Wikimedia Foundation

Re: Kafka 0.9 MirrorMaker failing with Batch Expired when producing to Kafka 1.0 cluster

2018-03-13 Thread Andrew Otto
ng the configurations. > For the hanging MirrorMaker instances, I think looking at stack dumps would > help you get closer to the root cause. > > Best regards, > Andras > > On Mon, Mar 12, 2018 at 7:56 PM, Andrew Otto <o...@wikimedia.org> wrote: > > > Hi all, > >

Re: Kafka Mirrormaker issue

2018-04-09 Thread Andrew Otto
ith the new consumer, on > MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 > Kafka cluster. > > On 3/30/18, 3:56 PM, "Andrew Otto" <o...@wikimedia.org> wrote: > > I’m currently stuck on MirrorMaker version 0.9, and I’m not sure &g

Re: Using Kafka CLI without specifying the URLs every single time?

2018-04-23 Thread Andrew Otto
Us too: https://github.com/wikimedia/puppet/blob/production/modules/confluent/files/kafka/kafka.sh This requires that the various kafka-* scrips are in your PATH. And then this gets rendered into /etc/profile.d to set env variables.

Re: Kafka Mirrormaker issue

2018-03-30 Thread Andrew Otto
. Not sure what the default in 0.10 is. On Fri, Mar 30, 2018 at 11:40 AM, Siva A <siva9940261...@gmail.com> wrote: > Any other update on this? > > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <o...@wikimedia.org> wrote: > > > I’ve had similar problems, but I do

Re: Kafka Mirrormaker issue

2018-03-26 Thread Andrew Otto
I’ve had similar problems, but I don’t have an explanation for ya :/ On Sun, Mar 25, 2018 at 12:19 PM, Siva A wrote: > Hi, > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from > another 3 node cluster of same Kafka version. > Both the clusters

Re: Exposing Kafka on WAN

2018-08-30 Thread Andrew Otto
The trouble is that the producer and consumer clients need to discover the broker hostnames and address the individual brokers directly. There is an advertised.listeners setting that will allow you to tell clients to connect to external proxy hostnames instead of your internal ones, but those

Re: Where to run MM2? Source or destination DC/region?

2020-01-09 Thread Andrew Otto
ource this event is happing. > > -----Original Message- > From: Andrew Otto > Sent: Thursday, January 9, 2020 8:32 AM > To: users@kafka.apache.org > Subject: Re: Where to run MM2? Source or destination DC/region? > > ---External Email--- > > Hi Peter, > > My unde

Re: Where to run MM2? Source or destination DC/region?

2020-01-09 Thread Andrew Otto
Hi Peter, My understanding here comes from MirrorMaker 1, but I believe it holds for MM2 (someone correct me if I am wrong!) For the most part, if you have no latency or connectivity issues, running MM at the source will be fine. However, the failure scenario is different if something goes

log.message.timestamp.difference.max.ms and future timestamps?

2020-04-16 Thread Andrew Otto
this case. log.message.timestamp.difference.max.ms - futureTimestamp == -bigNumber Will the message be rejected or accepted in this case? Thanks! -Andrew Otto @Wikimedia Foundation

Re: ACLs - How To Allow Anyone To Access of A Topic

2020-05-18 Thread Andrew Otto
If I understand correctly, if your client authenticates, there must be an ACL for that principal, otherwise it will fail authorization. If you are going to allow everything anyway, perhaps you don't need to authenticate?

  1   2   >