Re: AWS EC2 deployment best practices

2014-09-30 Thread James Cheng
I'm also interested in hearing more about deploying Kafka in AWS. I was also considering options like your 1a and 2. I ran some calculations and one interesting thing I ran across was bandwidth costs between AZs. In 1a, if you can have your producers and consumers in the same AZ as the "master"

If you run Kafka in AWS or Docker, how do you persist data?

2015-02-26 Thread James Cheng
Hi, I know that Netflix might be talking about "Kafka on AWS" at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work? For both of these, how do you persist data?

Re: Kafka 0.8.2 log cleaner

2015-03-02 Thread James Cheng
Ivan, I think log.cleaner.delete.retention.ms does just that? "The amount of time to retain delete tombstone markers for log compacted topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapsh

Re: Database Replication Question

2015-03-04 Thread James Cheng
Another thing to think about is delivery guarantees. Exactly once, at least once, etc. If you have a publisher that consumes from the database log and pushes out to Kafka, and then the publisher crashes, what happens when it starts back up? Depending on how you keep track of the database's tran

Re: Database Replication Question

2015-03-04 Thread James Cheng
> On Mar 3, 2015, at 4:18 PM, Guozhang Wang wrote: > > Additionally to Jay's recommendation, you also need to have some special > cares in error handling of the producer in order to preserve ordering since > producer uses batching and async sending. That is, if you already sent > messages 1,2,3,

Re: Database Replication Question

2015-03-05 Thread James Cheng
es at the same time. This > could improve your throughput and your consumers can easily identify if any > message is lost due to any reason. > > Best wishes, > > Xiao Li > > > On Mar 4, 2015, at 4:59 PM, James Cheng wrote: > >> Another thing to think about

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-10 Thread James Cheng
Hi, Sorry to bring up this old thread, but my question is about this exact thing: Guozhang, you said: > A more concrete example: say you have topic AC: 3 partitions, topic BC: 6 > partitions. > > With createMessageStreams("AC" => 3, "BC" => 2) a total of 5 threads will > be created, and consumin

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-11 Thread James Cheng
B and C do, how will the fetcher behave? Will the processing thread receive messages from partitions B and C? Thanks, -James > Guozhang > > On Tue, Mar 10, 2015 at 5:15 PM, James Cheng wrote: > >> Hi, >> >> Sorry to bring up this old thread, but my question is abou

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-12 Thread James Cheng
sent to broker with a fetch request to > ask for all partitions. So if A, B, C are in the same broker fetcher thread > is still able to fetch data from A, B, C even though A returns no data. > same logic is applied to different broker. > > On Thu, Mar 12, 2015 at 6:25 AM, James Che

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-16 Thread James Cheng
Super cool, and super simple. I like how it is pretty much a pure translation of the binlog into Kafka, with no interpretation of the events. That means people can layer whatever they want on top of it. They would have to understand what the mysql binary events mean, but they would just have to

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-17 Thread James Cheng
This is a great set of projects! We should put this list of projects on a site somewhere so people can more easily see and refer to it. These aren't Kafka-specific, but most seem to be "MySQL CDC." Does anyone have a place where they can host a page? Preferably a wiki, so we can keep it up to d

Re: Kafka 0.9 consumer API

2015-03-19 Thread James Cheng
Those are pretty much the best javadocs I've ever seen. :) Nice job, Kafka team. -James > On Mar 19, 2015, at 9:40 PM, Jay Kreps wrote: > > Err, here: > http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html > > -Jay > > On Thu, Mar 19, 2015 at 9:

Re: Post on running Kafka at LinkedIn

2015-03-20 Thread James Cheng
For those who missed it: The Kafka Audit tool was also presented at the 1/27 Kafka meetup: http://www.meetup.com/http-kafka-apache-org/events/219626780/ Recorded video is here, starting around the 40 minute mark: http://www.ustream.tv/recorded/58109076 Slides are here: http://www.ustream.tv/reco

Re: Post on running Kafka at LinkedIn

2015-03-20 Thread James Cheng
gt;> >> Thank you for sharing it! >> >> The links of videos and slides are the same. Could you check the link of >> slides? >> >> Xiao Li >> >> On Mar 20, 2015, at 11:30 AM, James Cheng wrote: >> >>> For those who missed it: >&g

Re: Post on running Kafka at LinkedIn

2015-03-20 Thread James Cheng
rowth) Bytes out: 650 TB (20% growth) Total brokers: 1100 (56% growth) That much growth in just 2 months? Wowzers. -James > On Mar 20, 2015, at 11:30 AM, James Cheng wrote: > > For those who missed it: > > The Kafka Audit tool was also presented at the 1/27 Kafka meetup: > htt

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-23 Thread James Cheng
o" wrote: >>> >>>> Hi, all, >>>> >>>> Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I >>>> believe they already knew the whole problem very well. >>>> >>>> Using triggers? or

How to consume from a specific topic, as well as a wildcard of topics?

2015-04-03 Thread James Cheng
Hi, I want to consume from both a specific topic "a_topic" as well as all topics that match a certain prefix "prefix.*". When I do that using a single instance of a ConsumerConnector, I get a hang when creating the 2nd set of message streams. Code: ConsumerConnector consumer = Consume

Re: serveral questions about auto.offset.reset

2015-04-14 Thread James Cheng
"What to do when there is no initial offset in ZooKeeper or if an offset is out of range" I personally find the name "auto.offset.reset" to be somewhat confusing. That's mostly because I only knew of it as the "no initial offset" setting. An alternate name could be "auto.offset.initial", to ha

New and old producers partition messages differently

2015-04-24 Thread James Cheng
Hi, I was playing with the new producer in 0.8.2.1 using partition keys ("semantic partitioning" I believe is the phrase?). I noticed that the default partitioner in 0.8.2.1 does not partition items the same way as the old 0.8.1.1 default partitioner was doing. For a test item, the old producer

Re: New and old producers partition messages differently

2015-04-27 Thread James Cheng
hash code >> so I think this is not quite as bad as it sounds. Though it would be good >> to standardize things. >> >> I think the most obvious thing we could do here would be to do a much >> better job of advertising this in the docs, though, so people don't get

Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

2015-05-08 Thread James Cheng
Hi, I want to use the high level consumer to read all partitions for a topic, and know when I have reached "the end". I know "the end" might be a little vague, since items keep showing up, but I'm trying to get as close as possible. I know that more messages might show up later, but I want to k

Re: Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

2015-05-11 Thread James Cheng
tition > attached to it so with that data. I suppose that information plus info from > a fetch response you could determine this with in an application. > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse > > Does

Re: Log end offset

2015-05-11 Thread James Cheng
Vamsi, There is another thread going on right now about this exact topic: "Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?" http://search-hadoop.com/m/uyzND1Eb3e42NMCWl -James On May 10, 2015, at 11:48 PM, Achanta Vams

Re: Is fetching from in-sync replicas possible?

2015-05-27 Thread James Cheng
On May 26, 2015, at 1:44 PM, Joel Koshy wrote: >> Apologies if this question has been asked before. If I understand things >> correctly a client can only fetch from the leader of a partition, not from >> an (in-sync) replica. I have a use case where it would be very beneficial >> if it were poss

Re: Is fetching from in-sync replicas possible?

2015-05-27 Thread James Cheng
nt >> the topic ideally needs more partitions. >> >> Aditya >> >> >> From: James Cheng [jch...@tivo.com] >> Sent: Wednesday, May 27, 2015 10:50 AM >> To: users@kafka.apache.org >> Subject: Re: Is fetching from in-sync rep

How to manage the consumer group id?

2015-06-10 Thread James Cheng
Hi, How are people specifying/persisting/resetting the consumer group identifier ("group.id") when using the high-level consumer? I understand how it works. I specify some string and all consumers that use that same string will help consume a topic. The partitions will be distributed amongst t

Re: Changing replication factor for an existing topic

2015-06-10 Thread James Cheng
AirBNB's kafkat tool has a "set-replication-factor" option. I've never tried it myself. https://github.com/airbnb/kafkat -James > On Jun 10, 2015, at 4:20 PM, Aditya Auradkar > wrote: > > The replica list that you specify can be used to increment/decrement the > replication factor. > http:/

Re: How to manage the consumer group id?

2015-06-10 Thread James Cheng
the consumer, and monitoring offsets for lag. I don't hear much about people deleting or changing/setting offsets by other means. How is it usually done? Are there tools to change the offsets, or do people go into zookeeper to change them directly? Or, for broker-stored offsets, use the

Re: Using Kafka as a persistent store

2015-07-13 Thread James Cheng
For what it's worth, I did something similar to Rad's suggestion of "cold-storage" to add long-term archiving when using Amazon Kinesis. Kinesis is also a message bus, but only has a 24 hour retention window. I wrote a Kinesis consumer that would take all messages from Kinesis and save them int

New producer and ordering of Callbacks when sending to multiple partitions

2015-07-13 Thread James Cheng
Hi, I'm trying to understand the new producer, and the order in which the Callbacks will be called. From my understanding, records are batched up per partition. So all records destined for a specific partition will be sent in order, and that means that their callbacks will be called in order.

Re: New producer in production

2015-07-17 Thread James Cheng
http://kafka.apache.org/documentation.html, Section 3.4. > 3.4 New Producer Configs > > We are working on a replacement for our existing producer. The code is > available in trunk now and can be considered beta quality. Below is the > configuration for the new producer. Sivananda might have se

Re: New producer in production

2015-07-17 Thread James Cheng
found it here: > http://kafka.apache.org/documentation.html#newproducerconfigs, the same > link is reported by James. > > @Joel: Thanks a lot for the info, I will use new producer > > Regards, > Siva. > > On Fri, Jul 17, 2015 at 12:02 PM, James Cheng wrote: > >&g

Consuming from Kafka but don't need to save offsets

2015-07-20 Thread James Cheng
Hi, I have a web service that serves up some data that it obtains from a kafka topic. When the process starts up, it wants to load the entire kafka topic into memory, and serve the data up from an in-memory hashtable. The data in the topic has primary keys and is log compacted, and so the total

Re: New consumer - poll/seek javadoc confusing, need clarification

2015-07-21 Thread James Cheng
> On Jul 21, 2015, at 9:15 AM, Ewen Cheslack-Postava wrote: > > On Tue, Jul 21, 2015 at 2:38 AM, Stevo Slavić wrote: > >> Hello Apache Kafka community, >> >> I find new consumer poll/seek javadoc a bit confusing. Just by reading docs >> I'm not sure what the outcome will be, what is expected

Checkpointing with custom metadata

2015-08-03 Thread James Cheng
According to https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest, we can store custom metadata with our checkpoints. It looks like the high level consumer does not support committing offsets with metadata, and that in orde

Re: Checkpointing with custom metadata

2015-08-03 Thread James Cheng
of A to sorted-topic, that I would store "finished doing initial copy of A" into my checkpoint, and that upon restart, I would check that and know to start doing the merge sort of A B C. I have a couple other designs that seem cleaner, tho, so I might not actually need it. -James

Documentation typo for offsets.topic.replication.factor ?

2015-08-05 Thread James Cheng
Hi, My kafka cluster has a __consumer_offsets topic with 50 partitions (the default for offsets.topic.num.partitions) but with a replication factor of just 1 (the default for offsets.topic.replication.factor should be 3). From the docs http://kafka.apache.org/documentation.html: offsets.topic.

Re: Log Cleaner Thread Stops

2015-09-23 Thread James Cheng
On Sep 18, 2015, at 10:25 AM, Todd Palino wrote: > I think the last major issue with log compaction (that it couldn't handle > compressed messages) was committed as part of > https://issues.apache.org/jira/browse/KAFKA-1374 in August, but I'm not > certain what version this will end up in. It ma

Re: Log Cleaner Thread Stops

2015-09-24 Thread James Cheng
doesn’t it? I guess that means Burrow is only being used to monitor your mirror makers and auditor application, then? -James > -Todd > > > On Wed, Sep 23, 2015 at 3:21 PM, James Cheng wrote: > >> >> On Sep 18, 2015, at 10:25 AM, Todd Palino wrote: >> >>> I

Re: custom message handlers?

2015-09-28 Thread James Cheng
> On Sep 28, 2015, at 12:47 PM, Doug Tomm wrote: > > hello, > > i've noticed the addition of the custom message handler feature in the latest > code; a very useful feature. in what release will it be available, and when > might that be? at present i am building kafka from source to get this

Re: Dealing with large messages

2015-10-05 Thread James Cheng
Here’s an article that Gwen wrote earlier this year on handling large messages in Kafka. http://ingest.tips/2015/01/21/handling-large-messages-kafka/ -James > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota wrote: > > Fellow Kafkaers, > > We have a pretty heavyweight legacy event logging system

Re: New consumer client compatible with old broker

2015-10-15 Thread James Cheng
> On Oct 15, 2015, at 11:29 AM, tao xiao wrote: > > Hi team, > > Does new consumer client (the one in trunk) work with 0.8.2.x broker? I am > planning to use the new consumer in our development but don't want to > upgrade the broker to the latest. is it possible to do that? Tao, I recently trie

Where is replication factor stored?

2015-10-16 Thread James Cheng
Hi, Where is the replication factor for a topic stored? It isn't listed at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper. But the kafka-topics --describe command returns something. Where is it finding that? Thanks, -James ___

Re: Where is replication factor stored?

2015-10-16 Thread James Cheng
mtime = Wed Aug 05 22:48:12 UTC 2015 pZxid = 0xc017a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 79 numChildren = 0 I tried that for a number of different topics, and none of them have it. -James > Guozhang > > On Fri, Oct 16, 2015 at 12:33 PM, James Ch

0.8.2 high level consumer with one-time-use group.id's?

2015-12-15 Thread James Cheng
When using the 0.8.2 high level consumer, what is the impact of creating many one-time use groupIds and checkpointing offsets using those? I have a use case where upon every boot, I want to consume an entire topic from the very beginning, all partitions. We are using the high level consumer for

Re: kafka-connect-jdbc: ids, timestamps, and transactions

2015-12-18 Thread James Cheng
Mark, what database are you using? If you are using MySQL... There is a not-yet-finished Kafka MySQL Connector at https://github.com/wushujames/kafka-mysql-connector. It tails the MySQL binlog, and so will handle the situation you describe. But, as I mentioned, I haven't finished it yet. If

Re: how to reset kafka offset in zookeeper

2015-12-19 Thread James Cheng
This page describes what Kafka stores in Zookeeper: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper It looks like the info for a particular consumer groupId is stored at: /consumers// According to https://community.cloudera.com/t5/Cloudera-Labs/Kafka-Parcels

Re: Kafka + ZooKeeper on the same hardware?

2016-01-18 Thread James Cheng
> On Jan 18, 2016, at 12:21 PM, Dick Davies wrote: > > Started an Ansible playbook using the Confluent platform RPM distro, > and it seems that co-locates zookeepers > on the brokers. > > So I'm assuming it's fine (at least on 0.9.x for the reasons Todd mentioned). > > Does anyone know if the Con

Re: Offset storage issue with kafka(0.8.2.1)

2016-01-27 Thread James Cheng
> On Jan 27, 2016, at 8:25 PM, Sivananda Reddys Thummala Abbigari > wrote: > > Hi, > > # *Kafka Version*: 0.8.2.1 > > # *My consumer.propeties have the following properties*: >exclude.internal.topics=false >offsets.storage=kafka >dual.commit.enabled=false > > # With the above configu

Re: Accumulating data in Kafka Connect source tasks

2016-01-28 Thread James Cheng
> On Jan 28, 2016, at 5:06 PM, Ewen Cheslack-Postava wrote: > > Randall, > > Great question. Ideally you wouldn't need this type of state since it > should really be available in the source system. In your case, it might > actually make sense to be able to grab that information from the DB itself

Re: Accumulating data in Kafka Connect source tasks

2016-01-29 Thread James Cheng
> On Jan 29, 2016, at 7:06 AM, Randall Hauch wrote: > > On January 28, 2016 at 7:07:02 PM, Ewen Cheslack-Postava (e...@confluent.io) > wrote: > Randall, > > Great question. Ideally you wouldn't need this type of state since it > should really be available in the source system. In your case, it m

Re: MongoDB Kafka Connect driver

2016-01-29 Thread James Cheng
Not sure if this will help anything, but just throwing it out there. The Maxwell and mypipe projects both do CDC from MySQL and support bootstrapping. The way they do it is kind of "eventually consistent". 1) At time T1, record coordinates of the end of the binlog as of T1. 2) At time T2, do a f

Re: at-least-once delivery

2016-01-30 Thread James Cheng
> On Jan 30, 2016, at 4:21 AM, Franco Giacosa wrote: > > Sorry, this solved my questions: "Setting a value greater than zero will > cause the client to resend any record whose send fails with a potentially > transient error. Note that this retry is no different than if the client > resent the rec

Re: Detecting broker version programmatically

2016-02-04 Thread James Cheng
> On Feb 4, 2016, at 8:28 PM, Manikumar Reddy wrote: > > Currently it is available through JMX Mbean. It is not available on wire > protocol/requests. > The name of the JMX Mbean is kafka.server:type=app-info,id=4 Not sure what the id=4 means. -James > Pending JIRAs related to this: > https:/

Re: Discrepancy between JMX OfflinePartitionCount and kafka-topics.sh?

2016-02-09 Thread James Cheng
I ran into kind of a similar discrepancy, but about UnderReplicatedPartitions. kafka-topics.sh and zookeeper were saying that we had underreplicated partitions. But JMX said that there were none. I took one of the partitions that ZK was saying was under-replicated and I ran DumpLogSegments on t

Questions about unclean leader election and "Halting because log truncation is not allowed"

2016-02-25 Thread James Cheng
Hi, I ran into a scenario where one of my brokers would continually shutdown, with the error message: [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not allowed for topic test, Current leader 1's latest offset 0 is less than replica 2's latest offs

Re: Kafka Rest Proxy

2016-03-01 Thread James Cheng
Jan, I don't use the rest proxy, but Confluent has a mailing list where you can probably get more info: Here's the direct link: https://groups.google.com/forum/#!forum/confluent-platform And it is linked off of here: http://www.confluent.io/developer#documentation -James > On Mar 1, 2016, at

Unavailable partitions (Leader: -1 and ISR is empty) and we can't figure out how to get them back online

2016-03-01 Thread James Cheng
Hi, We have 44 partitions in our cluster that are unavailable. kafka-topics.sh is reporting them with Leader: -1, and with no brokers in the ISR. Zookeeper says that broker 5 should be the partition leader for this topic partition. These are topics with replication-factor 1. Most of the topics

Re: About the number of partitions

2016-03-02 Thread James Cheng
Kim, Here's a good blog post from Confluent with advice on how to choose the number of partitions. http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ -James > On Mar 1, 2016, at 4:11 PM, BYEONG-GI KIM wrote: > > Hello. > > I have questions about how

Re: Writing a Producer from Scratch

2016-03-03 Thread James Cheng
Stephen, There is a mailing list for kafka client developers that you may find useful: https://groups.google.com/forum/#!forum/kafka-clients The d...@kafka.apache.org mailing list might also be a good resource: http://kafka.apache.org/contact.html Lastly, do you h

Re: Uneven GC behavior between nodes

2016-03-05 Thread James Cheng
Your partitions are balanced, but is your data being evenly written across all the partitions? How are you producing data? Are you producing them with keys? Is it possible that the majority of the messages being written to just a few partitions, and so the brokers for those partitions are seeing

Re: Questions about unclean leader election and "Halting because log truncation is not allowed"

2016-03-15 Thread James Cheng
: > https://issues.apache.org/jira/browse/KAFKA-2143 > > Thank you, > > Tony > > On Thu, Feb 25, 2016 at 3:46 PM, James Cheng wrote: > >> Hi, >> >> I ran into a scenario where one of my brokers would continually shutdown, >> with the error messa

What happens if controlled shutdown can't complete within controlled.shutdown.max.retries attempts?

2016-03-20 Thread James Cheng
The broker has the following parameters related to controlled shutdown: controlled.shutdown.enable Enable controlled shutdown of the server boolean truemedium controlled.shutdown.max.retries Controlled shutdown can fail for multiple reasons. This determines the number of

Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread James Cheng
Congrats Damian! -James > On Jun 9, 2017, at 1:34 PM, Guozhang Wang wrote: > > Hello all, > > > The PMC of Apache Kafka is pleased to announce that we have invited Damian > Guy as a committer to the project. > > Damian has made tremendous contributions to Kafka. He has not only > contributed

Re: Slow Consumer Group Startup

2017-06-13 Thread James Cheng
Bryan, This sounds related to https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance and https://issues.apache.org/jira/browse/KAFKA-4925. -James > On Jun 13, 2017, at 7:02 AM, Bryan Baugher wrote: > > The topics already exist prior to starting an

Re: mirroring Kafka while preserving the order

2017-06-29 Thread James Cheng
MirrorMaker acts as a consumer+producer. So it will consume from the source topic and produce to the destination topic. That means that the destination partition is chosen using the same technique as the normal producer: * if the source record has a key, the key will be hashed and the hash will

Re: Mirroring multiple clusters into one

2017-07-06 Thread James Cheng
I'm not sure what the "official" recommendation is. At TiVo, we *do* run all our mirrormakers near the target cluster. It works fine for us, but we're still fairly inexperienced, so I'm not sure how strong of a data point we should be. I think the thought process is, if you are mirroring from a

Re: Mirroring multiple clusters into one

2017-07-06 Thread James Cheng
art with "mirror." This prevents us from creating mirroring loops. > Thanks. > --Vahid > > > > From: James Cheng > To: users@kafka.apache.org > Cc: dev > Date: 07/06/2017 12:37 PM > Subject:Re: Mirroring multiple clusters into one > >

Re: Consumer offsets partitions size much bigger than others

2017-07-18 Thread James Cheng
It's possible that the log-cleaning thread has crashed. That is the thread that implements log compaction. Look in the log-cleaner.log file in your kafka debuglog directory to see if there is any indication that it has crashed (error messages, stack traces, etc). What version of kafka are you u

Re: Tuning up mirror maker for high thruput

2017-07-22 Thread James Cheng
Becket Qin from LinkedIn spoke at a meetup about how to tune the Kafka producer. One scenario that he described was tuning for situations where you had high network latency. See slides at https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600 and vid

Re: Tuning up mirror maker for high thruput

2017-07-24 Thread James Cheng
t’s called receive.buffer.bytes. Again, you can set this to -1 to use > the OS configuration. Make sure to restart the applications after making > all these changes, of course. > > -Todd > > > On Sat, Jul 22, 2017 at 1:27 AM, James Cheng wrote: > >> Becket Qin from Linke

Re: Consumer group metadata retention

2017-07-26 Thread James Cheng
The offsets.retention.minutes value (1440 = 24 hours = 1 day) is a broker level configuration, and can't be changed dynamically during runtime. You would have to modify the broker configurations, and restart the brokers. -James > On Jul 25, 2017, at 9:43 PM, Raghu Angadi wrote: > > I am writi

Re: Improving Kafka State Store performance

2017-09-16 Thread James Cheng
In addition to the measurements that you are doing yourself, Kafka Streams also has its own metrics. They are exposed via JMX, if you have that enabled: http://kafka.apache.org/documentation/#monitoring If you set metrics.recording.level="debu

Re: Kafka Internals Video/Blog

2017-09-20 Thread James Cheng
This recent meetup had a presentation of the internals of the Kafka Controller. https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/242656767/ The video is not yet available, but hopefully will be soon. -J

Re: Metrics: committed offset, client version

2017-09-20 Thread James Cheng
KIP-188 is expected to be in the upcoming 1.0.0 release. It will add client-side JMX metrics that show the client version number. https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+Add+new+metrics+to+support+health+checks

Re: In which scenarios would "INVALID_REQUEST" be returned for "Offset Request"

2017-09-24 Thread James Cheng
Your client library might be sending a message that is too old or too new for your broker to understand. What version is your Kafka client library, and what version is your broker? -James Sent from my iPhone > On Sep 22, 2017, at 4:09 PM, Vignesh wrote: > > Hi, > > In which scenarios would

How do I instantiate a metrics reporter in Kafka Streams, with custom config?

2017-11-01 Thread James Cheng
Hi, we have a KafkaStreams app. We specify a custom metric reporter by doing: Properties config = new Properties(); config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092"); config.put(StreamsConfig.METRIC_REPORTER_CLASSES_CONFIG, "com.mycompany.MetricRepo

Re: [ANNOUNCE] Apache Kafka 1.0.0 Released

2017-11-01 Thread James Cheng
dy Chambers, > Apurva Mehta, Armin Braun, Attila Kreiner, Balint Molnar, Bart De Vylder, > Ben Stopford, Bharat Viswanadham, Bill Bejeck, Boyang Chen, Bryan Baugher, > Colin P. Mccabe, Koen De Groote, Dale Peakall, Damian Guy, Dana Powers, > Dejan Stojadinović, Derrick Or, Dong Lin, Zhen

Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread James Cheng
Congrats Onur! Well deserved! -James > On Nov 6, 2017, at 9:24 AM, Jun Rao wrote: > > Hi, everyone, > > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur > Karaman. > > Onur's most significant work is the improvement of Kafka controller, which > is the brain of a Kafka

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-11 Thread James Cheng
We saw this as well, when updating from 0.10.1.1 to 0.11.0.1. Have you restarted your brokers since then? Did it take 8h to start up again, or did it take its normal 45 minutes? I don't think it's related to the crash/recovery. Rather, I think it's due to the upgrade from 0.10.1.1 to 0.11.0.1

Re: [ANNOUNCE] New committer: Matthias J. Sax

2018-01-12 Thread James Cheng
Congrats, Matthias!! Well deserved! -James > On Jan 12, 2018, at 2:59 PM, Guozhang Wang wrote: > > Hello everyone, > > The PMC of Apache Kafka is pleased to announce Matthias J. Sax as our > newest Kafka committer. > > Matthias has made tremendous contributions to Kafka Streams API since earl

Re: [VOTE] KIP-247: Add public test utils for Kafka Streams

2018-01-18 Thread James Cheng
+1 (non-binding) -James Sent from my iPhone > On Jan 17, 2018, at 6:09 PM, Matthias J. Sax wrote: > > Hi, > > I would like to start the vote for KIP-247: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams > > > -Matthias >

Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-18 Thread James Cheng
Congrats Rajini! -James Sent from my iPhone > On Jan 17, 2018, at 10:48 AM, Gwen Shapira wrote: > > Dear Kafka Developers, Users and Fans, > > Rajini Sivaram became a committer in April 2017. Since then, she remained > active in the community and contributed major patches, reviews and KIP >

Re: [ANNOUNCE] Apache Kafka 1.0.1 Released

2018-03-06 Thread James Cheng
Congrats, everyone! Thanks for driving the release, Ewen! -James > On Mar 6, 2018, at 1:22 PM, Guozhang Wang wrote: > > Ewen, thanks for driving the release!! > > > Guozhang > > On Tue, Mar 6, 2018 at 1:14 PM, Ewen Cheslack-Postava wrote: > >> The Apache Kafka community is pleased to annou

Re: [ANNOUNCE] New Committer: Dong Lin

2018-03-28 Thread James Cheng
Congrats, Dong! -James > On Mar 28, 2018, at 10:58 AM, Becket Qin wrote: > > Hello everyone, > > The PMC of Apache Kafka is pleased to announce that Dong Lin has accepted > our invitation to be a new Kafka committer. > > Dong started working on Kafka about four years ago, since which he has >

Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread James Cheng
Thanks Damian and Rajini for running the release! Congrats and good job everyone! -James Sent from my iPhone > On Mar 29, 2018, at 2:27 AM, Rajini Sivaram wrote: > > The Apache Kafka community is pleased to announce the release for > > Apache Kafka 1.1.0. > > > Kafka 1.1.0 includes a numbe

Re: Multi-threaded consumer?

2016-03-22 Thread James Cheng
Here's a good introductory blog post on the 0.9.0 consumer: http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client It shows the basics of using the consumer, as well as a section where they launch 3 threads, each with one consumer, to consume a single

Re: kafka 0.9.0.1: FATAL exception on startup

2016-03-22 Thread James Cheng
Hi, we ran into this problem too. The only way we were able to bypass this was by stopping Kafka and deleting the log directory of the affected partition. Which means, we lost data for that partition on this broker. -James > On Mar 8, 2016, at 1:07 AM, Anatoly Deyneka wrote: > > Hi, > > I need

Re: Reg. Partition Rebalancing

2016-03-29 Thread James Cheng
> On Mar 29, 2016, at 10:33 AM, Todd Palino wrote: > > There’s two things that people usually mean when they talk about > rebalancing. > > One is leader reelection, or preferred replica election, which is sometimes > confusingly referred to as “leader rebalance”. This is when we ask the > control

Re: Consumers disappearing form __consumer_offsets

2016-04-11 Thread James Cheng
This may be related to offsets.retention.minutes. offsets.retention.minutes Log retention window in minutes for offsets topic It defaults to 1440 minutes = 24 hours. -James > On Apr 11, 2016, at 1:36 PM, Morellato, Wanny > wrote: > > Hi, > > I am trying to figure out why some of my consumers

Re: unknown (kafka) offsets after restart

2016-05-06 Thread James Cheng
Is the log compaction thread correctly working? The offsets are stored in a log compacted topic, and we have seen issues where the log cleaner thread dies and therefore the offsets topic just grows forever, which means it will take a long time to read in the topic. You can look in the log-clean

Do consumer offsets stored in zookeeper ever get cleaned up?

2016-05-19 Thread James Cheng
I know that when offsets get stored in Kafka, they get cleaned up based on the offsets.retention.minutes config setting. This happens when using the new consumer, or when using the old consumer but offsets.storage=kafka. If using the old consumer where offsets are stored in Zookeeper, do old off

Will segments on no-traffic topics get deleted/compacted?

2016-05-19 Thread James Cheng
Time-based log retention only happens on old log segments. And log compaction only happens on old segments as well. Currently, I believe segments only roll whenever a new record is written to the log. That is, during the write of the new record is when the current segment is evaluated to see if

Re: Will segments on no-traffic topics get deleted/compacted?

2016-05-24 Thread James Cheng
gt; Tom Crayford > Heroku Kafka > > On Fri, May 20, 2016 at 12:49 AM, James Cheng wrote: > >> Time-based log retention only happens on old log segments. And log >> compaction only happens on old segments as well. >> >> Currently, I believe segments only roll whene

Re: 10MB message

2016-06-15 Thread James Cheng
Igor, This article talks about what to think about if putting large messages into Kafka: http://ingest.tips/2015/01/21/handling-large-messages-kafka/ The summary is that Kafka is not optimized for handling large messages, but if you really want to, it's possible to do it. That website is havin

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-06-26 Thread James Cheng
Peter, can you add some of your observations to those JIRAs? You seem to have a good understanding of the problem. Maybe there is something that can be improved in the codebase to prevent this from happening, or reduce the impact of it. Wanny, you might want to add a "me too" to the JIRAs as we

Re: kafka + autoscaling groups fuckery

2016-07-03 Thread James Cheng
Charity, I'm not sure about the specific problem you are having, but about Kafka on AWS, Netflix did a talk at a meetup about their Kafka installation on AWS. There might be some useful information in there. There is a video stream as well as slides, and maybe you can get in touch with the spea

Re: Read all record from a Topic.

2016-07-13 Thread James Cheng
Jean-Baptiste, I wrote a blog post recently on this exact subject. https://logallthethings.com/2016/06/28/how-to-read-to-the-end-of-a-kafka-topic/ Let me know if you find it useful. -James Sent from my iPhone > On Jul 13, 2016, at 7:16 AM, g...@netcourrier.com wrote: > > Hi, > > > I'm usin

Re: [ANNOUNCE] New committer: Jiangjie (Becket) Qin

2016-10-31 Thread James Cheng
Congrats, Becket! -James > On Oct 31, 2016, at 10:35 AM, Joel Koshy wrote: > > The PMC for Apache Kafka has invited Jiangjie (Becket) Qin to join as a > committer and we are pleased to announce that he has accepted! > > Becket has made significant contributions to Kafka over the last two years

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread James Cheng
Congrats and great job, everyone! Thanks Rajini for driving the release! -James Sent from my iPhone > On Jul 30, 2018, at 3:25 AM, Rajini Sivaram wrote: > > The Apache Kafka community is pleased to announce the release for > > Apache Kafka 2.0.0. > > > > > > This is a major release and i

  1   2   >