[jira] [Created] (KAFKA-13965) Document broker-side socket-server-metrics

2022-06-07 Thread James Cheng (Jira)
James Cheng created KAFKA-13965:
---

 Summary: Document broker-side socket-server-metrics
 Key: KAFKA-13965
 URL: https://issues.apache.org/jira/browse/KAFKA-13965
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.2.0
Reporter: James Cheng


There are a bunch of broker JMX metrics in the "socket-server-metrics" space 
that are not documented on kafka.apache.org/documentation

 
 * {_}MBean{_}: 
kafka.server:{{{}type=socket-server-metrics,listener=,networkProcessor={}}}
 ** From KIP-188: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+Add+new+metrics+to+support+health+checks]
 *  
kafka.server:type=socket-server-metrics,name=connection-accept-rate,listener=\{listenerName}
 ** From KIP-612: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers]

It would be helpful to get all the socket-server-metrics documented

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Are timestamps available for records stored in Kafka Streams state stores?

2022-05-19 Thread James Cheng
Thanks Guozhang! Based on your comment, I searched through the repo and found 
the associated pull requests and JIRAs.

It looks like most of the support was added in 
https://issues.apache.org/jira/browse/KAFKA-6521 
<https://issues.apache.org/jira/browse/KAFKA-6521>

Can you add that to the KIP page for KIP-258? It would make it easier for other 
people to find when/where the timestamp support was added.

Thanks!
-James

> On May 19, 2022, at 1:24 PM, Guozhang Wang  wrote:
> 
> Hi James,
> 
> For kv / time-window stores, they have been geared with timestamps and you
> can access via the TimestampedKeyValueStore/TimstampedWindowStore.
> 
> What's not implemented yet are timestamped session stores.
> 
> Guozhang
> 
> On Thu, May 19, 2022 at 12:49 PM James Cheng  wrote:
> 
>> Hi,
>> 
>> I'm trying to see if timestamps are available for records that are stored
>> in Kafka Streams state stores.
>> 
>> I saw "KIP-258: Allow to Store Record Timestamps in RocksDB"
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-258:+Allow+to+Store+Record+Timestamps+in+RocksDB
>>> 
>> 
>> But I am not sure if it has been fully implemented. The vote thread went
>> through, but it looks like the implementation is still in progress.
>> https://issues.apache.org/jira/browse/KAFKA-8382 <
>> https://issues.apache.org/jira/browse/KAFKA-8382>
>> 
>> The KIP page says "2.3.0 (partially implemented, inactive)"
>> 
>> Can someone share what the current thoughts are, around this KIP?
>> 
>> Thanks!
>> 
>> -James
> 
> 
> 
> -- 
> -- Guozhang



Are timestamps available for records stored in Kafka Streams state stores?

2022-05-19 Thread James Cheng
Hi,

I'm trying to see if timestamps are available for records that are stored in 
Kafka Streams state stores. 

I saw "KIP-258: Allow to Store Record Timestamps in RocksDB"
https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB
 


But I am not sure if it has been fully implemented. The vote thread went 
through, but it looks like the implementation is still in progress.
https://issues.apache.org/jira/browse/KAFKA-8382 


The KIP page says "2.3.0 (partially implemented, inactive)"

Can someone share what the current thoughts are, around this KIP?

Thanks!

-James

Re: [ANNOUNCE] Apache Kafka 3.2.0

2022-05-19 Thread James Cheng
Bruno,

Congrats on the release!

There is a small typo on the page.
> KIP-791 
> 
>  adds method recordMetada() to the StateStoreContext,

Should be
> KIP-791 
> 
>  adds method recordMetadata() to the StateStoreContext,

I know that the page has already been published, but should we fix that typo?

Thanks!
-James


> On May 17, 2022, at 9:01 AM, Bruno Cadonna  wrote:
> 
> The Apache Kafka community is pleased to announce the release for Apache 
> Kafka 3.2.0
> 
> * log4j 1.x is replaced with reload4j (KAFKA-9366)
> * StandardAuthorizer for KRaft (KIP-801)
> * Send a hint to the partition leader to recover the partition (KIP-704)
> * Top-level error code field in DescribeLogDirsResponse (KIP-784)
> * kafka-console-producer writes headers and null values (KIP-798 and KIP-810)
> * JoinGroupRequest and LeaveGroupRequest have a reason attached (KIP-800)
> * Static membership protocol lets the leader skip assignment (KIP-814)
> * Rack-aware standby task assignment in Kafka Streams (KIP-708)
> * Interactive Query v2 (KIP-796, KIP-805, and KIP-806)
> * Connect APIs list all connector plugins and retrieve their configuration 
> (KIP-769)
> * TimestampConverter SMT supports different unix time precisions (KIP-808)
> * Connect source tasks handle producer exceptions (KIP-779)
> 
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/3.2.0/RELEASE_NOTES.html
> 
> 
> You can download the source and binary release (Scala 2.12 and 2.13) from:
> https://kafka.apache.org/downloads#3.2.0
> 
> ---
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> ** The Producer API allows an application to publish a stream of records to
> one or more Kafka topics.
> 
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
> 
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
> 
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
> 
> ** Building real-time streaming applications that transform or react
> to the streams of data.
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> A big thank you for the following 113 contributors to this release!
> 
> A. Sophie Blee-Goldman, Adam Kotwasinski, Aleksandr Sorokoumov, Alexandre 
> Garnier, Alok Nikhil, aSemy, Bounkong Khamphousone, bozhao12, Bruno Cadonna, 
> Chang, Chia-Ping Tsai, Chris Egerton, Colin P. Mccabe, Colin Patrick McCabe, 
> Cong Ding, David Arthur, David Jacot, David Mao, defhacks, dengziming, Ed B, 
> Edwin, florin-akermann, GauthamM-official, GuoPhilipse, Guozhang Wang, Hao 
> Li, Haoze Wu, Idan Kamara, Ismael Juma, Jason Gustafson, Jason Koch, Jeff 
> Kim, jiangyuan, Joel Hamill, John Roesler, Jonathan Albrecht, Jorge Esteban 
> Quilcate Otoya, Josep Prat, Joseph (Ting-Chou) Lin, José Armando García 
> Sancio, Jules Ivanic, Julien Chanaud, Justin Lee, Justine Olshan, Kamal 
> Chandraprakash, Kate Stanley, keashem, Kirk True, Knowles Atchison, Jr, 
> Konstantine Karantasis, Kowshik Prakasam, kurtostfeld, Kvicii, Lee Dongjin, 
> Levani Kokhreidze, lhunyady, Liam Clarke-Hutchinson, liym, loboya~, Lucas 
> Bradstreet, Ludovic DEHON, Luizfrf3, Luke Chen, Marc Löhe, Matthew Wong, 
> Matthias J. Sax, Michal T, Mickael Maison, Mike Lothian, mkandaswamy, Márton 
> Sigmond, Nick Telford, Niket, Okada Haruki, Paolo Patierno, Patrick Stuedi, 
> Philip Nee, Prateek Agarwal, prince-mahajan, Rajini Sivaram, Randall Hauch, 
> Richard, RivenSun, Rob Leland, Ron Dagostino, Sayantanu Dey, Stanislav 
> Vodetskyi, sunshujie1990, Tamara Skokova, Tim Patterson, Tolga H. Dur, Tom 
> Bentley, Tomonari Yamashita, vamossagar12, Vicky Papavasileiou, Victoria Xia, 
> Vijay Krishna, Vincent Jiang, Walker Carlson, wangyap, Wenhao Ji, Wenjun 
> Ruan, Xiaobing Fang, Xiaoyue Xue, xuexiaoyue, Yang Yu, yasar03, Yu, Zhang 
> Hongyi, zzccctv, 工业废水, 彭小漪
> 
> We welcome your help and feedback. For more 

Re: [VOTE] KIP-831: Add metric for log recovery progress

2022-05-17 Thread James Cheng
+1 (non-binding)

-James

Sent from my iPhone

> On May 16, 2022, at 12:12 AM, Luke Chen  wrote:
> 
> Hi all,
> 
> I'd like to start a vote on KIP to expose metrics for log recovery
> progress. These metrics would let the admins have a way to monitor the log
> recovery progress.
> 
> Details can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> 
> Any feedback is appreciated.
> 
> Thank you.
> Luke


Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-05-11 Thread James Cheng
Hi Luke,

Thanks for the detailed explanation. I agree that the current proposal of 
RemainingLogs and RemainingSegments will greatly improve the situation, and 
that we can go ahead with the KIP as is.

If RemainingBytes were straight-forward to implement, then I’d like to have it. 
But we can live without it for now. And if people start using RemainingLogs and 
RemainingSegments and then REALLY FEEL like they need RemainingBytes, then we 
can always add it in the future.

Thanks Luke, for the detailed explanation, and for responding to my feedback!

-James

Sent from my iPhone

> On May 10, 2022, at 6:48 AM, Luke Chen  wrote:
> 
> Hi James and all,
> 
> I checked again and I can see when creating UnifiedLog, we expected the
> logs/indexes/snapshots are in good state.
> So, I don't think we should break the current design to expose the
> `RemainingBytesToRecovery`
> metric.
> 
> If there is no other comments, I'll start a vote within this week.
> 
> Thank you.
> Luke
> 
>> On Fri, May 6, 2022 at 6:00 PM Luke Chen  wrote:
>> 
>> Hi James,
>> 
>> Thanks for your input.
>> 
>> For the `RemainingBytesToRecovery` metric proposal, I think there's one
>> thing I didn't make it clear.
>> Currently, when log manager start up, we'll try to load all logs
>> (segments), and during the log loading, we'll try to recover logs if
>> necessary.
>> And the logs loading is using "thread pool" as you thought.
>> 
>> So, here's the problem:
>> All segments in each log folder (partition) will be loaded in each log
>> recovery thread, and until it's loaded, we can know how many segments (or
>> how many Bytes) needed to recover.
>> That means, if we have 10 partition logs in one broker, and we have 2 log
>> recovery threads (num.recovery.threads.per.data.dir=2), before the
>> threads load the segments in each log, we only know how many logs
>> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
>> We cannot know how many segments/Bytes needed to recover until each thread
>> starts to load the segments under one log (partition).
>> 
>> So, the example in the KIP, it shows:
>> Currently, there are still 5 logs (partitions) needed to recover under
>> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has
>> 1 segments needed to recover, and the other one has 3 segments needed
>> to recover.
>> 
>>   - kafka.log
>>  - LogManager
>> - RemainingLogsToRecover
>>- /tmp/log1 => 5← there are 5 logs under
>>/tmp/log1 needed to be recovered
>>- /tmp/log2 => 0
>> - RemainingSegmentsToRecover
>>- /tmp/log1 ← 2 threads are doing log
>>recovery for /tmp/log1
>>- 0 => 1 ← there are 1 segments needed to be
>>   recovered for thread 0
>>   - 1 => 3
>>   - /tmp/log2
>>   - 0 => 0
>>   - 1 => 0
>> 
>> 
>> So, after a while, the metrics might look like this:
>> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
>> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
>> (which should imply the thread already completed 2 logs recovery in the
>> period)
>> 
>>   - kafka.log
>>  - LogManager
>> - RemainingLogsToRecover
>>- /tmp/log1 => 3← there are 3 logs under
>>/tmp/log1 needed to be recovered
>>- /tmp/log2 => 0
>> - RemainingSegmentsToRecover
>>- /tmp/log1 ← 2 threads are doing log
>>recovery for /tmp/log1
>>- 0 => 9000 ← there are 9000 segments needed to be
>>   recovered for thread 0
>>   - 1 => 5
>>       - /tmp/log2
>>   - 0 => 0
>>   - 1 => 0
>> 
>> 
>> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
>> as you expected. I think the current proposal with `RemainingLogsToRecover`
>> and `RemainingSegmentsToRecover` should already provide enough info for
>> the log recovery progress.
>> 
>> I've also updated the KIP example to make it clear.
>> 
>> 
>> Thank you.
>> Luke
>> 
>> 
>>> On Thu, May 5, 2022 at 3:31 AM James Cheng  wrote:
>>> 
>>> Hi Luke,
>>> 
>>> Thanks for adding RemainingSegmentsToRecovery.
>>> 
>>> Ano

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-05-04 Thread James Cheng
Hi Luke,

Thanks for adding RemainingSegmentsToRecovery.

Another thought: different topics can have different segment sizes. I don't 
know how common it is, but it is possible. Some topics might want small segment 
sizes to more granular expiration of data.

The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery is that 
the rate that they will decrement depends on the configuration and patterns of 
the topics and partitions and segment sizes. If someone is monitoring those 
metrics, they might see times where the metric decrements slowly, followed by a 
burst where it decrements quickly.

What about RemainingBytesToRecovery? This would not depend on the configuration 
of the topic or of the data. It would actually be a pretty good metric, because 
I think that this metric would change at a constant rate (based on the disk I/O 
speed that the broker allocates to recovery). Because it changes at a constant 
rate, you would be able to use the rate-of-change to predict when it hits zero, 
which will let you know when the broker is going to start up. Like, I would 
imagine if we graphed RemainingBytesToRecovery that we'd see a fairly straight 
line that is decrementing at a steady rate towards zero.

What do you think about adding RemainingBytesToRecovery? 

Or, what would you think about making the primary metric be 
RemainingBytesToRecovery, and getting rid of the others?

I don't know if I personally would rather have all 3 metrics, or would just use 
RemainingBytesToRecovery. I'd too would like more community input on which of 
those metrics would be useful to people.

About the JMX metrics, you said that if num.recovery.threads.per.data.dir=2, 
that there might be a separate RemainingSegmentsToRecovery counter for each 
thread. Is that actually how the data is structured within the Kafka recovery 
threads? Does each thread get a fixed set of partitions, or is there just one 
big pool of partitions that the threads all work on?

As a more concrete example:
* If I have 9 small partitions and 1 big partition, and 
num.recovery.threads.per.data.dir=2
Does each thread get 5 partitions, which means one thread will finish much 
sooner than the other?
OR
Do both threads just work on the set of 10 partitions, which means likely 1 
thread will be busy with the big partition, while the other one ends up plowing 
through the 9 small partitions?

If each thread gets assigned 5 partitions, then it would make sense that each 
thread has its own counter.
If the threads works on a single pool of 10 partitions, then it would probably 
mean that the counter is on the pool of partitions itself, and not on each 
thread.

-James

> On May 4, 2022, at 5:55 AM, Luke Chen  wrote:
> 
> Hi devs,
> 
> If there are no other comments, I'll start a vote tomorrow.
> 
> Thank you.
> Luke
> 
> On Sun, May 1, 2022 at 5:08 PM Luke Chen  wrote:
> 
>> Hi James,
>> 
>> Sorry for the late reply.
>> 
>> Yes, this is a good point, to know how many segments to be recovered if
>> there are some large partitions.
>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric for
>> each log recovery thread, to show the value.
>> The example in the Proposed section here
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges>
>> shows what it will look like.
>> 
>> Thanks for the suggestion.
>> 
>> Thank you.
>> Luke
>> 
>> 
>> 
>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng  wrote:
>> 
>>> The KIP describes RemainingLogsToRecovery, which seems to be the number
>>> of partitions in each log.dir.
>>> 
>>> We have some partitions which are much much larger than others. Those
>>> large partitions have many many more segments than others.
>>> 
>>> Is there a way the metric can reflect partition size? Could it be
>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>>> 
>>> -James
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen  wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I'd like to propose a KIP to expose a metric for log recovery progress.
>>>> This metric would let the admins have a way to monitor the log recovery
>>>> progress.
>>>> Details can be found here:
>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>>>> 
>>>> Any feedback is appreciated.
>>>> 
>>>> Thank you.
>>>> Luke
>>> 
>> 



Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-04-22 Thread James Cheng
The KIP describes RemainingLogsToRecovery, which seems to be the number of 
partitions in each log.dir. 

We have some partitions which are much much larger than others. Those large 
partitions have many many more segments than others. 

Is there a way the metric can reflect partition size? Could it be 
RemainingSegmentsToRecover? Or even RemainingBytesToRecover?

-James

Sent from my iPhone

> On Apr 20, 2022, at 2:01 AM, Luke Chen  wrote:
> 
> Hi all,
> 
> I'd like to propose a KIP to expose a metric for log recovery progress.
> This metric would let the admins have a way to monitor the log recovery
> progress.
> Details can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> 
> Any feedback is appreciated.
> 
> Thank you.
> Luke


Re: [DISCUSS] KIP-729 Custom validation of records on the broker prior to log append

2021-07-06 Thread James Cheng
One use case we would like is to require that producers are sending compressed 
messages. Would this KIP (or KIP-686) allow the broker to detect that? From 
looking at both KIPs, it doesn't look it would help with my particular use 
case. Both of the KIPs are at the Record-level.

Thanks,
-James

> On Jun 30, 2021, at 10:05 AM, Soumyajit Sahu  wrote:
> 
> Hi Nikolay,
> Great to hear that. I'm ok with either one too.
> I had missed noticing the KIP-686. Thanks for bringing it up.
> 
> I have tried to keep this one simple, but hope it can cover all our
> enterprise needs.
> 
> Should we put this one for vote?
> 
> Regards,
> Soumyajit
> 
> 
> On Wed, Jun 30, 2021, 8:50 AM Nikolay Izhikov  wrote:
> 
>> Team, If we have support from committers for API to check records on the
>> broker side let’s choose one KIP to go with and move forward to vote and
>> implementation?
>> I’m ready to drive implementation of this API.
>> 
>> I’m ready to drive the implementation of this API.
>> It seems very useful to me.
>> 
>>> 30 июня 2021 г., в 18:04, Nikolay Izhikov 
>> написал(а):
>>> 
>>> Hello.
>>> 
>>> I had a very similar proposal [1].
>>> So, yes, I think we should have one implementation of API in the product.
>>> 
>>> [1]
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-686%3A+API+to+ensure+Records+policy+on+the+broker
>>> 
 30 июня 2021 г., в 17:57, Christopher Shannon <
>> christopher.l.shan...@gmail.com> написал(а):
 
 I would find this feature very useful as well as adding custom
>> validation
 to incoming records would be nice to prevent bad data from making it to
>> the
 topic.
 
 On Wed, Apr 7, 2021 at 7:03 PM Soumyajit Sahu >> 
 wrote:
 
> Thanks Colin! Good call on the ApiRecordError. We could use
> InvalidRecordException instead, and have the broker convert it
> to ApiRecordError.
> Modified signature below.
> 
> interface BrokerRecordValidator {
> /**
>  * Validate the record for a given topic-partition.
>  */
>  Optional validateRecord(TopicPartition
> topicPartition, ByteBuffer key, ByteBuffer value, Header[] headers);
> }
> 
> On Tue, Apr 6, 2021 at 5:09 PM Colin McCabe 
>> wrote:
> 
>> Hi Soumyajit,
>> 
>> The difficult thing is deciding which fields to share and how to share
>> them.  Key and value are probably the minimum we need to make this
> useful.
>> If we do choose to go with byte buffer, it is not necessary to also
>> pass
>> the size, since ByteBuffer maintains that internally.
>> 
>> ApiRecordError is also an internal class, so it can't be used in a
>> public
>> API.  I think most likely if we were going to do this, we would just
> catch
>> an exception and use the exception text as the validation error.
>> 
>> best,
>> Colin
>> 
>> 
>> On Tue, Apr 6, 2021, at 15:57, Soumyajit Sahu wrote:
>>> Hi Tom,
>>> 
>>> Makes sense. Thanks for the explanation. I get what Colin had meant
>> earlier.
>>> 
>>> Would a different signature for the interface work? Example below,
>> but
>>> please feel free to suggest alternatives if there are any
>> possibilities
>> of
>>> such.
>>> 
>>> If needed, then deprecating this and introducing a new signature
>> would
> be
>>> straight-forward as both (old and new) calls could be made serially
>> in
>> the
>>> LogValidator allowing a coexistence for a transition period.
>>> 
>>> interface BrokerRecordValidator {
>>>  /**
>>>   * Validate the record for a given topic-partition.
>>>   */
>>>  Optional validateRecord(TopicPartition
>> topicPartition,
>>> int keySize, ByteBuffer key, int valueSize, ByteBuffer value,
>> Header[]
>>> headers);
>>> }
>>> 
>>> 
>>> On Tue, Apr 6, 2021 at 12:54 AM Tom Bentley 
> wrote:
>>> 
 Hi Soumyajit,
 
 Although that class does indeed have public access at the Java
>> level,
>> it
 does so only because it needs to be used by internal Kafka code
>> which
>> lives
 in other packages (there isn't any more restrictive access modifier
>> which
 would work). What the project considers public Java API is
>> determined
>> by
 what's included in the published Javadocs:
 https://kafka.apache.org/27/javadoc/index.html, which doesn't
> include
>> the
 org.apache.kafka.common.record package.
 
 One of the problems with making these internal classes public is it
>> ties
 the project into supporting them as APIs, which can make changing
> them
>> much
 harder and in the long run that can slow, or even prevent,
>> innovation
>> in
 the rest of Kafka.
 
 Kind regards,
 
 Tom
 
 
 
 On Sun, Apr 4, 2021 at 7:31 PM Soumyajit Sahu <
>> 

Re: [DISCUSS] KIP-760: Increase minimum value of segment.ms and segment.bytes

2021-07-06 Thread James Cheng
Badai,

Thanks for the KIP.

We sometimes want to force compaction on a topic. This might be because there 
is a bad record in the topic, and we want to force it to get deleted. The way 
we do this is, we set segment.ms to a small value and write a record, in order 
to force a segment roll. And we also set min.cleanable.dirty.ratio=0, in order 
to trigger compaction. It's rare that we need to do it, but it happens 
sometimes. This change would make it more difficult to do that. With this KIP, 
we would have to write up to 1MB of data before causing the segment roll, or 
wait an hour. 

Although come to think of it, if my goal is to trigger compaction, then I can 
just write my tombstone a couple thousand times. So maybe this KIP just makes 
it slightly more tedious, but doesn't make it impossible.

Another use case is when we want to truncate a topic, so we set a small segment 
size and set retention to almost zero, which will allow Kafka to delete what is 
in the topic. For that, though, we could also use kafka-delete-records.sh, so 
this KIP would not have impact on that particular use case.

-James

> On Jul 6, 2021, at 2:23 PM, Badai Aqrandista  
> wrote:
> 
> Hi all
> 
> I have just created KIP-760
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-760%3A+Increase+minimum+value+of+segment.ms+and+segment.bytes).
> 
> I created this KIP because I have seen so many Kafka brokers crash due
> to small segment.ms and/or segment.bytes.
> 
> Please let me know what you think.
> 
> -- 
> Thanks,
> Badai



Re: [ANNOUNCE] Apache Kafka 2.6.1

2021-01-11 Thread James Cheng
Thank you Mickael for running the release. Good job everyone!

-James

Sent from my iPhone

> On Jan 11, 2021, at 2:17 PM, Mickael Maison  wrote:
> 
> The Apache Kafka community is pleased to announce the release for
> Apache Kafka 2.6.1.
> 
> This is a bug fix release and it includes fixes and improvements from
> 41 JIRAs, including a few critical bugs.
> 
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/2.6.1/RELEASE_NOTES.html
> 
> 
> You can download the source and binary release (Scala 2.12 and Scala 2.13) 
> from:
> https://kafka.apache.org/downloads#2.6.1
> 
> 
> ---
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> ** The Producer API allows an application to publish a stream records
> to one or more Kafka topics.
> 
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
> 
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming
> the input streams to output streams.
> 
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
> 
> ** Building real-time streaming applications that transform or react
> to the streams of data.
> 
> 
> Apache Kafka is in use at large and small companies worldwide,
> including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
> Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
> Zalando, among others.
> 
> 
> A big thank you for the following 36 contributors to this release!
> 
> A. Sophie Blee-Goldman, John Roesler, Bruno Cadonna, Rajini Sivaram,
> Guozhang Wang, Matthias J. Sax, Chris Egerton, Mickael Maison, Randall
> Hauch, leah, Luke Chen, Jason Gustafson, Konstantine Karantasis,
> Michael Bingham, Lucas Bradstreet, Andrew Egelhofer, Micah Paul Ramos,
> Nikolay, Nitesh Mor, Alex Diachenko, xakassi, Shaik Zakir Hussain,
> Stanislav Kozlovski, Stanislav Vodetskyi, Thorsten Hake, Tom Bentley,
> Vikas Singh, feyman2016, high.lee, Dima Reznik, Colin Patrick McCabe,
> Edoardo Comar, Jim Galasyn, Chia-Ping Tsai, Justine Olshan, Levani
> Kokhreidze
> 
> 
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
> 
> Thank you!
> 
> Regards,
> Mickael


Re: Can I get a review for a documentation update (KAFKA-10473)?

2020-12-02 Thread James Cheng
Hi,

Can I get a review for this small documentation update? 

Thanks,
-James

> On Oct 9, 2020, at 5:23 PM, James Cheng  wrote:
> 
> Hi,
> 
> Would someone be able to review this pull request for me?
> 
> This is a small documentation change.
> 
> Thanks,
> -James
> 
>> On Sep 24, 2020, at 11:53 PM, James Cheng > <mailto:wushuja...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> Can I get a review from one of the commiters for this documentation update?
>> 
>> I am adding docs for the following JMX metrics:
>>  kafka.log,type=Log,name=Size
>>  kafka.log,type=Log,name=NumLogSegments
>>  kafka.log,type=Log,name=LogStartOffset
>>  kafka.log,type=Log,name=LogEndOffset
>> 
>> 
>> https://issues.apache.org/jira/browse/KAFKA-10473 
>> <https://issues.apache.org/jira/browse/KAFKA-10473>
>> https://github.com/apache/kafka/pull/9276 
>> <https://github.com/apache/kafka/pull/9276>
>> 
>> The pull request page lists lots of failed checks. However, this pull 
>> request only modifies an HTML file, and the test failures don't seem related 
>> to my changes.
>> 
>> Thanks,
>> -James
>> 
> 



[jira] [Created] (KAFKA-10801) Docs on configuration have multiple places using the same HTML anchor tag

2020-12-02 Thread James Cheng (Jira)
James Cheng created KAFKA-10801:
---

 Summary: Docs on configuration have multiple places using the same 
HTML anchor tag
 Key: KAFKA-10801
 URL: https://issues.apache.org/jira/browse/KAFKA-10801
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.5.1, 2.6.0
Reporter: James Cheng


The configuration option "compression.type" is a configuration option on the 
Kafka Producer as well as on the Kafka brokers.

 

The same HTML anchor #compression.type is used on both of those entries. So if 
you click or bookmark the link 
[http://kafka.apache.org/documentation/#compression.type] , it will always 
bring you to the first entry (the broker-side config). It will never bring you 
to the 2nd entry (producer config).

 

I've only noticed this for the compression.type config, but it is possible that 
it also applies to any other config option that is the same between the 
broker/producer/consumer.

 

We should at least fix it for compression.type, and we should possibly fix it 
across the entire document.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-640 Add log compression analysis tool

2020-11-28 Thread James Cheng
Chris,

This (understandably) requires access to the log segment files on disk. Managed 
Kafka services are becoming more popular (Confluent Cloud, Amazon MSK) and they 
do not expose the log segment files on disk. It’d be great to have an 
equivalent functionality that would work on managed services.

But, that would be a completely different implementation, so maybe it’s not 
something you want to handle right now.

-James

Sent from my iPhone

> On Nov 27, 2020, at 10:14 AM, Christopher Beard  wrote:
> 
> Bump. I'd like to gather more feedback on this!
> 
> Chris
> 
>> On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" 
>>  wrote: 
>> Hi everyone,
>> 
>> I would like to start a discussion on KIP-640:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
>> 
>> This KIP outlines a new CLI tool which helps compare how the various 
>> compression types supported by Kafka reduce the size of a log (and therefore 
>> more broadly, of a topic).
>> 
>> I've put together a PR that might help serve as a starting point for 
>> comments and suggestions.
>> [WIP] PR: https://github.com/apache/kafka/pull/9193
>> 
>> Thanks,
>> Chris Beard


Re: [ANNOUNCE] New committer: David Jacot

2020-10-19 Thread James Cheng
Congratulations, David!

-James

> On Oct 16, 2020, at 9:01 AM, Gwen Shapira  wrote:
> 
> The PMC for Apache Kafka has invited David Jacot as a committer, and
> we are excited to say that he accepted!
> 
> David Jacot has been contributing to Apache Kafka since July 2015 (!)
> and has been very active since August 2019. He contributed several
> notable KIPs:
> 
> KIP-511: Collect and Expose Client Name and Version in Brokers
> KIP-559: Make the Kafka Protocol Friendlier with L7 Proxies:
> KIP-570: Add leader epoch in StopReplicaReques
> KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
> KIP-496 Added an API for the deletion of consumer offsets
> 
> In addition, David Jacot reviewed many community contributions and
> showed great technical and architectural taste. Great reviews are hard
> and often thankless work - but this is what makes Kafka a great
> product and helps us grow our community.
> 
> Thanks for all the contributions, David! Looking forward to more
> collaboration in the Apache Kafka community.
> 
> -- 
> Gwen Shapira



Re: [ANNOUNCE] New committer: Chia-Ping Tsai

2020-10-19 Thread James Cheng
Congratulations Chia-Ping!

-James

> On Oct 19, 2020, at 10:24 AM, Guozhang Wang  wrote:
> 
> Hello all,
> 
> I'm happy to announce that Chia-Ping Tsai has accepted his invitation to
> become an Apache Kafka committer.
> 
> Chia-Ping has been contributing to Kafka since March 2018 and has made 74
> commits:
> 
> https://github.com/apache/kafka/commits?author=chia7712
> 
> He's also authored several major improvements, participated in the KIP
> discussion and PR reviews as well. His major feature development includes:
> 
> * KAFKA-9654: Epoch based ReplicaAlterLogDirsThread creation.
> * KAFKA-8334: Spiky offsetCommit latency due to lock contention.
> * KIP-331: Add default implementation to close() and configure() for serde
> * KIP-367: Introduce close(Duration) to Producer and AdminClients
> * KIP-338: Support to exclude the internal topics in kafka-topics.sh command
> 
> In addition, Chia-Ping has demonstrated his great diligence fixing test
> failures, his impressive engineering attitude and taste in fixing tricky
> bugs while keeping simple designs.
> 
> Please join me to congratulate Chia-Ping for all the contributions!
> 
> 
> -- Guozhang



Re: [ANNOUNCE] New committer: A. Sophie Blee-Goldman

2020-10-19 Thread James Cheng
Congratulations, Sophie!

-James

> On Oct 19, 2020, at 9:40 AM, Matthias J. Sax  wrote:
> 
> Hi all,
> 
> I am excited to announce that A. Sophie Blee-Goldman has accepted her
> invitation to become an Apache Kafka committer.
> 
> Sophie is actively contributing to Kafka since Feb 2019 and has
> accumulated 140 commits. She authored 4 KIPs in the lead
> 
> - KIP-453: Add close() method to RocksDBConfigSetter
> - KIP-445: In-memory Session Store
> - KIP-428: Add in-memory window store
> - KIP-613: Add end-to-end latency metrics to Streams
> 
> and helped to implement two critical KIPs, 429 (incremental rebalancing)
> and 441 (smooth auto-scaling; not just implementation but also design).
> 
> In addition, she participates in basically every Kafka Streams related
> KIP discussion, reviewed 142 PRs, and is active on the user mailing list.
> 
> Thanks for all the contributions, Sophie!
> 
> 
> Please join me to congratulate her!
> -Matthias
> 



Re: Can I get a review for a documentation update (KAFKA-10473)?

2020-10-09 Thread James Cheng
Hi,

Would someone be able to review this pull request for me?

This is a small documentation change.

Thanks,
-James

> On Sep 24, 2020, at 11:53 PM, James Cheng  wrote:
> 
> Hi,
> 
> Can I get a review from one of the commiters for this documentation update?
> 
> I am adding docs for the following JMX metrics:
>   kafka.log,type=Log,name=Size
>   kafka.log,type=Log,name=NumLogSegments
>   kafka.log,type=Log,name=LogStartOffset
>   kafka.log,type=Log,name=LogEndOffset
> 
> 
> https://issues.apache.org/jira/browse/KAFKA-10473 
> <https://issues.apache.org/jira/browse/KAFKA-10473>
> https://github.com/apache/kafka/pull/9276 
> <https://github.com/apache/kafka/pull/9276>
> 
> The pull request page lists lots of failed checks. However, this pull request 
> only modifies an HTML file, and the test failures don't seem related to my 
> changes.
> 
> Thanks,
> -James
> 



Can I get a review for a documentation update (KAFKA-10473)?

2020-09-25 Thread James Cheng
Hi,

Can I get a review from one of the commiters for this documentation update?

I am adding docs for the following JMX metrics:
kafka.log,type=Log,name=Size
kafka.log,type=Log,name=NumLogSegments
kafka.log,type=Log,name=LogStartOffset
kafka.log,type=Log,name=LogEndOffset


https://issues.apache.org/jira/browse/KAFKA-10473 

https://github.com/apache/kafka/pull/9276 


The pull request page lists lots of failed checks. However, this pull request 
only modifies an HTML file, and the test failures don't seem related to my 
changes.

Thanks,
-James



Re: How do I place a JIRA in status "Patch Available"? (For KAFKA-10473)

2020-09-11 Thread James Cheng
Thanks John! I can access and edit the wiki now.

And I improved the instructions for how to change the status of a JIRA to 
"Patch Available". 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=59689925=36=37
 
<https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=59689925=36=37>

Thanks for the help,
-James

> On Sep 11, 2020, at 12:53 PM, John Roesler  wrote:
> 
> Hi James,
> 
> Sorry, I overlooked your reply until now. I've granted you
> access.
> 
> Thanks,
> -John
> 
> On Wed, 2020-09-09 at 21:44 -0700, James Cheng wrote:
>> Thanks John. My wiki user ID is wushujames 
>> 
>> -James
>> 
>> Sent from my iPhone
>> 
>>> On Sep 9, 2020, at 7:03 PM, John Roesler  wrote:
>>> 
>>> Hi James,
>>> 
>>> Good, I’m glad my incredibly vague response was helpful!
>>> 
>>> If you let me know your wiki user id, I can grant you edit permission. It’s 
>>> a separate account from Jira. 
>>> 
>>> Thanks,
>>> John
>>> 
>>>> On Wed, Sep 9, 2020, at 20:45, James Cheng wrote:
>>>> Thanks John. That worked.
>>>> 
>>>> I clicked the button that says "Submit Patch", and a dialog box popped 
>>>> up. I didn't fill out anything additional in the dialog, and clicked 
>>>> "Submit Patch" in the dialog.
>>>> 
>>>> The JIRA is now in status "Patch Available"
>>>> 
>>>> I would like to improve the docs at 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
>>>>  
>>>> <https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes>
>>>>  to make this step clearer. It looks like I don't have permissions to edit 
>>>> the page.
>>>> 
>>>> Can someone grant me permissions to edit the page? 
>>>> 
>>>> Or, if that is too difficult, can someone edit the page as follows?
>>>> 
>>>> Change
>>>> 
>>>>   7. Change the status of the JIRA to "Patch Available" if it's ready for 
>>>> review.
>>>> to be
>>>> 
>>>>   7. Change the status of the JIRA to "Patch Available" if it's ready 
>>>> for review. Do this by clicking the "Submit Patch" button in JIRA, and 
>>>> then in the resulting dialog, click "Submit Patch".
>>>> 
>>>> -James
>>>> 
>>>>>> On Sep 9, 2020, at 6:24 PM, John Roesler  wrote:
>>>>> 
>>>>> Hi James,
>>>>> 
>>>>> I think the button on Jira says “Add Patch” or something confusing like 
>>>>> that. 
>>>>> 
>>>>> Thanks,
>>>>> John
>>>>> 
>>>>> 
>>>>> On Wed, Sep 9, 2020, at 17:34, James Cheng wrote:
>>>>>> I have a JIRA that I am working on, and a pull request available for it.
>>>>>> 
>>>>>> [KAFKA-10473] Website is missing docs on JMX metrics for partition 
>>>>>> size-on-disk (kafka.log:type=Log,name=*)
>>>>>> https://issues.apache.org/jira/browse/KAFKA-10473
>>>>>> https://github.com/apache/kafka/pull/9276
>>>>>> 
>>>>>> The "Contributing Code Changes" instructions say to
>>>>>>   7. Change the status of the JIRA to "Patch Available" if it's ready 
>>>>>> for review.
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
>>>>>> 
>>>>>> How do I do that? 
>>>>>> * The title of my pull request starts with KAFKA-10473, so the JIRA 
>>>>>> does have a link to the pull request
>>>>>> * I *was* able to assign it to myself and then say "Start progress" and 
>>>>>> now the status says "In Progress".
>>>>>> * But I can't find how to set it to "Patch Available". In the JIRA 
>>>>>> website, I can't find a field or menu item that lets me change the 
>>>>>> status to "Patch Available" . 
>>>>>> 
>>>>>> Thanks,
>>>>>> -James
>>>>>> 
>>>>>> 
>>>>>> 
> 



Re: How do I place a JIRA in status "Patch Available"? (For KAFKA-10473)

2020-09-09 Thread James Cheng
Thanks John. My wiki user ID is wushujames 

-James

Sent from my iPhone

> On Sep 9, 2020, at 7:03 PM, John Roesler  wrote:
> 
> Hi James,
> 
> Good, I’m glad my incredibly vague response was helpful!
> 
> If you let me know your wiki user id, I can grant you edit permission. It’s a 
> separate account from Jira. 
> 
> Thanks,
> John
> 
>> On Wed, Sep 9, 2020, at 20:45, James Cheng wrote:
>> Thanks John. That worked.
>> 
>> I clicked the button that says "Submit Patch", and a dialog box popped 
>> up. I didn't fill out anything additional in the dialog, and clicked 
>> "Submit Patch" in the dialog.
>> 
>> The JIRA is now in status "Patch Available"
>> 
>> I would like to improve the docs at 
>> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes 
>> <https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes>
>>  to make this step clearer. It looks like I don't have permissions to edit 
>> the page.
>> 
>> Can someone grant me permissions to edit the page? 
>> 
>> Or, if that is too difficult, can someone edit the page as follows?
>> 
>> Change
>> 
>>7. Change the status of the JIRA to "Patch Available" if it's ready for 
>> review.
>> to be
>> 
>>7. Change the status of the JIRA to "Patch Available" if it's ready 
>> for review. Do this by clicking the "Submit Patch" button in JIRA, and 
>> then in the resulting dialog, click "Submit Patch".
>> 
>> -James
>> 
>>>> On Sep 9, 2020, at 6:24 PM, John Roesler  wrote:
>>> 
>>> Hi James,
>>> 
>>> I think the button on Jira says “Add Patch” or something confusing like 
>>> that. 
>>> 
>>> Thanks,
>>> John
>>> 
>>> 
>>> On Wed, Sep 9, 2020, at 17:34, James Cheng wrote:
>>>> I have a JIRA that I am working on, and a pull request available for it.
>>>> 
>>>> [KAFKA-10473] Website is missing docs on JMX metrics for partition 
>>>> size-on-disk (kafka.log:type=Log,name=*)
>>>> https://issues.apache.org/jira/browse/KAFKA-10473
>>>> https://github.com/apache/kafka/pull/9276
>>>> 
>>>> The "Contributing Code Changes" instructions say to
>>>>7. Change the status of the JIRA to "Patch Available" if it's ready for 
>>>> review.
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
>>>> 
>>>> How do I do that? 
>>>> * The title of my pull request starts with KAFKA-10473, so the JIRA 
>>>> does have a link to the pull request
>>>> * I *was* able to assign it to myself and then say "Start progress" and 
>>>> now the status says "In Progress".
>>>> * But I can't find how to set it to "Patch Available". In the JIRA 
>>>> website, I can't find a field or menu item that lets me change the 
>>>> status to "Patch Available" . 
>>>> 
>>>> Thanks,
>>>> -James
>>>> 
>>>> 
>>>> 
>> 
>> 


Re: How do I place a JIRA in status "Patch Available"? (For KAFKA-10473)

2020-09-09 Thread James Cheng
Thanks John. That worked.

I clicked the button that says "Submit Patch", and a dialog box popped up. I 
didn't fill out anything additional in the dialog, and clicked "Submit Patch" 
in the dialog.

The JIRA is now in status "Patch Available"

I would like to improve the docs at 
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes 
<https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes> 
to make this step clearer. It looks like I don't have permissions to edit the 
page.

Can someone grant me permissions to edit the page? 

Or, if that is too difficult, can someone edit the page as follows?

Change

7. Change the status of the JIRA to "Patch Available" if it's ready for 
review.
to be

7. Change the status of the JIRA to "Patch Available" if it's ready for 
review. Do this by clicking the "Submit Patch" button in JIRA, and then in the 
resulting dialog, click "Submit Patch".

-James

> On Sep 9, 2020, at 6:24 PM, John Roesler  wrote:
> 
> Hi James,
> 
> I think the button on Jira says “Add Patch” or something confusing like that. 
> 
> Thanks,
> John
> 
> 
> On Wed, Sep 9, 2020, at 17:34, James Cheng wrote:
>> I have a JIRA that I am working on, and a pull request available for it.
>> 
>> [KAFKA-10473] Website is missing docs on JMX metrics for partition 
>> size-on-disk (kafka.log:type=Log,name=*)
>> https://issues.apache.org/jira/browse/KAFKA-10473
>> https://github.com/apache/kafka/pull/9276
>> 
>> The "Contributing Code Changes" instructions say to
>>  7. Change the status of the JIRA to "Patch Available" if it's ready for 
>> review.
>> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
>> 
>> How do I do that? 
>> * The title of my pull request starts with KAFKA-10473, so the JIRA 
>> does have a link to the pull request
>> * I *was* able to assign it to myself and then say "Start progress" and 
>> now the status says "In Progress".
>> * But I can't find how to set it to "Patch Available". In the JIRA 
>> website, I can't find a field or menu item that lets me change the 
>> status to "Patch Available" . 
>> 
>> Thanks,
>> -James
>> 
>> 
>> 



How do I place a JIRA in status "Patch Available"? (For KAFKA-10473)

2020-09-09 Thread James Cheng
I have a JIRA that I am working on, and a pull request available for it.

[KAFKA-10473] Website is missing docs on JMX metrics for partition size-on-disk 
(kafka.log:type=Log,name=*)
https://issues.apache.org/jira/browse/KAFKA-10473
https://github.com/apache/kafka/pull/9276

The "Contributing Code Changes" instructions say to
7. Change the status of the JIRA to "Patch Available" if it's ready for 
review.
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes

How do I do that? 
* The title of my pull request starts with KAFKA-10473, so the JIRA does have a 
link to the pull request
* I *was* able to assign it to myself and then say "Start progress" and now the 
status says "In Progress".
* But I can't find how to set it to "Patch Available". In the JIRA website, I 
can't find a field or menu item that lets me change the status to "Patch 
Available" . 

Thanks,
-James




[jira] [Created] (KAFKA-10473) Website is missing docs on JMX metrics for partition size-on-disk

2020-09-09 Thread James Cheng (Jira)
James Cheng created KAFKA-10473:
---

 Summary: Website is missing docs on JMX metrics for partition 
size-on-disk
 Key: KAFKA-10473
 URL: https://issues.apache.org/jira/browse/KAFKA-10473
 Project: Kafka
  Issue Type: Improvement
  Components: docs, documentation
Affects Versions: 2.5.1, 2.6.0
Reporter: James Cheng


The website is missing docs on the following JMX metrics:

kafka.log,type=Log,name=Size

kafka.log,type=Log,name=NumLogSegments

kafka.log,type=Log,name=LogStartOffset

kafka.log,type=Log,name=LogEndOffset

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-579: new exception on min.insync.replicas > replication.factor

2020-04-04 Thread James Cheng



> On Apr 2, 2020, at 4:27 AM, Paolo Moriello  wrote:
> 
> Hi,
> 
> Thanks a lot for your feedback, I really appreciate your help on this.
> 
> Given what you suggested, I will take some time to update the kip with a
> proposal to make invalid configuration requests FAIL. This involves
> checking multiple code paths, as James was saying, to eg.: validate topic
> creation, topic-configuration changes, partition reassignment and broker
> configuration setup.
> 
> Regarding the latter, do you have any suggestion on what's the best thing
> to do? For instance, we know that we can specify min.insync.replicas at
> cluster level. At the same time, we can also specify
> default.replication.factor. If there is an inconsistency with this setup,
> do we want to make kafka fail at startup or allow the users to overwrite it
> at a later point? (I believe we should be strict here and fail at startup).
> A similar question applies to offsets.topic.replication.factor.
> 

Paolo,

I haven't had a chance to think of it deeply, but your idea of having kafka 
fail at startup makes sense to me. I'd like one of the committers to chime in 
about that idea, too.

Some broker default settings can also be set dynamically during runtime, so you 
will also have to catch/reject those.

-James

> Thanks,
> Paolo
> 
> On Wed, 1 Apr 2020 at 05:29, James Cheng  wrote:
> 
>> I agree that we should prevent the creation of such a topic configuration.
>> That would mean catching it at topic-creation time, as well as catching it
>> on any topic-configuration changes that might make min.isr > replication
>> factor.
>> 
>> Not sure how we would detect things if someone changed the broker-default
>> configuration. That could be tricky.
>> 
>> Btw, I was the person who filed the original JIRA and as Mickael guessed,
>> it was done by mistake.
>> 
>> -James
>> 
>>> On Mar 31, 2020, at 9:30 AM, Ismael Juma  wrote:
>>> 
>>> Hi Paolo,
>>> 
>>> Thanks for the KIP. Why would one want to set min.isr to be higher than
>>> replication factor even in that case? Mickael's suggestion seems better
>> to
>>> me.
>>> 
>>> Ismael
>>> 
>>> On Fri, Mar 13, 2020 at 10:28 AM Paolo Moriello <
>> paolomoriell...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Mickael,
>>>> 
>>>> Thanks for your interest in this. The main motivation to NOT make topic
>>>> creation fail when this mismatch happens is because at the moment it is
>>>> possible to produce/consume on topics if acks is not set to all. I'm not
>>>> sure we want to disable this behavior (as we would by failing at topic
>>>> creation). That's why I decided to go for a softer approach, which at
>> least
>>>> gives some more clarity to the users and avoids other issues mentioned
>> in
>>>> the KIP.
>>>> 
>>>> Let's see what others think!
>>>> 
>>>> On Fri, 13 Mar 2020 at 17:16, Mickael Maison 
>>>> wrote:
>>>> 
>>>>> Hi Paolo,
>>>>> 
>>>>> Thanks for looking at this issue. This can indeed be a source of
>>>> confusion.
>>>>> 
>>>>> I'm wondering if we should prevent the creation of topics with
>>>>> min.insync.replicas > replication.factor?
>>>>> You listed that as a rejected alternative because it requires more
>>>>> changes. However, I can't think of any scenarios where a user would
>>>>> want to create such a topic. I'm guessing it's probably always by
>>>>> mistake.
>>>>> 
>>>>> Let's see what other people think but I think it's worth checking what
>>>>> needs to be done if we wanted to prevent topics with bogus configs
>>>>> 
>>>>> On Fri, Mar 13, 2020 at 3:28 PM Paolo Moriello
>>>>>  wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Following this Jira ticket (
>>>>> https://issues.apache.org/jira/browse/KAFKA-4680),
>>>>>> I've created a proposal (
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-579%3A+new+exception+on+min.insync.replicas+%3E+replication.factor
>>>>> )
>>>>>> to add a new exception/error to be used on min.insync.replicas >
>>>>>> replication.factor.
>>>>>> 
>>>>>> The proposal aims to introduce a new exception specific for the
>>>>>> configuration mismatch above to be used when producers requires acks =
>>>>> all.
>>>>>> At the moment we are using NotEnoughReplicaException, which is a
>>>>> retriable
>>>>>> exception and is used to fail on insync replicas < min isr. Plan is to
>>>>> have
>>>>>> a new, non-retriable exception, to separate the two cases.
>>>>>> 
>>>>>> I've also submitted a PR for the change mentioned above:
>>>>>> https://github.com/apache/kafka/pull/8225
>>>>>> 
>>>>>> Please have a look and let me know what you think.
>>>>>> 
>>>>>> Thanks,
>>>>>> Paolo
>>>>> 
>>>> 
>> 
>> 



Re: [DISCUSS] KIP-579: new exception on min.insync.replicas > replication.factor

2020-03-31 Thread James Cheng
I agree that we should prevent the creation of such a topic configuration. That 
would mean catching it at topic-creation time, as well as catching it on any 
topic-configuration changes that might make min.isr > replication factor.

Not sure how we would detect things if someone changed the broker-default 
configuration. That could be tricky.

Btw, I was the person who filed the original JIRA and as Mickael guessed, it 
was done by mistake.

-James

> On Mar 31, 2020, at 9:30 AM, Ismael Juma  wrote:
> 
> Hi Paolo,
> 
> Thanks for the KIP. Why would one want to set min.isr to be higher than
> replication factor even in that case? Mickael's suggestion seems better to
> me.
> 
> Ismael
> 
> On Fri, Mar 13, 2020 at 10:28 AM Paolo Moriello 
> wrote:
> 
>> Hi Mickael,
>> 
>> Thanks for your interest in this. The main motivation to NOT make topic
>> creation fail when this mismatch happens is because at the moment it is
>> possible to produce/consume on topics if acks is not set to all. I'm not
>> sure we want to disable this behavior (as we would by failing at topic
>> creation). That's why I decided to go for a softer approach, which at least
>> gives some more clarity to the users and avoids other issues mentioned in
>> the KIP.
>> 
>> Let's see what others think!
>> 
>> On Fri, 13 Mar 2020 at 17:16, Mickael Maison 
>> wrote:
>> 
>>> Hi Paolo,
>>> 
>>> Thanks for looking at this issue. This can indeed be a source of
>> confusion.
>>> 
>>> I'm wondering if we should prevent the creation of topics with
>>> min.insync.replicas > replication.factor?
>>> You listed that as a rejected alternative because it requires more
>>> changes. However, I can't think of any scenarios where a user would
>>> want to create such a topic. I'm guessing it's probably always by
>>> mistake.
>>> 
>>> Let's see what other people think but I think it's worth checking what
>>> needs to be done if we wanted to prevent topics with bogus configs
>>> 
>>> On Fri, Mar 13, 2020 at 3:28 PM Paolo Moriello
>>>  wrote:
 
 Hi,
 
 Following this Jira ticket (
>>> https://issues.apache.org/jira/browse/KAFKA-4680),
 I've created a proposal (
 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-579%3A+new+exception+on+min.insync.replicas+%3E+replication.factor
>>> )
 to add a new exception/error to be used on min.insync.replicas >
 replication.factor.
 
 The proposal aims to introduce a new exception specific for the
 configuration mismatch above to be used when producers requires acks =
>>> all.
 At the moment we are using NotEnoughReplicaException, which is a
>>> retriable
 exception and is used to fail on insync replicas < min isr. Plan is to
>>> have
 a new, non-retriable exception, to separate the two cases.
 
 I've also submitted a PR for the change mentioned above:
 https://github.com/apache/kafka/pull/8225
 
 Please have a look and let me know what you think.
 
 Thanks,
 Paolo
>>> 
>> 



[jira] [Created] (KAFKA-9697) ControlPlaneNetworkProcessorAvgIdlePercent is always NaN

2020-03-10 Thread James Cheng (Jira)
James Cheng created KAFKA-9697:
--

 Summary: ControlPlaneNetworkProcessorAvgIdlePercent is always NaN
 Key: KAFKA-9697
 URL: https://issues.apache.org/jira/browse/KAFKA-9697
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.3.0
Reporter: James Cheng


I have a broker running Kafka 2.3.0. The value of 
kafka.network:type=SocketServer,name=ControlPlaneNetworkProcessorAvgIdlePercent 
is always "NaN".

Is that normal, or is there a problem with the metric?

I am running Kafka 2.3.0. I have not checked this in newer/older versions.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9696) Document the control plane metrics that were added in KIP-402

2020-03-10 Thread James Cheng (Jira)
James Cheng created KAFKA-9696:
--

 Summary: Document the control plane metrics that were added in 
KIP-402
 Key: KAFKA-9696
 URL: https://issues.apache.org/jira/browse/KAFKA-9696
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0, 2.3.0, 2.2.0
Reporter: James Cheng


KIP-402 (in https://issues.apache.org/jira/browse/KAFKA-7719) added new metrics 
of

 

kafka.network:type=SocketServer,name=ControlPlaneNetworkProcessorAvgIdlePercent

kafka.network:type=SocketServer,name=ControlPlaneExpiredConnectionsKilledCount

 

There is no documentation on these metrics on 
http://kafka.apache.org/documentation/. We should update the documentation to 
describe these new metrics.

 

I'm not 100% familiar with them, but it appears they are measuring the same 
thing as 

kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent

kafka.network:type=SocketServer,name=ExpiredConnectionsKilledCount

 

except for the control plane, instead of the data plane.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-14 Thread James Cheng
Congrats Colin, Vahid, and Manikumar!

-James

> On Jan 14, 2020, at 10:59 AM, Tom Bentley  wrote:
> 
> Congratulations!
> 
> On Tue, Jan 14, 2020 at 6:57 PM Rajini Sivaram 
> wrote:
> 
>> Congratulations Colin, Vahid and Manikumar!
>> 
>> Regards,
>> Rajini
>> 
>> On Tue, Jan 14, 2020 at 6:32 PM Mickael Maison 
>> wrote:
>> 
>>> Congrats Colin, Vahid and Manikumar!
>>> 
>>> On Tue, Jan 14, 2020 at 5:43 PM Ismael Juma  wrote:
 
 Congratulations Colin, Vahid and Manikumar!
 
 Ismael
 
 On Tue, Jan 14, 2020 at 9:30 AM Gwen Shapira 
>> wrote:
 
> Hi everyone,
> 
> I'm happy to announce that Colin McCabe, Vahid Hashemian and
>> Manikumar
> Reddy are now members of Apache Kafka PMC.
> 
> Colin and Manikumar became committers on Sept 2018 and Vahid on Jan
> 2019. They all contributed many patches, code reviews and
>> participated
> in many KIP discussions. We appreciate their contributions and are
> looking forward to many more to come.
> 
> Congrats Colin, Vahid and Manikumar!
> 
> Gwen, on behalf of Apache Kafka PMC
> 
>>> 
>> 



[jira] [Created] (KAFKA-9389) Document how to use kafka-reassign-partitions.sh to change log dirs for a partition

2020-01-08 Thread James Cheng (Jira)
James Cheng created KAFKA-9389:
--

 Summary: Document how to use kafka-reassign-partitions.sh to 
change log dirs for a partition
 Key: KAFKA-9389
 URL: https://issues.apache.org/jira/browse/KAFKA-9389
 Project: Kafka
  Issue Type: Improvement
Reporter: James Cheng


KIP-113 introduced support for moving replicas between log directories. As part 
of it, support was added to kafka-reassign-partitions.sh so that users can move 
replicas between log directories. Specifically, when you call 
"kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json", 
you can specify a "log_dirs" key in the topics-to-move.json file, and 
kafka-reassign-partitions.sh will then move those replicas to those directories.

 

However, when working on that KIP, we didn't update the docs on 
kafka.apache.org to describe how to use the new functionality. We should add 
documentation on that.

 

I haven't used it before, but whoever works on this Jira can probably figure it 
out by experimentation with kafka-reassign-partitions.sh, or by reading KIP-113 
page or the associated JIRAs.
 * 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+directories]
 * KAFKA-5163
 * KAFKA-5694

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8745) DumpLogSegments doesn't show keys, when the message is null

2019-09-18 Thread James Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng resolved KAFKA-8745.

Fix Version/s: 2.4.0
 Reviewer: Guozhang Wang
   Resolution: Fixed

> DumpLogSegments doesn't show keys, when the message is null
> ---
>
> Key: KAFKA-8745
> URL: https://issues.apache.org/jira/browse/KAFKA-8745
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.3.0
>    Reporter: James Cheng
>Assignee: James Cheng
>Priority: Major
> Fix For: 2.4.0
>
>
> When DumpLogSegments encounters a message with a message key, but no message 
> value, it doesn't print out the message key.
>  
> {noformat}
> $ ~/kafka_2.11-2.2.0/bin/kafka-run-class.sh kafka.tools.DumpLogSegments 
> --files compacted-0/.log --print-data-log
> Dumping compacted-0/.log
> Starting offset: 0
> baseOffset: 0 lastOffset: 3 count: 4 baseSequence: -1 lastSequence: -1 
> producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: 
> false isControl: false position: 0 CreateTime: 1564696640073 size: 113 magic: 
> 2 compresscodec: NONE crc: 206507478 isvalid: true
> | offset: 2 CreateTime: 1564696640073 keysize: 4 valuesize: -1 sequence: -1 
> headerKeys: []
> | offset: 3 CreateTime: 1564696640073 keysize: 4 valuesize: -1 sequence: -1 
> headerKeys: []
> {noformat}
> It should print out the message key.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: UnderReplicatedPartitions = 0 and UnderMinPartitionIsrCount > 0

2019-08-13 Thread James Cheng
Alexandre,

You are right that this is a problem. There is a JIRA on this from a while 
back. 

https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-4680

I don’t think anyone is currently working on it right now. 

-James

Sent from my iPhone

> On Aug 13, 2019, at 1:17 AM, Alexandre Dupriez  
> wrote:
> 
> Hello all,
> 
> We run into a scenario where we had misconfigured the replication factor
> and the minimum in-sync replicas count in such a way that the replication
> factor (either default or defined at the topic level) is strictly lower
> than the property min.insync.replicas.
> 
> We observed broker metrics reporting UnderReplicatedPartitions = 0 and
> UnderMinPartitionIsrCount > 0, and the topic’s partitions were unavailable
> for producers (with ack=all) and consumers.
> 
> Since it seems to be impossible in this scenario to ever reach the number
> of in-sync replicas, making partitions permanently unavailable, it could be
> worth to prevent this misconfiguration to make its way to the broker, e.g.
> a check could be added when a topic is created to ensure the replication
> factor is greater than or equals to the minimum number of in-sync replicas.
> 
> I may have missed something though. What do you think?
> 
> Thank you,
> Alexandre


[jira] [Created] (KAFKA-8745) DumpLogSegments doesn't show keys, when the message is null

2019-08-01 Thread James Cheng (JIRA)
James Cheng created KAFKA-8745:
--

 Summary: DumpLogSegments doesn't show keys, when the message is 
null
 Key: KAFKA-8745
 URL: https://issues.apache.org/jira/browse/KAFKA-8745
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.3.0
Reporter: James Cheng


When DumpLogSegments encounters a message with a message key, but no message 
value, it doesn't print out the message key.

 
{noformat}
$ ~/kafka_2.11-2.2.0/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
compacted-0/.log --print-data-log
Dumping compacted-0/.log
Starting offset: 0
baseOffset: 0 lastOffset: 3 count: 4 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false 
isControl: false position: 0 CreateTime: 1564696640073 size: 113 magic: 2 
compresscodec: NONE crc: 206507478 isvalid: true
| offset: 2 CreateTime: 1564696640073 keysize: 4 valuesize: -1 sequence: -1 
headerKeys: []
| offset: 3 CreateTime: 1564696640073 keysize: 4 valuesize: -1 sequence: -1 
headerKeys: []
{noformat}
It should print out the message key.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [ANNOUNCE] New Kafka PMC member: Matthias J. Sax

2019-04-19 Thread James Cheng
Congrats!!

-James

Sent from my iPhone

> On Apr 18, 2019, at 2:35 PM, Guozhang Wang  wrote:
> 
> Hello Everyone,
> 
> I'm glad to announce that Matthias J. Sax is now a member of Kafka PMC.
> 
> Matthias has been a committer since Jan. 2018, and since then he continued
> to be active in the community and made significant contributions the
> project.
> 
> 
> Congratulations to Matthias!
> 
> -- Guozhang


Re: [ANNOUNCE] New Committer: Randall Hauch

2019-02-14 Thread James Cheng
Congrats, Randall! Well deserved!

-James

Sent from my iPhone

> On Feb 14, 2019, at 6:16 PM, Guozhang Wang  wrote:
> 
> Hello all,
> 
> The PMC of Apache Kafka is happy to announce another new committer joining
> the project today: we have invited Randall Hauch as a project committer and
> he has accepted.
> 
> Randall has been participating in the Kafka community for the past 3 years,
> and is well known as the founder of the Debezium project, a popular project
> for database change-capture streams using Kafka (https://debezium.io). More
> recently he has become the main person keeping Kafka Connect moving
> forward, participated in nearly all KIP discussions and QAs on the mailing
> list. He's authored 6 KIPs and authored 50 pull requests and conducted over
> a hundred reviews around Kafka Connect, and has also been evangelizing
> Kafka Connect at several Kafka Summit venues.
> 
> 
> Thank you very much for your contributions to the Connect community Randall
> ! And looking forward to many more :)
> 
> 
> Guozhang, on behalf of the Apache Kafka PMC


Re: [DISCUSS] Kafka 2.2.0 in February 2018

2019-02-14 Thread James Cheng
Matthias, 

You said “Friday 2/14”, but 2/14 is this Thursday. 

-James

Sent from my iPhone

> On Feb 11, 2019, at 2:31 PM, Matthias J. Sax  wrote:
> 
> Hello,
> 
> this is a short reminder, that feature freeze for AK 2.2 release is end
> of this week, Friday 2/14.
> 
> Currently, there are two blocker issues
> 
> - https://issues.apache.org/jira/browse/KAFKA-7909
> - https://issues.apache.org/jira/browse/KAFKA-7481
> 
> and five critical issues
> 
> - https://issues.apache.org/jira/browse/KAFKA-7915
> - https://issues.apache.org/jira/browse/KAFKA-7565
> - https://issues.apache.org/jira/browse/KAFKA-7556
> - https://issues.apache.org/jira/browse/KAFKA-7304
> - https://issues.apache.org/jira/browse/KAFKA-3955
> 
> marked with "fixed version" 2.2. Please let me know, if I missed any
> other blocker/critical issue that is relevant for 2.2 release.
> 
> I will start to move out all other non-closed Jiras out of the release
> after code freeze and check again on the critical issues.
> 
> After code freeze, only blocker issues can be merged to 2.2 branch.
> 
> 
> Thanks a lot!
> 
> -Matthias
> 
>> On 1/19/19 11:09 AM, Matthias J. Sax wrote:
>> Thanks you all!
>> 
>> Added 291, 379, 389, and 420 for tracking.
>> 
>> 
>> -Matthias
>> 
>> 
>>> On 1/19/19 6:32 AM, Dongjin Lee wrote:
>>> Hi Matthias,
>>> 
>>> Thank you for taking the lead. KIP-389[^1] was accepted last week[^2], so
>>> it seems like to be included.
>>> 
>>> Thanks,
>>> Dongjin
>>> 
>>> [^1]:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Introduce+a+configurable+consumer+group+size+limit
>>> [^2]:
>>> https://lists.apache.org/thread.html/53b84cc35c93eddbc67c8d0dd86aedb93050e45016dfe0fc7b82caaa@%3Cdev.kafka.apache.org%3E
>>> 
 On Sat, Jan 19, 2019 at 9:04 PM Alex D  wrote:
 
 KIP-379?
 
> On Fri, 18 Jan 2019, 22:33 Matthias J. Sax  
> Just a quick update on the release.
> 
> 
> We have 22 KIP atm:
> 
> 81, 207, 258, 289, 313, 328, 331, 339, 341, 351, 359, 361, 367, 368,
> 371, 376, 377, 380, 386, 393, 394, 414
> 
> Let me know if I missed any KIP that is targeted for 2.2 release.
> 
> 21 of those KIPs are accepted, and the vote for the last one is open and
> can be closed on time.
> 
> The KIP deadline is Jan 24, so if any late KIPs are coming in, the vote
> must be started latest next Monday Jan 21, to be open for at least 72h
> and to meet the deadline.
> 
> Also keep the feature freeze deadline in mind (31 Jan).
> 
> 
> Besides this, there are 91 open tickets and 41 ticket in progress. I
> will start to go through those tickets soon to see what will make it
> into 2.2 and what we need to defer. If you have any tickets assigned to
> yourself that are target for 2.2 and you know you cannot make it, I
> would appreciate if you could update those ticket yourself to help
> streamlining the release process. Thanks a lot for you support!
> 
> 
> -Matthias
> 
> 
>> On 1/8/19 7:27 PM, Ismael Juma wrote:
>> Thanks for volunteering Matthias! The plan sounds good to me.
>> 
>> Ismael
>> 
>>> On Tue, Jan 8, 2019, 1:07 PM Matthias J. Sax > wrote:
>> 
>>> Hi all,
>>> 
>>> I would like to propose a release plan (with me being release manager)
>>> for the next time-based feature release 2.2.0 in February.
>>> 
>>> The recent Kafka release history can be found at
>>> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
 .
>>> The release plan (with open issues and planned KIPs) for 2.2.0 can be
>>> found at
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827512
>>> .
>>> 
>>> 
>>> Here are the suggested dates for Apache Kafka 2.2.0 release:
>>> 
>>> 1) KIP Freeze: Jan 24, 2019.
>>> 
>>> A KIP must be accepted by this date in order to be considered for this
>>> release)
>>> 
>>> 2) Feature Freeze: Jan 31, 2019
>>> 
>>> Major features merged & working on stabilization, minor features have
>>> PR, release branch cut; anything not in this state will be
 automatically
>>> moved to the next release in JIRA.
>>> 
>>> 3) Code Freeze: Feb 14, 2019
>>> 
>>> The KIP and feature freeze date is about 2-3 weeks from now. Please
 plan
>>> accordingly for the features you want push into Apache Kafka 2.2.0
> release.
>>> 
>>> 4) Release Date: Feb 28, 2019 (tentative)
>>> 
>>> 
>>> -Matthias
> 


[jira] [Created] (KAFKA-7884) Docs for message.format.version and log.message.format.version show invalid (corrupt?) "valid values"

2019-01-29 Thread James Cheng (JIRA)
James Cheng created KAFKA-7884:
--

 Summary: Docs for message.format.version and 
log.message.format.version show invalid (corrupt?) "valid values"
 Key: KAFKA-7884
 URL: https://issues.apache.org/jira/browse/KAFKA-7884
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Reporter: James Cheng


In the docs for message.format.version and log.message.format.version, the list 
of valid values is

 
{code:java}
kafka.api.ApiVersionValidator$@56aac163 
{code}
 

It appears it's simply doing a .toString on the class/instance.

At a minimum, we should remove this java-y-ness.

Even better is, it should show all the valid values.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Kafka 2.2.0 in February 2018

2019-01-18 Thread James Cheng
I think KIP-291 is supposed to go into 2.2.0. 

https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%3A+Separating+controller+connections+and+requests+from+the+data+plane

At least, according to the associated JIRA, the pull request seems to have been 
merged just a couple days ago.

https://issues.apache.org/jira/browse/KAFKA-4453

-James


> On Jan 18, 2019, at 11:33 AM, Matthias J. Sax  wrote:
> 
> Just a quick update on the release.
> 
> 
> We have 22 KIP atm:
> 
> 81, 207, 258, 289, 313, 328, 331, 339, 341, 351, 359, 361, 367, 368,
> 371, 376, 377, 380, 386, 393, 394, 414
> 
> Let me know if I missed any KIP that is targeted for 2.2 release.
> 
> 21 of those KIPs are accepted, and the vote for the last one is open and
> can be closed on time.
> 
> The KIP deadline is Jan 24, so if any late KIPs are coming in, the vote
> must be started latest next Monday Jan 21, to be open for at least 72h
> and to meet the deadline.
> 
> Also keep the feature freeze deadline in mind (31 Jan).
> 
> 
> Besides this, there are 91 open tickets and 41 ticket in progress. I
> will start to go through those tickets soon to see what will make it
> into 2.2 and what we need to defer. If you have any tickets assigned to
> yourself that are target for 2.2 and you know you cannot make it, I
> would appreciate if you could update those ticket yourself to help
> streamlining the release process. Thanks a lot for you support!
> 
> 
> -Matthias
> 
> 
> On 1/8/19 7:27 PM, Ismael Juma wrote:
>> Thanks for volunteering Matthias! The plan sounds good to me.
>> 
>> Ismael
>> 
>> On Tue, Jan 8, 2019, 1:07 PM Matthias J. Sax > 
>>> Hi all,
>>> 
>>> I would like to propose a release plan (with me being release manager)
>>> for the next time-based feature release 2.2.0 in February.
>>> 
>>> The recent Kafka release history can be found at
>>> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.
>>> The release plan (with open issues and planned KIPs) for 2.2.0 can be
>>> found at
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827512
>>> .
>>> 
>>> 
>>> Here are the suggested dates for Apache Kafka 2.2.0 release:
>>> 
>>> 1) KIP Freeze: Jan 24, 2019.
>>> 
>>> A KIP must be accepted by this date in order to be considered for this
>>> release)
>>> 
>>> 2) Feature Freeze: Jan 31, 2019
>>> 
>>> Major features merged & working on stabilization, minor features have
>>> PR, release branch cut; anything not in this state will be automatically
>>> moved to the next release in JIRA.
>>> 
>>> 3) Code Freeze: Feb 14, 2019
>>> 
>>> The KIP and feature freeze date is about 2-3 weeks from now. Please plan
>>> accordingly for the features you want push into Apache Kafka 2.2.0 release.
>>> 
>>> 4) Release Date: Feb 28, 2019 (tentative)
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>> 
> 



Re: [ANNOUNCE] New Committer: Vahid Hashemian

2019-01-15 Thread James Cheng
Congrats, Vahid!!

-James

> On Jan 15, 2019, at 2:44 PM, Jason Gustafson  wrote:
> 
> Hi All,
> 
> The PMC for Apache Kafka has invited Vahid Hashemian as a project committer 
> and
> we are
> pleased to announce that he has accepted!
> 
> Vahid has made numerous contributions to the Kafka community over the past
> few years. He has authored 13 KIPs with core improvements to the consumer
> and the tooling around it. He has also contributed nearly 100 patches
> affecting all parts of the codebase. Additionally, Vahid puts a lot of
> effort into community engagement, helping others on the mail lists and
> sharing his experience at conferences and meetups.
> 
> We appreciate the contributions and we are looking forward to more.
> Congrats Vahid!
> 
> Jason, on behalf of the Apache Kafka PMC



[jira] [Created] (KAFKA-7797) Replication throttling configs aren't in the docs

2019-01-08 Thread James Cheng (JIRA)
James Cheng created KAFKA-7797:
--

 Summary: Replication throttling configs aren't in the docs
 Key: KAFKA-7797
 URL: https://issues.apache.org/jira/browse/KAFKA-7797
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.1.0
Reporter: James Cheng


Docs for the following configs are not on the website:
 * leader.replication.throttled.rate
 * follower.replication.throttled.rate
 * replica.alter.log.dirs.io.max.bytes.per.second

 

They are available in the Operations section titled "Limiting Bandwidth Usage 
during Data Migration", but they are not in the general config section.

I think these are generally applicable, right? Not just during 
kafka-reassign-partitions.sh? If so, then they should be in the auto-generated 
docs.

Related: I think none of the configs in 
core/src/main/scala/kafka/server/DynamicConfig.scala are in the generated docs.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Apache Kafka 2.1.0

2018-11-21 Thread James Cheng
Thanks Dong for running the release, and congrats to everyone in the community!

-James

Sent from my iPhone

> On Nov 21, 2018, at 10:09 AM, Dong Lin  wrote:
> 
> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 2.1.0
> 
> 
> This is a major release and includes significant features from 28 KIPs. It
> contains fixes and improvements from 179 JIRSs, including a few critical
> bug fixes. Here is a summary of some notable changes
> 
> ** Java 11 support
> ** Support for Zstandard, which achieves compression comparable to gzip
> with higher compression and especially decompression speeds(KIP-110)
> ** Avoid expiring committed offsets for active consumer group (KIP-211)
> ** Provide Intuitive User Timeouts in The Producer (KIP-91)
> ** Kafka's replication protocol now supports improved fencing of zombies.
> Previously, under certain rare conditions, if a broker became partitioned
> from Zookeeper but not the rest of the cluster, then the logs of replicated
> partitions could diverge and cause data loss in the worst case (KIP-320)
> ** Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
> ** Admin script and admin client API improvements to simplify admin
> operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> ** DNS handling improvements (KIP-235, KIP-302)
> 
> 
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/2.1.0/RELEASE_NOTES.html
> 
> 
> You can download the source and binary release (Scala ) from:
> https://kafka.apache.org/downloads#2.1.0
> 
> ---
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
> 
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
> 
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
> 
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
> 
> ** Building real-time streaming applications that transform or react
> to the streams of data.
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> A big thank you for the following 100 contributors to this release!
> 
> Ahmed Al Mehdi, Aleksei Izmalkin, Alex Dunayevsky, Amit Sela, Andras
> Katona, Andy Coates, Anna Povzner, Arjun Satish, Attila Sasvari, Aviem Zur,
> Bibin Sebastian, Bill Bejeck, Bob Barrett, Brandon Kirchner, Bridger
> Howell, Chia-Ping Tsai, Colin Hicks, Colin Patrick McCabe, Dhruvil Shah,
> Dong Lin, Edoardo Comar, Eugen Feller, Ewen Cheslack-Postava, Filipe
> Agapito, Flavien Raynaud, Gantigmaa Selenge, Gardner Vickers, Gitomain,
> Gunnar Morling, Guozhang Wang, hashangayasri, huxi, huxihx, Ismael Juma,
> Jagadesh Adireddi, Jason Gustafson, Jim Galasyn, Jimin Hsieh, Jimmy Casey,
> Joan Goyeau, John Roesler, Jon Lee, jonathanskrzypek, Jun Rao, Kamal
> Chandraprakash, Kevin Lafferty, Kevin Lu, Koen De Groote, Konstantine
> Karantasis, lambdaliu, Lee Dongjin, Lincong Li, Liquan Pei, lucapette,
> Lucas Wang, Maciej Bryński, Magesh Nandakumar, Manikumar Reddy, Manikumar
> Reddy O, Mario Molina, Marko Stanković, Matthias J. Sax, Matthias
> Wessendorf, Max Zheng, Mayank Tankhiwale, mgharat, Michal Dziemianko,
> Michał Borowiecki, Mickael Maison, Mutasem Aldmour, Nikolay, nixsticks,
> nprad, okumin, Radai Rosenblatt, radai-rosenblatt, Rajini Sivaram, Randall
> Hauch, Robert Yokota, Rohan, Ron Dagostino, Sam Lendle, Sandor Murakozi,
> Simon Clark, Stanislav Kozlovski, Stephane Maarek, Sébastien Launay, Sönke
> Liebau, Ted Yu, uncleGen, Vahid Hashemian, Viktor Somogyi, wangshao,
> xinzhg, Xiongqi Wesley Wu, Xiongqi Wu, ying-zheng, Yishun Guan, Yu Yang,
> Zhanxiang (Patrick) Huang
> 
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
> 
> Thank you!
> 
> Regards,
> Dong


Re: [ANNOUNCE] New committer: Colin McCabe

2018-10-02 Thread James Cheng
Congrats, Colin!

-James

> On Sep 25, 2018, at 1:39 AM, Ismael Juma  wrote:
> 
> Hi all,
> 
> The PMC for Apache Kafka has invited Colin McCabe as a committer and we are
> pleased to announce that he has accepted!
> 
> Colin has contributed 101 commits and 8 KIPs including significant
> improvements to replication, clients, code quality and testing. A few
> highlights were KIP-97 (Improved Clients Compatibility Policy), KIP-117
> (AdminClient), KIP-227 (Incremental FetchRequests to Increase Partition
> Scalability), the introduction of findBugs and adding Trogdor (fault
> injection and benchmarking tool).
> 
> In addition, Colin has reviewed 38 pull requests and participated in more
> than 50 KIP discussions.
> 
> Thank you for your contributions Colin! Looking forward to many more. :)
> 
> Ismael, for the Apache Kafka PMC



Re: [ANNOUNCE] New Kafka PMC member: Dong Lin

2018-08-21 Thread James Cheng
Congrats Dong!

-James

> On Aug 20, 2018, at 3:54 AM, Ismael Juma  wrote:
> 
> Hi everyone,
> 
> Dong Lin became a committer in March 2018. Since then, he has remained
> active in the community and contributed a number of patches, reviewed
> several pull requests and participated in numerous KIP discussions. I am
> happy to announce that Dong is now a member of the
> Apache Kafka PMC.
> 
> Congratulation Dong! Looking forward to your future contributions.
> 
> Ismael, on behalf of the Apache Kafka PMC



Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-08-07 Thread James Cheng
Guozhang, in a previous message, you proposed said this:

> On Jul 30, 2018, at 3:56 PM, Guozhang Wang  wrote:
> 
> 1. We bump up the JoinGroupRequest with additional fields:
> 
>  1.a) a flag indicating "static" or "dynamic" membership protocols.
>  1.b) with "static" membership, we also add the pre-defined member id.
>  1.c) with "static" membership, we also add an optional
> "group-change-timeout" value.
> 
> 2. On the broker side, we enforce only one of the two protocols for all
> group members: we accept the protocol on the first joined member of the
> group, and if later joining members indicate a different membership
> protocol, we reject it. If the group-change-timeout value was different to
> the first joined member, we reject it as well.


What will happen if we have an already-deployed application that wants to 
switch to using static membership? Let’s say there are 10 instances of it. As 
the instances go through a rolling restart, they will switch from dynamic 
membership (the default?) to static membership. As each one leaves the group 
and restarts, they will be rejected from the group (because the group is 
currently using dynamic membership). The group will shrink down until there is 
1 node handling all the traffic. After that one restarts, the group will switch 
over to static membership.

Is that right? That means that the transition plan from dynamic to static 
membership isn’t very smooth. 

I’m not really sure what can be done in this case. This reminds me of the 
transition plans that were discussed for moving from zookeeper-based consumers 
to kafka-coordinator-based consumers. That was also hard, and ultimately we 
decided not to build that.

-James



Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-01 Thread James Cheng
I’m a little confused about something. Is this KIP focused on log cleaner 
exceptions in general, or focused on log cleaner exceptions due to disk 
failures?

Will max.uncleanable.partitions apply to all exceptions (including log cleaner 
logic errors) or will it apply to only disk I/o exceptions?

I can understand taking the disk offline if there have been “N” I/O exceptions. 
Disk errors are user fixable (by replacing the affected disk). It turns an 
invisible (soft?) failure into a visible hard failure. And the I/O exceptions 
are possibly already causing problems, so it makes sense to limit their impact.

But I’m not sure if it makes sense to take a disk offline after”N” logic errors 
in the log cleaner. If a log cleaner logic error happens, it’s rarely user 
fixable. And it will likely several partitions at once, so you’re likely to 
bump up against the max.uncleanable.partitions limit more quickly. If a disk 
was taken due to logic errors, I’m not sure what the user would do.

-James

Sent from my iPhone

> On Aug 1, 2018, at 9:11 AM, Stanislav Kozlovski  
> wrote:
> 
> Yes, good catch. Thank you, James!
> 
> Best,
> Stanislav
> 
>> On Wed, Aug 1, 2018 at 5:05 PM James Cheng  wrote:
>> 
>> Can you update the KIP to say what the default is for
>> max.uncleanable.partitions?
>> 
>> -James
>> 
>> Sent from my iPhone
>> 
>>> On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski 
>> wrote:
>>> 
>>> Hey group,
>>> 
>>> I am planning on starting a voting thread tomorrow. Please do reply if
>> you
>>> feel there is anything left to discuss.
>>> 
>>> Best,
>>> Stanislav
>>> 
>>> On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski <
>> stanis...@confluent.io>
>>> wrote:
>>> 
>>>> Hey, Ray
>>>> 
>>>> Thanks for pointing that out, it's fixed now
>>>> 
>>>> Best,
>>>> Stanislav
>>>> 
>>>>> On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:
>>>>> 
>>>>> Thanks.  Can you fix the link in the "KIPs under discussion" table on
>>>>> the main KIP landing page
>>>>> <
>>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#
>>> ?
>>>>> 
>>>>> I tried, but the Wiki won't let me.
>>>>> 
>>>>> -Ray
>>>>> 
>>>>>> On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
>>>>>> Hey guys,
>>>>>> 
>>>>>> @Colin - good point. I added some sentences mentioning recent
>>>>> improvements
>>>>>> in the introductory section.
>>>>>> 
>>>>>> *Disk Failure* - I tend to agree with what Colin said - once a disk
>>>>> fails,
>>>>>> you don't want to work with it again. As such, I've changed my mind
>> and
>>>>>> believe that we should mark the LogDir (assume its a disk) as offline
>> on
>>>>>> the first `IOException` encountered. This is the LogCleaner's current
>>>>>> behavior. We shouldn't change that.
>>>>>> 
>>>>>> *Respawning Threads* - I believe we should never re-spawn a thread.
>> The
>>>>>> correct approach in my mind is to either have it stay dead or never
>> let
>>>>> it
>>>>>> die in the first place.
>>>>>> 
>>>>>> *Uncleanable-partition-names metric* - Colin is right, this metric is
>>>>>> unneeded. Users can monitor the `uncleanable-partitions-count` metric
>>>>> and
>>>>>> inspect logs.
>>>>>> 
>>>>>> 
>>>>>> Hey Ray,
>>>>>> 
>>>>>>> 2) I'm 100% with James in agreement with setting up the LogCleaner to
>>>>>>> skip over problematic partitions instead of dying.
>>>>>> I think we can do this for every exception that isn't `IOException`.
>>>>> This
>>>>>> will future-proof us against bugs in the system and potential other
>>>>> errors.
>>>>>> Protecting yourself against unexpected failures is always a good thing
>>>>> in
>>>>>> my mind, but I also think that protecting yourself against bugs in the
>>>>>> software is sort of clunky. What does everybody think about this?
>>>>>> 
>>>>>>> 4) The only improveme

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-01 Thread James Cheng
Can you update the KIP to say what the default is for 
max.uncleanable.partitions?

-James

Sent from my iPhone

> On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski  
> wrote:
> 
> Hey group,
> 
> I am planning on starting a voting thread tomorrow. Please do reply if you
> feel there is anything left to discuss.
> 
> Best,
> Stanislav
> 
> On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski 
> wrote:
> 
>> Hey, Ray
>> 
>> Thanks for pointing that out, it's fixed now
>> 
>> Best,
>> Stanislav
>> 
>>> On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:
>>> 
>>> Thanks.  Can you fix the link in the "KIPs under discussion" table on
>>> the main KIP landing page
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#>?
>>> 
>>> I tried, but the Wiki won't let me.
>>> 
>>> -Ray
>>> 
>>>> On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
>>>> Hey guys,
>>>> 
>>>> @Colin - good point. I added some sentences mentioning recent
>>> improvements
>>>> in the introductory section.
>>>> 
>>>> *Disk Failure* - I tend to agree with what Colin said - once a disk
>>> fails,
>>>> you don't want to work with it again. As such, I've changed my mind and
>>>> believe that we should mark the LogDir (assume its a disk) as offline on
>>>> the first `IOException` encountered. This is the LogCleaner's current
>>>> behavior. We shouldn't change that.
>>>> 
>>>> *Respawning Threads* - I believe we should never re-spawn a thread. The
>>>> correct approach in my mind is to either have it stay dead or never let
>>> it
>>>> die in the first place.
>>>> 
>>>> *Uncleanable-partition-names metric* - Colin is right, this metric is
>>>> unneeded. Users can monitor the `uncleanable-partitions-count` metric
>>> and
>>>> inspect logs.
>>>> 
>>>> 
>>>> Hey Ray,
>>>> 
>>>>> 2) I'm 100% with James in agreement with setting up the LogCleaner to
>>>>> skip over problematic partitions instead of dying.
>>>> I think we can do this for every exception that isn't `IOException`.
>>> This
>>>> will future-proof us against bugs in the system and potential other
>>> errors.
>>>> Protecting yourself against unexpected failures is always a good thing
>>> in
>>>> my mind, but I also think that protecting yourself against bugs in the
>>>> software is sort of clunky. What does everybody think about this?
>>>> 
>>>>> 4) The only improvement I can think of is that if such an
>>>>> error occurs, then have the option (configuration setting?) to create a
>>>>> .skip file (or something similar).
>>>> This is a good suggestion. Have others also seen corruption be generally
>>>> tied to the same segment?
>>>> 
>>>> On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 
>>> wrote:
>>>> 
>>>>> For the cleaner thread specifically, I do not think respawning will
>>> help at
>>>>> all because we are more than likely to run into the same issue again
>>> which
>>>>> would end up crashing the cleaner. Retrying makes sense for transient
>>>>> errors or when you believe some part of the system could have healed
>>>>> itself, both of which I think are not true for the log cleaner.
>>>>> 
>>>>> On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino 
>>> wrote:
>>>>> 
>>>>>> <<>> in
>>>>> an
>>>>>> infinite loop which consumes resources and fires off continuous log
>>>>>> messages.
>>>>>> Hi Colin.  In case it could be relevant, one way to mitigate this
>>> effect
>>>>> is
>>>>>> to implement a backoff mechanism (if a second respawn is to occur then
>>>>> wait
>>>>>> for 1 minute before doing it; then if a third respawn is to occur wait
>>>>> for
>>>>>> 2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to some
>>> max
>>>>>> wait time).
>>>>>> 
>>>>>> I have no opinion on whether respawn is appropriate or not in this
>>>>> context,
>>>>>> but a mitigation like the incr

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread James Cheng
Congrats and great job, everyone! Thanks Rajini for driving the release!

-James

Sent from my iPhone

> On Jul 30, 2018, at 3:25 AM, Rajini Sivaram  wrote:
> 
> The Apache Kafka community is pleased to announce the release for
> 
> Apache Kafka 2.0.0.
> 
> 
> 
> 
> 
> This is a major release and includes significant new features from
> 
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
> 
> a few critical bugs. Here is a summary of some notable changes:
> 
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
> 
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
> 
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
> 
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
> 
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
> 
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
> 
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
> 
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
> 
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
> 
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
> 
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably regarding
> Serdes with new implicit Serdes.
> 
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
> 
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
> 
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
> 
> 
> 
> 
> 
> All of the changes in this release can be found in the release notes:
> 
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
> 
> 
> 
> 
> 
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
> 
> https://kafka.apache.org/downloads#2.0.0
> 
> 
> 
> 
> ---
> 
> 
> 
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> 
> ** The Producer API allows an application to publish a stream records to
> 
> one or more Kafka topics.
> 
> 
> 
> ** The Consumer API allows an application to subscribe to one or more
> 
> 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread James Cheng
When you say that it will "break", what does this breakage look like? Will the 
consumer-group be non-functional? Will just those instances be non-functional? 
Or will the group be functional, but the rebalancing be non-optimal and require 
more round-trips/data-transfer? (similar to the current algorithm)

I'm trying to assess the potential for user-error and the impact of user-error.

-James

> On Jul 27, 2018, at 11:25 AM, Boyang Chen  wrote:
> 
> Hey James,
> 
> 
> the algorithm is relying on client side to provide unique consumer member id. 
> It will break unless we enforce some sort of validation (host + port) on the 
> server side. To simplify the first version, we do not plan to enforce 
> validation. A good comparison would be the EOS producer which is in charge of 
> generating unique transaction id sequence. IMO for broker logic, the 
> tolerance of client side error is not unlimited.
> 
> 
> Thank you!
> 
> 
> 
> From: James Cheng 
> Sent: Saturday, July 28, 2018 1:26 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
> specifying member id
> 
> 
>> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
>> 
>> Hi Boyang,
>> 
>> Thanks for the proposed KIP. I made a pass over the wiki and here are some
>> comments / questions:
>> 
>> 1. In order to preserve broker compatibility, we need to make sure the
>> broker version discovery logic can be integrated with this new logic. I.e.
>> if a newer versioned consumer is talking to an older versioned broker who
>> does not recognize V4, the client needs to downgrade its JoinGroupRequest
>> version to V3 and not setting the member-id specifically. You can take a
>> look at the ApiVersionsRequest and see how to work with it.
>> 
>> 2. There may exist some manners to validate that two different clients do
>> not send with the same member id, for example if we pass along the
>> host:port information from KafkaApis to the GroupCoordinator interface. But
>> I think this is overly complicated the logic and may not worthwhile than
>> relying on users to specify unique member ids.
> 
> Boyang,
> 
> Thanks for the KIP! How will the algorithm behave if multiple consumers 
> provide the same member id?
> 
> -James
> 
>> 3. Minor: you would need to bumping up the version of JoinGroupResponse to
>> V4 as well.
>> 
>> 4. Minor: in the wiki page, you need to specify the actual string value for
>> `MEMBER_ID`, for example "member.id".
>> 
>> 5. When this additional config it specified by users, we should consider
>> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
>> since otherwise its effectiveness would be less.
>> 
>> 
>> Guozhang
>> 
>> 
>> 
>>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:
>>> 
>>> Hey friends,
>>> 
>>> 
>>> I would like to open a discussion thread on KIP-345:
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
>>> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>>> 
>>> 
>>> This KIP is trying to resolve multiple rebalances by maintaining the
>>> consumer member id across rebalance generations. I have verified the theory
>>> on our internal Stream application, and it could reduce rebalance time to a
>>> few seconds when service is rolling restart.
>>> 
>>> 
>>> Let me know your thoughts, thank you!
>>> 
>>> 
>>> Best,
>>> 
>>> Boyang
>>> 
>> 
>> 
>> 
>> --
>> -- Guozhang



Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread James Cheng


> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
> 
> Hi Boyang,
> 
> Thanks for the proposed KIP. I made a pass over the wiki and here are some
> comments / questions:
> 
> 1. In order to preserve broker compatibility, we need to make sure the
> broker version discovery logic can be integrated with this new logic. I.e.
> if a newer versioned consumer is talking to an older versioned broker who
> does not recognize V4, the client needs to downgrade its JoinGroupRequest
> version to V3 and not setting the member-id specifically. You can take a
> look at the ApiVersionsRequest and see how to work with it.
> 
> 2. There may exist some manners to validate that two different clients do
> not send with the same member id, for example if we pass along the
> host:port information from KafkaApis to the GroupCoordinator interface. But
> I think this is overly complicated the logic and may not worthwhile than
> relying on users to specify unique member ids.

Boyang,

Thanks for the KIP! How will the algorithm behave if multiple consumers provide 
the same member id?

-James

> 3. Minor: you would need to bumping up the version of JoinGroupResponse to
> V4 as well.
> 
> 4. Minor: in the wiki page, you need to specify the actual string value for
> `MEMBER_ID`, for example "member.id".
> 
> 5. When this additional config it specified by users, we should consider
> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
> since otherwise its effectiveness would be less.
> 
> 
> Guozhang
> 
> 
> 
>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:
>> 
>> Hey friends,
>> 
>> 
>> I would like to open a discussion thread on KIP-345:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
>> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>> 
>> 
>> This KIP is trying to resolve multiple rebalances by maintaining the
>> consumer member id across rebalance generations. I have verified the theory
>> on our internal Stream application, and it could reduce rebalance time to a
>> few seconds when service is rolling restart.
>> 
>> 
>> Let me know your thoughts, thank you!
>> 
>> 
>> Best,
>> 
>> Boyang
>> 
> 
> 
> 
> -- 
> -- Guozhang


Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-24 Thread James Cheng
Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant of errors. 
Currently, as you said, once it dies, it stays dead. 

Things are better now than they used to be. We have the metric
kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of 1.1.0, we have 
KIP-226, which allows you to restart the log cleaner thread, without requiring 
a broker restart. 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration
 

 I've only read about this, I haven't personally tried it.

Some comments:
* I like the idea of having the log cleaner continue to clean as many 
partitions as it can, skipping over the problematic ones if possible.

* If the log cleaner thread dies, I think it should automatically be revived. 
Your KIP attempts to do that by catching exceptions during execution, but I 
think we should go all the way and make sure that a new one gets created, if 
the thread ever dies.

* It might be worth trying to re-clean the uncleanable partitions. I've seen 
cases where an uncleanable partition later became cleanable. I unfortunately 
don't remember how that happened, but I remember being surprised when I 
discovered it. It might have been something like a follower was uncleanable but 
after a leader election happened, the log truncated and it was then cleanable 
again. I'm not sure.

* For your metrics, can you spell out the full metric in JMX-style format, such 
as:
kafka.log:type=LogCleanerManager,name=uncleanable-partitions-count
value=4

* For "uncleanable-partitions": topic-partition names can be very long. I think 
the current max size is 210 characters (or maybe 240-ish?). Having the 
"uncleanable-partitions" being a list could be very large metric. Also, having 
the metric come out as a csv might be difficult to work with for monitoring 
systems. If we *did* want the topic names to be accessible, what do you think 
of having the 
kafka.log:type=LogCleanerManager,topic=topic1,partition=2
I'm not sure if LogCleanerManager is the right type, but my example was that 
the topic and partition can be tags in the metric. That will allow monitoring 
systems to more easily slice and dice the metric. I'm not sure what the 
attribute for that metric would be. Maybe something like  "uncleaned bytes" for 
that topic-partition? Or time-since-last-clean? Or maybe even just "Value=1".

* About `max.uncleanable.partitions`, you said that this likely indicates that 
the disk is having problems. I'm not sure that is the case. For the 4 JIRAs 
that you mentioned about log cleaner problems, all of them are partition-level 
scenarios that happened during normal operation. None of them were indicative 
of disk problems.

* About marking disks as offline when exceeding a certain threshold, that 
actually increases the blast radius of log compaction failures. Currently, the 
uncleaned partitions are still readable and writable. Taking the disks offline 
would impact availability of the uncleanable partitions, as well as impact all 
other partitions that are on the disk.

-James


> On Jul 23, 2018, at 5:46 PM, Stanislav Kozlovski  
> wrote:
> 
> I renamed the KIP and that changed the link. Sorry about that. Here is the
> new link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error
> 
> On Mon, Jul 23, 2018 at 5:11 PM Stanislav Kozlovski 
> wrote:
> 
>> Hey group,
>> 
>> I created a new KIP about making log compaction more fault-tolerant.
>> Please give it a look here and please share what you think, especially in
>> regards to the points in the "Needs Discussion" paragraph.
>> 
>> KIP: KIP-346
>> 
>> --
>> Best,
>> Stanislav
>> 
> 
> 
> -- 
> Best,
> Stanislav



[jira] [Created] (KAFKA-7144) Kafka Streams doesn't properly balance partition assignment

2018-07-09 Thread James Cheng (JIRA)
James Cheng created KAFKA-7144:
--

 Summary: Kafka Streams doesn't properly balance partition 
assignment
 Key: KAFKA-7144
 URL: https://issues.apache.org/jira/browse/KAFKA-7144
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.1.0
Reporter: James Cheng
 Attachments: OneThenTwelve.java

Kafka Streams doesn't always spread the tasks across all available 
instances/threads

I have a topology which consumes a single partition topic and goes .through() a 
12 partition topic. The makes 13 partitions.

 

I then started 2 instances of the application. I would have expected the 13 
partitions to be split across the 2 instances roughly evenly (7 partitions on 
one, 6 partitions on the other).

Instead, one instance gets 12 partitions, and the other instance gets 1 
partition.

 

Repro case attached. I ran it a couple times, and it was fairly repeatable.

Setup for the repro:
{code:java}
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic one --partitions 
1 --replication-factor 1 
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic twelve 
--partitions 12 --replication-factor 1
$ echo foo | kafkacat -P -b 127.0.0.1 -t one
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-325: Extend Consumer Group Command to Show Beginning Offsets

2018-06-27 Thread James Cheng
The “Motivation” section of the KIP says that the starting offset will be 
useful but doesn’t say why. Can you add a use-case or two to describe how it 
will be useful?

In our case (see 
https://github.com/wushujames/kafka-utilities/blob/master/ConsumerGroupLag/README.md),
 we found the starting offset useful so that we could calculate partition size 
so that we could identify empty partitions (partitions where all the data had 
expired). In particular, we needed that info so that we could calculate “lag”. 
Consider that case where we are asked to mirror an abandoned topic where 
startOffset==endOffset==100. We would have no committed offset on it, and 
the topic has no data in it, so we won’t soon get any committed offset, and so 
“lag” is kind of undefined. We used the additional startOffset to allow us to 
detect this case.

-James

Sent from my iPhone

> On Jun 26, 2018, at 11:23 AM, Vahid S Hashemian  
> wrote:
> 
> Hi everyone,
> 
> I have created a trivial KIP to improve the offset reporting of the 
> consumer group command: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-325%3A+Extend+Consumer+Group+Command+to+Show+Beginning+Offsets
> Looking forward to your feedback!
> 
> Thanks.
> --Vahid
> 
> 


Re: kafka ack=all and min-isr

2018-06-08 Thread James Cheng
I wrote a blog post on min.isr that explains it in more detail:
https://logallthethings.com/2016/07/11/min-insync-replicas-what-does-it-actually-do/

The post is 2 years old, but I think it's still correct.

-James

> On Jun 7, 2018, at 10:31 PM, Carl Samuelson  wrote:
> 
> Hi
> 
> Hopefully this is the correct email address and forum for this.
> I asked this question on stack overflow, but did not get an answer:
> https://stackoverflow.com/questions/50689177/kafka-ack-all-and-min-isr
> 
> *Summary*
> 
> The docs and code comments for Kafka suggest that when the producer setting
> acks is set to allthen an ack will only be sent to the producer when *all
> in-sync replicas have caught up*, but the code (Partition.Scala,
> checkEnoughReplicasReachOffset) seems to suggest that the ack is sent as
> soon as *min in-sync replicas have caught up*.
> 
> *Details*
> 
> The kafka docs have this:
> 
> acks=all This means the leader will wait for the full set of in-sync
> replicas to acknowledge the record. source
> 
> 
> Also, looking at the Kafka source code - partition.scala
> checkEnoughReplicasReachOffset() has the following comment (emphasis mine):
> 
> Note that this method will only be called if requiredAcks = -1 and we are
> waiting for *all replicas*in ISR to be fully caught up to the (local)
> leader's offset corresponding to this produce request before we acknowledge
> the produce request.
> 
> Finally, this answer  on Stack
> Overflow (emphasis mine again)
> 
> Also the min in-sync replica setting specifies the minimum number of
> replicas that need to be in-sync for the partition to remain available for
> writes. When a producer specifies ack (-1 / all config) it will still wait
> for acks from *all in sync replicas* at that moment (independent of the
> setting for min in-sync replicas).
> 
> But when I look at the code in Partition.Scala (note minIsr <
> curInSyncReplicas.size):
> 
> def checkEnoughReplicasReachOffset(requiredOffset: Long): (Boolean, Errors) = 
> {
>  ...
>  val minIsr = leaderReplica.log.get.config.minInSyncReplicas
>  if (leaderReplica.highWatermark.messageOffset >= requiredOffset) {
>if (minIsr <= curInSyncReplicas.size)
>  (true, Errors.NONE)
> 
> The code that calls this returns the ack:
> 
> if (error != Errors.NONE || hasEnough) {
>  status.acksPending = false
>  status.responseStatus.error = error
> }
> 
> So, the code looks like it returns an ack as soon as the in-sync replica
> set are greater than min in-sync replicas. However, the documentation and
> comments suggest that the ack is only sent once all in-sync replicas have
> caught up. What am I missing? At the very least, the comment above
> checkEnoughReplicasReachOffset looks like it should be changed.
> Regards,
> 
> Carl



[jira] [Created] (KAFKA-6970) Kafka streams lets the user call init() and close() on a state store, when inside Processors

2018-05-30 Thread James Cheng (JIRA)
James Cheng created KAFKA-6970:
--

 Summary: Kafka streams lets the user call init() and close() on a 
state store, when inside Processors
 Key: KAFKA-6970
 URL: https://issues.apache.org/jira/browse/KAFKA-6970
 Project: Kafka
  Issue Type: Bug
Reporter: James Cheng


When using a state store within Transform (and Processor and TransformValues), 
the user is able to call init() and close() on the state stores. Those APIs 
should only be called by kafka streams itself.

If possible, it would be good to guard those APIs so that the user cannot call 
them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6967) TopologyTestDriver does not allow pre-populating state stores that have change logging

2018-05-29 Thread James Cheng (JIRA)
James Cheng created KAFKA-6967:
--

 Summary: TopologyTestDriver does not allow pre-populating state 
stores that have change logging
 Key: KAFKA-6967
 URL: https://issues.apache.org/jira/browse/KAFKA-6967
 Project: Kafka
  Issue Type: Bug
Reporter: James Cheng


TopologyTestDriver does not allow pre-populating a state store that has logging 
enabled. If you try to do it, you will get the following error message:

 
{code:java}
java.lang.IllegalStateException: This should not happen as timestamp() should 
only be called while a record is processed
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.timestamp(AbstractProcessorContext.java:153)
at 
org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
at 
org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:69)
at 
org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:29)
at 
org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117)
{code}

Also see:
https://github.com/apache/kafka/blob/trunk/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverTest.java#L723-L740



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Kafka KIP meeting on Apr. 9 at 9:00am PDT

2018-04-05 Thread James Cheng
Jun, 

Can you add me as well? wushuja...@gmail.com 

Thanks,
-James

> On Apr 5, 2018, at 1:34 PM, Jun Rao  wrote:
> 
> Hi, Ted, Vahid,
> 
> Added you to the invite.
> 
> Thanks,
> 
> Jun
> 
> 
> On Thu, Apr 5, 2018 at 10:42 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
>> Hi Jun,
>> 
>> I used to receive these invites, but didn't get this one.
>> Please send me an invite. Thanks.
>> 
>> Regards,
>> --Vahid
>> 
>> 
>> 
>> From:   Jun Rao 
>> To: dev 
>> Date:   04/05/2018 10:25 AM
>> Subject:Kafka KIP meeting on Apr. 9 at 9:00am PDT
>> 
>> 
>> 
>> Hi, Everyone,
>> 
>> We plan to have a Kafka KIP meeting this coming Monday (Apr. 9)  at 9:00am
>> PDT. If you plan to attend but haven't received an invite, please let me
>> know. The following is the agenda.
>> 
>> Agenda:
>> KIP-253: Partition expansion
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> 
>> 
>> 



Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread James Cheng
Thanks Damian and Rajini for running the release! Congrats and good job 
everyone!

-James

Sent from my iPhone

> On Mar 29, 2018, at 2:27 AM, Rajini Sivaram  wrote:
> 
> The Apache Kafka community is pleased to announce the release for
> 
> Apache Kafka 1.1.0.
> 
> 
> Kafka 1.1.0 includes a number of significant new features.
> 
> Here is a summary of some notable changes:
> 
> 
> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
> 
>   that speed up controlled shutdown. ZooKeeper session expiration edge
> cases
> 
>   have also been fixed as part of this effort.
> 
> 
> ** Controller improvements also enable more partitions to be supported on a
> 
>   single cluster. KIP-227 introduced incremental fetch requests, providing
> 
>   more efficient replication when the number of partitions is large.
> 
> 
> ** KIP-113 added support for replica movement between log directories to
> 
>   enable data balancing with JBOD.
> 
> 
> ** Some of the broker configuration options like SSL keystores can now be
> 
>   updated dynamically without restarting the broker. See KIP-226 for
> details
> 
>   and the full list of dynamic configs.
> 
> 
> ** Delegation token based authentication (KIP-48) has been added to Kafka
> 
>   brokers to support large number of clients without overloading Kerberos
> 
>   KDCs or other authentication servers.
> 
> 
> ** Several new features have been added to Kafka Connect, including header
> 
>   support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST
> 
>   interface (KIP-208 and KIP-238), validation of connector names (KIP-212)
> 
>   and support for topic regex in sink connectors (KIP-215). Additionally,
> 
>   the default maximum heap size for Connect workers was increased to 2GB.
> 
> 
> ** Several improvements have been added to the Kafka Streams API, including
> 
>   reducing repartition topic partitions footprint, customizable error
> 
>   handling for produce failures and enhanced resilience to broker
> 
>   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
> 
> 
> All of the changes in this release can be found in the release notes:
> 
> 
> 
> https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html
> 
> 
> 
> 
> You can download the source release from:
> 
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz
> 
> 
> 
> and binary releases from:
> 
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz
> 
> (Scala 2.11)
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz
> 
> (Scala 2.12)
> 
> 
> --
> 
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> 
> ** The Producer API allows an application to publish a stream records to
> 
> one or more Kafka topics.
> 
> 
> 
> ** The Consumer API allows an application to subscribe to one or more
> 
> topics and process the stream of records produced to them.
> 
> 
> 
> ** The Streams API allows an application to act as a stream processor,
> 
> consuming an input stream from one or more topics and producing an output
> 
> stream to one or more output topics, effectively transforming the input
> 
> streams to output streams.
> 
> 
> 
> ** The Connector API allows building and running reusable producers or
> 
> consumers that connect Kafka topics to existing applications or data
> 
> systems. For example, a connector to a relational database might capture
> 
> every change to a table.three key capabilities:
> 
> 
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> 
> between systems or applications.
> 
> 
> 
> ** Building real-time streaming applications that transform or react to the
> 
> streams of data.
> 
> 
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> 
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> 
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> 
> 
> 
> A big thank you for the following 120 contributors to this release!
> 
> 
> Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
> 
> Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
> 
> Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,
> 
> Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
> 
> Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,
> 
> ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,
> 
> fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, Hugo Louro,
> 
> huxi, huxihx, Igor Kostiakov, Ismael Juma, Ivan Babrou, Jacek Laskowski,
> 
> Jakub Scholz, Jason Gustafson, Jeff Klukas, Jeff Widman, Jeremy
> Custenborder,
> 
> Jeyhun Karimov, Jiangjie (Becket) Qin, 

Re: [VOTE] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2018-03-28 Thread James Cheng
+1 (non-binding)

Thanks for all the hard work on this, Vahid!

-James

> On Mar 28, 2018, at 10:34 AM, Vahid S Hashemian  
> wrote:
> 
> Hi all,
> 
> As I believe the feedback and suggestions on this KIP have been addressed 
> so far, I'd like to start a vote.
> The KIP can be found at 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
> 
> Thanks in advance for voting :)
> 
> --Vahid
> 



Re: [ANNOUNCE] New Committer: Dong Lin

2018-03-28 Thread James Cheng
Congrats, Dong!

-James

> On Mar 28, 2018, at 10:58 AM, Becket Qin  wrote:
> 
> Hello everyone,
> 
> The PMC of Apache Kafka is pleased to announce that Dong Lin has accepted
> our invitation to be a new Kafka committer.
> 
> Dong started working on Kafka about four years ago, since which he has
> contributed numerous features and patches. His work on Kafka core has been
> consistent and important. Among his contributions, most noticeably, Dong
> developed JBOD (KIP-112, KIP-113) to handle disk failures and to reduce
> overall cost, added deleteDataBefore() API (KIP-107) to allow users
> actively remove old messages. Dong has also been active in the community,
> participating in KIP discussions and doing code reviews.
> 
> Congratulations and looking forward to your future contribution, Dong!
> 
> Jiangjie (Becket) Qin, on behalf of Apache Kafka PMC



Re: [VOTE] KIP-268: Simplify Kafka Streams Rebalance Metadata Upgrade

2018-03-22 Thread James Cheng
+1 (non-binding)

-James

> On Mar 21, 2018, at 2:28 AM, Damian Guy  wrote:
> 
> +1
> 
> On Wed, 21 Mar 2018 at 01:44 abel-sun  wrote:
> 
>> 
>>   Thanks you of your offer ,agree with you!
>> 
>> On 2018/03/21 00:56:11, Richard Yu  wrote:
>>> Hi Matthias,
>>> Thanks for setting up the upgrade path.
>>> 
>>> +1 (non-binding)
>>> 
>>> On Tue, Mar 20, 2018 at 3:42 PM, Matthias J. Sax 
>>> wrote:
>>> 
 Hi,
 
 I would like to start the vote for KIP-268:
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade
 
 PR https://github.com/apache/kafka/pull/4636 contains the fixes to
 upgrade from metadata version 1 to 2. Some tests are still missing but
 I'll add them asap.
 
 For "version probing" including new metadata version 3 I plan to do a
 follow-up PR after PR-4636 is merged.
 
 
 -Matthias
 
 
>>> 
>> 



Re: [DISCUSS] KIP-268: Simplify Kafka Streams Rebalance Metadata Upgrade

2018-03-22 Thread James Cheng


> On Mar 21, 2018, at 11:45 PM, Matthias J. Sax <matth...@confluent.io> wrote:
> 
> Yes, it only affects the metadata. KIP-268 targets metadata upgrade
> without store upgrade.
> 
> We can discuss store upgrade further in KIP-258: I think in general, the
> upgrade/downgrade behavior might be an issue for upgrading stores.
> However, this upgrade/downgrade can only happen when upgrading from 1.2
> to a future version. Thus, it won't affect an upgrade to 1.2.
> 
> For an upgrade to 1.2, we introduce the "upgrade.from" parameter
> (because we don't have "version probing" for 1.1 yet) and this ensures
> that upgrading cannot happen "too early", and no downgrade can happen
> either for this case.
> 
> Let me know what you think.
> 

I think yes, we can discuss upgrade/downgrade issues (to versions after 1.2) in 
the other KIP (KIP-258).

However, this KIP-268 looks fine. It gives us the mechanism to properly detect 
and automatically upgrade/downgrade the topology and allows the new/old code to 
co-exist within a topology, which is something we didn't have before.

KIP-268 looks good to me.

Thanks for all the answers to my questions.

-James

> 
> -Matthias
> 
> On 3/21/18 11:16 PM, James Cheng wrote:
>> 
>> 
>> 
>>> On Mar 21, 2018, at 11:18 AM, Matthias J. Sax <matth...@confluent.io> wrote:
>>> 
>>> Thanks for following up James.
>>> 
>>>> Is this the procedure that happens during every rebalance? The reason I 
>>>> ask is that this step:
>>>>>>> As long as the leader (before or after upgrade) receives at least
>>> one old version X Subscription it always sends version Assignment X back
>>> (the encoded supported version is X before the leader is upgrade and Y
>>> after the leader is upgraded).
>>> 
>>> Yes, that would be the consequence.
>>> 
>>>> This implies that the leader receives all Subscriptions before sending 
>>>> back any responses. Is that what actually happens? Is it possible that it 
>>>> would receive say 4 out of 5 Subscriptions of Y, send back a response Y, 
>>>> and then later receive a Subscription X? What happens in that case? Would 
>>>> that Subscription X then trigger another rebalance, and the whole thing 
>>>> starts again?
>>> 
>>> That sounds correct. A 'delayed' Subscription could always happen --
>>> even before KIP-268 -- and would trigger a new rebalance. With this
>>> regard, the behavior does not change. The difference is, that we would
>>> automatically downgrade the Assignment from Y to X again -- but the
>>> application would not fail (as it would before the KIP).
>>> 
>>> Do you see an issue with this behavior. The idea of the design is to
>>> make Kafka Streams robust against those scenarios. Thus, if 4 apps are
>>> upgraded but no.5 is not yet and no.5 is late, Kafka Streams would first
>>> upgrade from X to Y and downgrade from Y to X in the second rebalance
>>> when no.5 joins the group. If no.5 gets upgraded, a third rebalance
>>> would upgrade to Y again.
>>> 
>> 
>> Sounds good. 
>> 
>> 
>>> Thus, as long as not all instances are on the newest version,
>>> upgrades/donwgrades of the exchanged rebalance metadata could happen
>>> multiple times. However, this should not be an issue from my understanding.
>> 
>> About “this should not be an issue”: this upgrade/downgrade is just about 
>> the rebalance metadata, right? Are there other associated things that will 
>> also have to upgrade/downgrade in sync with the rebalance metadata? For 
>> example, the idea for this KIP originally came up during the discussion 
>> about adding timestamps to RockDB state stores, which required updating the 
>> on-disk schema. In the case of an updated RocksDB state store but with a 
>> downgraded rebalance metadata... that should work, right? Because we still 
>> have updated code (which understands the on-disk format) but that it simply 
>> gets its partition assignments via the downgraded rebalance metadata?
>> 
>> Thanks,
>> -James
>> 
>> Sent from my iPhone
>> 
>>> Let us know what you think about it.
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>>> On 3/20/18 11:10 PM, James Cheng wrote:
>>>> Sorry, I see that the VOTE started already, but I have a late question on 
>>>> this KIP.
>>>> 
>>>> In the "version probing" protocol:
>>>>> Detailed upgrade

Re: [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-22 Thread James Cheng
Regardless of what we decide, Allen, can you update the KIP and provide an 
concrete example of what an mbean will look like with the proposed change? 
Similar to what I said here:

> An example of the metric we are thinking of changing is this:
> 
> kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce
> 
> And the KIP is proposing to change it to:
> 
> kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=1.0.0


I think that would make the KIP easier to understand.

Thanks,
-James

Sent from my iPhone

> On Mar 21, 2018, at 11:49 PM, James Cheng <wushuja...@gmail.com> wrote:
> 
> An example of the metric we are thinking of changing is this:
> 
> kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce
> 
> And the KIP is proposing to change it to:
> 
> kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=1.0.0
> 
> Is is possible for the broker to have BOTH metrics? That way, we don’t have 
> to change the name.
> 
> Would that make querying/aggregating too annoying (since a query for 
> name=RequestsPerSec and request=Produce would return BOTH metrics)? 
> 
> Also, it might be hard to query for “name=RequestsPerSec and request=Produce 
> and version field NOT present”
> 
> -James
> 
> Sent from my iPhone
> 
>> On Mar 21, 2018, at 10:17 PM, Jeff Widman <j...@jeffwidman.com> wrote:
>> 
>> I agree with Allen.
>> 
>> Go with the intuitive name, even if it means not deprecating. The impact of
>> breakage here is small since it only breaks monitoring and the folks who
>> watch their dashboards closely are the ones likely to read the release
>> notes carefully and see this change.
>> 
>>> On Wed, Mar 21, 2018, 3:24 PM Allen Wang <allenxw...@gmail.com> wrote:
>>> 
>>> I understand the impact to jmx based tools. But adding a new metric is
>>> unnecessary for more advanced monitoring systems that can aggregate with or
>>> without tags. Duplicating the metric (including the "request" tag) also
>>> adds performance cost to the broker, especially for the metric that needs
>>> to be updated for every request.
>>> 
>>> For KIP-225, the choice makes sense because the deprecated metric's name is
>>> undesirable anyway and the new metric name is much better than the prefixed
>>> metric name. Not the case for RequestsPerSec. It is hard for me to come up
>>> with another intuitive name.
>>> 
>>> Thanks,
>>> Allen
>>> 
>>> 
>>> 
>>>> On Wed, Mar 21, 2018 at 2:01 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>> 
>>>> Creating new metric and deprecating existing one seems better from
>>>> compatibility point of view.
>>>>  Original message From: James Cheng <
>>> wushuja...@gmail.com>
>>>> Date: 3/21/18  1:39 AM  (GMT-08:00) To: dev@kafka.apache.org Subject:
>>> Re:
>>>> [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric
>>>> Manikumar brings up a good point. This is a breaking change to the
>>>> existing metric. Do we want to break compatibility, or do we want to add
>>> a
>>>> new metric and (optionally) deprecate the existing metric?
>>>> 
>>>> For reference, in KIP-153 [1], we changed an existing metric without
>>> doing
>>>> proper deprecation.
>>>> 
>>>> However, in KIP-225, we noticed that that we maybe shouldn't have done
>>>> that. For KIP-225 [2], we instead decided to create a new metric, and
>>>> deprecate (but not remove) the old one.
>>>> 
>>>> -James
>>>> 
>>>> [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric
>>>> 
>>>> [2] https://cwiki.apache.org/confluence/pages/viewpage.
>>>> action?pageId=74686649
>>>> 
>>>> 
>>>>> On Mar 21, 2018, at 12:14 AM, Manikumar <manikumar.re...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Can we retain total RequestsPerSec metric and add new version tag
>>> metric?
>>>>> When monitoring with simple jconsole/jmx based tools, It is useful to
>>>> have
>>>>> total metric
>>>>> to monitor request rate.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> On Wed, Mar 21, 2018 at 11:01 AM, Gwen Shapira <g...@confluent.i

Re: [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-22 Thread James Cheng
An example of the metric we are thinking of changing is this:

kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce

And the KIP is proposing to change it to:

kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce,version=1.0.0

Is is possible for the broker to have BOTH metrics? That way, we don’t have to 
change the name.

Would that make querying/aggregating too annoying (since a query for 
name=RequestsPerSec and request=Produce would return BOTH metrics)? 

Also, it might be hard to query for “name=RequestsPerSec and request=Produce 
and version field NOT present”

-James

Sent from my iPhone

> On Mar 21, 2018, at 10:17 PM, Jeff Widman <j...@jeffwidman.com> wrote:
> 
> I agree with Allen.
> 
> Go with the intuitive name, even if it means not deprecating. The impact of
> breakage here is small since it only breaks monitoring and the folks who
> watch their dashboards closely are the ones likely to read the release
> notes carefully and see this change.
> 
>> On Wed, Mar 21, 2018, 3:24 PM Allen Wang <allenxw...@gmail.com> wrote:
>> 
>> I understand the impact to jmx based tools. But adding a new metric is
>> unnecessary for more advanced monitoring systems that can aggregate with or
>> without tags. Duplicating the metric (including the "request" tag) also
>> adds performance cost to the broker, especially for the metric that needs
>> to be updated for every request.
>> 
>> For KIP-225, the choice makes sense because the deprecated metric's name is
>> undesirable anyway and the new metric name is much better than the prefixed
>> metric name. Not the case for RequestsPerSec. It is hard for me to come up
>> with another intuitive name.
>> 
>> Thanks,
>> Allen
>> 
>> 
>> 
>>> On Wed, Mar 21, 2018 at 2:01 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> 
>>> Creating new metric and deprecating existing one seems better from
>>> compatibility point of view.
>>>  Original message From: James Cheng <
>> wushuja...@gmail.com>
>>> Date: 3/21/18  1:39 AM  (GMT-08:00) To: dev@kafka.apache.org Subject:
>> Re:
>>> [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric
>>> Manikumar brings up a good point. This is a breaking change to the
>>> existing metric. Do we want to break compatibility, or do we want to add
>> a
>>> new metric and (optionally) deprecate the existing metric?
>>> 
>>> For reference, in KIP-153 [1], we changed an existing metric without
>> doing
>>> proper deprecation.
>>> 
>>> However, in KIP-225, we noticed that that we maybe shouldn't have done
>>> that. For KIP-225 [2], we instead decided to create a new metric, and
>>> deprecate (but not remove) the old one.
>>> 
>>> -James
>>> 
>>> [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric
>>> 
>>> [2] https://cwiki.apache.org/confluence/pages/viewpage.
>>> action?pageId=74686649
>>> 
>>> 
>>>> On Mar 21, 2018, at 12:14 AM, Manikumar <manikumar.re...@gmail.com>
>>> wrote:
>>>> 
>>>> Can we retain total RequestsPerSec metric and add new version tag
>> metric?
>>>> When monitoring with simple jconsole/jmx based tools, It is useful to
>>> have
>>>> total metric
>>>> to monitor request rate.
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> On Wed, Mar 21, 2018 at 11:01 AM, Gwen Shapira <g...@confluent.io>
>>> wrote:
>>>> 
>>>>> I love this. Not much to add - it is an elegant solution, clean
>>>>> implementation and it addresses a real need, especially during
>> upgrades.
>>>>> 
>>>>>> On Tue, Mar 20, 2018 at 2:49 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>> 
>>>>>> Thanks for the response.
>>>>>> 
>>>>>> Assuming number of client versions is limited in a cluster, memory
>>>>>> consumption is not a concern.
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> On Tue, Mar 20, 2018 at 10:47 AM, Allen Wang <allenxw...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi Ted,
>>>>>>> 
>>>>>>> The additional hash map is very small, possibly a few KB. Each
>> request
>>>>>> type
>>>&

Re: [DISCUSS] KIP-268: Simplify Kafka Streams Rebalance Metadata Upgrade

2018-03-22 Thread James Cheng



> On Mar 21, 2018, at 11:18 AM, Matthias J. Sax <matth...@confluent.io> wrote:
> 
> Thanks for following up James.
> 
>> Is this the procedure that happens during every rebalance? The reason I ask 
>> is that this step:
>>>>> As long as the leader (before or after upgrade) receives at least
> one old version X Subscription it always sends version Assignment X back
> (the encoded supported version is X before the leader is upgrade and Y
> after the leader is upgraded).
> 
> Yes, that would be the consequence.
> 
>> This implies that the leader receives all Subscriptions before sending back 
>> any responses. Is that what actually happens? Is it possible that it would 
>> receive say 4 out of 5 Subscriptions of Y, send back a response Y, and then 
>> later receive a Subscription X? What happens in that case? Would that 
>> Subscription X then trigger another rebalance, and the whole thing starts 
>> again?
> 
> That sounds correct. A 'delayed' Subscription could always happen --
> even before KIP-268 -- and would trigger a new rebalance. With this
> regard, the behavior does not change. The difference is, that we would
> automatically downgrade the Assignment from Y to X again -- but the
> application would not fail (as it would before the KIP).
> 
> Do you see an issue with this behavior. The idea of the design is to
> make Kafka Streams robust against those scenarios. Thus, if 4 apps are
> upgraded but no.5 is not yet and no.5 is late, Kafka Streams would first
> upgrade from X to Y and downgrade from Y to X in the second rebalance
> when no.5 joins the group. If no.5 gets upgraded, a third rebalance
> would upgrade to Y again.
> 

Sounds good. 


> Thus, as long as not all instances are on the newest version,
> upgrades/donwgrades of the exchanged rebalance metadata could happen
> multiple times. However, this should not be an issue from my understanding.

About “this should not be an issue”: this upgrade/downgrade is just about the 
rebalance metadata, right? Are there other associated things that will also 
have to upgrade/downgrade in sync with the rebalance metadata? For example, the 
idea for this KIP originally came up during the discussion about adding 
timestamps to RockDB state stores, which required updating the on-disk schema. 
In the case of an updated RocksDB state store but with a downgraded rebalance 
metadata... that should work, right? Because we still have updated code (which 
understands the on-disk format) but that it simply gets its partition 
assignments via the downgraded rebalance metadata?

Thanks,
-James

Sent from my iPhone

> Let us know what you think about it.
> 
> 
> -Matthias
> 
> 
>> On 3/20/18 11:10 PM, James Cheng wrote:
>> Sorry, I see that the VOTE started already, but I have a late question on 
>> this KIP.
>> 
>> In the "version probing" protocol:
>>> Detailed upgrade protocol from metadata version X to Y (with X >= 1.2):
>>> On startup/rolling-bounce, an instance does not know what version the 
>>> leader understands and (optimistically) sends an Subscription with the 
>>> latest version Y
>>> (Old, ie, not yet upgraded) Leader sends empty Assignment back to the 
>>> corresponding instance that sent the newer Subscription it does not 
>>> understand. The Assignment metadata only encodes both version numbers 
>>> (used-version == supported-version) as leader's supported-version X.
>>> For all other instances the leader sends a regular Assignment in version X 
>>> back.
>>> If an upgrade follower sends new version number Y Subscription and receives 
>>> version X  Assignment with "supported-version = X", it can downgrade to X 
>>> (in-memory flag) and resends a new Subscription with old version X to retry 
>>> joining the group. To force an immediate second rebalance, the follower 
>>> does an "unsubscribe()/subscribe()/poll()" sequence.
>>> As long as the leader (before or after upgrade) receives at least one old 
>>> version X Subscription it always sends version Assignment X back (the 
>>> encoded supported version is X before the leader is upgrade and Y after the 
>>> leader is upgraded).
>>> If an upgraded instance receives an Assigment it always checks the leaders 
>>> supported-version and update its downgraded "used-version" if possible
>> 
>> Is this the procedure that happens during every rebalance? The reason I ask 
>> is that this step:
>>>> As long as the leader (before or after upgrade) receives at least one old 
>>>> version X Subscription it always sends version Assignment X ba

Re: [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-21 Thread James Cheng
Manikumar brings up a good point. This is a breaking change to the existing 
metric. Do we want to break compatibility, or do we want to add a new metric 
and (optionally) deprecate the existing metric?

For reference, in KIP-153 [1], we changed an existing metric without doing 
proper deprecation.

However, in KIP-225, we noticed that that we maybe shouldn't have done that. 
For KIP-225 [2], we instead decided to create a new metric, and deprecate (but 
not remove) the old one.

-James

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric

[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74686649


> On Mar 21, 2018, at 12:14 AM, Manikumar  wrote:
> 
> Can we retain total RequestsPerSec metric and add new version tag metric?
> When monitoring with simple jconsole/jmx based tools, It is useful to have
> total metric
> to monitor request rate.
> 
> 
> Thanks,
> 
> On Wed, Mar 21, 2018 at 11:01 AM, Gwen Shapira  wrote:
> 
>> I love this. Not much to add - it is an elegant solution, clean
>> implementation and it addresses a real need, especially during upgrades.
>> 
>> On Tue, Mar 20, 2018 at 2:49 PM, Ted Yu  wrote:
>> 
>>> Thanks for the response.
>>> 
>>> Assuming number of client versions is limited in a cluster, memory
>>> consumption is not a concern.
>>> 
>>> Cheers
>>> 
>>> On Tue, Mar 20, 2018 at 10:47 AM, Allen Wang 
>> wrote:
>>> 
 Hi Ted,
 
 The additional hash map is very small, possibly a few KB. Each request
>>> type
 ("produce", "fetch", etc.) will have such a map which have a few
>> entries
 depending on the client API versions the broker will encounter. So if
 broker encounters two client versions for "produce", there will be two
 entries in the map for "produce" requests mapping from version to
>> meter.
>>> Of
 course, hash map always have additional memory overhead.
 
 Thanks,
 Allen
 
 
 On Mon, Mar 19, 2018 at 3:49 PM, Ted Yu  wrote:
 
> bq. *additional hash lookup is needed when updating the metric to
>>> locate
> the metric *
> 
> *Do you have estimate how much memory is needed for maintaining the
>>> hash
> map ?*
> 
> *Thanks*
> 
> On Mon, Mar 19, 2018 at 3:19 PM, Allen Wang 
 wrote:
> 
>> Hi all,
>> 
>> I have created KIP-272: Add API version tag to broker's
>>> RequestsPerSec
>> metric.
>> 
>> Here is the link to the KIP:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
>> 
>> Looking forward to the discussion.
>> 
>> Thanks,
>> Allen
>> 
> 
 
>>> 
>> 
>> 
>> 
>> --
>> *Gwen Shapira*
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter  | blog
>> 
>> 



Re: [DISCUSS] KIP-268: Simplify Kafka Streams Rebalance Metadata Upgrade

2018-03-21 Thread James Cheng
Sorry, I see that the VOTE started already, but I have a late question on this 
KIP.

In the "version probing" protocol:
> Detailed upgrade protocol from metadata version X to Y (with X >= 1.2):
> On startup/rolling-bounce, an instance does not know what version the leader 
> understands and (optimistically) sends an Subscription with the latest 
> version Y
> (Old, ie, not yet upgraded) Leader sends empty Assignment back to the 
> corresponding instance that sent the newer Subscription it does not 
> understand. The Assignment metadata only encodes both version numbers 
> (used-version == supported-version) as leader's supported-version X.
> For all other instances the leader sends a regular Assignment in version X 
> back.
> If an upgrade follower sends new version number Y Subscription and receives 
> version X  Assignment with "supported-version = X", it can downgrade to X 
> (in-memory flag) and resends a new Subscription with old version X to retry 
> joining the group. To force an immediate second rebalance, the follower does 
> an "unsubscribe()/subscribe()/poll()" sequence.
> As long as the leader (before or after upgrade) receives at least one old 
> version X Subscription it always sends version Assignment X back (the encoded 
> supported version is X before the leader is upgrade and Y after the leader is 
> upgraded).
> If an upgraded instance receives an Assigment it always checks the leaders 
> supported-version and update its downgraded "used-version" if possible

Is this the procedure that happens during every rebalance? The reason I ask is 
that this step:
>> As long as the leader (before or after upgrade) receives at least one old 
>> version X Subscription it always sends version Assignment X back (the 
>> encoded supported version is X before the leader is upgrade and Y after the 
>> leader is upgraded).

This implies that the leader receives all Subscriptions before sending back any 
responses. Is that what actually happens? Is it possible that it would receive 
say 4 out of 5 Subscriptions of Y, send back a response Y, and then later 
receive a Subscription X? What happens in that case? Would that Subscription X 
then trigger another rebalance, and the whole thing starts again?

Thanks,
-James

> On Mar 19, 2018, at 5:04 PM, Matthias J. Sax  wrote:
> 
> Guozhang,
> 
> thanks for your comments.
> 
> 2: I think my main concern is, that 1.2 would be "special" release that
> everybody need to use to upgrade. As an alternative, we could say that
> we add the config in 1.2 and keep it for 2 additional releases (1.3 and
> 1.4) but remove it in 1.5. This gives users more flexibility and does
> force not force user to upgrade to a specific version but also allows us
> to not carry the tech debt forever. WDYT about this? If users upgrade on
> an regular basis, this approach could avoid a forces update with high
> probability as the will upgrade to either 1.2/1.3/1.4 anyway at some
> point. Thus, only if users don't upgrade for a very long time, they are
> forces to do 2 upgrades with an intermediate version.
> 
> 4. Updated the KIP to remove the ".x" suffix
> 
> 5. Updated the KIP accordingly.
> 
> -Matthias
> 
> On 3/19/18 10:33 AM, Guozhang Wang wrote:
>> Yup :)
>> 
>> On Mon, Mar 19, 2018 at 10:01 AM, Ted Yu  wrote:
>> 
>>> bq. some snippet like ProduceRequest / ProduceRequest
>>> 
>>> Did you mean ProduceRequest / Response ?
>>> 
>>> Cheers
>>> 
>>> On Mon, Mar 19, 2018 at 9:51 AM, Guozhang Wang  wrote:
>>> 
 Hi Matthias,
 
 About 2: yeah I guess this is a subjective preference. My main concern
 about keeping the config / handling code beyond 1.2 release is that it
>>> will
 become a non-cleanable tech debt forever, as fewer and fewer users would
 need to upgrade from 0.10.x and 1.1.x, and eventually we will need to
 maintain this for nearly no one. On the other hand, I agree that this
>>> tech
 debt is not too large. So if more people feel this is a good tradeoff to
 pay for not enforcing users from older versions to upgrade twice I'm
>>> happen
 to change my opinion.
 
 A few more minor comments:
 
 4. For the values of "upgrade.from", could we simply to only major.minor?
 I.e. "0.10.0" than "0.10.0.x" ? Since we never changed compatibility
 behavior in bug fix releases we would not need to specify a bug-fix
>>> version
 to distinguish ever.
 
 5. Could you also present the encoding format in subscription /
>>> assignment
 metadata bytes in version 2, and in future versions (i.e. which first
>>> bytes
 would be kept moving forward), for readers to better understand the
 proposal? some snippet like ProduceRequest / ProduceRequest in
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
 would be very helpful.
 
 
 
 Guozhang
 
 
 On Fri, Mar 

Re: Log Retention Period of Kafka Messages

2018-03-12 Thread James Cheng


> On Mar 12, 2018, at 10:13 AM, Kyle Tinker  
> wrote:
> 
> You have a couple options:
> 1) You can adjust log.segment.bytes to make the segments smaller so that 
> individual segments can be trimmed
> 2) You can set log.roll.hours, which will roll to a new log segment even if 
> the size hasn't been reached
> Note: #1 and #2 also have per-topic setting controls
> 

About #2, it's possible that segments will only roll when a message has been 
received, after log.roll.hours (or log.roll.ms, etc). That is, if the time has 
passed but no new message has been received, the log roll might not happen.

You should do some testing to double check.

-James


> https://kafka.apache.org/08/documentation/#brokerconfigs
> 
> Thanks,
> --
> 
> 
> KYLE TINKER
> Lead Architect/Team Lead | WorkForce Software 
> 
> T: +1 734 742 2616 | E: ktin...@workforcesoftware.com
> 
> 
> -Original Message-
> From: Muruganandham, Ashokkumar [mailto:ashokkumar.muruganand...@fluke.com] 
> Sent: Monday, March 12, 2018 1:19 AM
> To: dev@kafka.apache.org
> Subject: Log Retention Period of Kafka Messages
> 
> Hi Team,
> 
> Form the documentation I could see that log retention period is relative to 
> the partition segments. So unless the segment is closed (1Gb/1 Week) the 
> message will never be deleted.
> 
> Consider I have a very critical use case where I will need to delete the data 
> every day and my segment reaches 1GB after three days. So according the above 
> statement, then there is no way I could configure Kafka to delete the data in 
> one day.
> 
> Is my understanding correct ?
> 
> Regards,
> AK
> 
> 
> Please be advised that this email may contain confidential information. If 
> you are not the intended recipient, please notify us by email by replying to 
> the sender and delete this message. The sender disclaims that the content of 
> this email constitutes an offer to enter into, or the acceptance of, any 
> agreement; provided that the foregoing does not invalidate the binding effect 
> of any digital or other electronic reproduction of a manual signature that is 
> included in any attachment.
> 
> 
> 
> This message is intended exclusively for the individual or entity to which it 
> is addressed. This communication may contain information that is proprietary, 
> privileged, confidential or otherwise legally exempt from disclosure. If you 
> are not the named addressee, or have been inadvertently and erroneously 
> referenced in the address line, you are not authorized to read, print, 
> retain, copy or disseminate this message or any part of it. If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete all copies of the message. (ID m031214)



Re: [DISCUSS] KIP-258: Allow to Store Record Timestamps in RocksDB

2018-03-09 Thread James Cheng
Matthias,

For all the upgrade paths, is it possible to get rid of the 2nd rolling bounce?

For the in-place upgrade, it seems like primary difference between the 1st 
rolling bounce and the 2nd rolling bounce is to decide whether to send 
Subscription Version 2 or Subscription Version 3.  (Actually, there is another 
difference mentioned in that the KIP says that the 2nd rolling bounce should 
happen after all new state stores are created by the background thread. 
However, within the 2nd rolling bounce, we say that there is still a background 
thread, so it seems like is no actual requirement to wait for the new state 
stores to be created.)

The 2nd rolling bounce already knows how to deal with mixed-mode (having both 
Version 2 and Version 3 in the same consumer group). It seems like we could get 
rid of the 2nd bounce if we added logic (somehow/somewhere) such that:
* Instances send Subscription Version 2 until all instances are running the new 
code.
* Once all the instances are running the new code, then one at a time, the 
instances start sending Subscription V3. Leader still hands out Assignment 
Version 2, until all new state stores are ready.
* Once all instances report that new stores are ready, Leader sends out 
Assignment Version 3.
* Once an instance receives an Assignment Version 3, it can delete the old 
state store.

Doing it that way seems like it would reduce a lot of operator/deployment 
overhead. No need to do 2 rolling restarts. No need to monitor logs for state 
store rebuild. You just deploy it, and the instances update themselves.

What do you think?

The thing that made me think of this is that the "2 rolling bounces" is similar 
to what Kafka brokers have to do changes in inter.broker.protocol.version and 
log.message.format.version. And in the broker case, it seems like it would be 
possible (with some work of course) to modify kafka to allow us to do similar 
auto-detection of broker capabilities and automatically do a switchover from 
old/new versions. 

-James


> On Mar 9, 2018, at 10:38 AM, Bill Bejeck  wrote:
> 
> Matthias,
> 
> Thanks for the KIP, it's a +1 from me.
> 
> I do have one question regarding the retrieval methods on the new
> interfaces.
> 
> Would want to consider adding one method with a Predicate that would allow
> for filtering records by the timestamp stored with the record?  Or is this
> better left for users to implement themselves once the data has been
> retrieved?
> 
> Thanks,
> Bill
> 
> On Thu, Mar 8, 2018 at 7:14 PM, Ted Yu  wrote:
> 
>> Matthias:
>> For my point #1, I don't have preference as to which separator is chosen.
>> Given the background you mentioned, current choice is good.
>> 
>> For #2, I think my proposal is better since it is closer to English
>> grammar.
>> 
>> Would be good to listen to what other people think.
>> 
>> On Thu, Mar 8, 2018 at 4:02 PM, Matthias J. Sax 
>> wrote:
>> 
>>> Thanks for the comments!
>>> 
>>> @Guozhang:
>>> 
>>> So far, there is one PR for the rebalance metadata upgrade fix
>>> (addressing the mentioned
>>> https://issues.apache.org/jira/browse/KAFKA-6054) It give a first
>>> impression how the metadata upgrade works including a system test:
>>> https://github.com/apache/kafka/pull/4636
>>> 
>>> I can share other PRs as soon as they are ready. I agree that the KIP is
>>> complex am I ok with putting out more code to give better discussion
>>> context.
>>> 
>>> @Ted:
>>> 
>>> I picked `_` instead of `-` to align with the `processing.guarantee`
>>> parameter that accepts `at_least_one` and `exactly_once` as values.
>>> Personally, I don't care about underscore vs dash but I prefer
>>> consistency. If you feel strong about it, we can also change it to `-`.
>>> 
>>> About the interface name: I am fine either way -- I stripped the `With`
>>> to keep the name a little shorter. Would be good to get feedback from
>>> others and pick the name the majority prefers.
>>> 
>>> @John:
>>> 
>>> We can certainly change it. I agree that it would not make a difference.
>>> I'll dig into the code to see if any of the two version might introduce
>>> undesired complexity and update the KIP if I don't hit an issue with
>>> putting the `-v2` to the store directory instead of `rocksdb-v2`
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> On 3/8/18 2:44 PM, John Roesler wrote:
 Hey Matthias,
 
 The KIP looks good to me. I had several questions queued up, but they
>>> were
 all in the "rejected alternatives" section... oh, well.
 
 One very minor thought re changing the state directory from
>>> "//<
 application.id>//rocksdb/storeName/" to "//<
 application.id>//rocksdb-v2/storeName/": if you put the "v2"
 marker on the storeName part of the path (i.e., "//<
 application.id>//rocksdb/storeName-v2/"), then you get the
>> same
 benefits without altering the high-level directory structure.
 
 It may not matter, but I could imagine 

Re: [VOTE] KIP-186: Increase offsets retention default to 7 days

2018-03-07 Thread James Cheng
+1 (non-binding)

-James

> On Mar 7, 2018, at 1:20 PM, Jay Kreps  wrote:
> 
> +1
> 
> I think we can improve this in the future, but this simple change will
> avoid a lot of pain. Thanks for reviving it Ewen.
> 
> -Jay
> 
> On Mon, Mar 5, 2018 at 11:35 AM, Ewen Cheslack-Postava 
> wrote:
> 
>> I'd like to kick off voting for KIP-186:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 186%3A+Increase+offsets+retention+default+to+7+days
>> 
>> This is the trivial fix that people in the DISCUSS thread were in favor of.
>> There are some ideas for further refinements, but I think we can follow up
>> with those in subsequent KIPs, see the discussion thread for details. Also
>> note this is related, but complementary, to
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
>> .
>> 
>> And of course +1 (binding) from me.
>> 
>> Thanks,
>> Ewen
>> 



Re: [ANNOUNCE] Apache Kafka 1.0.1 Released

2018-03-06 Thread James Cheng
Congrats, everyone! Thanks for driving the release, Ewen!

-James

> On Mar 6, 2018, at 1:22 PM, Guozhang Wang  wrote:
> 
> Ewen, thanks for driving the release!!
> 
> 
> Guozhang
> 
> On Tue, Mar 6, 2018 at 1:14 PM, Ewen Cheslack-Postava  wrote:
> 
>> The Apache Kafka community is pleased to announce the release for Apache
>> Kafka
>> 1.0.1.
>> 
>> This is a bugfix release for the 1.0 branch that was first released with
>> 1.0.0 about 4 months ago. We've fixed 49 issues since that release. Most of
>> these are non-critical, but in aggregate these fixes will have significant
>> impact. A few of the more significant fixes include:
>> 
>> * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
>> plugins
>> * KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
>> down conversion
>> * KAFKA-6269: KTable state restore fails after rebalance
>> * KAFKA-6190: GlobalKTable never finishes restoring when consuming
>> transactional messages
>> * KAFKA-6529: Stop file descriptor leak when client disconnects with
>> staged receives
>> * KAFKA-6238: Issues with protocol version when applying a rolling upgrade
>> to 1.0.0
>> 
>> 
>> All of the changes in this release can be found in the release notes:
>> 
>> 
>> https://dist.apache.org/repos/dist/release/kafka/1.0.1/RELEASE_NOTES.html
>> 
>> 
>> 
>> You can download the source release from:
>> 
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/
>> kafka-1.0.1-src.tgz
>> 
>> 
>> and binary releases from:
>> 
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/
>> kafka_2.11-1.0.1.tgz
>> (Scala 2.11)
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/
>> kafka_2.12-1.0.1.tgz
>> (Scala 2.12)
>> 
>> 
>> ---
>> 
>> 
>> Apache Kafka is a distributed streaming platform with four core APIs:
>> 
>> 
>> ** The Producer API allows an application to publish a stream records to
>> one or more Kafka topics.
>> 
>> 
>> ** The Consumer API allows an application to subscribe to one or more
>> topics and process the stream of records produced to them.
>> 
>> 
>> ** The Streams API allows an application to act as a stream processor,
>> consuming an input stream from one or more topics and producing an output
>> stream to one or more output topics, effectively transforming the input
>> streams to output streams.
>> 
>> 
>> ** The Connector API allows building and running reusable producers or
>> consumers that connect Kafka topics to existing applications or data
>> systems. For example, a connector to a relational database might capture
>> every change to a table.three key capabilities:
>> 
>> 
>> 
>> With these APIs, Kafka can be used for two broad classes of application:
>> 
>> 
>> ** Building real-time streaming data pipelines that reliably get data
>> between systems or applications.
>> 
>> 
>> ** Building real-time streaming applications that transform or react to the
>> streams of data.
>> 
>> 
>> 
>> Apache Kafka is in use at large and small companies worldwide, including
>> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
>> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>> 
>> 
>> 
>> A big thank you for the following 36 contributors to this release!
>> 
>> Alex Good, Andras Beni, Andy Bryant, Arjun Satish, Bill Bejeck, Colin P.
>> Mccabe, Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, Daniel
>> Wojda, Dong Lin, Edoardo Comar, Ewen Cheslack-Postava, Filipe Agapito,
>> fredfp, Guozhang Wang, huxihx, Ismael Juma, Jason Gustafson, Jeremy
>> Custenborder, Jiangjie (Becket) Qin, Joel Hamill, Konstantine Karantasis,
>> lisa2lisa, Logan Buckley, Manjula K, Matthias J. Sax, Nick Chiu, parafiend,
>> Rajini Sivaram, Randall Hauch, Robert Yokota, Ron Dagostino, tedyu,
>> Yaswanth Kumar, Yu.
>> 
>> 
>> We welcome your help and feedback. For more information on how to
>> report problems,
>> and to get involved, visit the project website at http://kafka.apache.org/
>> 
>> 
>> Thank you!
>> Ewen
>> 
> 
> 
> 
> -- 
> -- Guozhang



Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2018-02-01 Thread James Cheng
Vahid,

Under rejected alternatives, we had decided that we did NOT want to do 
per-partition expiration, and instead we wait until the entire group is empty 
and then (after the right time has passed) expire the entire group at once.

I thought of one scenario that might benefit from per-partition expiration.

Let's say I have topics A B C... Z. So, I have 26 topics, all of them single 
partition, so 26 partitions. Let's say I have mirrormaker mirroring those 26 
topics. The group will then have 26 committed offsets.

Let's say I then change the whitelist on mirrormaker so that it only mirrors 
topic Z, but I keep the same consumer group name. (I imagine that is a common 
thing to do?)

With the proposed design for this KIP, the committed offsets for topics A 
through Y will stay around as long as this mirroring group name exists.

In the current implementation that already exists (prior to this KIP), I belive 
that committed offsets for topics A through Y will expire.

How much do we care about this case?

-James

> On Jan 23, 2018, at 11:44 PM, Jeff Widman  wrote:
> 
> Bumping this as I'd like to see it land...
> 
> It's one of the "features" that tends to catch Kafka n00bs unawares and
> typically results in message skippage/loss, vs the proposed solution is
> much more intuitive behavior.
> 
> Plus it's more wire efficient because consumers no longer need to commit
> offsets for partitions that have no new messages just to keep those offsets
> alive.
> 
> On Fri, Jan 12, 2018 at 10:21 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
>> There has been no further discussion on this KIP for about two months.
>> So I thought I'd provide the scoop hoping it would spark additional
>> feedback and move the KIP forward.
>> 
>> The KIP proposes a method to preserve group offsets as long as the group
>> is not in Empty state (even when offsets are committed very rarely), and
>> start the offset expiration of the group as soon as the group becomes
>> Empty.
>> It suggests dropping the `retention_time` field from the `OffsetCommit`
>> request and, instead, enforcing it via the broker config
>> `offsets.retention.minutes` for all groups. In other words, all groups
>> will have the same retention time.
>> The KIP presumes that this global retention config would suffice common
>> use cases and does not lead to, e.g., unmanageable offset cache size (for
>> groups that don't need to stay around that long). It suggests opening
>> another KIP if this global retention setting proves to be problematic in
>> the future. It was suggested earlier in the discussion thread that the KIP
>> should propose a per-group retention config to circumvent this risk.
>> 
>> I look forward to hearing your thoughts. Thanks!
>> 
>> --Vahid
>> 
>> 
>> 
>> 
>> From:   "Vahid S Hashemian" 
>> To: dev 
>> Date:   10/18/2017 04:45 PM
>> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of Consumer
>> Group Offsets
>> 
>> 
>> 
>> Hi all,
>> 
>> I created a KIP to address the group offset expiration issue reported in
>> KAFKA-4682:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>> apache.org_confluence_display_KAFKA_KIP-2D211-253A-2BRevise-
>> 2BExpiration-2BSemantics-2Bof-2BConsumer-2BGroup-2BOffsets&
>> d=DwIFAg=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
>> kjJc7uSVcviKUc=JkzH_2jfSMhCUPMk3rUasrjDAId6xbAEmX7_shSYdU4=
>> UBu7D2Obulg0fterYxL5m8xrDWkF_O2kGlygTCWsfFc=
>> 
>> 
>> Your feedback is welcome!
>> 
>> Thanks.
>> --Vahid
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> *Jeff Widman*
> jeffwidman.com  | 740-WIDMAN-J (943-6265)
> <><



Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-18 Thread James Cheng
Congrats Rajini!

-James

Sent from my iPhone

> On Jan 17, 2018, at 10:48 AM, Gwen Shapira  wrote:
> 
> Dear Kafka Developers, Users and Fans,
> 
> Rajini Sivaram became a committer in April 2017.  Since then, she remained
> active in the community and contributed major patches, reviews and KIP
> discussions. I am glad to announce that Rajini is now a member of the
> Apache Kafka PMC.
> 
> Congratulations, Rajini and looking forward to your future contributions.
> 
> Gwen, on behalf of Apache Kafka PMC


Re: [VOTE] KIP-247: Add public test utils for Kafka Streams

2018-01-18 Thread James Cheng
+1 (non-binding)

-James

Sent from my iPhone

> On Jan 17, 2018, at 6:09 PM, Matthias J. Sax  wrote:
> 
> Hi,
> 
> I would like to start the vote for KIP-247:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> 
> 
> -Matthias
> 


Re: [ANNOUNCE] New committer: Matthias J. Sax

2018-01-12 Thread James Cheng
Congrats, Matthias!! Well deserved!

-James

> On Jan 12, 2018, at 2:59 PM, Guozhang Wang  wrote:
> 
> Hello everyone,
> 
> The PMC of Apache Kafka is pleased to announce Matthias J. Sax as our
> newest Kafka committer.
> 
> Matthias has made tremendous contributions to Kafka Streams API since early
> 2016. His footprint has been all over the places in Streams: in the past
> two years he has been the main driver on improving the join semantics
> inside Streams DSL, summarizing all their shortcomings and bridging the
> gaps; he has also been largely working on the exactly-once semantics of
> Streams by leveraging on the transaction messaging feature in 0.11.0. In
> addition, Matthias have been very active in community activity that goes
> beyond mailing list: he's getting the close to 1000 up votes and 100
> helpful flags on SO for answering almost all questions about Kafka Streams.
> 
> Thank you for your contribution and welcome to Apache Kafka, Matthias!
> 
> 
> 
> Guozhang, on behalf of the Apache Kafka PMC



Pull Request emails are not being sent to the mailing list

2018-01-02 Thread James Cheng
Since Dec 21st, it seems that email notifications of Github pull requests are 
not being sent to the dev mailing list. Is this a bug, or have PR emails moved 
to somewhere else? If it’s a bug, can we look into this?

The last email PR that was sent was this one: 
http://mail-archives.apache.org/mod_mbox/kafka-dev/201712.mbox/%3cgit-pr-4351-ka...@git.apache.org%3e
 


And there have been a bunch of PR’s created since then, and their emails don’t 
appear to be in the mailing list. https://github.com/apache/kafka/pulls 


Is this related to the Gitbox transition? It looks like the the Gitbox 
transition happened just 3 hours after that last PR email. 
http://mail-archives.apache.org/mod_mbox/kafka-dev/201712.mbox/%3cCAD5tkZZqHcM009kEPx7ccSy=c_bnaol4mok1pgsesei-vnh...@mail.gmail.com%3e
 


Thanks,
-James



[jira] [Created] (KAFKA-6312) Add documentation about kafka-consumer-groups.sh's ability to set/change offsets

2017-12-05 Thread James Cheng (JIRA)
James Cheng created KAFKA-6312:
--

 Summary: Add documentation about kafka-consumer-groups.sh's 
ability to set/change offsets
 Key: KAFKA-6312
 URL: https://issues.apache.org/jira/browse/KAFKA-6312
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: James Cheng


KIP-122 added the ability for kafka-consumer-groups.sh to reset/change consumer 
offsets, at a fine grained level.

There is documentation on it in the kafka-consumer-groups.sh usage text. 

There is no such documentation on the kafka.apache.org website. We should add 
some documentation to the website, so that users can read about the 
functionality without having the tools installed.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] KIP-227: Introduce Incremental FetchRequests to Increase Partition Scalability

2017-11-22 Thread James Cheng
I think the discussion may have already cover this but just in case...

How does the leader decide when a newly written message is "committed" enough 
to hand out to consumers?

When a message is produced and is stored to the disk of the leader, the message 
is not considered "committed" until it has hit all replicas in the ISR. Only at 
that point will the leader decide to hand out the message to normal consumers.

In the current protocol, I believe the leader has to wait for 2 fetch requests 
from a follower before it considers the message committed: One to fetch the 
uncommitted message, and another to fetch anything after that. It is the fetch 
offset in the 2nd fetch that tells the leader that the follower now has the 
uncommitted message.

As an example:
1a. Newly produced messages at offsets 10,11,12. Saved to leader, not yet 
replicated to followers.
2a. Follower asks for messages starting at offset 10. Leader hands out messages 
10,11,12
3a. Follower asks for messages starting at offset 13. Based on that fetch 
request, the leader concludes that the follower already has messages 10,11,12, 
and so will now hand messages 10,11,12 out to consumers.

How will the new protocol handle that? How will the leader know that the 
follower already has messages 10,11,12?

In particular, how will the new protocol handle the case when not all 
partitions are returned in each request?

Another example:
1b. Newly produced messages to topic A at offsets 10,11,12. Saved to leader, 
not yet replicated to followers.
2b. Newly produced 1MB message to topic B at offset 100. Saved to leader, not 
yet replicated to follower.
3b. Follower asks for messages from topic A starting at offset 10, and messages 
from topic B starting at offset 100.
4b. Leader decides to send to the follower the 1MB message at topic B offset 
100. Due to replica.fetch.max.bytes, it only sends that single message to the 
follower.
5b. Follower asks for messages from topic A starting at offset 10, and messages 
from topic B starting at offset 101. Leader concludes that topic B offset 100 
has been replicated and so can be handed out to consumers. Topic A messages 
10,11,12 are not yet replicated and so cannot yet be handled out to consumers.

In this particular case, the follower made no progress on replicating the new 
messages from topic A.

How will the new protocol handle this scenario?

-James

> On Nov 22, 2017, at 7:54 PM, Colin McCabe  wrote:
> 
> Oh, I see the issue now.  The broker uses sendfile() and sends some
> message data without knowing what the ending offset is.  To learn that,
> we would need another index access.
> However, when we do that index->offset lookup, we know that the next offset-
>> index lookup (done in the following fetch request) will be for the same
> offset.  So we should be able to cache the result (the index).  Also:
> Does the operating system’s page cache help us here?
> Best,
> Colin
> 
> On Wed, Nov 22, 2017, at 16:53, Jun Rao wrote:
>> Hi, Colin,
>> 
>> After step 3a, do we need to update the cached offset in the
>> leader to be> the last offset in the data returned in the fetch response? If 
>> so, we> need
>> another offset index lookup since the leader only knows that it
>> gives out> X
>> bytes in the fetch response, but not the last offset in those X bytes.>
>> Thanks,
>> 
>> Jun
>> 
>> On Wed, Nov 22, 2017 at 4:01 PM, Colin McCabe
>>  wrote:>
>>> On Wed, Nov 22, 2017, at 14:09, Jun Rao wrote:
 Hi, Colin,
 
 When fetching data for a partition, the leader needs to
 translate the> > > fetch offset to a position in a log segment with an 
 index lookup.
 If the> > fetch
 request now also needs to cache the offset for the next fetch
 request,> > > there will be an extra offset index lookup.
>>> 
>>> Hmm.  So the way I was thinking about it was, with an
>>> incremental fetch> > request, for each partition:
>>> 
>>> 1a. the leader consults its cache to find the offset it needs to
>>> use for> > the fetch request
>>> 2a. the leader performs a lookup to translate the offset to a
>>> file index> > 3a. the leader reads the data from the file
>>> 
>>> In contrast, with a full fetch request, for each partition:
>>> 
>>> 1b. the leader looks at the FetchRequest to find the offset it
>>> needs to> > use for the fetch request
>>> 2b. the leader performs a lookup to translate the offset to a
>>> file index> > 3b. the leader reads the data from the file
>>> 
>>> It seems like there is only one offset index lookup in both
>>> cases?  The> > key point is that the cache in step #1a is not stored on 
>>> disk.
>>> Or maybe> > I'm missing something here.
>>> 
>>> best,
>>> Colin
>>> 
>>> 
 The offset index lookup can
 potentially be expensive since it could require disk I/Os. One
 way to> > > optimize this a bit is to further cache the log segment 
 position
 for the> > > next offset. The tricky issue is that for a compacted topic, 

Re: [DISCUSS] KIP-225 - Use tags for consumer “records.lag” metrics

2017-11-16 Thread James Cheng
Ah, that's a great point. KIP-153 didn't *rename* the metric but changed its 
meaning, yet we didn't seem to discuss compatibility much when we made that 
change.

If the Kafka devs can comment on the backwards-compatibility-ness of metrics 
and how we treat that, that would be helpful.

-James

> On Nov 16, 2017, at 2:06 AM, charly molter <charly.mol...@gmail.com> wrote:
> 
> Yes James you are right.
> I wasn't sure what to do about it and followed what happened with BytesOut
> in KIP-153 which completely changed meaning without any deprecation window.
> I'm happy to adapt my KIP if the community thinks we should duplicate the
> metric for a while.
> 
> Thanks!
> 
> On Thu, Nov 16, 2017 at 8:13 AM, James Cheng <wushuja...@gmail.com> wrote:
> 
>> This KIP will break backwards compatibility for anyone who is using the
>> existing attribute names.
>> 
>> Kafka devs, I believe that metrics are a supported interface, and so this
>> would be a breaking change. In order to do this, we would need a
>> deprecation timeframe for the old metric, and a transition plan to the new
>> name. Is that right? I'm not sure how we deprecate metrics...
>> 
>> During the deprecation timeframe, we could duplicate the metric to the new
>> name.
>> 
>> -James
>> 
>> On Nov 13, 2017, at 6:09 AM, charly molter <charly.mol...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> There doesn't seem to be much opposition to this KIP, I'll leave a couple
>>> more days before starting the vote.
>>> 
>>> Thanks!
>>> 
>>> On Thu, Nov 9, 2017 at 1:59 PM, charly molter <charly.mol...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'd like to start the discussion on KIP-225.
>>>> 
>>>> This KIP tries to correct the way the consumer lag metrics are reported
>> to
>>>> use built in tags from MetricName.
>>>> 
>>>> Here's the link:
>>>> https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=74686649
>>>> 
>>>> Thanks!
>>>> --
>>>> Charly Molter
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Charly Molter
>> 
>> 
> 
> 
> -- 
> Charly Molter



Re: [DISCUSS] KIP-225 - Use tags for consumer “records.lag” metrics

2017-11-16 Thread James Cheng
This KIP will break backwards compatibility for anyone who is using the 
existing attribute names.

Kafka devs, I believe that metrics are a supported interface, and so this would 
be a breaking change. In order to do this, we would need a deprecation 
timeframe for the old metric, and a transition plan to the new name. Is that 
right? I'm not sure how we deprecate metrics...

During the deprecation timeframe, we could duplicate the metric to the new name.

-James

On Nov 13, 2017, at 6:09 AM, charly molter  wrote:
> 
> Hi,
> 
> There doesn't seem to be much opposition to this KIP, I'll leave a couple
> more days before starting the vote.
> 
> Thanks!
> 
> On Thu, Nov 9, 2017 at 1:59 PM, charly molter 
> wrote:
> 
>> Hi,
>> 
>> I'd like to start the discussion on KIP-225.
>> 
>> This KIP tries to correct the way the consumer lag metrics are reported to
>> use built in tags from MetricName.
>> 
>> Here's the link:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74686649
>> 
>> Thanks!
>> --
>> Charly Molter
>> 
> 
> 
> 
> -- 
> Charly Molter



Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-11-16 Thread James Cheng
How fast does the in-memory cache grow?

As a random datapoint...

10 months ago we set our offsets.retention.minutes to 1 year. So, for the past 
10 months, we essentially have not expired any offsets.

Via JMX, one of our brokers says 
kafka.coordinator.group:type=GroupMetadataManager,name=NumOffsets
Value=153552

I don't know that maps into memory usage. Are the keys dependent on topic names 
and group names?

And, of course, that number is highly dependent on cluster usage, so I'm not 
sure if we are able to generalize anything from it.

-James

> On Nov 15, 2017, at 5:05 PM, Vahid S Hashemian  
> wrote:
> 
> Thanks Jeff.
> 
> I believe the in-memory cache size is currently unbounded.
> As you mentioned the size of this cache on each broker is a factor of the 
> number of consumer groups (whose coordinator is on that broker) and the 
> number of partitions in each group.
> With compaction in mind, the cache size could be manageable even with the 
> current KIP.
> We could also consider implementing KAFKA-4664 to minimize the cache size: 
> https://issues.apache.org/jira/browse/KAFKA-5664.
> 
> It would be great to hear feedback from others (and committers) on this.
> 
> --Vahid
> 
> 
> 
> 
> From:   Jeff Widman 
> To: dev@kafka.apache.org
> Date:   11/15/2017 01:04 PM
> Subject:Re: [DISCUSS] KIP-211: Revise Expiration Semantics of 
> Consumer Group Offsets
> 
> 
> 
> I thought about this scenario as well.
> 
> However, my conclusion was that because __consumer_offsets is a compacted
> topic, this extra clutter from short-lived consumer groups is negligible.
> 
> The disk size is the product of the number of consumer groups and the
> number of partitions in the group's subscription. Typically I'd expect 
> that
> for short-lived consumer groups, that number < 100K.
> 
> The one area I wasn't sure of was how the group coordinator's in-memory
> cache of offsets works. Is it a pull-through cache of unbounded size or
> does it contain all offsets of all groups that use that broker as their
> coordinator? If the latter, possibly there's an OOM risk there. If so,
> might be worth investigating changing the cache design to a bounded size.
> 
> Also, switching to this design means that consumer groups no longer need 
> to
> commit all offsets, they only need to commit the ones that changed. I
> expect in certain cases there will be broker-side performance gains due to
> parsing smaller OffsetCommit requests. For example, due to some bad design
> decisions we have some a couple of topics that have 1500 partitions of
> which ~10% are regularly used. So 90% of the OffsetCommit request
> processing is unnecessary.
> 
> 
> 
> On Wed, Nov 15, 2017 at 11:27 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
>> I'm forwarding this feedback from John to the mailing list, and 
> responding
>> at the same time:
>> 
>> John, thanks for the feedback. I agree that the scenario you described
>> could lead to unnecessary long offset retention for other consumer 
> groups.
>> If we want to address that in this KIP we could either keep the
>> 'retention_time' field in the protocol, or propose a per group retention
>> configuration.
>> 
>> I'd like to ask for feedback from the community on whether we should
>> design and implement a per-group retention configuration as part of this
>> KIP; or keep it simple at this stage and go with one broker level 
> setting
>> only.
>> Thanks in advance for sharing your opinion.
>> 
>> --Vahid
>> 
>> 
>> 
>> 
>> From:   John Crowley 
>> To: vahidhashem...@us.ibm.com
>> Date:   11/15/2017 10:16 AM
>> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of 
> Consumer
>> Group Offsets
>> 
>> 
>> 
>> Sorry for the clutter, first found KAFKA-3806, then -4682, and finally
>> this KIP - they have more detail which I’ll avoid duplicating here.
>> 
>> Think that not starting the expiration until all consumers have ceased,
>> and clearing all offsets at the same time, does clean things up and 
> solves
>> 99% of the original issues - and 100% of my particular concern.
>> 
>> A valid use-case may still have a periodic application - say production
>> applications posting to Topics all week, and then a weekend batch job
>> which consumes all new messages.
>> 
>> Setting offsets.retention.minutes = 10 days does cover this but at the
>> cost of extra clutter if there are other consumer groups which are truly
>> created/used/abandoned on a frequent basis. Being able to set
>> offsets.retention.minutes on a per groupId basis allows this to also be
>> covered cleanly, and makes it visible that these groupIds are a special
>> case.
>> 
>> But relatively minor, and should not delay the original KIP.
>> 
>> Thanks,
>> 
>> John Crowley
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> *Jeff Widman*
> jeffwidman.com <
> 

[jira] [Created] (KAFKA-6197) Difficult to get to the Kafka Streams javadocs

2017-11-09 Thread James Cheng (JIRA)
James Cheng created KAFKA-6197:
--

 Summary: Difficult to get to the Kafka Streams javadocs
 Key: KAFKA-6197
 URL: https://issues.apache.org/jira/browse/KAFKA-6197
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: James Cheng


In order to get to the javadocs for the Kafka producer/consumer/streams, I 
typically go to http://kafka.apache.org/documentation/ and click on either 2.1 
2.2 or 2.3 in the table of contents to go right to appropriate section.

The link for "Streams API" now goes to the (very nice) 
http://kafka.apache.org/10/documentation/streams/. That page doesn't have a 
direct link to the Javadocs anywhere. The examples and guides actually 
frequently mention "See javadocs for details" but there are no direct links to 
it.

If I instead go back to the main page and scroll directly to section 2.3, there 
is still the link to get to the javadocs. But it's harder to jump immediately 
to it. And it's a little confusing that section 2.3 in the table of contents 
does not link you to section 2.3 of the page.

It would be nice if the link to the Streams javadocs was easier to get to.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread James Cheng
Congrats Onur! Well deserved!

-James

> On Nov 6, 2017, at 9:24 AM, Jun Rao  wrote:
> 
> Hi, everyone,
> 
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
> 
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
> 
> Congratulations, Onur!
> 
> Thanks,
> 
> Jun (on behalf of the Apache Kafka PMC)



Re: [ANNOUNCE] Apache Kafka 1.0.0 Released

2017-11-01 Thread James Cheng
ilities:
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data between
> systems or applications.
> 
> ** Building real-time streaming applications that transform or react
> to the streams
> of data.
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> 
> A big thank you for the following 108 contributors to this release!
> 
> Abhishek Mendhekar, Xi Hu, Andras Beni, Andrey Dyachkov, Andy Chambers,
> Apurva Mehta, Armin Braun, Attila Kreiner, Balint Molnar, Bart De Vylder,
> Ben Stopford, Bharat Viswanadham, Bill Bejeck, Boyang Chen, Bryan Baugher,
> Colin P. Mccabe, Koen De Groote, Dale Peakall, Damian Guy, Dana Powers,
> Dejan Stojadinović, Derrick Or, Dong Lin, Zhendong Liu, Dustin Cote,
> Edoardo Comar, Eno Thereska, Erik Kringen, Erkan Unal, Evgeny Veretennikov,
> Ewen Cheslack-Postava, Florian Hussonnois, Janek P, Gregor Uhlenheuer,
> Guozhang Wang, Gwen Shapira, Hamidreza Afzali, Hao Chen, Jiefang He, Holden
> Karau, Hooman Broujerdi, Hugo Louro, Ismael Juma, Jacek Laskowski, Jakub
> Scholz, James Cheng, James Chien, Jan Burkhardt, Jason Gustafson, Jeff
> Chao, Jeff Klukas, Jeff Widman, Jeremy Custenborder, Jeyhun Karimov,
> Jiangjie Qin, Joel Dice, Joel Hamill, Jorge Quilcate Otoya, Kamal C, Kelvin
> Rutt, Kevin Lu, Kevin Sweeney, Konstantine Karantasis, Perry Lee, Magnus
> Edenhill, Manikumar Reddy, Manikumar Reddy O, Manjula Kumar, Mariam John,
> Mario Molina, Matthias J. Sax, Max Zheng, Michael Andre Pearce, Michael
> André Pearce, Michael G. Noll, Michal Borowiecki, Mickael Maison, Nick
> Pillitteri, Oleg Prozorov, Onur Karaman, Paolo Patierno, Pranav Maniar,
> Qihuang Zheng, Radai Rosenblatt, Alex Radzish, Rajini Sivaram, Randall
> Hauch, Richard Yu, Robin Moffatt, Sean McCauliff, Sebastian Gavril, Siva
> Santhalingam, Soenke Liebau, Stephane Maarek, Stephane Roset, Ted Yu,
> Thibaud Chardonnens, Tom Bentley, Tommy Becker, Umesh Chaudhary, Vahid
> Hashemian, Vladimír Kleštinec, Xavier Léauté, Xianyang Liu, Xin Li, Linhua
> Xin
> 
> 
> We welcome your help and feedback. For more information on how to report
> problems, and to get involved, visit the project website at
> http://kafka.apache.org/
> 
> 
> 
> 
> Thanks,
> Guozhang Wang



[jira] [Resolved] (KAFKA-6088) Kafka Consumer slows down when reading from highly compacted topics

2017-10-18 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng resolved KAFKA-6088.

Resolution: Won't Fix

It is fixed in kafka client 0.11.0.0, and 0.11.0.0 clients can be used against 
brokers as far back as 0.10.0.0. So if anyone is affected, they can update 
their kafka clients in order to get the fix. So, we won't issue a patch fix to 
older releases.

> Kafka Consumer slows down when reading from highly compacted topics
> ---
>
> Key: KAFKA-6088
> URL: https://issues.apache.org/jira/browse/KAFKA-6088
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.2.1
>    Reporter: James Cheng
> Fix For: 0.11.0.0
>
>
> Summary of the issue
> -
> We found a performance issue with the Kafka Consumer where it gets less 
> efficient if you have frequent gaps in offsets (which happens when there is 
> lots of compaction on the topic).
> The issue is present in 0.10.2.1 and possibly prior.
> It is fixed in 0.11.0.0.
> Summary of cause
> -
> The fetcher code assumes that there will be no gaps in message offsets. If 
> there are, it does an additional round trip to the broker. For topics with 
> large gaps in offsets, it is possible that most calls to {{poll()}} will 
> generate a roundtrip to the broker.
> Background and details 
> -
> We have a topic with roughly 8 million records. The topic is log compacted. 
> It turns out that most of the initial records in the topic were never 
> overwritten, whereas in the 2nd half of the topic we had lots of overwritten 
> records. That means that for the first part of the topic, there are no gaps 
> in offsets. But in the 2nd part of the topic, there are frequent gaps in the 
> offsets (due to records being compacted away).
> We have a consumer that starts up and reads the entire topic from beginning 
> to end. We noticed that the consumer would read through the first part of the 
> topic very quickly. When it got to the part of the topic with frequent gaps 
> in offsets, consumption rate slowed down dramatically. This slowdown was 
> consistent across multiple runs.
> What is happening is this:
> 1) A call to {{poll()}} happens. The consumer goes to the broker and returns 
> 1MB of data (the default of {{max.partition.fetch.bytes}}). It then returns 
> to the caller just 500 records (the default of {{max.poll.records}}), and 
> keeps the rest of the data in memory to use in future calls to {{poll()}}. 
> 2) Before returning the 500 records, the consumer library records the *next* 
> offset it should return. It does so by taking the offset of the last record, 
> and adds 1 to it. (The offset of the 500th message from the set, plus 1). It 
> calls this the {{nextOffset}}
> 3) The application finishes processing the 500 messages, and makes another 
> call to {{poll()}} happens. During this call, the consumer library does a 
> sanity check. It checks that the first message of the set *it is about to 
> return* has an offset that matches the value of {{nextOffset}}. That is it 
> checks if the 501th record has an offset that is 1 greater than the 500th 
> record.
>   a. If it matches, then it returns an additional 500 records, and 
> increments the {{nextOffset}} to (offset of the 1000th record, plus 1)
>   b. If it doesn't match, then it throws away the remainder of the 1MB of 
> data that it stored in memory in step 1, and it goes back to the broker to 
> fetch an additional 1MB of data, starting at the offset {{nextOffset}}.
> In topics have no gaps (a non-compacted topic), then the code will always hit 
> the 3a code path.
> If the topic has gaps in offsets and the call to {{poll()}} happens to fall 
> onto a gap, then the code will hit code path 3b.
> If the gaps are frequent, then it will frequently hit code path 3b.
> The worst case scenario that can happen is if you have a large number of 
> gaps, and you run with {{max.poll.records=1}}. Every gap will result in a new 
> fetch to the broker. You may possibly end up only processing one message per 
> fetch. Or, said another way, you will end up doing a single fetch for every 
> single message in the partition.
> Repro
> -
> We created a repro. It appears that the bug is in 0.10.2.1, but was fixed in 
> 0.11. I've attached the tarball with all the code and instructions. 
> The repro is:
> 1) Create a single partition topic with log compaction turned on 
> 2) Write messages with the following keys: 1 1 2 2 3 3 4 4 5 5 ... (each 
> message key written twice in a row) 
> 3) Let com

[jira] [Created] (KAFKA-6088) Kafka Consumer slows down when reading from highly compacted topics

2017-10-18 Thread James Cheng (JIRA)
James Cheng created KAFKA-6088:
--

 Summary: Kafka Consumer slows down when reading from highly 
compacted topics
 Key: KAFKA-6088
 URL: https://issues.apache.org/jira/browse/KAFKA-6088
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.2.1
Reporter: James Cheng
 Fix For: 0.11.0.0


Summary of the issue
-
We found a performance issue with the Kafka Consumer where it gets less 
efficient if you have frequent gaps in offsets (which happens when there is 
lots of compaction on the topic).

The issue is present in 0.10.2.1 and possibly prior.

It is fixed in 0.11.0.0.

Summary of cause
-
The fetcher code assumes that there will be no gaps in message offsets. If 
there are, it does an additional round trip to the broker. For topics with 
large gaps in offsets, it is possible that most calls to {{poll()}} will 
generate a roundtrip to the broker.

Background and details 
-
We have a topic with roughly 8 million records. The topic is log compacted. It 
turns out that most of the initial records in the topic were never overwritten, 
whereas in the 2nd half of the topic we had lots of overwritten records. That 
means that for the first part of the topic, there are no gaps in offsets. But 
in the 2nd part of the topic, there are frequent gaps in the offsets (due to 
records being compacted away).

We have a consumer that starts up and reads the entire topic from beginning to 
end. We noticed that the consumer would read through the first part of the 
topic very quickly. When it got to the part of the topic with frequent gaps in 
offsets, consumption rate slowed down dramatically. This slowdown was 
consistent across multiple runs.

What is happening is this:
1) A call to {{poll()}} happens. The consumer goes to the broker and returns 
1MB of data (the default of {{max.partition.fetch.bytes}}). It then returns to 
the caller just 500 records (the default of {{max.poll.records}}), and keeps 
the rest of the data in memory to use in future calls to {{poll()}}. 
2) Before returning the 500 records, the consumer library records the *next* 
offset it should return. It does so by taking the offset of the last record, 
and adds 1 to it. (The offset of the 500th message from the set, plus 1). It 
calls this the {{nextOffset}}
3) The application finishes processing the 500 messages, and makes another call 
to {{poll()}} happens. During this call, the consumer library does a sanity 
check. It checks that the first message of the set *it is about to return* has 
an offset that matches the value of {{nextOffset}}. That is it checks if the 
501th record has an offset that is 1 greater than the 500th record.
a. If it matches, then it returns an additional 500 records, and 
increments the {{nextOffset}} to (offset of the 1000th record, plus 1)
b. If it doesn't match, then it throws away the remainder of the 1MB of 
data that it stored in memory in step 1, and it goes back to the broker to 
fetch an additional 1MB of data, starting at the offset {{nextOffset}}.

In topics have no gaps (a non-compacted topic), then the code will always hit 
the 3a code path.
If the topic has gaps in offsets and the call to {{poll()}} happens to fall 
onto a gap, then the code will hit code path 3b.

If the gaps are frequent, then it will frequently hit code path 3b.

The worst case scenario that can happen is if you have a large number of gaps, 
and you run with {{max.poll.records=1}}. Every gap will result in a new fetch 
to the broker. You may possibly end up only processing one message per fetch. 
Or, said another way, you will end up doing a single fetch for every single 
message in the partition.


Repro
-

We created a repro. It appears that the bug is in 0.10.2.1, but was fixed in 
0.11. I've attached the tarball with all the code and instructions. 

The repro is:
1) Create a single partition topic with log compaction turned on 
2) Write messages with the following keys: 1 1 2 2 3 3 4 4 5 5 ... (each 
message key written twice in a row) 
3) Let compaction happen. This would mean that that offsets 0 2 4 6 8 10 ... 
would be compacted away 
4) Consume from this topic with {{max.poll.records=1}}

More concretely,

Here is the producer code:
{code}
Producer<String, String> producer = new KafkaProducer<String, String>(props); 
for (int i = 0; i < 100; i++) { 
producer.send(new ProducerRecord<String, String>("compacted", 
Integer.toString(i), Integer.toString(i))); 
producer.send(new ProducerRecord<String, String>("compacted", 
Integer.toString(i), Integer.toString(i))); 
} 
producer.flush(); 
producer.close();
{code}


When consuming with a 0.10.2.1 consumer, you can see this pattern (with Fetcher 
logs at DEBUG, see file consumer_0.10.2/debug.log):

{code}
offset = 1

[jira] [Created] (KAFKA-6054) ERROR "SubscriptionInfo - unable to decode subscription data: version=2" when upgrading from 0.10.0.0 to 0.10.2.1

2017-10-11 Thread James Cheng (JIRA)
James Cheng created KAFKA-6054:
--

 Summary: ERROR "SubscriptionInfo - unable to decode subscription 
data: version=2" when upgrading from 0.10.0.0 to 0.10.2.1
 Key: KAFKA-6054
 URL: https://issues.apache.org/jira/browse/KAFKA-6054
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.1
Reporter: James Cheng


We upgraded an app from kafka-streams 0.10.0.0 to 0.10.2.1. We did a rolling 
upgrade of the app, so that one point, there were both 0.10.0.0-based instances 
and 0.10.2.1-based instances running. 

We observed the following stack trace:

{code}
2017-10-11 07:02:19.964 [StreamThread-3] ERROR o.a.k.s.p.i.a.SubscriptionInfo -
unable to decode subscription data: version=2
org.apache.kafka.streams.errors.TaskAssignmentException: unable to decode
subscription data: version=2
at 
org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.decode(SubscriptionInfo.java:113)
at 
org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:235)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:260)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:404)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$900(AbstractCoordinator.java:81)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:358)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:340)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)

{code}

I spoke with [~mjsax] and he said this is a known issue that happens when you 
have both 0.10.0.0 instances and 0.10.2.1 instances running at the same time, 
because the internal version number of the protocol changed when adding 
Interactive Queries. Matthias asked me to file this JIRA>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5952) Refactor Consumer Fetcher metrics

2017-09-21 Thread James Cheng (JIRA)
James Cheng created KAFKA-5952:
--

 Summary: Refactor Consumer Fetcher metrics
 Key: KAFKA-5952
 URL: https://issues.apache.org/jira/browse/KAFKA-5952
 Project: Kafka
  Issue Type: Sub-task
Reporter: James Cheng
Assignee: James Cheng






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5951) Autogenerate Producer RecordAccumulator metrics

2017-09-21 Thread James Cheng (JIRA)
James Cheng created KAFKA-5951:
--

 Summary: Autogenerate Producer RecordAccumulator metrics
 Key: KAFKA-5951
 URL: https://issues.apache.org/jira/browse/KAFKA-5951
 Project: Kafka
  Issue Type: Sub-task
Reporter: James Cheng






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: 1.0.0 KIPs Update

2017-09-20 Thread James Cheng
Thanks Ismael. Yes, you're right, PR 3799 is purely a refactoring.

There are new subtasks of https://issues.apache.org/jira/browse/KAFKA-3480 
<https://issues.apache.org/jira/browse/KAFKA-3480> that I haven't yet filed, 
that I want to work on. They will add auto generation of metrics documentation 
for more areas of the codebase. So I was trying to figure out what were my 
chances of getting them into 1.0.0, to help me prioritize when I should work on 
them.

But to your general point, it sounds like if I make my case and the community 
agrees that its worth the cost/benefit, that it might be eligible to get into 
1.0.0. Sounds perfect. Thanks!

-James

> On Sep 20, 2017, at 3:14 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> 
> Hi James,
> 
> Isn't PR 3799 purely a code refactoring (i.e. no change in behaviour)? If
> so, it would probably land in trunk only after today. If it's new
> functionality, then it depends on size, risk and importance. If you think
> it's important and should target 1.0.0, feel free to make your case in the
> JIRA or PR.
> 
> Thanks,
> Ismael
> 
> On Wed, Sep 20, 2017 at 10:59 PM, James Cheng <wushuja...@gmail.com> wrote:
> 
>> Hi. I'm a little unclear on what types of things are subject to the
>> "Feature Freeze" today (Sept 20th).
>> 
>> For changes such as https://github.com/apache/kafka/pull/3799 <
>> https://github.com/apache/kafka/pull/3799> and https://issues.apache.org/
>> jira/browse/KAFKA-3480 <https://issues.apache.org/jira/browse/KAFKA-3480>
>> that don't require KIPs, is today the freeze date? If I have PR created but
>> not yet reviewed, does that mean it is eligible to get into 1.0.0?
>> 
>> And if I have a PR that comes AFTER today, does that mean it is NOT
>> eligible for 1.0.0?
>> 
>> Thanks!
>> 
>> -James
>> 
>>> On Sep 11, 2017, at 12:14 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>>> 
>>> Sure! Please feel free to update the wiki.
>>> 
>>> 
>>> Guozhang
>>> 
>>> On Mon, Sep 11, 2017 at 9:28 AM, Rajini Sivaram <rajinisiva...@gmail.com
>>> 
>>> wrote:
>>> 
>>>> Hi Guozhang,
>>>> 
>>>> Can KIP-188 be added to the list, please? The vote has passed and PR
>> should
>>>> be ready soon.
>>>> 
>>>> Thank you,
>>>> 
>>>> Rajini
>>>> 
>>>> On Thu, Sep 7, 2017 at 10:28 PM, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>>>> 
>>>>> Actually my bad, there is already a voting thread and you asked people
>> to
>>>>> recast a vote on a small change.
>>>>> 
>>>>> On Thu, Sep 7, 2017 at 2:27 PM, Guozhang Wang <wangg...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi Tom,
>>>>>> 
>>>>>> It seems KIP-183 is still in the discussion phase, and voting has not
>>>>> been
>>>>>> started?
>>>>>> 
>>>>>> 
>>>>>> Guozhang
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 7, 2017 at 1:13 AM, Tom Bentley <t.j.bent...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Would it be possible to add KIP-183 to the list too, please?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Tom
>>>>>>> 
>>>>>>> On 6 September 2017 at 22:04, Guozhang Wang <wangg...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Vahid,
>>>>>>>> 
>>>>>>>> Yes I have just added it while sending this email :)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Guozhang
>>>>>>>> 
>>>>>>>> On Wed, Sep 6, 2017 at 1:54 PM, Vahid S Hashemian <
>>>>>>>> vahidhashem...@us.ibm.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Guozhang,
>>>>>>>>> 
>>>>>>>>> Thanks for the heads-up.
>>>>>>>>> 
>>>>>>>>> Can KIP-163 be added to the list?
>>>>>>>>> The proposal for this KIP is accepted, and the PR is ready for
>>>>> review.
>>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> --Vahid
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> From:   Guozhang Wang <wangg...@gmail.com>
>>>>>>>>> To: "dev@kafka.apache.org" <dev@kafka.apache.org>
>>>>>>>>> Date:   09/06/2017 01:45 PM
>>>>>>>>> Subject:1.0.0 KIPs Update
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hello folks,
>>>>>>>>> 
>>>>>>>>> This is a heads up on 1.0.0 progress:
>>>>>>>>> 
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.a
>>>>>>>>> pache.org_confluence_pages_viewpage.action-3FpageId-3D717649
>>>>>>>>> 13=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_
>>>>>>>>> xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=bLvgeykOujjty9joOuWXD4wZab
>>>>>>>>> o1CV0pULY4eqBxqzk=90UN7ejzCQmdPOyRR_
>>>> 2z304xLUSBCtOYi0KqhAo4EyU=
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We have one week left towards the KIP deadline, which is Sept.
>>>> 13th.
>>>>>>>> There
>>>>>>>>> are still a lot of KIPs that under discussion / voting process.
>>>> For
>>>>>>> the
>>>>>>>>> KIP
>>>>>>>>> proposer, please keep in mind that the voting has to be done
>>>> before
>>>>>>> the
>>>>>>>>> deadline in order to be added into the coming release.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -- Guozhang
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> -- Guozhang
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -- Guozhang
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> -- Guozhang
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -- Guozhang
>> 
>> 



Re: 1.0.0 KIPs Update

2017-09-20 Thread James Cheng
Hi. I'm a little unclear on what types of things are subject to the "Feature 
Freeze" today (Sept 20th). 

For changes such as https://github.com/apache/kafka/pull/3799 
 and 
https://issues.apache.org/jira/browse/KAFKA-3480 
 that don't require KIPs, is 
today the freeze date? If I have PR created but not yet reviewed, does that 
mean it is eligible to get into 1.0.0?

And if I have a PR that comes AFTER today, does that mean it is NOT eligible 
for 1.0.0?

Thanks!

-James

> On Sep 11, 2017, at 12:14 PM, Guozhang Wang  wrote:
> 
> Sure! Please feel free to update the wiki.
> 
> 
> Guozhang
> 
> On Mon, Sep 11, 2017 at 9:28 AM, Rajini Sivaram 
> wrote:
> 
>> Hi Guozhang,
>> 
>> Can KIP-188 be added to the list, please? The vote has passed and PR should
>> be ready soon.
>> 
>> Thank you,
>> 
>> Rajini
>> 
>> On Thu, Sep 7, 2017 at 10:28 PM, Guozhang Wang  wrote:
>> 
>>> Actually my bad, there is already a voting thread and you asked people to
>>> recast a vote on a small change.
>>> 
>>> On Thu, Sep 7, 2017 at 2:27 PM, Guozhang Wang 
>> wrote:
>>> 
 Hi Tom,
 
 It seems KIP-183 is still in the discussion phase, and voting has not
>>> been
 started?
 
 
 Guozhang
 
 
 On Thu, Sep 7, 2017 at 1:13 AM, Tom Bentley 
>>> wrote:
 
> Would it be possible to add KIP-183 to the list too, please?
> 
> Thanks,
> 
> Tom
> 
> On 6 September 2017 at 22:04, Guozhang Wang 
>> wrote:
> 
>> Hi Vahid,
>> 
>> Yes I have just added it while sending this email :)
>> 
>> 
>> Guozhang
>> 
>> On Wed, Sep 6, 2017 at 1:54 PM, Vahid S Hashemian <
>> vahidhashem...@us.ibm.com
>>> wrote:
>> 
>>> Hi Guozhang,
>>> 
>>> Thanks for the heads-up.
>>> 
>>> Can KIP-163 be added to the list?
>>> The proposal for this KIP is accepted, and the PR is ready for
>>> review.
>>> 
>>> Thanks.
>>> --Vahid
>>> 
>>> 
>>> 
>>> From:   Guozhang Wang 
>>> To: "dev@kafka.apache.org" 
>>> Date:   09/06/2017 01:45 PM
>>> Subject:1.0.0 KIPs Update
>>> 
>>> 
>>> 
>>> Hello folks,
>>> 
>>> This is a heads up on 1.0.0 progress:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.a
>>> pache.org_confluence_pages_viewpage.action-3FpageId-3D717649
>>> 13=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_
>>> xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=bLvgeykOujjty9joOuWXD4wZab
>>> o1CV0pULY4eqBxqzk=90UN7ejzCQmdPOyRR_
>> 2z304xLUSBCtOYi0KqhAo4EyU=
>>> 
>>> 
>>> We have one week left towards the KIP deadline, which is Sept.
>> 13th.
>> There
>>> are still a lot of KIPs that under discussion / voting process.
>> For
> the
>>> KIP
>>> proposer, please keep in mind that the voting has to be done
>> before
> the
>>> deadline in order to be added into the coming release.
>>> 
>>> 
>>> Thanks,
>>> -- Guozhang
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> -- Guozhang
>> 
> 
 
 
 
 --
 -- Guozhang
 
>>> 
>>> 
>>> 
>>> --
>>> -- Guozhang
>>> 
>> 
> 
> 
> 
> -- 
> -- Guozhang



Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread James Cheng
Thanks for the KIP, Randall.

The KIP has one MBean per metric name. Can I suggest an alternate grouping?

kafka.connect:type=connector-metrics,connector=([-.\w]+)
connector-type
connector-class
connector-version
status

kafka.connect:type=task-metrics,connector=([-.\w]+),task=([\d]+)
status
pause-ratio
offset-commit-success-percentage
offset-commit-failure-percentage
offset-commit-max-time
offset-commit-99p-time
offset-commit-95p-time
offset-commit-90p-time
offset-commit-75p-time
offset-commit-50p-time
batch-size-max
batch-size-avg

kafka.connect:type=source-task-metrics,connector=([-.\w]+),task=([\d]+)
source-record-poll-rate
source-record-write-rate

kafka.connect:type=sink-task-metrics,connector=([-.\w]+),task=([\d]+)
sink-record-read-rate
sink-record-send-rate
sink-record-lag-max
partition-count
offset-commit-95p-time
offset-commit-90p-time
offset-commit-75p-time
offset-commit-50p-time
batch-size-max
batch-size-avg

kafka.connect:type=sink-task-metrics,connector=([-.\w]+),task=([\d]+),topic=([-.\w]+),partition=([\d]+)
sink-record-lag
sink-record-lag-avg
sink-record-lag-max

kafka.connect:type=connect-coordinator-metrics
task-count
connector-count
leader-name
state
rest-request-rate

kafka.connect:type=connect-coordinator-metrics,name=assigned-tasks 
assigned-tasks (existing metric, so can't merge in above without 
breaking compatibility)
kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors 
(existing metric, so can't merge in above without breaking compatibility)
assigned-connectors (existing metric, so can't merge in above without 
breaking compatibility)

kafka.connect:type=connect-worker-rebalance-metrics
rebalance-success-total
rebalance-success-percentage
rebalance-failure-total
rebalance-failure-percentage
rebalance-max-time
rebalance-99p-time
rebalance-95p-time
rebalance-90p-time
rebalance-75p-time
rebalance-50p-time
time-since-last-rebalance
task-failure-rate

This lets you use a single MBean selector to select multiple related attributes 
all at once. You can use JMX's wildcards to target which connectors or tasks or 
topics or partitions you care about.

Also notice that for the topic and partition level metrics, the attributes are 
named identically ("sink-record-lag-avg" instead of 
"sink-record-{topic}-{partition}.records-lag-avg"), so monitoring systems have 
a consistent string they can use, instead of needing to prefix-and-suffix 
matching against the attribute name. And TBH, it integrates better with the 
work I'm doing in https://issues.apache.org/jira/browse/KAFKA-3480

-James

> On Sep 7, 2017, at 4:50 PM, Randall Hauch  wrote:
> 
> Hi everyone.
> 
> I've created a new KIP to add metrics to the Kafka Connect framework:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework
> 
> The KIP approval deadline is looming, so if you're interested in Kafka
> Connect metrics please review and provide feedback as soon as possible. I'm
> interested not only in whether the metrics are sufficient and appropriate,
> but also in whether the MBean naming conventions are okay.
> 
> Best regards,
> 
> Randall



Re: [ANNOUNCE] New Kafka PMC member: Jiangjie (Becket) Qin

2017-08-25 Thread James Cheng
Congrats Becket!

-James

> On Aug 23, 2017, at 10:20 PM, Joel Koshy  wrote:
> 
> Hi everyone,
> 
> Jiangjie (Becket) Qin has been a Kafka committer in October 2016 and has
> contributed significantly to several major patches, reviews and discussions
> since. I am glad to announce that Becket is now a member of the Apache Kafka
> PMC.
> 
> Congratulations Becket!
> 
> Joel



Re: [DISCUSS] KIP-186: Increase offsets retention default to 7 days

2017-08-10 Thread James Cheng
+1 from me!

-James

> On Aug 8, 2017, at 5:24 PM, Ewen Cheslack-Postava  wrote:
> 
> Hi all,
> 
> I posted a simple new KIP for a problem we see with a lot of users:
> KIP-186: Increase offsets retention default to 7 days
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-186%3A+Increase+offsets+retention+default+to+7+days
> 
> Note that in addition to the KIP text itself, the linked JIRA already
> existed and has a bunch of discussion on the subject.
> 
> -Ewen



Re: Consumer offsets partitions size much bigger than others

2017-07-18 Thread James Cheng
It's possible that the log-cleaning thread has crashed. That is the thread that 
implements log compaction.

Look in the log-cleaner.log file in your kafka debuglog directory to see if 
there is any indication that it has crashed (error messages, stack traces, etc).

What version of kafka are you using? 0.10 and prior had some bugs in the 
log-cleaner thread that might sometimes cause it to crash. Those were fixed in 
later versions, but it's always possible there might still be more bugs there.

I notice that your __consumer_offsets topic only has replication-factor=1. How 
many brokers are in your cluster? You should increase the replication factor to 
3. 

Older versions of kafka would try to auto-create the __consumer_offsets topic 
with replication-factor 3 but if there were fewer than 3 brokers in the 
cluster, then they would simply use the number of brokers in the cluster. What 
that means is that if your cluster only had 1 broker running at the time the 
topic was auto-created, that it would be created with replication-factor 1. 
This has been fixed in later brokers, so that it will always create topics with 
the specified number of replicas or will throw loud errors in the event you 
don't have enough brokers.

-James

> On Jul 18, 2017, at 8:44 AM, Luciano Afranllie  
> wrote:
> 
> Hi
> 
> One of our Kafka brokers was running out of disk space and when we checked
> the file size in the kafka log dir we observed the following
> 
> $ du -h . --max-depth=2 | grep '__consumer_offsets'
> 4.0K./kafka-logs/__consumer_offsets-16
> 4.0K./kafka-logs/__consumer_offsets-40
> 35G ./kafka-logs/__consumer_offsets-44
> 4.0K./kafka-logs/__consumer_offsets-8
> 4.0K./kafka-logs/__consumer_offsets-38
> 4.0K./kafka-logs/__consumer_offsets-20
> 4.0K./kafka-logs/__consumer_offsets-34
> 4.0K./kafka-logs/__consumer_offsets-18
> 4.0K./kafka-logs/__consumer_offsets-32
> 251G./kafka-logs/__consumer_offsets-14
> 4.0K./kafka-logs/__consumer_offsets-4
> 4.0K./kafka-logs/__consumer_offsets-26
> 4.0K./kafka-logs/__consumer_offsets-12
> 4.0K./kafka-logs/__consumer_offsets-30
> 4.0K./kafka-logs/__consumer_offsets-6
> 4.0K./kafka-logs/__consumer_offsets-2
> 4.0K./kafka-logs/__consumer_offsets-24
> 4.0K./kafka-logs/__consumer_offsets-36
> 4.0K./kafka-logs/__consumer_offsets-46
> 4.0K./kafka-logs/__consumer_offsets-42
> 4.0K./kafka-logs/__consumer_offsets-22
> 4.0K./kafka-logs/__consumer_offsets-0
> 4.0K./kafka-logs/__consumer_offsets-28
> 4.0K./kafka-logs/__consumer_offsets-10
> 4.0K./kafka-logs/__consumer_offsets-48
> 
> As you can see, two of the log files (partition 44 and 14) have a huge
> size. Do you have a hint to understand what could be happening here? May be
> for some reason this partitions are not being compacted?
> 
> By the way, this is the description of the __consumer_offsets topic.
> 
> # ./bin/kafka-topics.sh --describe --zookeeper x.x.x.x:2181 --topic
> __consumer_offsets
> Topic:__consumer_offsetsPartitionCount:50   ReplicationFactor:1
> 
> Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=uncompressed
>Topic: __consumer_offsets   Partition: 0Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 1Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 2Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 3Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 4Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 5Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 6Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 7Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 8Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 9Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 10   Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 11   Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 12   Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 13   Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 14   Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 15   Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 16   Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets   Partition: 17   Leader: 2
> Replicas: 2 Isr: 2
>Topic: __consumer_offsets   Partition: 18   Leader: 1
> Replicas: 1 Isr: 1
>Topic: __consumer_offsets 

[jira] [Created] (KAFKA-5597) Autogenerate Producer sender metrics

2017-07-16 Thread James Cheng (JIRA)
James Cheng created KAFKA-5597:
--

 Summary: Autogenerate Producer sender metrics
 Key: KAFKA-5597
 URL: https://issues.apache.org/jira/browse/KAFKA-5597
 Project: Kafka
  Issue Type: Sub-task
Reporter: James Cheng






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [ANNOUNCE] New Kafka PMC member Ismael Juma

2017-07-13 Thread James Cheng
Congrats Ismael!

-James

> On Jul 5, 2017, at 1:55 PM, Jun Rao  wrote:
> 
> Hi, Everyone,
> 
> Ismael Juma has been active in the Kafka community since he became
> a Kafka committer about a year ago. I am glad to announce that Ismael is
> now a member of Kafka PMC.
> 
> Congratulations, Ismael!
> 
> Jun



Re: [ANNOUNCE] New Kafka PMC member Jason Gustafson

2017-07-12 Thread James Cheng
Congrats Jason!

-James

> On Jul 11, 2017, at 10:32 PM, Guozhang Wang  wrote:
> 
> Hi Everyone,
> 
> Jason Gustafson has been very active in contributing to the Kafka community
> since he became a Kafka committer last September and has done lots of
> significant work including the most recent exactly-once project. In
> addition, Jason has initiated or participated in the design discussion of
> more than 30 KIPs in which he has consistently brought in great judgement
> and insights throughout his communication. I am glad to announce that Jason
> has now become a PMC member of the project.
> 
> Congratulations, Jason!
> 
> -- Guozhang



Re: Aggressive log compaction ratio appears to have no negative effect on log-compacted topics

2017-07-10 Thread James Cheng
Jeff,

The discussion thread from a while back on KIP-58 has some discussion around 
"log.cleaner.min.cleanable.ratio".

KIP-58 page: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
Discussion thread (linked off that page): 
http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3CCAAWiU2VzPXdK1fW3FacfDsVQc-1sphNMjEqtkRSHZEYaN1Wr-w%40mail.gmail.com%3E

The summary is that "log.cleaner.min.cleanable.ratio" seems like it was 
designed to limit how much disk I/O to spend on compaction. Your JIRA indicates 
you benchmarked CPU and memory, but did you look at disk I/O?

-James

> On Jul 7, 2017, at 1:24 PM, Jeff Chao  wrote:
> 
> Hi,
> 
> I filed a jira a few weeks ago around some log compaction ratio behavior we
> were seeing. Now that the 0.11 vote done and release is out, I wanted to
> follow up on it. Jira is here:
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-5452.
> 
> Details are in the jira, but long story short, after much testing, we were
> seeing that aggressive log compaction ratios were performing just as well
> as more conservative ratios. Fundamentally I would expect there to be some
> sort of hit, but seeing that the data shows there wasn't, we wanted to
> raise this to the rest of the community and see if anyone else has observed
> similar behavior. The motivation behind this is to see if we might consider
> changing the default from 0.5. This could help in preventing confusion
> around duplicate keys in low volume log-compacted topics use cases.
> 
> Thanks,
> 
> Jeff Chao
> Heroku Kafka



Re: Mirroring multiple clusters into one

2017-07-06 Thread James Cheng
Answers inline below.

-James

Sent from my iPhone

> On Jul 7, 2017, at 1:18 AM, Vahid S Hashemian <vahidhashem...@us.ibm.com> 
> wrote:
> 
> James,
> 
> Thanks for sharing your thoughts and experience.
> Could you please also confirm whether
> - you do any encryption for the mirrored data?
Not at the Kafka level. The data goes over a VPN.

> - you have a many-to-one mirroring similar to what I described?
> 

Yes, we mirror multiple source clusters to a single target cluster. We have a 
topic naming convention where our topics are prefixed with their cluster name, 
so as long as we follow that convention, each source topic gets mirrored to a 
unique target topic. That is, we try not to have multiple mirrormakers writing 
to a single target topic. 

Our topic names in the target cluster get prefixed with the string "mirror." 
And then we never mirror topics that start with "mirror." This prevents us from 
creating mirroring loops.

> Thanks.
> --Vahid
> 
> 
> 
> From:   James Cheng <wushuja...@gmail.com>
> To: us...@kafka.apache.org
> Cc: dev <dev@kafka.apache.org>
> Date:   07/06/2017 12:37 PM
> Subject:Re: Mirroring multiple clusters into one
> 
> 
> 
> I'm not sure what the "official" recommendation is. At TiVo, we *do* run 
> all our mirrormakers near the target cluster. It works fine for us, but 
> we're still fairly inexperienced, so I'm not sure how strong of a data 
> point we should be.
> 
> I think the thought process is, if you are mirroring from a source cluster 
> to a target cluster where there is a WAN between the two, then whichever 
> request goes across the WAN has a higher chance of intermittent failure 
> than the one over the LAN. That means that if mirrormaker is near the 
> source cluster, the produce request over the WAN to the target cluster may 
> fail. If the mirrormaker is near the target cluster, then the fetch 
> request over the WAN to the source cluster may fail.
> 
> Failed fetch requests don't have much impact on data replication, it just 
> delays it. Whereas a failure during a produce request may introduce 
> duplicates.
> 
> Becket Qin from LinkedIn did a presentation on tuning producer performance 
> at a meetup last year, and I remember he specifically talked about 
> producing over a WAN as one of the cases where you have to tune settings. 
> Maybe that presentation will give more ideas about what to look at. 
> https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
> 
> 
> -James
> 
> Sent from my iPhone
> 
>> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
> <vahidhashem...@us.ibm.com> wrote:
>> 
>> The literature suggests running the MM on the target cluster when 
> possible 
>> (with the exception of when encryption is required for transferred 
> data).
>> I am wondering if this is still the recommended approach when mirroring 
>> from multiple clusters to a single cluster (i.e. multiple MM instances).
>> Is there anything in particular (metric, specification, etc.) to 
> consider 
>> before making a decision?
>> 
>> Thanks.
>> --Vahid
>> 
>> 
> 
> 
> 
> 


Re: Mirroring multiple clusters into one

2017-07-06 Thread James Cheng
I'm not sure what the "official" recommendation is. At TiVo, we *do* run all 
our mirrormakers near the target cluster. It works fine for us, but we're still 
fairly inexperienced, so I'm not sure how strong of a data point we should be.

I think the thought process is, if you are mirroring from a source cluster to a 
target cluster where there is a WAN between the two, then whichever request 
goes across the WAN has a higher chance of intermittent failure than the one 
over the LAN. That means that if mirrormaker is near the source cluster, the 
produce request over the WAN to the target cluster may fail. If the mirrormaker 
is near the target cluster, then the fetch request over the WAN to the source 
cluster may fail.

Failed fetch requests don't have much impact on data replication, it just 
delays it. Whereas a failure during a produce request may introduce duplicates.

Becket Qin from LinkedIn did a presentation on tuning producer performance at a 
meetup last year, and I remember he specifically talked about producing over a 
WAN as one of the cases where you have to tune settings. Maybe that 
presentation will give more ideas about what to look at. 
https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

-James

Sent from my iPhone

> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian  
> wrote:
> 
> The literature suggests running the MM on the target cluster when possible 
> (with the exception of when encryption is required for transferred data).
> I am wondering if this is still the recommended approach when mirroring 
> from multiple clusters to a single cluster (i.e. multiple MM instances).
> Is there anything in particular (metric, specification, etc.) to consider 
> before making a decision?
> 
> Thanks.
> --Vahid
> 
> 


  1   2   3   4   >