Deduplicating a topic in the face of producer crashes over a time window?

2018-11-01 Thread Andrew Wilcox
Suppose I have a producer which is ingesting a data stream with unique keys
from an external service and sending it to a Kafka topic.  In my producer I
can set enable.idempotence and get exactly-once delivery in the presence of
broker crashes.  However my producer might crash after it delivers a batch
of messages to Kafka but before it records that the batch was delivered.
After restarting the crashed producer it would re-deliver the same batch,
resulting in duplicate messages in the topic.

With a streams transformer I can deduplicate the topic by using a state
store to record previously seen keys and then only creating an output
record if the key hasn't been seen before.  However without a mechanism to
remove old keys the state store will grow without bound.

Say I only want to deduplicate over a time period such as one day.  (I'm
confident that I'll be able to restart a crashed producer sooner).  Thus
I'd like keys older than a day to expire out of the state store, so the
store only needs to keep track of keys seen in the last day or so.

Is there a way to do this with Kafka streams?  Or is there another
recommended mechanism to keep messages with unique keys unduplicated in the
presence of producer crashes?

Thanks!

Andrew


Re: Bemchmarks for KTable Joins and Queries

2018-11-01 Thread Matthias J. Sax
I am not aware if benchmarks, but want to point out, that KTables work
somewhat different to relational database system. Thus, you might want
to evaluate not base on performance, but on the semantics KTable provide.

Recall, that Kafka Streams is a stream processing library while a
database system is a "batch processing" system. It's two quite different
types of systems and benchmarking them to compare each other is
questionable.


-Matthias

On 10/31/18 12:18 PM, Tom Szumowski wrote:
> I was wonderinf if anyone had or knew of benchmark tests for KTable or
> GlobalKTable queries/joins, as compared to alternatives such as distributed
> databases.
> 



signature.asc
Description: OpenPGP digital signature


Re: consumer fetch multiple topic partitions committed offset

2018-11-01 Thread Matthias J. Sax
You need to call `committed()` multiple times.

-Matthias

On 11/1/18 12:28 AM, hacker win7 wrote:
> Hi,
> 
> After reviewing the KafkaConsumer source about API of *committed():*
> I found that old consumer support committed(mutipleTopicPartitions) to
> return multiple committed offset, while in new consumer, there is only
> committed(singleTopicPartition) and return only one committed offset.
> 
> It is a little weird for me that why new consumer only support fetch single
> topic partition committed offset. I search some KIPs but didn't find the
> reason about this. Anyway, How to fetch multiple topic partitions committed
> offset in new consumer?
> 



signature.asc
Description: OpenPGP digital signature


Re: Despite of log.retention.hours=168 the log files of messages sente yesterday are not present

2018-11-01 Thread Matthias J. Sax
/tmp/kafka-logs is just a convenient default, but not a reliable folder
to store data. /tmp/ might be cleared by the operation system. Note that
the quickstart is not designed to give "production ready" configuration
etc. It's just to play with the system.

You should change the config accordingly.


-Matthias


On 11/1/18 1:57 AM, Marco Ippolito wrote:
> Hi all,
> yesterday I installed Kafka 2.0 in my Ubuntu 18.04.01 Server Edition and
> followed the QuickStart instructions and explanations till successfully
> deploying the Step 7 : https://kafka.apache.org/quickstart
> 
> Today I discovered that there are no kafka-logs- files  in /tmp , despite
> being directly explicitated in the related server.properties files:
> server.properties :
>   # A comma separated list of directories under which to store log files
>  log.dirs=/tmp/kafka-logs
> # The minimum age of a log file to be eligible for deletion due to age
> log.retention.hours=168
> 
>  server-1.properties :
> # A comma separated list of directories under which to store log files
> log.dirs=/tmp/kafka-logs-1
> # The minimum age of a log file to be eligible for deletion due to age
> log.retention.hours=168
> 
> server-2.properties :
> # A comma separated list of directories under which to store log files
> log.dirs=/tmp/kafka-logs-2
> # The minimum age of a log file to be eligible for deletion due to age
> log.retention.hours=168
> 
> Why did it happen, despite of the successfull follow-up of the first seven
> steps of the QuickStart tutorial? And what to do in order to keep the Kafka
> Log files, which contain the messages sent between the producers and the
> receivers though the Kafka brokerage?
> 
> Looking forward to your kind help.
> Marco
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] 2.0.1 RC0

2018-11-01 Thread Manikumar
We were waiting for the system test results. There were few failures:
KAFKA-7579,  KAFKA-7559, KAFKA-7561
they are not blockers for 2.0.1 release. We need more votes from
PMC/committers :)

Thanks Stanislav! for the system test results.

Thanks,
Manikumar

On Thu, Nov 1, 2018 at 10:20 PM Eno Thereska  wrote:

> Anything else holding this up?
>
> Thanks
> Eno
>
> On Thu, Nov 1, 2018 at 10:27 AM Jakub Scholz  wrote:
>
> > +1 (non-binding) ... I used the staged binaries and run tests with
> > different clients.
> >
> > On Fri, Oct 26, 2018 at 4:29 AM Manikumar 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 2.0.1.
> > >
> > > This is a bug fix release closing 49 tickets:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> > >
> > > Release notes for the 2.0.1 release:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by  Tuesday, October 30, end of day
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> > >
> > > * Documentation:
> > > http://kafka.apache.org/20/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/20/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.0 branch:
> > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.0-jdk8/177/
> > >
> > > /**
> > >
> > > Thanks,
> > > Manikumar
> > >
> >
>


Re: [VOTE] 2.0.1 RC0

2018-11-01 Thread Harsha Chintalapani
+1.
Ran a 3 node cluster with few simple tests.

Thanks,
Harsha
On Nov 1, 2018, 9:50 AM -0700, Eno Thereska , wrote:
> Anything else holding this up?
>
> Thanks
> Eno
>
> On Thu, Nov 1, 2018 at 10:27 AM Jakub Scholz  wrote:
>
> > +1 (non-binding) ... I used the staged binaries and run tests with
> > different clients.
> >
> > On Fri, Oct 26, 2018 at 4:29 AM Manikumar 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 2.0.1.
> > >
> > > This is a bug fix release closing 49 tickets:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> > >
> > > Release notes for the 2.0.1 release:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Tuesday, October 30, end of day
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> > >
> > > * Documentation:
> > > http://kafka.apache.org/20/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/20/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.0 branch:
> > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.0-jdk8/177/
> > >
> > > /**
> > >
> > > Thanks,
> > > Manikumar
> > >
> >


Re: [VOTE] 2.0.1 RC0

2018-11-01 Thread Eno Thereska
Anything else holding this up?

Thanks
Eno

On Thu, Nov 1, 2018 at 10:27 AM Jakub Scholz  wrote:

> +1 (non-binding) ... I used the staged binaries and run tests with
> different clients.
>
> On Fri, Oct 26, 2018 at 4:29 AM Manikumar 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 2.0.1.
> >
> > This is a bug fix release closing 49 tickets:
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> >
> > Release notes for the 2.0.1 release:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by  Tuesday, October 30, end of day
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> >
> > * Documentation:
> > http://kafka.apache.org/20/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/20/protocol.html
> >
> > * Successful Jenkins builds for the 2.0 branch:
> > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.0-jdk8/177/
> >
> > /**
> >
> > Thanks,
> > Manikumar
> >
>


Re: [VOTE] 2.1.0 RC0

2018-11-01 Thread Jakub Scholz
+1 (non-binding) ... I used the staged binaries and checked it with
different clients.

On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for feature release of Apache Kafka 2.1.0.
>
> This is a major version release of Apache Kafka. It includes 28 new KIPs
> and
>
> critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> details:
>
> *
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
>  >
>
> Here are a few notable highlights:
>
> - Java 11 support
> - Support for Zstandard, which achieves compression comparable to gzip with
> higher compression and especially decompression speeds(KIP-110)
> - Avoid expiring committed offsets for active consumer group (KIP-211)
> - Provide Intuitive User Timeouts in The Producer (KIP-91)
> - Kafka's replication protocol now supports improved fencing of zombies.
> Previously, under certain rare conditions, if a broker became partitioned
> from Zookeeper but not the rest of the cluster, then the logs of replicated
> partitions could diverge and cause data loss in the worst case (KIP-320)
> - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
> - Admin script and admin client API improvements to simplify admin
> operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> - DNS handling improvements (KIP-235, KIP-302)
>
> Release notes for the 2.1.0 release:
> http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote ***
>
> * Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~lindong/kafka-2.1.0-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
>
> * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> https://github.com/apache/kafka/tree/2.1.0-rc0
>
> * Documentation:
> *http://kafka.apache.org/21/documentation.html*
> 
>
> * Protocol:
> http://kafka.apache.org/21/protocol.html
>
> * Successful Jenkins builds for the 2.1 branch:
> Unit/integration tests: *https://builds.apache.org/job/kafka-2.1-jdk8/38/
> *
>
> Please test and verify the release artifacts and submit a vote for this RC,
> or report any issues so we can fix them and get a new RC out ASAP. Although
> this release vote requires PMC votes to pass, testing, votes, and bug
> reports are valuable and appreciated from everyone.
>
> Cheers,
> Dong
>


Re: [VOTE] 2.0.1 RC0

2018-11-01 Thread Jakub Scholz
+1 (non-binding) ... I used the staged binaries and run tests with
different clients.

On Fri, Oct 26, 2018 at 4:29 AM Manikumar  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 2.0.1.
>
> This is a bug fix release closing 49 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
>
> Release notes for the 2.0.1 release:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by  Tuesday, October 30, end of day
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> https://github.com/apache/kafka/releases/tag/2.0.1-rc0
>
> * Documentation:
> http://kafka.apache.org/20/documentation.html
>
> * Protocol:
> http://kafka.apache.org/20/protocol.html
>
> * Successful Jenkins builds for the 2.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/177/
>
> /**
>
> Thanks,
> Manikumar
>


consumer fetch multiple topic partitions committed offset

2018-11-01 Thread hacker win7
Hi,

After reviewing the KafkaConsumer source about API of *committed():*
I found that old consumer support committed(mutipleTopicPartitions) to
return multiple committed offset, while in new consumer, there is only
committed(singleTopicPartition) and return only one committed offset.

It is a little weird for me that why new consumer only support fetch single
topic partition committed offset. I search some KIPs but didn't find the
reason about this. Anyway, How to fetch multiple topic partitions committed
offset in new consumer?