We are looking for a way to guarantee no "publisher" loss where waiting
until the cluster/network health is resolved is fine.
What we are doing right now is to at avoid this loss in favor of at least
once guarantee by doing a *future.get() on say 1000th record or 1s
(whatever comes first)* and if
Hi Wim,
this topic (processing order) has been cropping up for a while, several
article, benchmark, and other think on the subject
reaching this conclusion.
After that you can, you can ask someone else another opinion on the subject.
regards,
Adrien
De : Wim
Hello Guozhang,
we ingest messages that we outgest is user facing datastore, after some
additional processing. Due to GDPR, we can only retain that unanonymised
data for a maximum period of time. Let's say 6 months.
So, right before sending the data to the out topic, we'll branch the data
into
Hey Adrien,
thank you for the elaborate explanation!
We are ingesting call data records here, which due to the nature of a telco
network might not arrive in absolute logical order.
If I understand your explanation correctly, you are saying that with your
setup, Kafka guarantees the processing
I think what you're suggesting is to:
1. compile the main streams code, but not the tests
2. compile test-utils (and compile and run the test-utils tests)
3. compile and run the streams tests
This works in theory, since the test-utils depends on the main streams
code, but not the streams tests.
MockProcessorContext is only used in unit tests, and hence we should be
able to declare it as a test dependency of `streams` in gradle build file,
which is OK.
Guozhang
On Thu, Mar 8, 2018 at 3:32 PM, John Roesler wrote:
> Actually, replacing the MockProcessorContext in
Thanks for the explanation.
Not sure if setting the metadata you want to get committed in
punctuation() would be sufficient. But I would think about it in more
details if we get a KIP for this.
It's correct that flushing and committing offsets is correlated. But
it's not related to punctuation.
Thanks, Matthias,
1. I can move it into the o.a.k.streams.processor package; that makes sense.
2. I'm expecting most users to use in-memory state stores, so they won't
need a state directory. In the "real" code path, the stateDir is extracted
from the config
by
Hello Wim,
Not sure if I understand your motivations for delayed processing, could you
elaborate a bit more? Do you want to process raw messages, or do you want
to process anonymised messages?
Guozhang
On Thu, Mar 8, 2018 at 12:35 PM, Wim Van Leuven <
wim.vanleu...@highestpoint.biz> wrote:
>
Isn't MockProcessorContext in o.a.k.test part of the unit-test package
but not the main package?
This should resolve the dependency issue.
-Matthias
On 3/8/18 3:32 PM, John Roesler wrote:
> Actually, replacing the MockProcessorContext in o.a.k.test could be a bit
> tricky, since it would make
Thanks for the KIP John.
Couple of minor questions:
- What about putting the mock into sub-package `processor` so it's in
the same package name as the interface it implements?
- What is the purpose of the constructor talking the `File stateDir`
argument? The state directory should be encoded in
Actually, replacing the MockProcessorContext in o.a.k.test could be a bit
tricky, since it would make the "streams" module depend on
"streams:test-utils", but "streams:test-utils" already depends on "streams".
At first glance, it seems like the options are:
1. leave the two separate
Thanks for the review, Guozhang,
In response:
1. I missed that! I'll look into it and update the KIP.
2. I was planning to use the real implementation, since folks might
register some metrics in the processors and want to verify the values that
get recorded. If the concern is about initializing
0.10.2.2 was not release. It's unclear atm if it will be. Maybe we need
to upgrade your brokers to 0.11, 1.0, or upcoming 1.1 release.
-Matthias
On 3/8/18 10:58 AM, Ted Yu wrote:
> Only found the following from brief search:
>
>
Hello Wim,
does it matter (I think), because one of the big and principal features of
Kafka is:
Kafka is to do load balancing of messages and guarantee ordering in a
distributed cluster.
The order of the messages should be guaranteed, unless several cases:
1] Producer can cause data loss
Hello,
I'm wondering how to design a KStreams or regular Kafka application that
can hold of processing of messages until a future time.
This related to EU's data protection regulation: we can store raw messages
for a given time; afterwards we have to store the anonymised message. So, I
was
Hi,
We have a setup of 10 node Kafka cluster + 5 node ZK cluster . Kafka version
is 0.10.2.1 .
We had an issue where a ZK follower lost connection from the ZK leader which
triggered a series of ISR Shrink and ISR Expands .
This caused some of the partitions to have less number of
Only found the following from brief search:
http://search-hadoop.com/m/Kafka/uyzND1kSw4Xm3wyj1?subj=0+10+2+2+bug+fix+release
FYI
On Thu, Mar 8, 2018 at 10:13 AM, Devendar Rao
wrote:
> Hi,
>
> We're hitting an issue with log cleaner and I see this is fixed in 0.10.2.2
Hi,
We're hitting an issue with log cleaner and I see this is fixed in 0.10.2.2
release as per https://issues.apache.org/jira/browse/KAFKA-5413 . But I
don't see 0.10.2.2 release notes in https://kafka.apache.org/downloads
was 0.10.2.2 ever released?
Thanks,
Devendar
19 matches
Mail list logo