Avi,
I have granted you the permissions under apache id (avi).
Guozhang
On Thu, Dec 15, 2016 at 3:40 PM, Matthias J. Sax
wrote:
> What is you wiki ID? We can grant you permission.
>
> -Matthias
>
> On 12/15/16 3:27 PM, Avi Flax wrote:
> >
> >> On Dec 13, 2016, at 21:02, Matthias J. Sax
> wrot
What is you wiki ID? We can grant you permission.
-Matthias
On 12/15/16 3:27 PM, Avi Flax wrote:
>
>> On Dec 13, 2016, at 21:02, Matthias J. Sax wrote:
>>
>> thanks for your feedback.
>
> My pleasure!
>
>> We want to enlarge the scope for Streams
>> application and started to collect use case
> On Dec 13, 2016, at 21:02, Matthias J. Sax wrote:
>
> thanks for your feedback.
My pleasure!
> We want to enlarge the scope for Streams
> application and started to collect use cases in the Wiki:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Sce
Avi,
thanks for your feedback. We want to enlarge the scope for Streams
application and started to collect use cases in the Wiki:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios
Feel free to add there via editing the page or writing a comment.
On 2016-11-28 13:47 (-0500), "Matthias J. Sax" wrote:
>
> I want to start a discussion about KIP-95:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams
>
> Looking forward to your feedback.
Hi Matthias,
I’d just like to share some fe
A few thoughts:
> introduce a Streams metadata topic (single partitioned and log compacted;
one for each application-ID)
I would consider whether the single partition is a strict requirement. A
single partition increases the likelihood of outages. This is something I'd
like to see changed in Conn
About using offset-topic metadata field:
Even with KIP-98, I think this would not work (or maybe in a weird way).
If we have the following topology:
topic1 -> subTopologyA -> topic2 -> subTopologyB
If producer of subTopologyA commits, it will commit its input offsets
from topic1. Thus, the stop-
I will read through the KIP doc once again to provide more detailed
feedbacks, but let me through my two cents just for the above email.
There are a few motivations to have a "consistent" stop-point across tasks
belonging to different sub-topologies. One of them is for interactive
queries: say you
Thanks for the input Jay.
From my understanding, your question boils down to how fuzzy the stop
point can/should be, and what guarantees we want to provide to the user.
This has multiple dimension:
1. Using a timestamp has the main issue, that if an input topic has no
data with this timestamp, t
I'd like to second the discouragement of adding a new topic per job. We
went down this path in Samza and I think the result was quite a mess. You
had to read the full topic every time a job started and so it added a lot
of overhead and polluted the topic space.
What if we did the following:
1.
1) The change to consume the metadata topic by all instances should not
be big. We want to leverage the "restore consumer" that does manual
partitions assignments already.
2) I understand your concern about adding the metadata topic... For a
single instance with one thread, it would be easy to go
A couple of remaining questions:
- it says in the KIP: "because the metadata topic must be consumed by all
instances, we need to assign the topic’s partitions manually and do not commit
offsets -- we also need to seekToBeginning() each time we consume the metadata
topic)" . How big of a change
> Instead, why would we not have batch.mode=true/false?
Matthias already replied in detail, but let me also throw in my thoughts:
I'd argue that "there is no batch [mode]" (just like "there is no spoon"
for the Matrix fans out there :-P). What would a batch mode look like? It
is primarily defin
Right now, there is only one config value and I am open to better
suggestions.
We did not go for batch.mode=true/false because we might want to have
auto stop at specific stop-offsets or stop-timestamp later on. So we can
extend the parameter with new values like autostop.at=timestamp in
combinati
I agree that the metadata topic is required to build a batching semantic
that is intuitive.
One question on the config -
autostop.at
I see one value for it - eol. What other values can be used? Instead, why
would we not have batch.mode=true/false?
On Wed, Nov 30, 2016 at 1:51 PM, Matthias J. Sa
Both types of intermediate topics are handled the exact same way and
both types do connect different subtopologies (even if the user might
not be aware that there are multiple subtopologies in case of internal
data repartitioning). So there is no distinction between user
intermediate topics (via th
With the marker we have substituted a failure problem for a liveness problem in
that under repeated failure the other instances will not do any useful work.
As mentioned in the other email, I don't know if we need to worry about that
corner case just yet.
Eno
> On 30 Nov 2016, at 20:06, Matthi
In the KIP, two types of intermediate topics are described, 1) ones that
connect two sub-topologies, and 2) others that are internal repartitioning
topics (e.g., for joins).
I wasn't envisioning stopping the consumption of (2) at the HWM. The HWM can be
used for the source topics only (so I agre
Eno,
> So in general, we have the problem of failures during writes to the
metadata topic itself.
The KIP suggests to use marker messaged for this case. The marker is
either written (indicating success) or not. If not, after
failure/rebalance the (new) group leader will collect HW again. As long
I think the KIP looks good. I also think we need the metadata topic
in-order to provide sane guarantees on what data will be processed.
As Matthias has outlined in the KIP we need to know when to stop consuming
from intermediate topics, i.e, topics that are part of the same application
but are use
Hi Matthias,
I like the first part of the KIP. However, the second part with the failure
modes and metadata topic is quite complex and I'm worried it doesn't solve the
problems you mention under failure. For example, the application can fail
before writing to the metadata topic. In that case, i
Thanks for your input.
To clarify: the main reason to add the metadata topic is to cope with
subtopologies that are connected via intermediate topic (either
user-defined via through() or internally created for data repartitioning).
Without this handling, the behavior would be odd and user experie
Thanks for initiating this. I think this is a good first step towards
unifying batch and stream processing in Kafka.
I understood this capability to be simple yet very useful; it allows a
Streams program to process a log, in batch, in arbitrary windows defined by
the difference between the HW and
23 matches
Mail list logo