I think Marton has some good points here.
1) Is KeyedDataStream a better name if this is only a renaming?
2) the discretize semantics is unclear indeed. Are we operating on a single
or sequence of datasets? If the latter why not call it something else
(dstream). How are joins and other binary
There is no inconsistency between the Batch and Streaming API. They have
different semantics - the batch API is implicitly always windowed.
There is a naming difference between the two APIs.
There is a strong inconsistency within the Streaming API right now.
Grouping and aggregating without
I think the though was to explicitly not have the same terminology as the
batch API to not confuse people.
But this is a minor naming issue IMO.
On Tue, Jul 14, 2015 at 12:40 PM, Gyula Fóra gyula.f...@gmail.com wrote:
I see your point, reduceByKey is much clearer.
The question is whether we
I see your point, reduceByKey is much clearer.
The question is whether we want to introduce this inconsistency across the
two api-s or stick with what we have.
On Tue, Jul 14, 2015 at 10:57 AM Aljoscha Krettek aljos...@apache.org
wrote:
I agree, the groupBy, in the batch API is misleading,
Concerning your comments:
1) In the new design, there is no grouping without windowing. The
KeyedDataStream subsumes the grouping and key-ing for partitioned state.
The keyBy() + window() makes a parallel grouped window
keyBy() alone allows access to partitioned state.
My thought
It is not a bit different than the batch API, because streaming semantics
are a bit different ;-)
One good thing is that we can make things better that were sub-optimal in
the Batch API.
On Tue, Jul 14, 2015 at 10:55 AM, Stephan Ewen se...@apache.org wrote:
keyBy() does not do any grouping.
keyBy() does not do any grouping. Grouping in streams in not defined
without windows.
On Tue, Jul 14, 2015 at 10:48 AM, Gyula Fóra gyula.f...@gmail.com wrote:
If we only want to have either keyBy or groupBy, why not keep groupBy? That
would be more consistent with the batch api.
On Tue, Jul
If we only want to have either keyBy or groupBy, why not keep groupBy? That
would be more consistent with the batch api.
On Tue, Jul 14, 2015 at 10:35 AM Stephan Ewen se...@apache.org wrote:
Concerning your comments:
1) In the new design, there is no grouping without windowing. The
+1 I like it as well.
On Mon, 13 Jul 2015 at 16:17 Kostas Tzoumas ktzou...@apache.org wrote:
+1 from my side
On Mon, Jul 13, 2015 at 4:15 PM, Stephan Ewen se...@apache.org wrote:
Do we have consensus on these designs?
If we have, we should get to implementing this soon, because
In general I like it, although the main difference between the current and
the new one is the windowing and that is still not very clear.
Where do we have the full stream time windows for instance?(which is
parallel but not keyed)
On Mon, Jul 13, 2015 at 4:28 PM Aljoscha Krettek
Do we have consensus on these designs?
If we have, we should get to implementing this soon, because basically all
streaming patches will have to be revisited in light of this...
On Tue, Jul 7, 2015 at 3:41 PM, Gyula Fóra gyula.f...@gmail.com wrote:
You are right thats an important issue.
And
+1 from my side
On Mon, Jul 13, 2015 at 4:15 PM, Stephan Ewen se...@apache.org wrote:
Do we have consensus on these designs?
If we have, we should get to implementing this soon, because basically all
streaming patches will have to be revisited in light of this...
On Tue, Jul 7, 2015 at
+1
No further concerns from my side either
On 13 Jul 2015, at 18:30, Gyula Fóra gyula.f...@gmail.com wrote:
+1
On Mon, Jul 13, 2015 at 6:23 PM Stephan Ewen se...@apache.org wrote:
If naming is the only concern, then we should go ahead, because we can
change names easily (before the
If naming is the only concern, then we should go ahead, because we can
change names easily (before the release).
In fact, I don't think it leaves a bad impression. Global windows are
non-parallel windows. There are also parallel windows. Pick what you need
and what works.
On Mon, Jul 13, 2015
I think we agree on everything its more of a naming issue :)
I thought it might be misleading that global time windows are
non-parallel windows. We dont want to give a bad impression. (Also we
dont want them to think that every global window is parallel but thats not
a problem here)
Gyula
On
+1
On Mon, Jul 13, 2015 at 6:23 PM Stephan Ewen se...@apache.org wrote:
If naming is the only concern, then we should go ahead, because we can
change names easily (before the release).
In fact, I don't think it leaves a bad impression. Global windows are
non-parallel windows. There are also
Okay, what is missing about the windowing in your opinion?
The core points of the document are:
- The parallel windows are per group only.
- The implementation of the parallel windows holds window data in the
group buffers.
- The global windows are non-parallel. May have parallel
Hi,
I just noticed that we don't have anything about how iterations and
timestamps/watermarks should interact.
Cheers,
Aljoscha
On Mon, 6 Jul 2015 at 23:56 Stephan Ewen se...@apache.org wrote:
Hi all!
As many of you know, there are a ongoing efforts to consolidate the
streaming API for the
18 matches
Mail list logo