On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton wrote:
> The use case is to support an unbounded stream-stream join, where the
> elements are arriving in roughly time sorted order. Removing a specific
> element from the timestamp indexed collection is necessary when a match is
> found.
>
Just
The use case is to support an unbounded stream-stream join, where the
elements are arriving in roughly time sorted order. Removing a specific
element from the timestamp indexed collection is necessary when a match is
found. Having clearRange is helpful to expire elements that are no longer
Hi,
Currently we only support removing a timestamp range. You can remove a
single timestamp of course by removing [ts, ts+1), however if there are
multiple elements with the same timestamp this will remove all of those
elements.
Does this fit your use case? If not, I wonder if MapState is closer
Hi Reuven,
I noticed that there was an implementation of the in-memory OrderedListState
introduced [1]. Where can I find out more regarding the plan and design? Is
there a design doc? I'd like to know more details about the implementation to
see if it fits my use case. I was hoping it would
Hey folks,
Sry I'm late to this thread but this might be very helpful for the problem
we're dealing with. Do we have a design doc or a jira ticket I can follow?
Cheers,
Catlyn
On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský wrote:
> My questions were just an example. I fully agree there is a
My questions were just an example. I fully agree there is a fundamental
need for a sorted state (of some form, and I also think this links to
efficient implementation of retrations) - I was reacting to Kenn's
question about BIP. This one would be pretty nice example why it would
be good to
Jan - my proposal is exactly TimeSortedBagState (more accurately -
TimeSortedListState), though I went a bit further and also proposed a way
to have a dynamic number of tagged TimeSortedBagStates.
You are correct that the runner doesn't really have to store the data time
sorted - what's actually
Big +1 for a BIP, as this might really help clarify all the pros and
cons of all possibilities. There seem to be questions that need
answering and motivating use cases - do we need sorted map state or can
we solve our use cases by something simpler - e.g. the mentioned
TimeSortedBagState? Does
Zooming in from generic philosophy to be clear: adding time ordered buffer
to the Fn state API is *not* a shortcut.It has benefits that will not be
achieved by SDK-side implementation on top of either ordered or unordered
multimap. Are those benefits worth expanding the API? I don't know.
A
It might help for me to describe what I have in mind. I'm still proposing
that we build multimap, just not a globally-sorted multimap.
My previous proposal was that we provide a Multimap state type,
sorted by key. this would have two additional operations -
multimap.getRange(startKey, endKey) and
The portability layer is meant to live across multiple versions of Beam and
I don't think it should be treated by doing the simple and useful thing now
since I believe it will lead to a proliferation of the API.
On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles wrote:
> I have thoughts on the
I have thoughts on the subject of whether to have APIs just for the
lowest-level building blocks versus having APIs for higher-level
constructs. Specifically this applies to providing only unsorted multimap
vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
time-ordered buffer;
I would be glad to take a stab at how to provide sorting on top of unsorted
multimap state.
Based upon your description, you want integer keys representing timestamps
and arbitrary user value for the values, is that correct?
What kinds of operations do you need on the sorted map state in order of
This sounds to me like a potential runner strategy. However if a runner can
natively support sorted maps (e.g. we expect the Dataflow runner to be able
to do so, and I think it would be useful for other runners as well), then
it's probably preferable to allow the runner to use its native
On Tue, May 28, 2019 at 2:59 AM Robert Bradshaw wrote:
> On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote:
> >
> > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
> >>
> >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
> >>>
> >>> Some great comments!
> >>>
> >>> Aljoscha:
On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote:
>
> On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
>>
>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
>>>
>>> Some great comments!
>>>
>>> Aljoscha: absolutely this would have to be implemented by runners to be
>>> efficient.
In my look for, it should have said "void". instead of "key". when
explaining how to do it.
On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik wrote:
> For the API that you proposed, the map key is always "void" and the sort
> key == user key. So in my example of
> key: dummy value
> key.000: token,
For the API that you proposed, the map key is always "void" and the sort
key == user key. So in my example of
key: dummy value
key.000: token, (0001, value4)
key.001: token, (0010, value1), (0011, value2)
key.01: token
key.1: token, (1011, value3)
you would have:
"void": dummy value
"void".000:
Can you explain how fetching and deleting ranges of keys would work with
this data structure?
On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik wrote:
> Reuven, for the example, I assume that we never want to store more then 2
> values at a given sort key prefix, and if we do then we will create a
On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
>
>
> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
>
>> Some great comments!
>>
>> *Aljoscha*: absolutely this would have to be implemented by runners to
>> be efficient. We can of course provide a default (inefficient)
>>
On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
> Some great comments!
>
> *Aljoscha*: absolutely this would have to be implemented by runners to be
> efficient. We can of course provide a default (inefficient) implementation,
> but ideally runners would provide better ones.
>
> *Jan* Exactly.
On Fri, May 24, 2019 at 9:36 AM Rui Wang wrote:
>
>> *Jan* Exactly. I think MapState can be dropped or backed by this. E.g.
>>
>>>
>>> Regarding to performance, I have a concern to drop MapSate or use
> SortedMapState to back it. Although SortedMapState provide API to remove a
> single key, I
Reuven, for the example, I assume that we never want to store more then 2
values at a given sort key prefix, and if we do then we will create a new
longer prefix splitting up the values based upon the sort key.
Tuple representation in examples below is (key, sort key, value) and . is a
character
>
>
> *Jan* Exactly. I think MapState can be dropped or backed by this. E.g.
>
>>
>> Regarding to performance, I have a concern to drop MapSate or use
SortedMapState to back it. Although SortedMapState provide API to remove a
single key, I would imagine its implementation in runners will different
Some great comments!
*Aljoscha*: absolutely this would have to be implemented by runners to be
efficient. We can of course provide a default (inefficient) implementation,
but ideally runners would provide better ones.
*Jan* Exactly. I think MapState can be dropped or backed by this. E.g.
On Fri, May 24, 2019 at 5:32 AM Reuven Lax wrote:
>
> On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>>
>>
>>
>> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>>>
>>>
>>>
>>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>
> A few obvious problems with this code:
> 1.
. 5. 2019 10:56:45
Předmět: Re: DISCUSS: Sorted MapState API
"
This is quite interesting! The Flink Table API (relational and SQL) has an
implementation for the type of join you mention in the example. We call it
Temporal Table Join, and it works on something we call Temporal Tables:
This is quite interesting! The Flink Table API (relational and SQL) has an
implementation for the type of join you mention in the example. We call it
Temporal Table Join, and it works on something we call Temporal Tables:
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>
>
> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>>
>>
>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>>
>>> A few obvious problems with this code:
1. Removing the elements already processed from the bag requires
On Thu, May 23, 2019 at 11:23 AM Lukasz Cwik wrote:
> I would suggest that we drop MapState and instead support MultimapState
> without ordering as a first pass and potentially add ordering later.
>
While I agree that MultimapState is useful on its own, for these
aggregation use cases ordering
+100 As someone who has just spent a lot of time coding all the "GC,
caching of BagState fetches, etc" this would make life a lot easier!
Its also generally valuable for a lot of timeseries work.
On Fri, 24 May 2019 at 04:59, Lukasz Cwik wrote:
>
>
> On Thu, May 23, 2019 at 1:53 PM Ahmet Altay
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>
>
> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>>
>>
>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>>
>>> A few obvious problems with this code:
1. Removing the elements already processed from the bag requires
On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>
> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>
>> A few obvious problems with this code:
>>> 1. Removing the elements already processed from the bag requires
>>> clearing and rewriting the entire bag. This is O(n^2) in the number of
On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
> A few obvious problems with this code:
>> 1. Removing the elements already processed from the bag requires
>> clearing and rewriting the entire bag. This is O(n^2) in the number of
>> input trades.
>>
> why it's not O(2 * n) to clearing and
>
> A few obvious problems with this code:
> 1. Removing the elements already processed from the bag requires
> clearing and rewriting the entire bag. This is O(n^2) in the number of
> input trades.
>
why it's not O(2 * n) to clearing and rewriting trade state?
> public interface
I would suggest that we drop MapState and instead support MultimapState
without ordering as a first pass and potentially add ordering later.
Inside an SDK we would be able to build a "sorted" multimap state that uses
a prefix of the key and implement radix based sorting. Any time a prefix
has too
36 matches
Mail list logo