Re: Geode and data snapshotting

Olivier Mallassi Thu, 12 May 2016 00:55:55 -0700

Aggr size should be around 500 bytes max
(Will try a first implem)


Thx

On Saturday, 7 May 2016, Michael Stolz <[email protected]> wrote:

> For the CQ to deliver only latest, you would need to have a separately
> keyed "Latest" entity that they can register the interest on.
>
> How big are each of the aggregates? If they are large you will not get
> much benefit from my array model.
>
> The array model is ideal for fixed numbers of doubles or integers like
> availability counts and rates in hotel systems or 5 minute prices on
> financial instruments.
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Fri, May 6, 2016 at 12:27 PM, Olivier Mallassi <
> [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi all
>>
>> thank you for your answer.
>>
>> Mike, this is not for Market Data (one day...) but it is more related to
>> our geode / Storm integration (as you know).
>>
>> At one point, I need to snapshot my aggregates: every xx minutes, a
>> specific event is emitted. this event specify a txId (long). and, in the
>> end, every txId matches to a snaphsot (aka well known version of the
>> aggregates)
>>
>> I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
>> etc...
>>
>> I like your pattern and it could work and be modeled like
>>
>> key: aggregateKey =  a.b.c
>> value: aggregates[] where the index 0 is the latest txId, index 1
>> previous txId and so on
>>
>> The thing with this model (and this is maybe not a real issue) is that,
>> as I have CQ, the client will be notified with aggregates[] and not only
>> the latest objects. (but if I implement delta propagation?)
>>
>> Maybe another option (in my case) would be to use the txId in the key.
>> key: aggregateKey = [a.b.c, txID1]
>> value: aggregate
>>
>> if you have any ideas :) but in all cases, thank you.
>>
>> oliv/
>>
>> On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> Yes the lists can be first class objects with the same key as the
>>> description object and possibly some sort of date stamp appended, depending
>>> on how many observations over how many days you want to keep.
>>>
>>> Yes, I think this model can be used very well for any periodic
>>> time-series data, and would therefore be a very useful pattern.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Mike Stolz
>>> Principal Engineer, GemFire Product Manager
>>> Mobile: 631-835-4771
>>>
>>> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>> Mike,
>>>>
>>>> The model you just described, are you referring to one parent object
>>>> which describes an Entity and multiple List objects to describe measurable
>>>> metrics (e.g. stock price, temperature) with constant Array objects to
>>>> store time slices ?
>>>>
>>>> Metadata-Object
>>>>     - List of [metric1 timeslice array] - List<Array>
>>>>     - List of [metric2 timeslice array]
>>>>
>>>> How will the indexes work in this case ?
>>>>
>>>> This model can be used as a general time-series pattern for Geode.
>>>>
>>>> Thanks,
>>>> Alan
>>>>
>>>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <[email protected]
>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>
>>>>> If what you are trying to do is get a consistent picture of market
>>>>> data and trade data at a point in time, then maybe some form of temporal
>>>>> storage organization would give you the best approach.
>>>>>
>>>>> If you can define a regular interval we can do a very elegant
>>>>> mechanism based on fixed length arrays in GemFire that contain point in
>>>>> time snapshots of the rapidly changing elements. For instance, you might
>>>>> want a single top-level market data description object and then a price
>>>>> object with individual prices at 5 minute intervals built as a simple 
>>>>> array
>>>>> of doubles.
>>>>>
>>>>> Does that sound like it might be a workable pattern for you?
>>>>>
>>>>>
>>>>> --
>>>>> Mike Stolz
>>>>> Principal Engineer, GemFire Product Manager
>>>>> Mobile: 631-835-4771
>>>>>
>>>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>>>> [email protected]
>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>
>>>>>> Hi everybody
>>>>>>
>>>>>> I am facing an issue and do not know what would be the right pattern.
>>>>>> I guess you can help.
>>>>>>
>>>>>> The need is to create snapshot of datas:
>>>>>> - let's say you have a stream of incoming objects that you want to
>>>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>>>> CQ) to updates on *MyRegion*.
>>>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>>>> case) you want to snapshot these datas (while keeping updated the 
>>>>>> *MyRegion
>>>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>>>> currently thinking about keeping a fixed number of snapshots and rolling 
>>>>>> on
>>>>>> them.
>>>>>>
>>>>>> I see several options to implement this.
>>>>>> - *option#1*: at fixed period, I execute a function to copy data
>>>>>> from *MyRegion *to *MyRegion/snapshot-id1*. not sure it works fine
>>>>>> with large amount of data. not sure how to well handle new objects 
>>>>>> arriving
>>>>>> in *MyRegion *while I am snapshotting it.
>>>>>>
>>>>>> - *option#2*: I write the object twice: once in *MyRegion *and also
>>>>>> in *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest
>>>>>> one. then switching to a new snapshot is about writing the objects in 
>>>>>> *MyRegion
>>>>>> *and *MyRegion/snapshot-idN+1*.
>>>>>>
>>>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>>>> see two implementations:
>>>>>> - *implem#1*. use a custom function that writes the object twice
>>>>>> (regions can be collocated etc...)? I can use local transaction within 
>>>>>> the
>>>>>> function in order to guarantee consistency between both regions.
>>>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>>>> are declared on multiple nodes, I assume there is no risk of losing data 
>>>>>> in
>>>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>>>> AsyncListener are processed) ?
>>>>>>
>>>>>> Implem#1 looks easier to me (and I do not think it costs me a lot
>>>>>> more in terms of performance than the HA AsyncEventListener).
>>>>>>
>>>>>> What would be your opinions? favorite options? alternative options?
>>>>>>
>>>>>> I hope my email is clear enough. Many thanks for your help.
>>>>>>
>>>>>> olivier.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Geode and data snapshotting

Reply via email to