Re: [DISCUSS] Atomic Counters on cluster

Thomas Mueller Thu, 05 Feb 2015 08:45:01 -0800

Hi,

I would use the "index" approach, at least for now. By the way, I have
used the same idea for the approximate node counter mechanism (OAK-1907).


A generalisation of the "atomic counter" problem is the "atomic sum"
problem. For that, you don't just want to support +1 and -1, but +x and
-x, possibly with a constraint (an allowed range), so it could be used for
reservation systems (in synchronous mode only; not in a cluster). One
possible solution is:

The data itself is stored in the content:

/content/flight_345/seats/@seat:count = 100
/content/flight_345/reservation_1/@seat:count = -1

/content/flight_345/reservation_2/@seat:count = -3
/content/flight_1001/seats/@seat:count = 200
/content/flight_1001/reservation_1/@seat:count = -3
/content/flight_1001/reservation_2/@seat:count = -2

The query to get the current count would be:

    select sum(count) from [nt:base]
    where descendantnode('/content/flight_1001')

That query would be very fast, O(1), if there is an index. The index
definition would be:


# index of type "sum", on property "seat:count"
/oak:index/flights/@type = sum
/oak:index/flights/@propertyName = seat:count


# restriction: sum must be at least 0, otherwise the commit fails (similar
to a unique index)
/oak:index/flights/@min = 0

# aggregate in the parent, so keep one sum(count) per flight
/oak:index/flights/@aggregationLevel = 1


For an async index, the "min" constraint can't be guaranteed.

For fast counters, if the count can get very high (page access count for
example), you probably want to avoid many nodes. For that case, a
background thread should be used, that only updates the content
periodically (once every 10 seconds for example), and aggregates the
content (replaces all nodes in /content/x/* once in a while with just one
node). 

Regards,
Thomas








On 04/02/15 10:33, "Davide Giannella" <dav...@apache.org> wrote:

>On 03/02/2015 15:41, Michael Dürig wrote:
>>
>> Hi,
>>
>> I think we should keep this independent from indexing. Running the
>> counter consolidation asynchronously might have commonalities with
>> async indexing. If so, I'd first implement the former separately and
>> then factor out the commonalties.
>Discussed off-list with Michael came up with the proposed approach
>
>- try out the index configuration approach as it should be a quick win.
>- refactor later on
>
>About the refactoring here's my thinking which I'd like to have some
>ideas as I didn't look at the details and something could be wrong.
>
>Right now we have a reliable way to run async processes (indexes in the
>specific case) in oak that is the async index.
>
>This has a known way of configuring the aspects by acting on
>oak:index/oak:queryIndexDefinition.
>
>Proposal:
>
>- rename the AsyncIndexUpdate into something like AsyncProcess
>- create a new area in the repo, beside the oak:index where we can
>configure these aspects. For example oak:processes/oak:processDefinition
>- instruct the AsyncProcess to understand this new area as well.
>
>Thoughts?
>
>D.
>
>

Re: [DISCUSS] Atomic Counters on cluster

Reply via email to