Re: Remove cache groups in AI 3.0

2018-04-11 Thread Anton Vinogradov
Vova,
thanks for explanations.

Comments really valuable to me.

>> 1) Please see my original message explaining how this could be fixed
>> without cache groups.

I have questions about you initial statements.

  >> 1) "Merge" partition data from different caches
  Ptoposal is just to automate grouping?

  >> 2) Employ segment-extent based approach instead of file-per-partition
  Idea is to keep all colocated partitions at one or some files?
  Something like: keep some colocated patritions together (for example to
have files ~ 2gb) with automatic grouping/splitting?

In case both answers are "yes":
There is no need to wait for 3.0 to implement this.

1) #2 sound like a storage optimization and can be implemented not like a
cache groups replacement, but like a "too many fsyncs" solution.
It looks to be a good idea to keep all replicated cache's partitions
together.

2) We can just deprecate cache groups since caches will be grouped
automatically, no need to atomicaly replace groups by proposed solution.

>> Once we have p.1 and p.2 ready cache groups could be removed, couldn't
they?
Sounds correct


2018-04-11 14:32 GMT+03:00 Vladimir Ozerov :

> Anton,
>
> Your example is extremely unlikely use case which we've never seen in a
> wild. But nevertheless:
> 1) Please see my original message explaining how this could be fixed
> without cache groups.
> 2) Logical cache creation also causes PME.
> 3) Yes, it is real. No fundamental limitations. In addition, removal of
> logical cache is costly operation with O(N) complexity, where N is number
> of records in cache. Removal of physical cache is constant-time operation.
> 4) I do not see how monitoring is related to cache groups.
>
> On Wed, Apr 11, 2018 at 2:02 PM, Anton Vinogradov  wrote:
>
> > Vova,
> >
> > 1) Each real cache have some megabytes overhead of memory on affinity
> each
> > node.
> > Virtual cache inside cache group consumes much less memory (~ 0mb).
> >
> > 2) Real cache creation cause PME,
> > Virtual cache creation just cause minor topology increment and do not
> stops
> > tx.
> >
> > Not sure about this staterment, is it correct?
> >
> > 3) In case we're talking about multi-tenant environment, we can have
> > 10_000+ organisations (or even some millions) inside one cluster, each
> can
> > have ~20 caches.
> > Is it real to have 200_000+ caches? I dont think so. Rebalancing will
> > freeze cluster in that case.
> >
> > Also, organisation/removal creation is a regular orepation (eg. 100+ per
> > day) and it should be fast and not cause performance degradation.
> >
> > 4) It very useful to have monitoring based on cache groups in case of
> > multi-tenant environment.
> > Each organisation will consume some megabytes, but, for example, all
> Loans
> > will require terabytes or have update rate over 9000 per second, and
> you'll
> > see that.
> >
> > The main Idea that virtual cache inside cache group require almost 0
> space,
> > but works as cool as real and even better.
> >
> >
> > 2018-04-11 13:45 GMT+03:00 Dmitry Pavlov :
> >
> > > Hi Igniters,
> > >
> > > Actually I do not understand both points of view: we need to
> > (keep/remove)
> > > cache groups.
> > >
> > > Only one reason for refactoring I see : 'too much fsyncs', but it may
> be
> > > solved at level of FilePageStoreV2 with new virtual FS for
> > partitions/index
> > > data, without any other changes.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > ср, 11 апр. 2018 г. в 13:30, Vladimir Ozerov :
> > >
> > > > Anton,
> > > >
> > > > I do not see the point. What is the problem with creation or removal
> of
> > > > real cache?
> > > >
> > > > On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov 
> > wrote:
> > > >
> > > > > Vova,
> > > > >
> > > > > Cache groups are very useful.
> > > > >
> > > > > For example, you can develop multi-tenant applications using cache
> > > groups
> > > > > as a templates.
> > > > > In case you have some cache groups, eg. Users, Loans, Deposits, you
> > can
> > > > > keep records for Organisation_A, Organisation_B and Organisation_C
> at
> > > > same
> > > > > data sctuctures, but logically separated.
> > > > > Addition/Removal of orgatisation will not cause creation or removal
> > of
> > > > real
> > > > > caches.
> > > > >
> > > > > ASAIK, you can use GridSecurity [1] over caches inside cache
> groups,
> > > and
> > > > > gain secured multi-tenant environment as a result.
> > > > >
> > > > > Can you propose better solution without cache groups usage?
> > > > >
> > > > > [1] https://docs.gridgain.com/docs/security-concepts
> > > > >
> > > > > 2018-04-11 0:24 GMT+03:00 Denis Magda :
> > > > >
> > > > > > Vladimir,
> > > > > >
> > > > > > - Data size per-cache
> > > > > >
> > > > > >
> > > > > > Could you elaborate how the data size per-cache/table task will
> be
> > > > > > addressed with proposed architecture? Are you going to store 

Re: Remove cache groups in AI 3.0

2018-04-11 Thread Vladimir Ozerov
Anton,

Your example is extremely unlikely use case which we've never seen in a
wild. But nevertheless:
1) Please see my original message explaining how this could be fixed
without cache groups.
2) Logical cache creation also causes PME.
3) Yes, it is real. No fundamental limitations. In addition, removal of
logical cache is costly operation with O(N) complexity, where N is number
of records in cache. Removal of physical cache is constant-time operation.
4) I do not see how monitoring is related to cache groups.

On Wed, Apr 11, 2018 at 2:02 PM, Anton Vinogradov  wrote:

> Vova,
>
> 1) Each real cache have some megabytes overhead of memory on affinity each
> node.
> Virtual cache inside cache group consumes much less memory (~ 0mb).
>
> 2) Real cache creation cause PME,
> Virtual cache creation just cause minor topology increment and do not stops
> tx.
>
> Not sure about this staterment, is it correct?
>
> 3) In case we're talking about multi-tenant environment, we can have
> 10_000+ organisations (or even some millions) inside one cluster, each can
> have ~20 caches.
> Is it real to have 200_000+ caches? I dont think so. Rebalancing will
> freeze cluster in that case.
>
> Also, organisation/removal creation is a regular orepation (eg. 100+ per
> day) and it should be fast and not cause performance degradation.
>
> 4) It very useful to have monitoring based on cache groups in case of
> multi-tenant environment.
> Each organisation will consume some megabytes, but, for example, all Loans
> will require terabytes or have update rate over 9000 per second, and you'll
> see that.
>
> The main Idea that virtual cache inside cache group require almost 0 space,
> but works as cool as real and even better.
>
>
> 2018-04-11 13:45 GMT+03:00 Dmitry Pavlov :
>
> > Hi Igniters,
> >
> > Actually I do not understand both points of view: we need to
> (keep/remove)
> > cache groups.
> >
> > Only one reason for refactoring I see : 'too much fsyncs', but it may be
> > solved at level of FilePageStoreV2 with new virtual FS for
> partitions/index
> > data, without any other changes.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 11 апр. 2018 г. в 13:30, Vladimir Ozerov :
> >
> > > Anton,
> > >
> > > I do not see the point. What is the problem with creation or removal of
> > > real cache?
> > >
> > > On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov 
> wrote:
> > >
> > > > Vova,
> > > >
> > > > Cache groups are very useful.
> > > >
> > > > For example, you can develop multi-tenant applications using cache
> > groups
> > > > as a templates.
> > > > In case you have some cache groups, eg. Users, Loans, Deposits, you
> can
> > > > keep records for Organisation_A, Organisation_B and Organisation_C at
> > > same
> > > > data sctuctures, but logically separated.
> > > > Addition/Removal of orgatisation will not cause creation or removal
> of
> > > real
> > > > caches.
> > > >
> > > > ASAIK, you can use GridSecurity [1] over caches inside cache groups,
> > and
> > > > gain secured multi-tenant environment as a result.
> > > >
> > > > Can you propose better solution without cache groups usage?
> > > >
> > > > [1] https://docs.gridgain.com/docs/security-concepts
> > > >
> > > > 2018-04-11 0:24 GMT+03:00 Denis Magda :
> > > >
> > > > > Vladimir,
> > > > >
> > > > > - Data size per-cache
> > > > >
> > > > >
> > > > > Could you elaborate how the data size per-cache/table task will be
> > > > > addressed with proposed architecture? Are you going to store data
> of
> > a
> > > > > specific cache in dedicated pages/segments? What's about index
> size?
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Dima,
> > > > > >
> > > > > > 1) Easy to understand for users
> > > > > > AI 2.x: cluster -> cache group -> cache -> table
> > > > > > AI 3.x: cluster -> cache(==table)
> > > > > >
> > > > > > 2) Fine grained cache management
> > > > > > - MVCC on/off per-cache
> > > > > > - WAL mode on/off per-cache
> > > > > > - Data size per-cache
> > > > > >
> > > > > > 3) Performance:
> > > > > > - Efficient scans are not possible with cache groups
> > > > > > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> > > > > >
> > > > > > "Huge refactoring" is not precise estimate. Let's think on how to
> > do
> > > > that
> > > > > > instead of how not to do :-)
> > > > > >
> > > > > > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> > > > > dsetrak...@apache.org
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Vladimir, sounds like a huge refactoring. Other than "cache
> > groups
> > > > are
> > > > > > > confusing", are we solving any other big issues with the new
> > > proposed
> > > > > > > approach?
> > > > > > >
> > > > > > > (every time we try to refactor rebalancing, I get goose bumps)
> > > > > > >
> > > > 

Re: Remove cache groups in AI 3.0

2018-04-11 Thread Vladimir Ozerov
Dima,

The question is: would we need cache groups if physical caches have the
same performance as logical?

On Wed, Apr 11, 2018 at 1:45 PM, Dmitry Pavlov 
wrote:

> Hi Igniters,
>
> Actually I do not understand both points of view: we need to (keep/remove)
> cache groups.
>
> Only one reason for refactoring I see : 'too much fsyncs', but it may be
> solved at level of FilePageStoreV2 with new virtual FS for partitions/index
> data, without any other changes.
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 11 апр. 2018 г. в 13:30, Vladimir Ozerov :
>
> > Anton,
> >
> > I do not see the point. What is the problem with creation or removal of
> > real cache?
> >
> > On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov  wrote:
> >
> > > Vova,
> > >
> > > Cache groups are very useful.
> > >
> > > For example, you can develop multi-tenant applications using cache
> groups
> > > as a templates.
> > > In case you have some cache groups, eg. Users, Loans, Deposits, you can
> > > keep records for Organisation_A, Organisation_B and Organisation_C at
> > same
> > > data sctuctures, but logically separated.
> > > Addition/Removal of orgatisation will not cause creation or removal of
> > real
> > > caches.
> > >
> > > ASAIK, you can use GridSecurity [1] over caches inside cache groups,
> and
> > > gain secured multi-tenant environment as a result.
> > >
> > > Can you propose better solution without cache groups usage?
> > >
> > > [1] https://docs.gridgain.com/docs/security-concepts
> > >
> > > 2018-04-11 0:24 GMT+03:00 Denis Magda :
> > >
> > > > Vladimir,
> > > >
> > > > - Data size per-cache
> > > >
> > > >
> > > > Could you elaborate how the data size per-cache/table task will be
> > > > addressed with proposed architecture? Are you going to store data of
> a
> > > > specific cache in dedicated pages/segments? What's about index size?
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Dima,
> > > > >
> > > > > 1) Easy to understand for users
> > > > > AI 2.x: cluster -> cache group -> cache -> table
> > > > > AI 3.x: cluster -> cache(==table)
> > > > >
> > > > > 2) Fine grained cache management
> > > > > - MVCC on/off per-cache
> > > > > - WAL mode on/off per-cache
> > > > > - Data size per-cache
> > > > >
> > > > > 3) Performance:
> > > > > - Efficient scans are not possible with cache groups
> > > > > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> > > > >
> > > > > "Huge refactoring" is not precise estimate. Let's think on how to
> do
> > > that
> > > > > instead of how not to do :-)
> > > > >
> > > > > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Vladimir, sounds like a huge refactoring. Other than "cache
> groups
> > > are
> > > > > > confusing", are we solving any other big issues with the new
> > proposed
> > > > > > approach?
> > > > > >
> > > > > > (every time we try to refactor rebalancing, I get goose bumps)
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov <
> > > voze...@gridgain.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > Cache groups were implemented for a sole purpose - to hide
> > internal
> > > > > > > inefficiencies. Namely (add more if I missed something):
> > > > > > > 1) Excessive heap usage for affinity/partition data
> > > > > > > 2) Too much data files as we employ file-per-partition
> approach.
> > > > > > >
> > > > > > > These problems were resolved, but now cache groups are a great
> > > source
> > > > > of
> > > > > > > confusion both for users and us - hard to understand, no way to
> > > > > configure
> > > > > > > it in deterministic way. Should we resolve mentioned
> performance
> > > > issues
> > > > > > we
> > > > > > > would never had cache groups. I propose to think we would it
> take
> > > for
> > > > > us
> > > > > > to
> > > > > > > get rid of cache groups.
> > > > > > >
> > > > > > > Please provide your inputs to suggestions below.
> > > > > > >
> > > > > > > 1) "Merge" partition data from different caches
> > > > > > > Consider that we start a new cache with the same affinity
> > > > configuration
> > > > > > > (cache mode, partition number, affinity function) as some of
> > > already
> > > > > > > existing caches, Is it possible to re-use partition
> distribution
> > > and
> > > > > > > history of existing cache for a new cache? Think of it as a
> kind
> > of
> > > > > > > automatic cache grouping which is transparent to the user. This
> > > would
> > > > > > > remove heap pressure. Also it could resolve our long-standing
> > issue
> > > > > with
> > > > > > > FairAffinityFunction when tow caches with the same affinity
> > > > > configuration
> > > > > > > are not co-located when started on different 

Re: Remove cache groups in AI 3.0

2018-04-11 Thread Anton Vinogradov
Vova,

1) Each real cache have some megabytes overhead of memory on affinity each
node.
Virtual cache inside cache group consumes much less memory (~ 0mb).

2) Real cache creation cause PME,
Virtual cache creation just cause minor topology increment and do not stops
tx.

Not sure about this staterment, is it correct?

3) In case we're talking about multi-tenant environment, we can have
10_000+ organisations (or even some millions) inside one cluster, each can
have ~20 caches.
Is it real to have 200_000+ caches? I dont think so. Rebalancing will
freeze cluster in that case.

Also, organisation/removal creation is a regular orepation (eg. 100+ per
day) and it should be fast and not cause performance degradation.

4) It very useful to have monitoring based on cache groups in case of
multi-tenant environment.
Each organisation will consume some megabytes, but, for example, all Loans
will require terabytes or have update rate over 9000 per second, and you'll
see that.

The main Idea that virtual cache inside cache group require almost 0 space,
but works as cool as real and even better.


2018-04-11 13:45 GMT+03:00 Dmitry Pavlov :

> Hi Igniters,
>
> Actually I do not understand both points of view: we need to (keep/remove)
> cache groups.
>
> Only one reason for refactoring I see : 'too much fsyncs', but it may be
> solved at level of FilePageStoreV2 with new virtual FS for partitions/index
> data, without any other changes.
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 11 апр. 2018 г. в 13:30, Vladimir Ozerov :
>
> > Anton,
> >
> > I do not see the point. What is the problem with creation or removal of
> > real cache?
> >
> > On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov  wrote:
> >
> > > Vova,
> > >
> > > Cache groups are very useful.
> > >
> > > For example, you can develop multi-tenant applications using cache
> groups
> > > as a templates.
> > > In case you have some cache groups, eg. Users, Loans, Deposits, you can
> > > keep records for Organisation_A, Organisation_B and Organisation_C at
> > same
> > > data sctuctures, but logically separated.
> > > Addition/Removal of orgatisation will not cause creation or removal of
> > real
> > > caches.
> > >
> > > ASAIK, you can use GridSecurity [1] over caches inside cache groups,
> and
> > > gain secured multi-tenant environment as a result.
> > >
> > > Can you propose better solution without cache groups usage?
> > >
> > > [1] https://docs.gridgain.com/docs/security-concepts
> > >
> > > 2018-04-11 0:24 GMT+03:00 Denis Magda :
> > >
> > > > Vladimir,
> > > >
> > > > - Data size per-cache
> > > >
> > > >
> > > > Could you elaborate how the data size per-cache/table task will be
> > > > addressed with proposed architecture? Are you going to store data of
> a
> > > > specific cache in dedicated pages/segments? What's about index size?
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Dima,
> > > > >
> > > > > 1) Easy to understand for users
> > > > > AI 2.x: cluster -> cache group -> cache -> table
> > > > > AI 3.x: cluster -> cache(==table)
> > > > >
> > > > > 2) Fine grained cache management
> > > > > - MVCC on/off per-cache
> > > > > - WAL mode on/off per-cache
> > > > > - Data size per-cache
> > > > >
> > > > > 3) Performance:
> > > > > - Efficient scans are not possible with cache groups
> > > > > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> > > > >
> > > > > "Huge refactoring" is not precise estimate. Let's think on how to
> do
> > > that
> > > > > instead of how not to do :-)
> > > > >
> > > > > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Vladimir, sounds like a huge refactoring. Other than "cache
> groups
> > > are
> > > > > > confusing", are we solving any other big issues with the new
> > proposed
> > > > > > approach?
> > > > > >
> > > > > > (every time we try to refactor rebalancing, I get goose bumps)
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov <
> > > voze...@gridgain.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > Cache groups were implemented for a sole purpose - to hide
> > internal
> > > > > > > inefficiencies. Namely (add more if I missed something):
> > > > > > > 1) Excessive heap usage for affinity/partition data
> > > > > > > 2) Too much data files as we employ file-per-partition
> approach.
> > > > > > >
> > > > > > > These problems were resolved, but now cache groups are a great
> > > source
> > > > > of
> > > > > > > confusion both for users and us - hard to understand, no way to
> > > > > configure
> > > > > > > it in deterministic way. Should we resolve mentioned
> performance
> > > > issues
> > > > > > we
> > > > > > > would never had cache 

Re: Remove cache groups in AI 3.0

2018-04-11 Thread Dmitry Pavlov
Hi Igniters,

Actually I do not understand both points of view: we need to (keep/remove)
cache groups.

Only one reason for refactoring I see : 'too much fsyncs', but it may be
solved at level of FilePageStoreV2 with new virtual FS for partitions/index
data, without any other changes.

Sincerely,
Dmitriy Pavlov

ср, 11 апр. 2018 г. в 13:30, Vladimir Ozerov :

> Anton,
>
> I do not see the point. What is the problem with creation or removal of
> real cache?
>
> On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov  wrote:
>
> > Vova,
> >
> > Cache groups are very useful.
> >
> > For example, you can develop multi-tenant applications using cache groups
> > as a templates.
> > In case you have some cache groups, eg. Users, Loans, Deposits, you can
> > keep records for Organisation_A, Organisation_B and Organisation_C at
> same
> > data sctuctures, but logically separated.
> > Addition/Removal of orgatisation will not cause creation or removal of
> real
> > caches.
> >
> > ASAIK, you can use GridSecurity [1] over caches inside cache groups, and
> > gain secured multi-tenant environment as a result.
> >
> > Can you propose better solution without cache groups usage?
> >
> > [1] https://docs.gridgain.com/docs/security-concepts
> >
> > 2018-04-11 0:24 GMT+03:00 Denis Magda :
> >
> > > Vladimir,
> > >
> > > - Data size per-cache
> > >
> > >
> > > Could you elaborate how the data size per-cache/table task will be
> > > addressed with proposed architecture? Are you going to store data of a
> > > specific cache in dedicated pages/segments? What's about index size?
> > >
> > > --
> > > Denis
> > >
> > > On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Dima,
> > > >
> > > > 1) Easy to understand for users
> > > > AI 2.x: cluster -> cache group -> cache -> table
> > > > AI 3.x: cluster -> cache(==table)
> > > >
> > > > 2) Fine grained cache management
> > > > - MVCC on/off per-cache
> > > > - WAL mode on/off per-cache
> > > > - Data size per-cache
> > > >
> > > > 3) Performance:
> > > > - Efficient scans are not possible with cache groups
> > > > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> > > >
> > > > "Huge refactoring" is not precise estimate. Let's think on how to do
> > that
> > > > instead of how not to do :-)
> > > >
> > > > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> > > dsetrak...@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > Vladimir, sounds like a huge refactoring. Other than "cache groups
> > are
> > > > > confusing", are we solving any other big issues with the new
> proposed
> > > > > approach?
> > > > >
> > > > > (every time we try to refactor rebalancing, I get goose bumps)
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > Cache groups were implemented for a sole purpose - to hide
> internal
> > > > > > inefficiencies. Namely (add more if I missed something):
> > > > > > 1) Excessive heap usage for affinity/partition data
> > > > > > 2) Too much data files as we employ file-per-partition approach.
> > > > > >
> > > > > > These problems were resolved, but now cache groups are a great
> > source
> > > > of
> > > > > > confusion both for users and us - hard to understand, no way to
> > > > configure
> > > > > > it in deterministic way. Should we resolve mentioned performance
> > > issues
> > > > > we
> > > > > > would never had cache groups. I propose to think we would it take
> > for
> > > > us
> > > > > to
> > > > > > get rid of cache groups.
> > > > > >
> > > > > > Please provide your inputs to suggestions below.
> > > > > >
> > > > > > 1) "Merge" partition data from different caches
> > > > > > Consider that we start a new cache with the same affinity
> > > configuration
> > > > > > (cache mode, partition number, affinity function) as some of
> > already
> > > > > > existing caches, Is it possible to re-use partition distribution
> > and
> > > > > > history of existing cache for a new cache? Think of it as a kind
> of
> > > > > > automatic cache grouping which is transparent to the user. This
> > would
> > > > > > remove heap pressure. Also it could resolve our long-standing
> issue
> > > > with
> > > > > > FairAffinityFunction when tow caches with the same affinity
> > > > configuration
> > > > > > are not co-located when started on different topology versions.
> > > > > >
> > > > > > 2) Employ segment-extent based approach instead of
> > file-per-partition
> > > > > > - Every object (cache, index) reside in dedicated segment
> > > > > > - Segment consists of extents (minimal allocation units)
> > > > > > - Extents are allocated and deallocated as needed
> > > > > > - *Ignite specific*: particular extent can be used by only one
> > > > partition
> > > > > > - Segments may be located in any number of data files we find

Re: Remove cache groups in AI 3.0

2018-04-11 Thread Vladimir Ozerov
Denis,

Normally, every database object, whether it is a table or an index, is kept
in it's own exclusive segment. Segment can span one or more real files.
Segment always have a kind of allocation map allowing to quickly get number
of allocated pages for specific object.

On Wed, Apr 11, 2018 at 12:24 AM, Denis Magda  wrote:

> Vladimir,
>
> - Data size per-cache
>
>
> Could you elaborate how the data size per-cache/table task will be
> addressed with proposed architecture? Are you going to store data of a
> specific cache in dedicated pages/segments? What's about index size?
>
> --
> Denis
>
> On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov 
> wrote:
>
> > Dima,
> >
> > 1) Easy to understand for users
> > AI 2.x: cluster -> cache group -> cache -> table
> > AI 3.x: cluster -> cache(==table)
> >
> > 2) Fine grained cache management
> > - MVCC on/off per-cache
> > - WAL mode on/off per-cache
> > - Data size per-cache
> >
> > 3) Performance:
> > - Efficient scans are not possible with cache groups
> > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> >
> > "Huge refactoring" is not precise estimate. Let's think on how to do that
> > instead of how not to do :-)
> >
> > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > wrote:
> >
> > > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > > confusing", are we solving any other big issues with the new proposed
> > > approach?
> > >
> > > (every time we try to refactor rebalancing, I get goose bumps)
> > >
> > > D.
> > >
> > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > Cache groups were implemented for a sole purpose - to hide internal
> > > > inefficiencies. Namely (add more if I missed something):
> > > > 1) Excessive heap usage for affinity/partition data
> > > > 2) Too much data files as we employ file-per-partition approach.
> > > >
> > > > These problems were resolved, but now cache groups are a great source
> > of
> > > > confusion both for users and us - hard to understand, no way to
> > configure
> > > > it in deterministic way. Should we resolve mentioned performance
> issues
> > > we
> > > > would never had cache groups. I propose to think we would it take for
> > us
> > > to
> > > > get rid of cache groups.
> > > >
> > > > Please provide your inputs to suggestions below.
> > > >
> > > > 1) "Merge" partition data from different caches
> > > > Consider that we start a new cache with the same affinity
> configuration
> > > > (cache mode, partition number, affinity function) as some of already
> > > > existing caches, Is it possible to re-use partition distribution and
> > > > history of existing cache for a new cache? Think of it as a kind of
> > > > automatic cache grouping which is transparent to the user. This would
> > > > remove heap pressure. Also it could resolve our long-standing issue
> > with
> > > > FairAffinityFunction when tow caches with the same affinity
> > configuration
> > > > are not co-located when started on different topology versions.
> > > >
> > > > 2) Employ segment-extent based approach instead of file-per-partition
> > > > - Every object (cache, index) reside in dedicated segment
> > > > - Segment consists of extents (minimal allocation units)
> > > > - Extents are allocated and deallocated as needed
> > > > - *Ignite specific*: particular extent can be used by only one
> > partition
> > > > - Segments may be located in any number of data files we find
> > convenient
> > > > With this approach "too many fsyncs" problem goes away automatically.
> > At
> > > > the same time it would be possible to implement efficient rebalance
> > still
> > > > as partition data will be split across moderate number of extents,
> not
> > > > chaotically.
> > > >
> > > > Once we have p.1 and p.2 ready cache groups could be removed,
> couldn't
> > > > they?
> > > >
> > > > Vladimir.
> > > >
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-11 Thread Vladimir Ozerov
Anton,

I do not see the point. What is the problem with creation or removal of
real cache?

On Wed, Apr 11, 2018 at 1:05 PM, Anton Vinogradov  wrote:

> Vova,
>
> Cache groups are very useful.
>
> For example, you can develop multi-tenant applications using cache groups
> as a templates.
> In case you have some cache groups, eg. Users, Loans, Deposits, you can
> keep records for Organisation_A, Organisation_B and Organisation_C at same
> data sctuctures, but logically separated.
> Addition/Removal of orgatisation will not cause creation or removal of real
> caches.
>
> ASAIK, you can use GridSecurity [1] over caches inside cache groups, and
> gain secured multi-tenant environment as a result.
>
> Can you propose better solution without cache groups usage?
>
> [1] https://docs.gridgain.com/docs/security-concepts
>
> 2018-04-11 0:24 GMT+03:00 Denis Magda :
>
> > Vladimir,
> >
> > - Data size per-cache
> >
> >
> > Could you elaborate how the data size per-cache/table task will be
> > addressed with proposed architecture? Are you going to store data of a
> > specific cache in dedicated pages/segments? What's about index size?
> >
> > --
> > Denis
> >
> > On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Dima,
> > >
> > > 1) Easy to understand for users
> > > AI 2.x: cluster -> cache group -> cache -> table
> > > AI 3.x: cluster -> cache(==table)
> > >
> > > 2) Fine grained cache management
> > > - MVCC on/off per-cache
> > > - WAL mode on/off per-cache
> > > - Data size per-cache
> > >
> > > 3) Performance:
> > > - Efficient scans are not possible with cache groups
> > > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> > >
> > > "Huge refactoring" is not precise estimate. Let's think on how to do
> that
> > > instead of how not to do :-)
> > >
> > > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> > dsetrak...@apache.org
> > > >
> > > wrote:
> > >
> > > > Vladimir, sounds like a huge refactoring. Other than "cache groups
> are
> > > > confusing", are we solving any other big issues with the new proposed
> > > > approach?
> > > >
> > > > (every time we try to refactor rebalancing, I get goose bumps)
> > > >
> > > > D.
> > > >
> > > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > Cache groups were implemented for a sole purpose - to hide internal
> > > > > inefficiencies. Namely (add more if I missed something):
> > > > > 1) Excessive heap usage for affinity/partition data
> > > > > 2) Too much data files as we employ file-per-partition approach.
> > > > >
> > > > > These problems were resolved, but now cache groups are a great
> source
> > > of
> > > > > confusion both for users and us - hard to understand, no way to
> > > configure
> > > > > it in deterministic way. Should we resolve mentioned performance
> > issues
> > > > we
> > > > > would never had cache groups. I propose to think we would it take
> for
> > > us
> > > > to
> > > > > get rid of cache groups.
> > > > >
> > > > > Please provide your inputs to suggestions below.
> > > > >
> > > > > 1) "Merge" partition data from different caches
> > > > > Consider that we start a new cache with the same affinity
> > configuration
> > > > > (cache mode, partition number, affinity function) as some of
> already
> > > > > existing caches, Is it possible to re-use partition distribution
> and
> > > > > history of existing cache for a new cache? Think of it as a kind of
> > > > > automatic cache grouping which is transparent to the user. This
> would
> > > > > remove heap pressure. Also it could resolve our long-standing issue
> > > with
> > > > > FairAffinityFunction when tow caches with the same affinity
> > > configuration
> > > > > are not co-located when started on different topology versions.
> > > > >
> > > > > 2) Employ segment-extent based approach instead of
> file-per-partition
> > > > > - Every object (cache, index) reside in dedicated segment
> > > > > - Segment consists of extents (minimal allocation units)
> > > > > - Extents are allocated and deallocated as needed
> > > > > - *Ignite specific*: particular extent can be used by only one
> > > partition
> > > > > - Segments may be located in any number of data files we find
> > > convenient
> > > > > With this approach "too many fsyncs" problem goes away
> automatically.
> > > At
> > > > > the same time it would be possible to implement efficient rebalance
> > > still
> > > > > as partition data will be split across moderate number of extents,
> > not
> > > > > chaotically.
> > > > >
> > > > > Once we have p.1 and p.2 ready cache groups could be removed,
> > couldn't
> > > > > they?
> > > > >
> > > > > Vladimir.
> > > > >
> > > >
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-11 Thread Vladimir Ozerov
Dmitry,

If you do this, why would you need cache groups at all?

On Tue, Apr 10, 2018 at 1:58 PM, Dmitry Pavlov 
wrote:

> Hi Vladimir,
>
> We can solve "too many fsyncs" or 'too many small files' by placing several
> partitions of cache group in one file.
>
> We don't need to get rid from cache groups in this case.
>
> It is not trivial task, but it is doable. We need to create simplest FS for
> paritition chunks inside one file.
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 10 апр. 2018 г. в 12:31, Vladimir Ozerov :
>
> > Dima,
> >
> > 1) Easy to understand for users
> > AI 2.x: cluster -> cache group -> cache -> table
> > AI 3.x: cluster -> cache(==table)
> >
> > 2) Fine grained cache management
> > - MVCC on/off per-cache
> > - WAL mode on/off per-cache
> > - Data size per-cache
> >
> > 3) Performance:
> > - Efficient scans are not possible with cache groups
> > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> >
> > "Huge refactoring" is not precise estimate. Let's think on how to do that
> > instead of how not to do :-)
> >
> > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > wrote:
> >
> > > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > > confusing", are we solving any other big issues with the new proposed
> > > approach?
> > >
> > > (every time we try to refactor rebalancing, I get goose bumps)
> > >
> > > D.
> > >
> > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > Cache groups were implemented for a sole purpose - to hide internal
> > > > inefficiencies. Namely (add more if I missed something):
> > > > 1) Excessive heap usage for affinity/partition data
> > > > 2) Too much data files as we employ file-per-partition approach.
> > > >
> > > > These problems were resolved, but now cache groups are a great source
> > of
> > > > confusion both for users and us - hard to understand, no way to
> > configure
> > > > it in deterministic way. Should we resolve mentioned performance
> issues
> > > we
> > > > would never had cache groups. I propose to think we would it take for
> > us
> > > to
> > > > get rid of cache groups.
> > > >
> > > > Please provide your inputs to suggestions below.
> > > >
> > > > 1) "Merge" partition data from different caches
> > > > Consider that we start a new cache with the same affinity
> configuration
> > > > (cache mode, partition number, affinity function) as some of already
> > > > existing caches, Is it possible to re-use partition distribution and
> > > > history of existing cache for a new cache? Think of it as a kind of
> > > > automatic cache grouping which is transparent to the user. This would
> > > > remove heap pressure. Also it could resolve our long-standing issue
> > with
> > > > FairAffinityFunction when tow caches with the same affinity
> > configuration
> > > > are not co-located when started on different topology versions.
> > > >
> > > > 2) Employ segment-extent based approach instead of file-per-partition
> > > > - Every object (cache, index) reside in dedicated segment
> > > > - Segment consists of extents (minimal allocation units)
> > > > - Extents are allocated and deallocated as needed
> > > > - *Ignite specific*: particular extent can be used by only one
> > partition
> > > > - Segments may be located in any number of data files we find
> > convenient
> > > > With this approach "too many fsyncs" problem goes away automatically.
> > At
> > > > the same time it would be possible to implement efficient rebalance
> > still
> > > > as partition data will be split across moderate number of extents,
> not
> > > > chaotically.
> > > >
> > > > Once we have p.1 and p.2 ready cache groups could be removed,
> couldn't
> > > > they?
> > > >
> > > > Vladimir.
> > > >
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-11 Thread Anton Vinogradov
Vova,

Cache groups are very useful.

For example, you can develop multi-tenant applications using cache groups
as a templates.
In case you have some cache groups, eg. Users, Loans, Deposits, you can
keep records for Organisation_A, Organisation_B and Organisation_C at same
data sctuctures, but logically separated.
Addition/Removal of orgatisation will not cause creation or removal of real
caches.

ASAIK, you can use GridSecurity [1] over caches inside cache groups, and
gain secured multi-tenant environment as a result.

Can you propose better solution without cache groups usage?

[1] https://docs.gridgain.com/docs/security-concepts

2018-04-11 0:24 GMT+03:00 Denis Magda :

> Vladimir,
>
> - Data size per-cache
>
>
> Could you elaborate how the data size per-cache/table task will be
> addressed with proposed architecture? Are you going to store data of a
> specific cache in dedicated pages/segments? What's about index size?
>
> --
> Denis
>
> On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov 
> wrote:
>
> > Dima,
> >
> > 1) Easy to understand for users
> > AI 2.x: cluster -> cache group -> cache -> table
> > AI 3.x: cluster -> cache(==table)
> >
> > 2) Fine grained cache management
> > - MVCC on/off per-cache
> > - WAL mode on/off per-cache
> > - Data size per-cache
> >
> > 3) Performance:
> > - Efficient scans are not possible with cache groups
> > - Efficient destroy/DROP - O(N) now, O(1) afterwards
> >
> > "Huge refactoring" is not precise estimate. Let's think on how to do that
> > instead of how not to do :-)
> >
> > On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > wrote:
> >
> > > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > > confusing", are we solving any other big issues with the new proposed
> > > approach?
> > >
> > > (every time we try to refactor rebalancing, I get goose bumps)
> > >
> > > D.
> > >
> > > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > Cache groups were implemented for a sole purpose - to hide internal
> > > > inefficiencies. Namely (add more if I missed something):
> > > > 1) Excessive heap usage for affinity/partition data
> > > > 2) Too much data files as we employ file-per-partition approach.
> > > >
> > > > These problems were resolved, but now cache groups are a great source
> > of
> > > > confusion both for users and us - hard to understand, no way to
> > configure
> > > > it in deterministic way. Should we resolve mentioned performance
> issues
> > > we
> > > > would never had cache groups. I propose to think we would it take for
> > us
> > > to
> > > > get rid of cache groups.
> > > >
> > > > Please provide your inputs to suggestions below.
> > > >
> > > > 1) "Merge" partition data from different caches
> > > > Consider that we start a new cache with the same affinity
> configuration
> > > > (cache mode, partition number, affinity function) as some of already
> > > > existing caches, Is it possible to re-use partition distribution and
> > > > history of existing cache for a new cache? Think of it as a kind of
> > > > automatic cache grouping which is transparent to the user. This would
> > > > remove heap pressure. Also it could resolve our long-standing issue
> > with
> > > > FairAffinityFunction when tow caches with the same affinity
> > configuration
> > > > are not co-located when started on different topology versions.
> > > >
> > > > 2) Employ segment-extent based approach instead of file-per-partition
> > > > - Every object (cache, index) reside in dedicated segment
> > > > - Segment consists of extents (minimal allocation units)
> > > > - Extents are allocated and deallocated as needed
> > > > - *Ignite specific*: particular extent can be used by only one
> > partition
> > > > - Segments may be located in any number of data files we find
> > convenient
> > > > With this approach "too many fsyncs" problem goes away automatically.
> > At
> > > > the same time it would be possible to implement efficient rebalance
> > still
> > > > as partition data will be split across moderate number of extents,
> not
> > > > chaotically.
> > > >
> > > > Once we have p.1 and p.2 ready cache groups could be removed,
> couldn't
> > > > they?
> > > >
> > > > Vladimir.
> > > >
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-10 Thread Denis Magda
Vladimir,

- Data size per-cache


Could you elaborate how the data size per-cache/table task will be
addressed with proposed architecture? Are you going to store data of a
specific cache in dedicated pages/segments? What's about index size?

--
Denis

On Tue, Apr 10, 2018 at 2:31 AM, Vladimir Ozerov 
wrote:

> Dima,
>
> 1) Easy to understand for users
> AI 2.x: cluster -> cache group -> cache -> table
> AI 3.x: cluster -> cache(==table)
>
> 2) Fine grained cache management
> - MVCC on/off per-cache
> - WAL mode on/off per-cache
> - Data size per-cache
>
> 3) Performance:
> - Efficient scans are not possible with cache groups
> - Efficient destroy/DROP - O(N) now, O(1) afterwards
>
> "Huge refactoring" is not precise estimate. Let's think on how to do that
> instead of how not to do :-)
>
> On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan  >
> wrote:
>
> > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > confusing", are we solving any other big issues with the new proposed
> > approach?
> >
> > (every time we try to refactor rebalancing, I get goose bumps)
> >
> > D.
> >
> > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Igniters,
> > >
> > > Cache groups were implemented for a sole purpose - to hide internal
> > > inefficiencies. Namely (add more if I missed something):
> > > 1) Excessive heap usage for affinity/partition data
> > > 2) Too much data files as we employ file-per-partition approach.
> > >
> > > These problems were resolved, but now cache groups are a great source
> of
> > > confusion both for users and us - hard to understand, no way to
> configure
> > > it in deterministic way. Should we resolve mentioned performance issues
> > we
> > > would never had cache groups. I propose to think we would it take for
> us
> > to
> > > get rid of cache groups.
> > >
> > > Please provide your inputs to suggestions below.
> > >
> > > 1) "Merge" partition data from different caches
> > > Consider that we start a new cache with the same affinity configuration
> > > (cache mode, partition number, affinity function) as some of already
> > > existing caches, Is it possible to re-use partition distribution and
> > > history of existing cache for a new cache? Think of it as a kind of
> > > automatic cache grouping which is transparent to the user. This would
> > > remove heap pressure. Also it could resolve our long-standing issue
> with
> > > FairAffinityFunction when tow caches with the same affinity
> configuration
> > > are not co-located when started on different topology versions.
> > >
> > > 2) Employ segment-extent based approach instead of file-per-partition
> > > - Every object (cache, index) reside in dedicated segment
> > > - Segment consists of extents (minimal allocation units)
> > > - Extents are allocated and deallocated as needed
> > > - *Ignite specific*: particular extent can be used by only one
> partition
> > > - Segments may be located in any number of data files we find
> convenient
> > > With this approach "too many fsyncs" problem goes away automatically.
> At
> > > the same time it would be possible to implement efficient rebalance
> still
> > > as partition data will be split across moderate number of extents, not
> > > chaotically.
> > >
> > > Once we have p.1 and p.2 ready cache groups could be removed, couldn't
> > > they?
> > >
> > > Vladimir.
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-10 Thread Dmitry Pavlov
Hi Vladimir,

We can solve "too many fsyncs" or 'too many small files' by placing several
partitions of cache group in one file.

We don't need to get rid from cache groups in this case.

It is not trivial task, but it is doable. We need to create simplest FS for
paritition chunks inside one file.

Sincerely,
Dmitriy Pavlov

вт, 10 апр. 2018 г. в 12:31, Vladimir Ozerov :

> Dima,
>
> 1) Easy to understand for users
> AI 2.x: cluster -> cache group -> cache -> table
> AI 3.x: cluster -> cache(==table)
>
> 2) Fine grained cache management
> - MVCC on/off per-cache
> - WAL mode on/off per-cache
> - Data size per-cache
>
> 3) Performance:
> - Efficient scans are not possible with cache groups
> - Efficient destroy/DROP - O(N) now, O(1) afterwards
>
> "Huge refactoring" is not precise estimate. Let's think on how to do that
> instead of how not to do :-)
>
> On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan  >
> wrote:
>
> > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > confusing", are we solving any other big issues with the new proposed
> > approach?
> >
> > (every time we try to refactor rebalancing, I get goose bumps)
> >
> > D.
> >
> > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Igniters,
> > >
> > > Cache groups were implemented for a sole purpose - to hide internal
> > > inefficiencies. Namely (add more if I missed something):
> > > 1) Excessive heap usage for affinity/partition data
> > > 2) Too much data files as we employ file-per-partition approach.
> > >
> > > These problems were resolved, but now cache groups are a great source
> of
> > > confusion both for users and us - hard to understand, no way to
> configure
> > > it in deterministic way. Should we resolve mentioned performance issues
> > we
> > > would never had cache groups. I propose to think we would it take for
> us
> > to
> > > get rid of cache groups.
> > >
> > > Please provide your inputs to suggestions below.
> > >
> > > 1) "Merge" partition data from different caches
> > > Consider that we start a new cache with the same affinity configuration
> > > (cache mode, partition number, affinity function) as some of already
> > > existing caches, Is it possible to re-use partition distribution and
> > > history of existing cache for a new cache? Think of it as a kind of
> > > automatic cache grouping which is transparent to the user. This would
> > > remove heap pressure. Also it could resolve our long-standing issue
> with
> > > FairAffinityFunction when tow caches with the same affinity
> configuration
> > > are not co-located when started on different topology versions.
> > >
> > > 2) Employ segment-extent based approach instead of file-per-partition
> > > - Every object (cache, index) reside in dedicated segment
> > > - Segment consists of extents (minimal allocation units)
> > > - Extents are allocated and deallocated as needed
> > > - *Ignite specific*: particular extent can be used by only one
> partition
> > > - Segments may be located in any number of data files we find
> convenient
> > > With this approach "too many fsyncs" problem goes away automatically.
> At
> > > the same time it would be possible to implement efficient rebalance
> still
> > > as partition data will be split across moderate number of extents, not
> > > chaotically.
> > >
> > > Once we have p.1 and p.2 ready cache groups could be removed, couldn't
> > > they?
> > >
> > > Vladimir.
> > >
> >
>


Re: Remove cache groups in AI 3.0

2018-04-10 Thread Vladimir Ozerov
Dima,

1) Easy to understand for users
AI 2.x: cluster -> cache group -> cache -> table
AI 3.x: cluster -> cache(==table)

2) Fine grained cache management
- MVCC on/off per-cache
- WAL mode on/off per-cache
- Data size per-cache

3) Performance:
- Efficient scans are not possible with cache groups
- Efficient destroy/DROP - O(N) now, O(1) afterwards

"Huge refactoring" is not precise estimate. Let's think on how to do that
instead of how not to do :-)

On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan 
wrote:

> Vladimir, sounds like a huge refactoring. Other than "cache groups are
> confusing", are we solving any other big issues with the new proposed
> approach?
>
> (every time we try to refactor rebalancing, I get goose bumps)
>
> D.
>
> On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov 
> wrote:
>
> > Igniters,
> >
> > Cache groups were implemented for a sole purpose - to hide internal
> > inefficiencies. Namely (add more if I missed something):
> > 1) Excessive heap usage for affinity/partition data
> > 2) Too much data files as we employ file-per-partition approach.
> >
> > These problems were resolved, but now cache groups are a great source of
> > confusion both for users and us - hard to understand, no way to configure
> > it in deterministic way. Should we resolve mentioned performance issues
> we
> > would never had cache groups. I propose to think we would it take for us
> to
> > get rid of cache groups.
> >
> > Please provide your inputs to suggestions below.
> >
> > 1) "Merge" partition data from different caches
> > Consider that we start a new cache with the same affinity configuration
> > (cache mode, partition number, affinity function) as some of already
> > existing caches, Is it possible to re-use partition distribution and
> > history of existing cache for a new cache? Think of it as a kind of
> > automatic cache grouping which is transparent to the user. This would
> > remove heap pressure. Also it could resolve our long-standing issue with
> > FairAffinityFunction when tow caches with the same affinity configuration
> > are not co-located when started on different topology versions.
> >
> > 2) Employ segment-extent based approach instead of file-per-partition
> > - Every object (cache, index) reside in dedicated segment
> > - Segment consists of extents (minimal allocation units)
> > - Extents are allocated and deallocated as needed
> > - *Ignite specific*: particular extent can be used by only one partition
> > - Segments may be located in any number of data files we find convenient
> > With this approach "too many fsyncs" problem goes away automatically. At
> > the same time it would be possible to implement efficient rebalance still
> > as partition data will be split across moderate number of extents, not
> > chaotically.
> >
> > Once we have p.1 and p.2 ready cache groups could be removed, couldn't
> > they?
> >
> > Vladimir.
> >
>


Re: Remove cache groups in AI 3.0

2018-04-10 Thread Dmitriy Setrakyan
Vladimir, sounds like a huge refactoring. Other than "cache groups are
confusing", are we solving any other big issues with the new proposed
approach?

(every time we try to refactor rebalancing, I get goose bumps)

D.

On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov 
wrote:

> Igniters,
>
> Cache groups were implemented for a sole purpose - to hide internal
> inefficiencies. Namely (add more if I missed something):
> 1) Excessive heap usage for affinity/partition data
> 2) Too much data files as we employ file-per-partition approach.
>
> These problems were resolved, but now cache groups are a great source of
> confusion both for users and us - hard to understand, no way to configure
> it in deterministic way. Should we resolve mentioned performance issues we
> would never had cache groups. I propose to think we would it take for us to
> get rid of cache groups.
>
> Please provide your inputs to suggestions below.
>
> 1) "Merge" partition data from different caches
> Consider that we start a new cache with the same affinity configuration
> (cache mode, partition number, affinity function) as some of already
> existing caches, Is it possible to re-use partition distribution and
> history of existing cache for a new cache? Think of it as a kind of
> automatic cache grouping which is transparent to the user. This would
> remove heap pressure. Also it could resolve our long-standing issue with
> FairAffinityFunction when tow caches with the same affinity configuration
> are not co-located when started on different topology versions.
>
> 2) Employ segment-extent based approach instead of file-per-partition
> - Every object (cache, index) reside in dedicated segment
> - Segment consists of extents (minimal allocation units)
> - Extents are allocated and deallocated as needed
> - *Ignite specific*: particular extent can be used by only one partition
> - Segments may be located in any number of data files we find convenient
> With this approach "too many fsyncs" problem goes away automatically. At
> the same time it would be possible to implement efficient rebalance still
> as partition data will be split across moderate number of extents, not
> chaotically.
>
> Once we have p.1 and p.2 ready cache groups could be removed, couldn't
> they?
>
> Vladimir.
>