Re: Re[2]: NUMA aware allocator, PR review request

2021-12-13 Thread Ivan Daschinsky
Thanks all for review, merged into master.

пн, 6 дек. 2021 г. в 23:26, Ivan Daschinsky :

> You are not wrong, it is built from source, every night. And every TC run.
> I don't understand why numa allocator cannot be treated the same. Moreover,
> it is built using maven, with maven plugin and just needs gcc and
> libnuma-dev. All of theese are already on TC agents and build are ready. I
> didn't see any difficulties in building.
>
> But I still don't understand how to organize Anton's proposal. It seems
> that if we follow this way, we cannot release allocator in 2.13
>
> пн, 6 дек. 2021 г., 23:18 Ilya Kasnacheev :
>
>> Hello!
>>
>> Maybe I am wrong, but ODBC installer is built from source and may be
>> improved from release to release.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 6 дек. 2021 г. в 20:41, Ivan Daschinsky :
>>
>> > Only one reason -- nowadays amost all hardware platforms uses NUMA
>> >
>> > Another reason -- there is no any release process of extensions.
>> >
>> >
>> > BTW, apache ignite release is shipped with odbc binary installer for
>> > windows. And nobody complains about it.
>> >
>> > But may be listen to others?
>> >
>> > пн, 6 дек. 2021 г., 19:49 Anton Vinogradov :
>> >
>> > > Any reason to release the same cpp sources for each release?
>> > > Any reason to increase the requirements amount for each new release?
>> > > Any reason to increase release complexity and duration?
>> > > All answers are "definitely no"
>> > >
>> > > What we should do is to release cpp part once and use it as a
>> dependency.
>> > > Extensions are a good location.
>> > >
>> > > On Mon, Dec 6, 2021 at 3:11 PM Zhenya Stanilovsky
>> > >  wrote:
>> > >
>> > > >
>> > > >
>> > > > +1 with Ivan, let`s store it in core product just because it looks
>> > > > like inalienable functionality and release cycle of extensions a
>> little
>> > > bit
>> > > > different.
>> > > >
>> > > >
>> > > >
>> > > > >Anton, I disagree.
>> > > > >
>> > > > >1. This should be released with main distro.
>> > > > >2. This should not be abandoned.
>> > > > >3. There is not any release process in ignite-extensions.
>> > > > >4. Everything is working now and working good.
>> > > > >
>> > > > >
>> > > > >So lets do not do this :)
>> > > > >
>> > > > >пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
>> > > > >
>> > > > >> Let's move all GCC-related parts to ignite-extensions, release,
>> and
>> > > use
>> > > > >> them as a maven dependency.
>> > > > >>
>> > > > >>
>> > > > >> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky <
>> > ivanda...@gmail.com
>> > > >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Ok, TC suite is ready [1].
>> > > > >> > If there is no objections, I will merge it soon.
>> > > > >> >
>> > > > >> > Possible concerns -- now it is required to install
>> > build-essentials
>> > > > and
>> > > > >> > libnuma-dev in order to build ignite on 64 bit linux.
>> > > > >> > I suppose that this is not a big deal, but maybe someone will
>> > > > contradict?
>> > > > >> >
>> > > > >> >
>> > > > >> > [1] --
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
>> > > > >> >
>> > > > >> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky <
>> ivanda...@gmail.com
>> > > >:
>> > > > >> >
>> > > > >> > > >> Our runs show about 7-10 speedup,
>> > > > >> > > Sorry, typo 7-10% speedup
>> > > > >> > >
>> > > > >> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky <
>> > ivanda...@gmail.com
>> > > > >:
>> > > > >> > >
>> > > > >> > >> Andrey, thanks!
>> > > > >> > >>
>> > > > >> > >> This allocator can be tested on every NUMA system.
>> > > > >> > >> Our runs show about 7-10 speedup, if we use allocattor with
>> > > > >> interleaved
>> > > > >> > >> strategy + -XX:+UseNUMA.
>> > > > >> > >> But unfortunately our yardstick benches doesn't use offheap
>> a
>> > > lot,
>> > > > >> > >> usually above one Gb.
>> > > > >> > >> We trying to do more benches with real data and share them,
>> > > > possibly
>> > > > >> in
>> > > > >> > >> meetup.
>> > > > >> > >>
>> > > > >> > >> AFAIK, GG lab servers are two-sockets machines, aren't
>> they? So
>> > > it
>> > > > is
>> > > > >> > >> worth to run benches with a lot data on them, using
>> > > > >> > >> allocator with interleaved strategy (you can skip specifying
>> > numa
>> > > > >> nodes,
>> > > > >> > >> by default it will use all available) and use -XX:+UseNUMA
>> jvm
>> > > > >> > >> flag.
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
>> > > > >> >  andrey.mashen...@gmail.com
>> > > > >> > >> >:
>> > > > >> > >>
>> > > > >> > >>> Ivan,
>> > > > >> > >>>
>> > > > >> > >>> Great job. PR looks good.
>> > > > >> > >>>
>> > > > >> > >>> This allocator in interleaved mode and passing
>> `-XX:+UseNUMA`
>> > > > flag to
>> > > > >> > jvm
>> > > > >> > >>> > show promising results on yardstick benches.
>> Technically, G1
>> > 

Re: Re[2]: NUMA aware allocator, PR review request

2021-12-06 Thread Ivan Daschinsky
You are not wrong, it is built from source, every night. And every TC run.
I don't understand why numa allocator cannot be treated the same. Moreover,
it is built using maven, with maven plugin and just needs gcc and
libnuma-dev. All of theese are already on TC agents and build are ready. I
didn't see any difficulties in building.

But I still don't understand how to organize Anton's proposal. It seems
that if we follow this way, we cannot release allocator in 2.13

пн, 6 дек. 2021 г., 23:18 Ilya Kasnacheev :

> Hello!
>
> Maybe I am wrong, but ODBC installer is built from source and may be
> improved from release to release.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 6 дек. 2021 г. в 20:41, Ivan Daschinsky :
>
> > Only one reason -- nowadays amost all hardware platforms uses NUMA
> >
> > Another reason -- there is no any release process of extensions.
> >
> >
> > BTW, apache ignite release is shipped with odbc binary installer for
> > windows. And nobody complains about it.
> >
> > But may be listen to others?
> >
> > пн, 6 дек. 2021 г., 19:49 Anton Vinogradov :
> >
> > > Any reason to release the same cpp sources for each release?
> > > Any reason to increase the requirements amount for each new release?
> > > Any reason to increase release complexity and duration?
> > > All answers are "definitely no"
> > >
> > > What we should do is to release cpp part once and use it as a
> dependency.
> > > Extensions are a good location.
> > >
> > > On Mon, Dec 6, 2021 at 3:11 PM Zhenya Stanilovsky
> > >  wrote:
> > >
> > > >
> > > >
> > > > +1 with Ivan, let`s store it in core product just because it looks
> > > > like inalienable functionality and release cycle of extensions a
> little
> > > bit
> > > > different.
> > > >
> > > >
> > > >
> > > > >Anton, I disagree.
> > > > >
> > > > >1. This should be released with main distro.
> > > > >2. This should not be abandoned.
> > > > >3. There is not any release process in ignite-extensions.
> > > > >4. Everything is working now and working good.
> > > > >
> > > > >
> > > > >So lets do not do this :)
> > > > >
> > > > >пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
> > > > >
> > > > >> Let's move all GCC-related parts to ignite-extensions, release,
> and
> > > use
> > > > >> them as a maven dependency.
> > > > >>
> > > > >>
> > > > >> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky <
> > ivanda...@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Ok, TC suite is ready [1].
> > > > >> > If there is no objections, I will merge it soon.
> > > > >> >
> > > > >> > Possible concerns -- now it is required to install
> > build-essentials
> > > > and
> > > > >> > libnuma-dev in order to build ignite on 64 bit linux.
> > > > >> > I suppose that this is not a big deal, but maybe someone will
> > > > contradict?
> > > > >> >
> > > > >> >
> > > > >> > [1] --
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
> > > > >> >
> > > > >> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky <
> ivanda...@gmail.com
> > > >:
> > > > >> >
> > > > >> > > >> Our runs show about 7-10 speedup,
> > > > >> > > Sorry, typo 7-10% speedup
> > > > >> > >
> > > > >> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky <
> > ivanda...@gmail.com
> > > > >:
> > > > >> > >
> > > > >> > >> Andrey, thanks!
> > > > >> > >>
> > > > >> > >> This allocator can be tested on every NUMA system.
> > > > >> > >> Our runs show about 7-10 speedup, if we use allocattor with
> > > > >> interleaved
> > > > >> > >> strategy + -XX:+UseNUMA.
> > > > >> > >> But unfortunately our yardstick benches doesn't use offheap a
> > > lot,
> > > > >> > >> usually above one Gb.
> > > > >> > >> We trying to do more benches with real data and share them,
> > > > possibly
> > > > >> in
> > > > >> > >> meetup.
> > > > >> > >>
> > > > >> > >> AFAIK, GG lab servers are two-sockets machines, aren't they?
> So
> > > it
> > > > is
> > > > >> > >> worth to run benches with a lot data on them, using
> > > > >> > >> allocator with interleaved strategy (you can skip specifying
> > numa
> > > > >> nodes,
> > > > >> > >> by default it will use all available) and use -XX:+UseNUMA
> jvm
> > > > >> > >> flag.
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> > > > >> >  andrey.mashen...@gmail.com
> > > > >> > >> >:
> > > > >> > >>
> > > > >> > >>> Ivan,
> > > > >> > >>>
> > > > >> > >>> Great job. PR looks good.
> > > > >> > >>>
> > > > >> > >>> This allocator in interleaved mode and passing
> `-XX:+UseNUMA`
> > > > flag to
> > > > >> > jvm
> > > > >> > >>> > show promising results on yardstick benches. Technically,
> G1
> > > is
> > > > >> not a
> > > > >> > >>> numa
> > > > >> > >>> > aware collector for java versions less than 14, but
> > allocation
> > > > of
> > > > >> > heap
> > > > >> > >>> in
> > > > >> > >>> > interleaved mode shows good results even on java 11.
> > > 

Re: Re[2]: NUMA aware allocator, PR review request

2021-12-06 Thread Ilya Kasnacheev
Hello!

Maybe I am wrong, but ODBC installer is built from source and may be
improved from release to release.

Regards,
-- 
Ilya Kasnacheev


пн, 6 дек. 2021 г. в 20:41, Ivan Daschinsky :

> Only one reason -- nowadays amost all hardware platforms uses NUMA
>
> Another reason -- there is no any release process of extensions.
>
>
> BTW, apache ignite release is shipped with odbc binary installer for
> windows. And nobody complains about it.
>
> But may be listen to others?
>
> пн, 6 дек. 2021 г., 19:49 Anton Vinogradov :
>
> > Any reason to release the same cpp sources for each release?
> > Any reason to increase the requirements amount for each new release?
> > Any reason to increase release complexity and duration?
> > All answers are "definitely no"
> >
> > What we should do is to release cpp part once and use it as a dependency.
> > Extensions are a good location.
> >
> > On Mon, Dec 6, 2021 at 3:11 PM Zhenya Stanilovsky
> >  wrote:
> >
> > >
> > >
> > > +1 with Ivan, let`s store it in core product just because it looks
> > > like inalienable functionality and release cycle of extensions a little
> > bit
> > > different.
> > >
> > >
> > >
> > > >Anton, I disagree.
> > > >
> > > >1. This should be released with main distro.
> > > >2. This should not be abandoned.
> > > >3. There is not any release process in ignite-extensions.
> > > >4. Everything is working now and working good.
> > > >
> > > >
> > > >So lets do not do this :)
> > > >
> > > >пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
> > > >
> > > >> Let's move all GCC-related parts to ignite-extensions, release, and
> > use
> > > >> them as a maven dependency.
> > > >>
> > > >>
> > > >> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky <
> ivanda...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Ok, TC suite is ready [1].
> > > >> > If there is no objections, I will merge it soon.
> > > >> >
> > > >> > Possible concerns -- now it is required to install
> build-essentials
> > > and
> > > >> > libnuma-dev in order to build ignite on 64 bit linux.
> > > >> > I suppose that this is not a big deal, but maybe someone will
> > > contradict?
> > > >> >
> > > >> >
> > > >> > [1] --
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
> > > >> >
> > > >> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky < ivanda...@gmail.com
> > >:
> > > >> >
> > > >> > > >> Our runs show about 7-10 speedup,
> > > >> > > Sorry, typo 7-10% speedup
> > > >> > >
> > > >> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky <
> ivanda...@gmail.com
> > > >:
> > > >> > >
> > > >> > >> Andrey, thanks!
> > > >> > >>
> > > >> > >> This allocator can be tested on every NUMA system.
> > > >> > >> Our runs show about 7-10 speedup, if we use allocattor with
> > > >> interleaved
> > > >> > >> strategy + -XX:+UseNUMA.
> > > >> > >> But unfortunately our yardstick benches doesn't use offheap a
> > lot,
> > > >> > >> usually above one Gb.
> > > >> > >> We trying to do more benches with real data and share them,
> > > possibly
> > > >> in
> > > >> > >> meetup.
> > > >> > >>
> > > >> > >> AFAIK, GG lab servers are two-sockets machines, aren't they? So
> > it
> > > is
> > > >> > >> worth to run benches with a lot data on them, using
> > > >> > >> allocator with interleaved strategy (you can skip specifying
> numa
> > > >> nodes,
> > > >> > >> by default it will use all available) and use -XX:+UseNUMA jvm
> > > >> > >> flag.
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> > > >> >  andrey.mashen...@gmail.com
> > > >> > >> >:
> > > >> > >>
> > > >> > >>> Ivan,
> > > >> > >>>
> > > >> > >>> Great job. PR looks good.
> > > >> > >>>
> > > >> > >>> This allocator in interleaved mode and passing `-XX:+UseNUMA`
> > > flag to
> > > >> > jvm
> > > >> > >>> > show promising results on yardstick benches. Technically, G1
> > is
> > > >> not a
> > > >> > >>> numa
> > > >> > >>> > aware collector for java versions less than 14, but
> allocation
> > > of
> > > >> > heap
> > > >> > >>> in
> > > >> > >>> > interleaved mode shows good results even on java 11.
> > > >> > >>>
> > > >> > >>> Can you share benchmark results?
> > > >> > >>> I'm not sure I'll have an Optane on my notebook in a
> reasonable
> > > time
> > > >> ;)
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky <
> > > ivanda...@gmail.com
> > > >> >
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>> > Semyon D. and Maks T. -- thanks a lot for review.
> > > >> > >>> >
> > > >> > >>> > BTW, Igniters, I will appreciate all opinions and feedback.
> > > >> > >>> >
> > > >> > >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky <
> > > >>  ivanda...@apache.org
> > > >> > >:
> > > >> > >>> >
> > > >> > >>> > > Hi, igniters!
> > > >> > >>> > >
> > > >> > >>> > > There is not a big secret that nowadays NUMA is quite
> common
> > > in
> > > >> > >>> > > multiprocessor 

Re: Re[2]: NUMA aware allocator, PR review request

2021-12-06 Thread Ivan Daschinsky
Only one reason -- nowadays amost all hardware platforms uses NUMA

Another reason -- there is no any release process of extensions.


BTW, apache ignite release is shipped with odbc binary installer for
windows. And nobody complains about it.

But may be listen to others?

пн, 6 дек. 2021 г., 19:49 Anton Vinogradov :

> Any reason to release the same cpp sources for each release?
> Any reason to increase the requirements amount for each new release?
> Any reason to increase release complexity and duration?
> All answers are "definitely no"
>
> What we should do is to release cpp part once and use it as a dependency.
> Extensions are a good location.
>
> On Mon, Dec 6, 2021 at 3:11 PM Zhenya Stanilovsky
>  wrote:
>
> >
> >
> > +1 with Ivan, let`s store it in core product just because it looks
> > like inalienable functionality and release cycle of extensions a little
> bit
> > different.
> >
> >
> >
> > >Anton, I disagree.
> > >
> > >1. This should be released with main distro.
> > >2. This should not be abandoned.
> > >3. There is not any release process in ignite-extensions.
> > >4. Everything is working now and working good.
> > >
> > >
> > >So lets do not do this :)
> > >
> > >пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
> > >
> > >> Let's move all GCC-related parts to ignite-extensions, release, and
> use
> > >> them as a maven dependency.
> > >>
> > >>
> > >> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky < ivanda...@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Ok, TC suite is ready [1].
> > >> > If there is no objections, I will merge it soon.
> > >> >
> > >> > Possible concerns -- now it is required to install build-essentials
> > and
> > >> > libnuma-dev in order to build ignite on 64 bit linux.
> > >> > I suppose that this is not a big deal, but maybe someone will
> > contradict?
> > >> >
> > >> >
> > >> > [1] --
> > >> >
> > >> >
> > >>
> >
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
> > >> >
> > >> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky < ivanda...@gmail.com
> >:
> > >> >
> > >> > > >> Our runs show about 7-10 speedup,
> > >> > > Sorry, typo 7-10% speedup
> > >> > >
> > >> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky < ivanda...@gmail.com
> > >:
> > >> > >
> > >> > >> Andrey, thanks!
> > >> > >>
> > >> > >> This allocator can be tested on every NUMA system.
> > >> > >> Our runs show about 7-10 speedup, if we use allocattor with
> > >> interleaved
> > >> > >> strategy + -XX:+UseNUMA.
> > >> > >> But unfortunately our yardstick benches doesn't use offheap a
> lot,
> > >> > >> usually above one Gb.
> > >> > >> We trying to do more benches with real data and share them,
> > possibly
> > >> in
> > >> > >> meetup.
> > >> > >>
> > >> > >> AFAIK, GG lab servers are two-sockets machines, aren't they? So
> it
> > is
> > >> > >> worth to run benches with a lot data on them, using
> > >> > >> allocator with interleaved strategy (you can skip specifying numa
> > >> nodes,
> > >> > >> by default it will use all available) and use -XX:+UseNUMA jvm
> > >> > >> flag.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> > >> >  andrey.mashen...@gmail.com
> > >> > >> >:
> > >> > >>
> > >> > >>> Ivan,
> > >> > >>>
> > >> > >>> Great job. PR looks good.
> > >> > >>>
> > >> > >>> This allocator in interleaved mode and passing `-XX:+UseNUMA`
> > flag to
> > >> > jvm
> > >> > >>> > show promising results on yardstick benches. Technically, G1
> is
> > >> not a
> > >> > >>> numa
> > >> > >>> > aware collector for java versions less than 14, but allocation
> > of
> > >> > heap
> > >> > >>> in
> > >> > >>> > interleaved mode shows good results even on java 11.
> > >> > >>>
> > >> > >>> Can you share benchmark results?
> > >> > >>> I'm not sure I'll have an Optane on my notebook in a reasonable
> > time
> > >> ;)
> > >> > >>>
> > >> > >>>
> > >> > >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky <
> > ivanda...@gmail.com
> > >> >
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > Semyon D. and Maks T. -- thanks a lot for review.
> > >> > >>> >
> > >> > >>> > BTW, Igniters, I will appreciate all opinions and feedback.
> > >> > >>> >
> > >> > >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky <
> > >>  ivanda...@apache.org
> > >> > >:
> > >> > >>> >
> > >> > >>> > > Hi, igniters!
> > >> > >>> > >
> > >> > >>> > > There is not a big secret that nowadays NUMA is quite common
> > in
> > >> > >>> > > multiprocessor systems.
> > >> > >>> > > And this memory architecture should be treated in specific
> > ways.
> > >> > >>> > >
> > >> > >>> > > Support for NUMA is present in many commercial and
> open-source
> > >> > >>> products.
> > >> > >>> > >
> > >> > >>> > > I've implemented a NUMA aware allocator for Apache Ignite
> [1]
> > >> > >>> > > It is a JNI wrapper around `libnuma` and supports different
> > >> > >>> allocation
> > >> > >>> > > options.
> > >> > >>> > > I.e. interleaved, local, 

Re: Re[2]: NUMA aware allocator, PR review request

2021-12-06 Thread Anton Vinogradov
Any reason to release the same cpp sources for each release?
Any reason to increase the requirements amount for each new release?
Any reason to increase release complexity and duration?
All answers are "definitely no"

What we should do is to release cpp part once and use it as a dependency.
Extensions are a good location.

On Mon, Dec 6, 2021 at 3:11 PM Zhenya Stanilovsky
 wrote:

>
>
> +1 with Ivan, let`s store it in core product just because it looks
> like inalienable functionality and release cycle of extensions a little bit
> different.
>
>
>
> >Anton, I disagree.
> >
> >1. This should be released with main distro.
> >2. This should not be abandoned.
> >3. There is not any release process in ignite-extensions.
> >4. Everything is working now and working good.
> >
> >
> >So lets do not do this :)
> >
> >пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
> >
> >> Let's move all GCC-related parts to ignite-extensions, release, and use
> >> them as a maven dependency.
> >>
> >>
> >> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky < ivanda...@gmail.com >
> >> wrote:
> >>
> >> > Ok, TC suite is ready [1].
> >> > If there is no objections, I will merge it soon.
> >> >
> >> > Possible concerns -- now it is required to install build-essentials
> and
> >> > libnuma-dev in order to build ignite on 64 bit linux.
> >> > I suppose that this is not a big deal, but maybe someone will
> contradict?
> >> >
> >> >
> >> > [1] --
> >> >
> >> >
> >>
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
> >> >
> >> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky < ivanda...@gmail.com >:
> >> >
> >> > > >> Our runs show about 7-10 speedup,
> >> > > Sorry, typo 7-10% speedup
> >> > >
> >> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky < ivanda...@gmail.com
> >:
> >> > >
> >> > >> Andrey, thanks!
> >> > >>
> >> > >> This allocator can be tested on every NUMA system.
> >> > >> Our runs show about 7-10 speedup, if we use allocattor with
> >> interleaved
> >> > >> strategy + -XX:+UseNUMA.
> >> > >> But unfortunately our yardstick benches doesn't use offheap a lot,
> >> > >> usually above one Gb.
> >> > >> We trying to do more benches with real data and share them,
> possibly
> >> in
> >> > >> meetup.
> >> > >>
> >> > >> AFAIK, GG lab servers are two-sockets machines, aren't they? So it
> is
> >> > >> worth to run benches with a lot data on them, using
> >> > >> allocator with interleaved strategy (you can skip specifying numa
> >> nodes,
> >> > >> by default it will use all available) and use -XX:+UseNUMA jvm
> >> > >> flag.
> >> > >>
> >> > >>
> >> > >>
> >> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> >> >  andrey.mashen...@gmail.com
> >> > >> >:
> >> > >>
> >> > >>> Ivan,
> >> > >>>
> >> > >>> Great job. PR looks good.
> >> > >>>
> >> > >>> This allocator in interleaved mode and passing `-XX:+UseNUMA`
> flag to
> >> > jvm
> >> > >>> > show promising results on yardstick benches. Technically, G1 is
> >> not a
> >> > >>> numa
> >> > >>> > aware collector for java versions less than 14, but allocation
> of
> >> > heap
> >> > >>> in
> >> > >>> > interleaved mode shows good results even on java 11.
> >> > >>>
> >> > >>> Can you share benchmark results?
> >> > >>> I'm not sure I'll have an Optane on my notebook in a reasonable
> time
> >> ;)
> >> > >>>
> >> > >>>
> >> > >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky <
> ivanda...@gmail.com
> >> >
> >> > >>> wrote:
> >> > >>>
> >> > >>> > Semyon D. and Maks T. -- thanks a lot for review.
> >> > >>> >
> >> > >>> > BTW, Igniters, I will appreciate all opinions and feedback.
> >> > >>> >
> >> > >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky <
> >>  ivanda...@apache.org
> >> > >:
> >> > >>> >
> >> > >>> > > Hi, igniters!
> >> > >>> > >
> >> > >>> > > There is not a big secret that nowadays NUMA is quite common
> in
> >> > >>> > > multiprocessor systems.
> >> > >>> > > And this memory architecture should be treated in specific
> ways.
> >> > >>> > >
> >> > >>> > > Support for NUMA is present in many commercial and open-source
> >> > >>> products.
> >> > >>> > >
> >> > >>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
> >> > >>> > > It is a JNI wrapper around `libnuma` and supports different
> >> > >>> allocation
> >> > >>> > > options.
> >> > >>> > > I.e. interleaved, local, interleved_mask and so on. For more
> >> > >>> information,
> >> > >>> > > see
> >> > >>> > > [2], [3].
> >> > >>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA`
> >> flag
> >> > >>> to jvm
> >> > >>> > > show promising results on yardstick benches. Technically, G1
> is
> >> > not a
> >> > >>> > numa
> >> > >>> > > aware collector for java versions less than 14, but
> allocation of
> >> > >>> heap in
> >> > >>> > > interleaved mode shows good results even on java 11.
> >> > >>> > >
> >> > >>> > > Currently, all needed libraries and tools for building this
> >> module
> >> > >>> are
> >> > >>> > > 

Re[2]: NUMA aware allocator, PR review request

2021-12-06 Thread Zhenya Stanilovsky


+1 with Ivan, let`s store it in core product just because it looks like 
inalienable functionality and release cycle of extensions a little bit 
different.


 
>Anton, I disagree.
>
>1. This should be released with main distro.
>2. This should not be abandoned.
>3. There is not any release process in ignite-extensions.
>4. Everything is working now and working good.
>
>
>So lets do not do this :)
>
>пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov < a...@apache.org >:
> 
>> Let's move all GCC-related parts to ignite-extensions, release, and use
>> them as a maven dependency.
>>
>>
>> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky < ivanda...@gmail.com >
>> wrote:
>>
>> > Ok, TC suite is ready [1].
>> > If there is no objections, I will merge it soon.
>> >
>> > Possible concerns -- now it is required to install build-essentials and
>> > libnuma-dev in order to build ignite on 64 bit linux.
>> > I suppose that this is not a big deal, but maybe someone will contradict?
>> >
>> >
>> > [1] --
>> >
>> >
>>  
>> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
>> >
>> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky < ivanda...@gmail.com >:
>> >
>> > > >> Our runs show about 7-10 speedup,
>> > > Sorry, typo 7-10% speedup
>> > >
>> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky < ivanda...@gmail.com >:
>> > >
>> > >> Andrey, thanks!
>> > >>
>> > >> This allocator can be tested on every NUMA system.
>> > >> Our runs show about 7-10 speedup, if we use allocattor with
>> interleaved
>> > >> strategy + -XX:+UseNUMA.
>> > >> But unfortunately our yardstick benches doesn't use offheap a lot,
>> > >> usually above one Gb.
>> > >> We trying to do more benches with real data and share them, possibly
>> in
>> > >> meetup.
>> > >>
>> > >> AFAIK, GG lab servers are two-sockets machines, aren't they? So it is
>> > >> worth to run benches with a lot data on them, using
>> > >> allocator with interleaved strategy (you can skip specifying numa
>> nodes,
>> > >> by default it will use all available) and use -XX:+UseNUMA jvm
>> > >> flag.
>> > >>
>> > >>
>> > >>
>> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
>> >  andrey.mashen...@gmail.com
>> > >> >:
>> > >>
>> > >>> Ivan,
>> > >>>
>> > >>> Great job. PR looks good.
>> > >>>
>> > >>> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to
>> > jvm
>> > >>> > show promising results on yardstick benches. Technically, G1 is
>> not a
>> > >>> numa
>> > >>> > aware collector for java versions less than 14, but allocation of
>> > heap
>> > >>> in
>> > >>> > interleaved mode shows good results even on java 11.
>> > >>>
>> > >>> Can you share benchmark results?
>> > >>> I'm not sure I'll have an Optane on my notebook in a reasonable time
>> ;)
>> > >>>
>> > >>>
>> > >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky < ivanda...@gmail.com
>> >
>> > >>> wrote:
>> > >>>
>> > >>> > Semyon D. and Maks T. -- thanks a lot for review.
>> > >>> >
>> > >>> > BTW, Igniters, I will appreciate all opinions and feedback.
>> > >>> >
>> > >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky <
>>  ivanda...@apache.org
>> > >:
>> > >>> >
>> > >>> > > Hi, igniters!
>> > >>> > >
>> > >>> > > There is not a big secret that nowadays NUMA is quite common in
>> > >>> > > multiprocessor systems.
>> > >>> > > And this memory architecture should be treated in specific ways.
>> > >>> > >
>> > >>> > > Support for NUMA is present in many commercial and open-source
>> > >>> products.
>> > >>> > >
>> > >>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
>> > >>> > > It is a JNI wrapper around `libnuma` and supports different
>> > >>> allocation
>> > >>> > > options.
>> > >>> > > I.e. interleaved, local, interleved_mask and so on. For more
>> > >>> information,
>> > >>> > > see
>> > >>> > > [2], [3].
>> > >>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA`
>> flag
>> > >>> to jvm
>> > >>> > > show promising results on yardstick benches. Technically, G1 is
>> > not a
>> > >>> > numa
>> > >>> > > aware collector for java versions less than 14, but allocation of
>> > >>> heap in
>> > >>> > > interleaved mode shows good results even on java 11.
>> > >>> > >
>> > >>> > > Currently, all needed libraries and tools for building this
>> module
>> > >>> are
>> > >>> > > available on TC agents
>> > >>> > > setup of specific test suite is in progress [4]
>> > >>> > >
>> > >>> > > So I am asking for a review of my patch.
>> > >>> > >
>> > >>> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
>> > >>> > > [2] --  https://man7.org/linux/man-pages/man3/numa.3.html
>> > >>> > > [3] --  https://man7.org/linux/man-pages/man2/mbind.2.html
>> > >>> > > [4] --  https://issues.apache.org/jira/browse/IGNITE-15994
>> > >>> > >
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> > Sincerely yours, Ivan Daschinskiy
>> > >>> >
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best regards,
>> > >>> Andrey V. Mashenkov
>> > >>>
>> > >>
>> > >>
>> > >> --

Re: NUMA aware allocator, PR review request

2021-12-06 Thread Ivan Daschinsky
Anton, I disagree.

1. This should be released with main distro.
2. This should not be abandoned.
3. There is not any release process in ignite-extensions.
4. Everything is working now and working good.


So lets do not do this :)

пн, 6 дек. 2021 г. в 14:49, Anton Vinogradov :

> Let's move all GCC-related parts to ignite-extensions, release, and use
> them as a maven dependency.
>
>
> On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky 
> wrote:
>
> > Ok, TC suite is ready [1].
> > If there is no objections, I will merge it soon.
> >
> > Possible concerns -- now it is required to install build-essentials and
> > libnuma-dev in order to build ignite on 64 bit linux.
> > I suppose that this is not a big deal, but maybe someone will contradict?
> >
> >
> > [1] --
> >
> >
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
> >
> > чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky :
> >
> > > >> Our runs show about 7-10 speedup,
> > > Sorry, typo 7-10%  speedup
> > >
> > > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky :
> > >
> > >> Andrey, thanks!
> > >>
> > >> This allocator can be tested on every NUMA system.
> > >> Our runs show about 7-10 speedup, if we use allocattor with
> interleaved
> > >> strategy + -XX:+UseNUMA.
> > >> But unfortunately our yardstick benches doesn't use offheap a lot,
> > >> usually above one Gb.
> > >> We trying to do more benches with real data and share them, possibly
> in
> > >> meetup.
> > >>
> > >> AFAIK, GG lab servers are two-sockets machines, aren't they? So it is
> > >> worth to run benches with a lot data on them, using
> > >> allocator with interleaved strategy (you can skip specifying numa
> nodes,
> > >> by default it will use all available) and use -XX:+UseNUMA jvm
> > >> flag.
> > >>
> > >>
> > >>
> > >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > >> >:
> > >>
> > >>> Ivan,
> > >>>
> > >>> Great job. PR looks good.
> > >>>
> > >>> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to
> > jvm
> > >>> > show promising results on yardstick benches. Technically, G1 is
> not a
> > >>> numa
> > >>> > aware collector for java versions less than 14, but allocation of
> > heap
> > >>> in
> > >>> > interleaved mode shows good results even on java 11.
> > >>>
> > >>> Can you share benchmark results?
> > >>> I'm not sure I'll have an Optane on my notebook in a reasonable time
> ;)
> > >>>
> > >>>
> > >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky  >
> > >>> wrote:
> > >>>
> > >>> > Semyon D. and Maks T. -- thanks a lot for review.
> > >>> >
> > >>> > BTW, Igniters, I will appreciate all opinions and feedback.
> > >>> >
> > >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky <
> ivanda...@apache.org
> > >:
> > >>> >
> > >>> > > Hi, igniters!
> > >>> > >
> > >>> > > There is not a big secret that nowadays NUMA is quite common in
> > >>> > > multiprocessor systems.
> > >>> > > And this memory architecture should be treated in specific ways.
> > >>> > >
> > >>> > > Support for NUMA is present in many commercial and open-source
> > >>> products.
> > >>> > >
> > >>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
> > >>> > > It is a JNI wrapper around `libnuma` and supports different
> > >>> allocation
> > >>> > > options.
> > >>> > > I.e. interleaved, local, interleved_mask and so on. For more
> > >>> information,
> > >>> > > see
> > >>> > > [2], [3].
> > >>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA`
> flag
> > >>> to jvm
> > >>> > > show promising results on yardstick benches. Technically, G1 is
> > not a
> > >>> > numa
> > >>> > > aware collector for java versions less than 14, but allocation of
> > >>> heap in
> > >>> > > interleaved mode shows good results even on java 11.
> > >>> > >
> > >>> > > Currently, all needed libraries and tools for building this
> module
> > >>> are
> > >>> > > available on TC agents
> > >>> > > setup of specific test suite is in progress [4]
> > >>> > >
> > >>> > > So I am asking for a review of my patch.
> > >>> > >
> > >>> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
> > >>> > > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
> > >>> > > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
> > >>> > > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
> > >>> > >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Sincerely yours, Ivan Daschinskiy
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Andrey V. Mashenkov
> > >>>
> > >>
> > >>
> > >> --
> > >> Sincerely yours, Ivan Daschinskiy
> > >>
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: NUMA aware allocator, PR review request

2021-12-06 Thread Anton Vinogradov
Let's move all GCC-related parts to ignite-extensions, release, and use
them as a maven dependency.


On Fri, Dec 3, 2021 at 1:08 PM Ivan Daschinsky  wrote:

> Ok, TC suite is ready [1].
> If there is no objections, I will merge it soon.
>
> Possible concerns -- now it is required to install build-essentials and
> libnuma-dev in order to build ignite on 64 bit linux.
> I suppose that this is not a big deal, but maybe someone will contradict?
>
>
> [1] --
>
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds
>
> чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky :
>
> > >> Our runs show about 7-10 speedup,
> > Sorry, typo 7-10%  speedup
> >
> > чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky :
> >
> >> Andrey, thanks!
> >>
> >> This allocator can be tested on every NUMA system.
> >> Our runs show about 7-10 speedup, if we use allocattor with interleaved
> >> strategy + -XX:+UseNUMA.
> >> But unfortunately our yardstick benches doesn't use offheap a lot,
> >> usually above one Gb.
> >> We trying to do more benches with real data and share them, possibly in
> >> meetup.
> >>
> >> AFAIK, GG lab servers are two-sockets machines, aren't they? So it is
> >> worth to run benches with a lot data on them, using
> >> allocator with interleaved strategy (you can skip specifying numa nodes,
> >> by default it will use all available) and use -XX:+UseNUMA jvm
> >> flag.
> >>
> >>
> >>
> >> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >> >:
> >>
> >>> Ivan,
> >>>
> >>> Great job. PR looks good.
> >>>
> >>> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to
> jvm
> >>> > show promising results on yardstick benches. Technically, G1 is not a
> >>> numa
> >>> > aware collector for java versions less than 14, but allocation of
> heap
> >>> in
> >>> > interleaved mode shows good results even on java 11.
> >>>
> >>> Can you share benchmark results?
> >>> I'm not sure I'll have an Optane on my notebook in a reasonable time ;)
> >>>
> >>>
> >>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky 
> >>> wrote:
> >>>
> >>> > Semyon D. and Maks T. -- thanks a lot for review.
> >>> >
> >>> > BTW, Igniters, I will appreciate all opinions and feedback.
> >>> >
> >>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky  >:
> >>> >
> >>> > > Hi, igniters!
> >>> > >
> >>> > > There is not a big secret that nowadays NUMA is quite common in
> >>> > > multiprocessor systems.
> >>> > > And this memory architecture should be treated in specific ways.
> >>> > >
> >>> > > Support for NUMA is present in many commercial and open-source
> >>> products.
> >>> > >
> >>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
> >>> > > It is a JNI wrapper around `libnuma` and supports different
> >>> allocation
> >>> > > options.
> >>> > > I.e. interleaved, local, interleved_mask and so on. For more
> >>> information,
> >>> > > see
> >>> > > [2], [3].
> >>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA` flag
> >>> to jvm
> >>> > > show promising results on yardstick benches. Technically, G1 is
> not a
> >>> > numa
> >>> > > aware collector for java versions less than 14, but allocation of
> >>> heap in
> >>> > > interleaved mode shows good results even on java 11.
> >>> > >
> >>> > > Currently, all needed libraries and tools for building this module
> >>> are
> >>> > > available on TC agents
> >>> > > setup of specific test suite is in progress [4]
> >>> > >
> >>> > > So I am asking for a review of my patch.
> >>> > >
> >>> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
> >>> > > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
> >>> > > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
> >>> > > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> > Sincerely yours, Ivan Daschinskiy
> >>> >
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrey V. Mashenkov
> >>>
> >>
> >>
> >> --
> >> Sincerely yours, Ivan Daschinskiy
> >>
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


Re: NUMA aware allocator, PR review request

2021-12-03 Thread Ivan Daschinsky
Ok, TC suite is ready [1].
If there is no objections, I will merge it soon.

Possible concerns -- now it is required to install build-essentials and
libnuma-dev in order to build ignite on 64 bit linux.
I suppose that this is not a big deal, but maybe someone will contradict?


[1] --
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_NumaAllocator/?mode=builds

чт, 2 дек. 2021 г. в 12:03, Ivan Daschinsky :

> >> Our runs show about 7-10 speedup,
> Sorry, typo 7-10%  speedup
>
> чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky :
>
>> Andrey, thanks!
>>
>> This allocator can be tested on every NUMA system.
>> Our runs show about 7-10 speedup, if we use allocattor with interleaved
>> strategy + -XX:+UseNUMA.
>> But unfortunately our yardstick benches doesn't use offheap a lot,
>> usually above one Gb.
>> We trying to do more benches with real data and share them, possibly in
>> meetup.
>>
>> AFAIK, GG lab servers are two-sockets machines, aren't they? So it is
>> worth to run benches with a lot data on them, using
>> allocator with interleaved strategy (you can skip specifying numa nodes,
>> by default it will use all available) and use -XX:+UseNUMA jvm
>> flag.
>>
>>
>>
>> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov > >:
>>
>>> Ivan,
>>>
>>> Great job. PR looks good.
>>>
>>> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
>>> > show promising results on yardstick benches. Technically, G1 is not a
>>> numa
>>> > aware collector for java versions less than 14, but allocation of heap
>>> in
>>> > interleaved mode shows good results even on java 11.
>>>
>>> Can you share benchmark results?
>>> I'm not sure I'll have an Optane on my notebook in a reasonable time ;)
>>>
>>>
>>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky 
>>> wrote:
>>>
>>> > Semyon D. and Maks T. -- thanks a lot for review.
>>> >
>>> > BTW, Igniters, I will appreciate all opinions and feedback.
>>> >
>>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky :
>>> >
>>> > > Hi, igniters!
>>> > >
>>> > > There is not a big secret that nowadays NUMA is quite common in
>>> > > multiprocessor systems.
>>> > > And this memory architecture should be treated in specific ways.
>>> > >
>>> > > Support for NUMA is present in many commercial and open-source
>>> products.
>>> > >
>>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
>>> > > It is a JNI wrapper around `libnuma` and supports different
>>> allocation
>>> > > options.
>>> > > I.e. interleaved, local, interleved_mask and so on. For more
>>> information,
>>> > > see
>>> > > [2], [3].
>>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA` flag
>>> to jvm
>>> > > show promising results on yardstick benches. Technically, G1 is not a
>>> > numa
>>> > > aware collector for java versions less than 14, but allocation of
>>> heap in
>>> > > interleaved mode shows good results even on java 11.
>>> > >
>>> > > Currently, all needed libraries and tools for building this module
>>> are
>>> > > available on TC agents
>>> > > setup of specific test suite is in progress [4]
>>> > >
>>> > > So I am asking for a review of my patch.
>>> > >
>>> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
>>> > > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
>>> > > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
>>> > > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
>>> > >
>>> >
>>> >
>>> > --
>>> > Sincerely yours, Ivan Daschinskiy
>>> >
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>> --
>> Sincerely yours, Ivan Daschinskiy
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: NUMA aware allocator, PR review request

2021-12-02 Thread Ivan Daschinsky
>> Our runs show about 7-10 speedup,
Sorry, typo 7-10%  speedup

чт, 2 дек. 2021 г. в 12:01, Ivan Daschinsky :

> Andrey, thanks!
>
> This allocator can be tested on every NUMA system.
> Our runs show about 7-10 speedup, if we use allocattor with interleaved
> strategy + -XX:+UseNUMA.
> But unfortunately our yardstick benches doesn't use offheap a lot, usually
> above one Gb.
> We trying to do more benches with real data and share them, possibly in
> meetup.
>
> AFAIK, GG lab servers are two-sockets machines, aren't they? So it is
> worth to run benches with a lot data on them, using
> allocator with interleaved strategy (you can skip specifying numa nodes,
> by default it will use all available) and use -XX:+UseNUMA jvm
> flag.
>
>
>
> чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov :
>
>> Ivan,
>>
>> Great job. PR looks good.
>>
>> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
>> > show promising results on yardstick benches. Technically, G1 is not a
>> numa
>> > aware collector for java versions less than 14, but allocation of heap
>> in
>> > interleaved mode shows good results even on java 11.
>>
>> Can you share benchmark results?
>> I'm not sure I'll have an Optane on my notebook in a reasonable time ;)
>>
>>
>> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky 
>> wrote:
>>
>> > Semyon D. and Maks T. -- thanks a lot for review.
>> >
>> > BTW, Igniters, I will appreciate all opinions and feedback.
>> >
>> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky :
>> >
>> > > Hi, igniters!
>> > >
>> > > There is not a big secret that nowadays NUMA is quite common in
>> > > multiprocessor systems.
>> > > And this memory architecture should be treated in specific ways.
>> > >
>> > > Support for NUMA is present in many commercial and open-source
>> products.
>> > >
>> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
>> > > It is a JNI wrapper around `libnuma` and supports different allocation
>> > > options.
>> > > I.e. interleaved, local, interleved_mask and so on. For more
>> information,
>> > > see
>> > > [2], [3].
>> > > This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to
>> jvm
>> > > show promising results on yardstick benches. Technically, G1 is not a
>> > numa
>> > > aware collector for java versions less than 14, but allocation of
>> heap in
>> > > interleaved mode shows good results even on java 11.
>> > >
>> > > Currently, all needed libraries and tools for building this module are
>> > > available on TC agents
>> > > setup of specific test suite is in progress [4]
>> > >
>> > > So I am asking for a review of my patch.
>> > >
>> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
>> > > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
>> > > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
>> > > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
>> > >
>> >
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>> >
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: NUMA aware allocator, PR review request

2021-12-02 Thread Ivan Daschinsky
Andrey, thanks!

This allocator can be tested on every NUMA system.
Our runs show about 7-10 speedup, if we use allocattor with interleaved
strategy + -XX:+UseNUMA.
But unfortunately our yardstick benches doesn't use offheap a lot, usually
above one Gb.
We trying to do more benches with real data and share them, possibly in
meetup.

AFAIK, GG lab servers are two-sockets machines, aren't they? So it is worth
to run benches with a lot data on them, using
allocator with interleaved strategy (you can skip specifying numa nodes, by
default it will use all available) and use -XX:+UseNUMA jvm
flag.



чт, 2 дек. 2021 г. в 11:48, Andrey Mashenkov :

> Ivan,
>
> Great job. PR looks good.
>
> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
> > show promising results on yardstick benches. Technically, G1 is not a
> numa
> > aware collector for java versions less than 14, but allocation of heap in
> > interleaved mode shows good results even on java 11.
>
> Can you share benchmark results?
> I'm not sure I'll have an Optane on my notebook in a reasonable time ;)
>
>
> On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky 
> wrote:
>
> > Semyon D. and Maks T. -- thanks a lot for review.
> >
> > BTW, Igniters, I will appreciate all opinions and feedback.
> >
> > пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky :
> >
> > > Hi, igniters!
> > >
> > > There is not a big secret that nowadays NUMA is quite common in
> > > multiprocessor systems.
> > > And this memory architecture should be treated in specific ways.
> > >
> > > Support for NUMA is present in many commercial and open-source
> products.
> > >
> > > I've implemented a NUMA aware allocator for Apache Ignite [1]
> > > It is a JNI wrapper around `libnuma` and supports different allocation
> > > options.
> > > I.e. interleaved, local, interleved_mask and so on. For more
> information,
> > > see
> > > [2], [3].
> > > This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to
> jvm
> > > show promising results on yardstick benches. Technically, G1 is not a
> > numa
> > > aware collector for java versions less than 14, but allocation of heap
> in
> > > interleaved mode shows good results even on java 11.
> > >
> > > Currently, all needed libraries and tools for building this module are
> > > available on TC agents
> > > setup of specific test suite is in progress [4]
> > >
> > > So I am asking for a review of my patch.
> > >
> > > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
> > > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
> > > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
> > > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: NUMA aware allocator, PR review request

2021-12-02 Thread Andrey Mashenkov
Ivan,

Great job. PR looks good.

This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
> show promising results on yardstick benches. Technically, G1 is not a numa
> aware collector for java versions less than 14, but allocation of heap in
> interleaved mode shows good results even on java 11.

Can you share benchmark results?
I'm not sure I'll have an Optane on my notebook in a reasonable time ;)


On Thu, Dec 2, 2021 at 10:41 AM Ivan Daschinsky  wrote:

> Semyon D. and Maks T. -- thanks a lot for review.
>
> BTW, Igniters, I will appreciate all opinions and feedback.
>
> пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky :
>
> > Hi, igniters!
> >
> > There is not a big secret that nowadays NUMA is quite common in
> > multiprocessor systems.
> > And this memory architecture should be treated in specific ways.
> >
> > Support for NUMA is present in many commercial and open-source products.
> >
> > I've implemented a NUMA aware allocator for Apache Ignite [1]
> > It is a JNI wrapper around `libnuma` and supports different allocation
> > options.
> > I.e. interleaved, local, interleved_mask and so on. For more information,
> > see
> > [2], [3].
> > This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
> > show promising results on yardstick benches. Technically, G1 is not a
> numa
> > aware collector for java versions less than 14, but allocation of heap in
> > interleaved mode shows good results even on java 11.
> >
> > Currently, all needed libraries and tools for building this module are
> > available on TC agents
> > setup of specific test suite is in progress [4]
> >
> > So I am asking for a review of my patch.
> >
> > [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
> > [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
> > [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
> > [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 
Best regards,
Andrey V. Mashenkov


Re: NUMA aware allocator, PR review request

2021-12-01 Thread Ivan Daschinsky
Semyon D. and Maks T. -- thanks a lot for review.

BTW, Igniters, I will appreciate all opinions and feedback.

пн, 29 нояб. 2021 г. в 10:13, Ivan Daschinsky :

> Hi, igniters!
>
> There is not a big secret that nowadays NUMA is quite common in
> multiprocessor systems.
> And this memory architecture should be treated in specific ways.
>
> Support for NUMA is present in many commercial and open-source products.
>
> I've implemented a NUMA aware allocator for Apache Ignite [1]
> It is a JNI wrapper around `libnuma` and supports different allocation
> options.
> I.e. interleaved, local, interleved_mask and so on. For more information,
> see
> [2], [3].
> This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
> show promising results on yardstick benches. Technically, G1 is not a numa
> aware collector for java versions less than 14, but allocation of heap in
> interleaved mode shows good results even on java 11.
>
> Currently, all needed libraries and tools for building this module are
> available on TC agents
> setup of specific test suite is in progress [4]
>
> So I am asking for a review of my patch.
>
> [1] --  https://issues.apache.org/jira/browse/IGNITE-15922
> [2] -- https://man7.org/linux/man-pages/man3/numa.3.html
> [3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
> [4] -- https://issues.apache.org/jira/browse/IGNITE-15994
>


-- 
Sincerely yours, Ivan Daschinskiy


NUMA aware allocator, PR review request

2021-11-28 Thread Ivan Daschinsky
Hi, igniters!

There is not a big secret that nowadays NUMA is quite common in
multiprocessor systems.
And this memory architecture should be treated in specific ways.

Support for NUMA is present in many commercial and open-source products.

I've implemented a NUMA aware allocator for Apache Ignite [1]
It is a JNI wrapper around `libnuma` and supports different allocation
options.
I.e. interleaved, local, interleved_mask and so on. For more information,
see
[2], [3].
This allocator in interleaved mode and passing `-XX:+UseNUMA` flag to jvm
show promising results on yardstick benches. Technically, G1 is not a numa
aware collector for java versions less than 14, but allocation of heap in
interleaved mode shows good results even on java 11.

Currently, all needed libraries and tools for building this module are
available on TC agents
setup of specific test suite is in progress [4]

So I am asking for a review of my patch.

[1] --  https://issues.apache.org/jira/browse/IGNITE-15922
[2] -- https://man7.org/linux/man-pages/man3/numa.3.html
[3] -- https://man7.org/linux/man-pages/man2/mbind.2.html
[4] -- https://issues.apache.org/jira/browse/IGNITE-15994


Review request - allocated RAM is always 0 for non-persistent regions

2021-03-16 Thread Ilya Kasnacheev
Hello!

Guys, can you please review
https://issues.apache.org/jira/browse/IGNITE-13878 ?

I could not find a reviewer for this ticket.

Regards,
-- 
Ilya Kasnacheev


Re: Review Request: IGNITE-8635: Add a Method to Inspect BinaryObject Size

2021-03-12 Thread Atri Sharma
Gentle ping

On Thu, 11 Mar 2021 at 2:56 PM, Atri Sharma  wrote:

> Hi All,
>
> I have raised a PR for the above mentioned issue. Please see and help
> review:
>
> https://github.com/apache/ignite/pull/8868
>
> Regards,
>
> Atri
>
> --
> Regards,
>
> Atri
> Apache Concerted
>
-- 
Regards,

Atri
Apache Concerted


Review Request: IGNITE-8635: Add a Method to Inspect BinaryObject Size

2021-03-11 Thread Atri Sharma
Hi All,

I have raised a PR for the above mentioned issue. Please see and help review:

https://github.com/apache/ignite/pull/8868

Regards,

Atri

-- 
Regards,

Atri
Apache Concerted


Re: Review Request

2021-03-03 Thread Valentin Kulichenko
Atri,

I've added my comments in the PR.

-Val

On Wed, Mar 3, 2021 at 7:29 AM Denis Magda  wrote:

> @Valentin Kulichenko , @Nikolay Izhikov
> , @samvi...@yandex.ru ,
>
> I saw you reviewing the ticket. Could you please double-check the changes?
> "IGNITE-2399: Implement acquireAndExecute In IgniteSemaphore"
>
>
> Atri, please put a ticket number and name in the title of an email, so that
> community member can see right away what change is being contributed.
>
> -
> Denis
>
>
> On Wed, Mar 3, 2021 at 7:27 AM Atri Sharma  wrote:
>
> > Hi,
> >
> > Please help in reviewing:
> >
> > https://github.com/apache/ignite/pull/8820
> >
> > Atri
> >
>


Re: Review Request

2021-03-03 Thread Denis Magda
@Valentin Kulichenko , @Nikolay Izhikov
, @samvi...@yandex.ru ,

I saw you reviewing the ticket. Could you please double-check the changes?
"IGNITE-2399: Implement acquireAndExecute In IgniteSemaphore"


Atri, please put a ticket number and name in the title of an email, so that
community member can see right away what change is being contributed.

-
Denis


On Wed, Mar 3, 2021 at 7:27 AM Atri Sharma  wrote:

> Hi,
>
> Please help in reviewing:
>
> https://github.com/apache/ignite/pull/8820
>
> Atri
>


Review Request

2021-03-03 Thread Atri Sharma
Hi,

Please help in reviewing:

https://github.com/apache/ignite/pull/8820

Atri


Re: [REVIEW REQUEST] IEP-47 Native Persistence Defragmentation, core logic

2020-11-17 Thread Ivan Bessonov
But maybe I just don't know the date. To be short - right now
defragmentation is my first priority.

вт, 17 нояб. 2020 г. в 15:18, Ivan Bessonov :

> Denis,
>
> chances that feature will be fully complete is a bit low. We still make
> adjustments to the API
> and we need a few optimizations so that it would work faster.
>
> чт, 12 нояб. 2020 г. в 19:11, Denis Magda :
>
>> Ivan,
>>
>> Nice! Is the plan to get it added to Ignite 2.10?
>>
>> -
>> Denis
>>
>>
>> On Thu, Nov 12, 2020 at 7:11 AM Ivan Bessonov 
>> wrote:
>>
>> > Hi Igniters,
>> >
>> > Core functionality of defragmentation is finally implemented in [1].
>> > There's no public API in it
>> > for now, patch is already very big and had to be split into smaller
>> tasks
>> > (that consist mostly of refactoring).
>> >
>> > Code is a little rough right now, I'm gonna go through all the remaining
>> > TODO, but you can already
>> > start reviewing it. PR is here: [2].
>> >
>> > First control.sh commands are here, but I don't have TC test results
>> yet:
>> > [3].
>> > There will be more API related issues later, but now I'd like to polish
>> > core classes.
>> >
>> > Please leave your thoughts here and in PR.
>> >
>> > Thank you!
>> >
>> > [0]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation
>> > [1] https://issues.apache.org/jira/browse/IGNITE-13190
>> > [2] https://github.com/apache/ignite/pull/7984/files
>> > [3] https://issues.apache.org/jira/browse/IGNITE-13697
>> >
>> > --
>> > Sincerely yours,
>> > Ivan Bessonov
>> >
>>
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 
Sincerely yours,
Ivan Bessonov


Re: [REVIEW REQUEST] IEP-47 Native Persistence Defragmentation, core logic

2020-11-17 Thread Ivan Bessonov
Denis,

chances that feature will be fully complete is a bit low. We still make
adjustments to the API
and we need a few optimizations so that it would work faster.

чт, 12 нояб. 2020 г. в 19:11, Denis Magda :

> Ivan,
>
> Nice! Is the plan to get it added to Ignite 2.10?
>
> -
> Denis
>
>
> On Thu, Nov 12, 2020 at 7:11 AM Ivan Bessonov 
> wrote:
>
> > Hi Igniters,
> >
> > Core functionality of defragmentation is finally implemented in [1].
> > There's no public API in it
> > for now, patch is already very big and had to be split into smaller tasks
> > (that consist mostly of refactoring).
> >
> > Code is a little rough right now, I'm gonna go through all the remaining
> > TODO, but you can already
> > start reviewing it. PR is here: [2].
> >
> > First control.sh commands are here, but I don't have TC test results yet:
> > [3].
> > There will be more API related issues later, but now I'd like to polish
> > core classes.
> >
> > Please leave your thoughts here and in PR.
> >
> > Thank you!
> >
> > [0]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation
> > [1] https://issues.apache.org/jira/browse/IGNITE-13190
> > [2] https://github.com/apache/ignite/pull/7984/files
> > [3] https://issues.apache.org/jira/browse/IGNITE-13697
> >
> > --
> > Sincerely yours,
> > Ivan Bessonov
> >
>


-- 
Sincerely yours,
Ivan Bessonov


Re: [REVIEW REQUEST] IEP-47 Native Persistence Defragmentation, core logic

2020-11-12 Thread Denis Magda
Ivan,

Nice! Is the plan to get it added to Ignite 2.10?

-
Denis


On Thu, Nov 12, 2020 at 7:11 AM Ivan Bessonov  wrote:

> Hi Igniters,
>
> Core functionality of defragmentation is finally implemented in [1].
> There's no public API in it
> for now, patch is already very big and had to be split into smaller tasks
> (that consist mostly of refactoring).
>
> Code is a little rough right now, I'm gonna go through all the remaining
> TODO, but you can already
> start reviewing it. PR is here: [2].
>
> First control.sh commands are here, but I don't have TC test results yet:
> [3].
> There will be more API related issues later, but now I'd like to polish
> core classes.
>
> Please leave your thoughts here and in PR.
>
> Thank you!
>
> [0]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation
> [1] https://issues.apache.org/jira/browse/IGNITE-13190
> [2] https://github.com/apache/ignite/pull/7984/files
> [3] https://issues.apache.org/jira/browse/IGNITE-13697
>
> --
> Sincerely yours,
> Ivan Bessonov
>


[REVIEW REQUEST] IEP-47 Native Persistence Defragmentation, core logic

2020-11-12 Thread Ivan Bessonov
Hi Igniters,

Core functionality of defragmentation is finally implemented in [1].
There's no public API in it
for now, patch is already very big and had to be split into smaller tasks
(that consist mostly of refactoring).

Code is a little rough right now, I'm gonna go through all the remaining
TODO, but you can already
start reviewing it. PR is here: [2].

First control.sh commands are here, but I don't have TC test results yet:
[3].
There will be more API related issues later, but now I'd like to polish
core classes.

Please leave your thoughts here and in PR.

Thank you!

[0]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation
[1] https://issues.apache.org/jira/browse/IGNITE-13190
[2] https://github.com/apache/ignite/pull/7984/files
[3] https://issues.apache.org/jira/browse/IGNITE-13697

-- 
Sincerely yours,
Ivan Bessonov


Re: [REVIEW REQUEST] IGNITE-12630 Remove developers sections from parent pom.xml

2020-02-09 Thread Ivan Pavlukhin
Merged to master.

Best regards,
Ivan Pavlukhin

чт, 6 февр. 2020 г. в 11:57, Anton Vinogradov :
>
> Looks good to me.
>
> On Thu, Feb 6, 2020 at 11:45 AM Ivan Pavlukhin  wrote:
>
> > Igniters,
> >
> > I raised a PR for a ticket [1] removing  section from
> > parent pom.xml. I described the motivation in the ticket. Shortly,
> > this section has a little meaning today and even worse is misleading.
> >
> > Please review.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12630
> >
> > Best regards,
> > Ivan Pavlukhin
> >


Re: [REVIEW REQUEST] IGNITE-12630 Remove developers sections from parent pom.xml

2020-02-06 Thread Anton Vinogradov
Looks good to me.

On Thu, Feb 6, 2020 at 11:45 AM Ivan Pavlukhin  wrote:

> Igniters,
>
> I raised a PR for a ticket [1] removing  section from
> parent pom.xml. I described the motivation in the ticket. Shortly,
> this section has a little meaning today and even worse is misleading.
>
> Please review.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12630
>
> Best regards,
> Ivan Pavlukhin
>


[REVIEW REQUEST] IGNITE-12630 Remove developers sections from parent pom.xml

2020-02-06 Thread Ivan Pavlukhin
Igniters,

I raised a PR for a ticket [1] removing  section from
parent pom.xml. I described the motivation in the ticket. Shortly,
this section has a little meaning today and even worse is misleading.

Please review.

[1] https://issues.apache.org/jira/browse/IGNITE-12630

Best regards,
Ivan Pavlukhin


Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-19 Thread Павлухин Иван
> > > > > > > > > Slava,
> > > > > > > > >
> > > > > > > > > Thanks for your review first!
> > > > > > > > >
> > > > > > > > > >> Anyway, I left some comments in your pull-request at
> > github.
> > > > > > Please
> > > > > > > > take
> > > > > > > > > a
> > > > > > > > > >> look. The most annoying thing is poorly documented code :(
> > > > > > > > > Since your comments mostly about Javadoc (does this mean
> > that my
> > > > > > > solution
> > > > > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > > > > I'd like to propose you to provide fixes as a PR since you
> > have a
> > > > > > > vision
> > > > > > > > of
> > > > > > > > > how it should be made. I'll review them and merge shortly.
> > > > > > > > >
> > > > > > > > > >> By the way, is it required to add test related to
> > fail-over
> > > > > > > scenarios?
> > > > > > > > > The best check is to use RR at real code.
> > > > > > > > > For example, I'm injecting RR now to the test with concurrent
> > > > > > > > modifications
> > > > > > > > > and restarts [1].
> > > > > > > > >
> > > > > > > > > >> I just checked, the IEP page and I still cannot find Jira
> > > > > tickets
> > > > > > > that
> > > > > > > > > >> should cover existing limitations/improvements.
> > > > > > > > > >> I would suggest creating the following tasks, at least:
> > > > > > > > > Mostly agree with you, but
> > > > > > > > > - MVCC is not production ready,
> > > > > > > > > - not sure near support really required,
> > > > > > > > > - metrics are better for monitoring, but the Event is enough
> > for
> > > > my
> > > > > > > wish
> > > > > > > > to
> > > > > > > > > cover AI with consistency check,
> > > > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > > > Also, we should not produce stillborn issue.
> > > > > > > > > All limitations listed at proxy creation method and they
> > > > definitely
> > > > > > are
> > > > > > > > not
> > > > > > > > > showstoppers and may be fixed later if someone interested.
> > > > > > > > > Сoming back to AI 3.0 discussion, we have A LOT of features
> > and
> > > > > it's
> > > > > > > > almost
> > > > > > > > > impossible (require much more time that feature's cost) to
> > > > support
> > > > > > them
> > > > > > > > > all.
> > > > > > > > > I will be pretty happy in case someone will do this and
> > provide
> > > > > help
> > > > > > if
> > > > > > > > > necessary!
> > > > > > > > >
> > > > > > > > > >> Perhaps, it would be a good idea to think about the
> > recovery
> > > > too
> > > > > > > > > Do you mean per partition check and recovery?
> > > > > > > > > That's a good idea, but I found it's not easy to imagine API
> > to
> > > > for
> > > > > > > such
> > > > > > > > > tool.
> > > > > > > > > In case you ready to assist with proper API/design this will
> > > > > > definitely
> > > > > > > > > help.
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > > > > > > >
> > > > > > > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > > > > > > > > slava.kopti...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > >

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-18 Thread Anton Vinogradov
 issue?
> > > > > > Perhaps, I
> > > > > > > should restart a node (which one?), restart the whole cluster,
> put
> > > a
> > > > > new
> > > > > > > value...
> > > > > > > 3. IgniteConsistencyViolationException is absolutely useless.
> It
> > > does
> > > > > not
> > > > > > > provide any information about the issue and possible way to
> fix it.
> > > > > > >
> > > > > > > It seems that transactional caches are covered much better.
> > > > > > >
> > > > > > > > Mostly agree with you, but
> > > > > > > > - MVCC is not production ready,
> > > > > > > > - not sure near support really required,
> > > > > > > > - metrics are better for monitoring, but the Event is enough
> for
> > > my
> > > > > > wish
> > > > > > > to
> > > > > > > > cover AI with consistency check,
> > > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > Well, near caches are widely used and fully transactional, so I
> > > think
> > > > > it
> > > > > > > makes sense to support the feature for near caches too.
> > > > > > > .Net is already aware of 'ReadRepair'. It seems to me, that it
> can
> > > be
> > > > > > > easily supported for C++. I don't see a reason why it should
> not be
> > > > > done
> > > > > > :)
> > > > > > >
> > > > > > > > Do you mean per partition check and recovery? That's a good
> idea,
> > > > > but I
> > > > > > > found it's not easy to imagine API to for such tool.
> > > > > > > Yep, perhaps it can be done on the idle cluster via
> `idle-verify`
> > > > > command
> > > > > > > with additional flag. Agreed, that this approach is not the
> best
> > > one
> > > > > > > definitely.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > S.
> > > > > > >
> > > > > > > чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov :
> > > > > > >
> > > > > > > > Slava,
> > > > > > > >
> > > > > > > > Thanks for your review first!
> > > > > > > >
> > > > > > > > >> Anyway, I left some comments in your pull-request at
> github.
> > > > > Please
> > > > > > > take
> > > > > > > > a
> > > > > > > > >> look. The most annoying thing is poorly documented code :(
> > > > > > > > Since your comments mostly about Javadoc (does this mean
> that my
> > > > > > solution
> > > > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > > > I'd like to propose you to provide fixes as a PR since you
> have a
> > > > > > vision
> > > > > > > of
> > > > > > > > how it should be made. I'll review them and merge shortly.
> > > > > > > >
> > > > > > > > >> By the way, is it required to add test related to
> fail-over
> > > > > > scenarios?
> > > > > > > > The best check is to use RR at real code.
> > > > > > > > For example, I'm injecting RR now to the test with concurrent
> > > > > > > modifications
> > > > > > > > and restarts [1].
> > > > > > > >
> > > > > > > > >> I just checked, the IEP page and I still cannot find Jira
> > > > tickets
> > > > > > that
> > > > > > > > >> should cover existing limitations/improvements.
> > > > > > > > >> I would suggest creating the following tasks, at least:
> > > > > > > > Mostly agree with you, but
> > > > > > > > - MVCC is not production ready,
> > > > > > > > - not sure near support really required,
> > > > > > > > - metrics are better for monitoring, but the Event is enough
> for
> > > my
> > > > > > wish
> > 

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-18 Thread Павлухин Иван
ition check and recovery? That's a good idea,
> > > > but I
> > > > > > found it's not easy to imagine API to for such tool.
> > > > > > Yep, perhaps it can be done on the idle cluster via `idle-verify`
> > > > command
> > > > > > with additional flag. Agreed, that this approach is not the best
> > one
> > > > > > definitely.
> > > > > >
> > > > > > Thanks,
> > > > > > S.
> > > > > >
> > > > > > чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov :
> > > > > >
> > > > > > > Slava,
> > > > > > >
> > > > > > > Thanks for your review first!
> > > > > > >
> > > > > > > >> Anyway, I left some comments in your pull-request at github.
> > > > Please
> > > > > > take
> > > > > > > a
> > > > > > > >> look. The most annoying thing is poorly documented code :(
> > > > > > > Since your comments mostly about Javadoc (does this mean that my
> > > > > solution
> > > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > > I'd like to propose you to provide fixes as a PR since you have a
> > > > > vision
> > > > > > of
> > > > > > > how it should be made. I'll review them and merge shortly.
> > > > > > >
> > > > > > > >> By the way, is it required to add test related to fail-over
> > > > > scenarios?
> > > > > > > The best check is to use RR at real code.
> > > > > > > For example, I'm injecting RR now to the test with concurrent
> > > > > > modifications
> > > > > > > and restarts [1].
> > > > > > >
> > > > > > > >> I just checked, the IEP page and I still cannot find Jira
> > > tickets
> > > > > that
> > > > > > > >> should cover existing limitations/improvements.
> > > > > > > >> I would suggest creating the following tasks, at least:
> > > > > > > Mostly agree with you, but
> > > > > > > - MVCC is not production ready,
> > > > > > > - not sure near support really required,
> > > > > > > - metrics are better for monitoring, but the Event is enough for
> > my
> > > > > wish
> > > > > > to
> > > > > > > cover AI with consistency check,
> > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > Also, we should not produce stillborn issue.
> > > > > > > All limitations listed at proxy creation method and they
> > definitely
> > > > are
> > > > > > not
> > > > > > > showstoppers and may be fixed later if someone interested.
> > > > > > > Сoming back to AI 3.0 discussion, we have A LOT of features and
> > > it's
> > > > > > almost
> > > > > > > impossible (require much more time that feature's cost) to
> > support
> > > > them
> > > > > > > all.
> > > > > > > I will be pretty happy in case someone will do this and provide
> > > help
> > > > if
> > > > > > > necessary!
> > > > > > >
> > > > > > > >> Perhaps, it would be a good idea to think about the recovery
> > too
> > > > > > > Do you mean per partition check and recovery?
> > > > > > > That's a good idea, but I found it's not easy to imagine API to
> > for
> > > > > such
> > > > > > > tool.
> > > > > > > In case you ready to assist with proper API/design this will
> > > > definitely
> > > > > > > help.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > > > > >
> > > > > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > > > > > > slava.kopti...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Perhaps, it would be a good idea to think about the recovery
> > > tool/
> > > > > > 

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-17 Thread Вячеслав Коптилин
de fixes as a PR since you have a
> > > > vision
> > > > > of
> > > > > > how it should be made. I'll review them and merge shortly.
> > > > > >
> > > > > > >> By the way, is it required to add test related to fail-over
> > > > scenarios?
> > > > > > The best check is to use RR at real code.
> > > > > > For example, I'm injecting RR now to the test with concurrent
> > > > > modifications
> > > > > > and restarts [1].
> > > > > >
> > > > > > >> I just checked, the IEP page and I still cannot find Jira
> > tickets
> > > > that
> > > > > > >> should cover existing limitations/improvements.
> > > > > > >> I would suggest creating the following tasks, at least:
> > > > > > Mostly agree with you, but
> > > > > > - MVCC is not production ready,
> > > > > > - not sure near support really required,
> > > > > > - metrics are better for monitoring, but the Event is enough for
> my
> > > > wish
> > > > > to
> > > > > > cover AI with consistency check,
> > > > > > - do we really need Platforms and Thin Client support?
> > > > > > Also, we should not produce stillborn issue.
> > > > > > All limitations listed at proxy creation method and they
> definitely
> > > are
> > > > > not
> > > > > > showstoppers and may be fixed later if someone interested.
> > > > > > Сoming back to AI 3.0 discussion, we have A LOT of features and
> > it's
> > > > > almost
> > > > > > impossible (require much more time that feature's cost) to
> support
> > > them
> > > > > > all.
> > > > > > I will be pretty happy in case someone will do this and provide
> > help
> > > if
> > > > > > necessary!
> > > > > >
> > > > > > >> Perhaps, it would be a good idea to think about the recovery
> too
> > > > > > Do you mean per partition check and recovery?
> > > > > > That's a good idea, but I found it's not easy to imagine API to
> for
> > > > such
> > > > > > tool.
> > > > > > In case you ready to assist with proper API/design this will
> > > definitely
> > > > > > help.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > > > >
> > > > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > > > > > slava.kopti...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Perhaps, it would be a good idea to think about the recovery
> > tool/
> > > > > > > control-utility command that will allow achieving the same
> goal.
> > > > > > > If I am not mistaken it was already proposed in the email
> thread.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > S.
> > > > > > >
> > > > > > > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <
> > > > > slava.kopti...@gmail.com
> > > > > > >:
> > > > > > >
> > > > > > > > Hi Anton,
> > > > > > > >
> > > > > > > > Well, the ReadRepair feature is finally merged and that is
> good
> > > > news
> > > > > :)
> > > > > > > >
> > > > > > > > Unfortunately, I cannot find a consensus about the whole
> > > > > functionality
> > > > > > in
> > > > > > > > any of these topics:
> > > > > > > >  -
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > > > > > >  -
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > > > >

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-16 Thread Anton Vinogradov
e issue?
> > > Perhaps, I
> > > > should restart a node (which one?), restart the whole cluster, put a
> > new
> > > > value...
> > > > 3. IgniteConsistencyViolationException is absolutely useless. It does
> > not
> > > > provide any information about the issue and possible way to fix it.
> > > >
> > > > It seems that transactional caches are covered much better.
> > > >
> > > > > Mostly agree with you, but
> > > > > - MVCC is not production ready,
> > > > > - not sure near support really required,
> > > > > - metrics are better for monitoring, but the Event is enough for my
> > > wish
> > > > to
> > > > > cover AI with consistency check,
> > > > > - do we really need Platforms and Thin Client support?
> > > > Well, near caches are widely used and fully transactional, so I think
> > it
> > > > makes sense to support the feature for near caches too.
> > > > .Net is already aware of 'ReadRepair'. It seems to me, that it can be
> > > > easily supported for C++. I don't see a reason why it should not be
> > done
> > > :)
> > > >
> > > > > Do you mean per partition check and recovery? That's a good idea,
> > but I
> > > > found it's not easy to imagine API to for such tool.
> > > > Yep, perhaps it can be done on the idle cluster via `idle-verify`
> > command
> > > > with additional flag. Agreed, that this approach is not the best one
> > > > definitely.
> > > >
> > > > Thanks,
> > > > S.
> > > >
> > > > чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov :
> > > >
> > > > > Slava,
> > > > >
> > > > > Thanks for your review first!
> > > > >
> > > > > >> Anyway, I left some comments in your pull-request at github.
> > Please
> > > > take
> > > > > a
> > > > > >> look. The most annoying thing is poorly documented code :(
> > > > > Since your comments mostly about Javadoc (does this mean that my
> > > solution
> > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > I'd like to propose you to provide fixes as a PR since you have a
> > > vision
> > > > of
> > > > > how it should be made. I'll review them and merge shortly.
> > > > >
> > > > > >> By the way, is it required to add test related to fail-over
> > > scenarios?
> > > > > The best check is to use RR at real code.
> > > > > For example, I'm injecting RR now to the test with concurrent
> > > > modifications
> > > > > and restarts [1].
> > > > >
> > > > > >> I just checked, the IEP page and I still cannot find Jira
> tickets
> > > that
> > > > > >> should cover existing limitations/improvements.
> > > > > >> I would suggest creating the following tasks, at least:
> > > > > Mostly agree with you, but
> > > > > - MVCC is not production ready,
> > > > > - not sure near support really required,
> > > > > - metrics are better for monitoring, but the Event is enough for my
> > > wish
> > > > to
> > > > > cover AI with consistency check,
> > > > > - do we really need Platforms and Thin Client support?
> > > > > Also, we should not produce stillborn issue.
> > > > > All limitations listed at proxy creation method and they definitely
> > are
> > > > not
> > > > > showstoppers and may be fixed later if someone interested.
> > > > > Сoming back to AI 3.0 discussion, we have A LOT of features and
> it's
> > > > almost
> > > > > impossible (require much more time that feature's cost) to support
> > them
> > > > > all.
> > > > > I will be pretty happy in case someone will do this and provide
> help
> > if
> > > > > necessary!
> > > > >
> > > > > >> Perhaps, it would be a good idea to think about the recovery too
> > > > > Do you mean per partition check and recovery?
> > > > > That's a good idea, but I found it's not easy to imagine API to for
> > > such
> > > > > tool.
> > > > > In case you ready to assist with proper API/design this will
&g

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-15 Thread Dmitriy Pavlov
> > that
> > > > >> should cover existing limitations/improvements.
> > > > >> I would suggest creating the following tasks, at least:
> > > > Mostly agree with you, but
> > > > - MVCC is not production ready,
> > > > - not sure near support really required,
> > > > - metrics are better for monitoring, but the Event is enough for my
> > wish
> > > to
> > > > cover AI with consistency check,
> > > > - do we really need Platforms and Thin Client support?
> > > > Also, we should not produce stillborn issue.
> > > > All limitations listed at proxy creation method and they definitely
> are
> > > not
> > > > showstoppers and may be fixed later if someone interested.
> > > > Сoming back to AI 3.0 discussion, we have A LOT of features and it's
> > > almost
> > > > impossible (require much more time that feature's cost) to support
> them
> > > > all.
> > > > I will be pretty happy in case someone will do this and provide help
> if
> > > > necessary!
> > > >
> > > > >> Perhaps, it would be a good idea to think about the recovery too
> > > > Do you mean per partition check and recovery?
> > > > That's a good idea, but I found it's not easy to imagine API to for
> > such
> > > > tool.
> > > > In case you ready to assist with proper API/design this will
> definitely
> > > > help.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > >
> > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > > > slava.kopti...@gmail.com>
> > > > wrote:
> > > >
> > > > > Perhaps, it would be a good idea to think about the recovery tool/
> > > > > control-utility command that will allow achieving the same goal.
> > > > > If I am not mistaken it was already proposed in the email thread.
> > > > >
> > > > > Thanks,
> > > > > S.
> > > > >
> > > > > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <
> > > slava.kopti...@gmail.com
> > > > >:
> > > > >
> > > > > > Hi Anton,
> > > > > >
> > > > > > Well, the ReadRepair feature is finally merged and that is good
> > news
> > > :)
> > > > > >
> > > > > > Unfortunately, I cannot find a consensus about the whole
> > > functionality
> > > > in
> > > > > > any of these topics:
> > > > > >  -
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > > > >  -
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > > > Also, there are no comments/discussion in JIRA. That makes me sad
> > :(
> > > > > > especially when a feature is huge, not obvious and involves
> > changing
> > > > > public
> > > > > > API (and that is the case, I think).
> > > > > >
> > > > > > Anyway, I left some comments in your pull-request at github.
> Please
> > > > take
> > > > > a
> > > > > > look. The most annoying thing is poorly documented code :(
> > > > > > By the way, is it required to add test related to fail-over
> > > scenarios?
> > > > > >
> > > > > > I just checked, the IEP page and I still cannot find Jira tickets
> > > that
> > > > > > should cover existing limitations/improvements.
> > > > > > I would suggest creating the following tasks, at least:
> > > > > >  - MVCC support
> > > > > >  - Near caches
> > > > > >  - Additional metrics (number of violations, number of repaired
> > > entries
> > > > > > etc)
> > > > > >  - Ignite C++ (It looks like, .Net is already aware of that
> > feature)
> > > > > >  - Thin clients support
> > > > > >  - Perhaps, it would be useful to support different strategies to
> > > > resolve
> > > > > > inconsistencies

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-15 Thread Nikolay Izhikov
ted at proxy creation method and they definitely are
> > not
> > > showstoppers and may be fixed later if someone interested.
> > > Сoming back to AI 3.0 discussion, we have A LOT of features and it's
> > almost
> > > impossible (require much more time that feature's cost) to support them
> > > all.
> > > I will be pretty happy in case someone will do this and provide help if
> > > necessary!
> > >
> > > >> Perhaps, it would be a good idea to think about the recovery too
> > > Do you mean per partition check and recovery?
> > > That's a good idea, but I found it's not easy to imagine API to for
> such
> > > tool.
> > > In case you ready to assist with proper API/design this will definitely
> > > help.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > >
> > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > > slava.kopti...@gmail.com>
> > > wrote:
> > >
> > > > Perhaps, it would be a good idea to think about the recovery tool/
> > > > control-utility command that will allow achieving the same goal.
> > > > If I am not mistaken it was already proposed in the email thread.
> > > >
> > > > Thanks,
> > > > S.
> > > >
> > > > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <
> > slava.kopti...@gmail.com
> > > >:
> > > >
> > > > > Hi Anton,
> > > > >
> > > > > Well, the ReadRepair feature is finally merged and that is good
> news
> > :)
> > > > >
> > > > > Unfortunately, I cannot find a consensus about the whole
> > functionality
> > > in
> > > > > any of these topics:
> > > > >  -
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > > >  -
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > > Also, there are no comments/discussion in JIRA. That makes me sad
> :(
> > > > > especially when a feature is huge, not obvious and involves
> changing
> > > > public
> > > > > API (and that is the case, I think).
> > > > >
> > > > > Anyway, I left some comments in your pull-request at github. Please
> > > take
> > > > a
> > > > > look. The most annoying thing is poorly documented code :(
> > > > > By the way, is it required to add test related to fail-over
> > scenarios?
> > > > >
> > > > > I just checked, the IEP page and I still cannot find Jira tickets
> > that
> > > > > should cover existing limitations/improvements.
> > > > > I would suggest creating the following tasks, at least:
> > > > >  - MVCC support
> > > > >  - Near caches
> > > > >  - Additional metrics (number of violations, number of repaired
> > entries
> > > > > etc)
> > > > >  - Ignite C++ (It looks like, .Net is already aware of that
> feature)
> > > > >  - Thin clients support
> > > > >  - Perhaps, it would be useful to support different strategies to
> > > resolve
> > > > > inconsistencies
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > S.
> > > > >
> > > > >
> > > > > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :
> > > > >
> > > > >> Folks,
> > > > >>
> > > > >> Thanks to everyone for tips and reviews.
> > > > >> Yardstick checked, no performance drop found.
> > > > >> Additional measurement: RR get() is just up to 7% slower than
> > regular
> > > > get
> > > > >> on real benchmarks (8 clients, 4 servers, 3 backups)
> > > > >> Code merged to the master.
> > > > >> "Must have" tasks created and attached to the IEP.
> > > > >>
> > > > >> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov 
> > > wrote:
> > > > >>
> > > > >> > Folks,
> > > > >> >
> > > > >>

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-15 Thread Dmitriy Pavlov
but I found it's not easy to imagine API to for such
> > tool.
> > In case you ready to assist with proper API/design this will definitely
> > help.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> >
> > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> > slava.kopti...@gmail.com>
> > wrote:
> >
> > > Perhaps, it would be a good idea to think about the recovery tool/
> > > control-utility command that will allow achieving the same goal.
> > > If I am not mistaken it was already proposed in the email thread.
> > >
> > > Thanks,
> > > S.
> > >
> > > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <
> slava.kopti...@gmail.com
> > >:
> > >
> > > > Hi Anton,
> > > >
> > > > Well, the ReadRepair feature is finally merged and that is good news
> :)
> > > >
> > > > Unfortunately, I cannot find a consensus about the whole
> functionality
> > in
> > > > any of these topics:
> > > >  -
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > >  -
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > Also, there are no comments/discussion in JIRA. That makes me sad :(
> > > > especially when a feature is huge, not obvious and involves changing
> > > public
> > > > API (and that is the case, I think).
> > > >
> > > > Anyway, I left some comments in your pull-request at github. Please
> > take
> > > a
> > > > look. The most annoying thing is poorly documented code :(
> > > > By the way, is it required to add test related to fail-over
> scenarios?
> > > >
> > > > I just checked, the IEP page and I still cannot find Jira tickets
> that
> > > > should cover existing limitations/improvements.
> > > > I would suggest creating the following tasks, at least:
> > > >  - MVCC support
> > > >  - Near caches
> > > >  - Additional metrics (number of violations, number of repaired
> entries
> > > > etc)
> > > >  - Ignite C++ (It looks like, .Net is already aware of that feature)
> > > >  - Thin clients support
> > > >  - Perhaps, it would be useful to support different strategies to
> > resolve
> > > > inconsistencies
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > S.
> > > >
> > > >
> > > > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :
> > > >
> > > >> Folks,
> > > >>
> > > >> Thanks to everyone for tips and reviews.
> > > >> Yardstick checked, no performance drop found.
> > > >> Additional measurement: RR get() is just up to 7% slower than
> regular
> > > get
> > > >> on real benchmarks (8 clients, 4 servers, 3 backups)
> > > >> Code merged to the master.
> > > >> "Must have" tasks created and attached to the IEP.
> > > >>
> > > >> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov 
> > wrote:
> > > >>
> > > >> > Folks,
> > > >> >
> > > >> > Just a minor update.
> > > >> >
> > > >> > RunAll [1] with enabled ReadRepair proxy is almost green now (~10
> > > tests
> > > >> > left, started with 6k :)).
> > > >> > During the analisys, I've found some tests with
> > > >> > - unexpected repairs at tx caches
> > > >> > - inconsistent state after the test finished (different entries
> > across
> > > >> the
> > > >> > topology)
> > > >> > For example,
> > > >> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration generates
> > obsolete
> > > >> > versions on backups in case of retry, fixed [2]
> > > >> > - initial cache load generates not equal versions on backups,
> fixed
> > > [3]
> > > >> > - testAccountTxNodeRestart causes unexpected repairs (entries have
> > > >> > different versions), to be investigated.
> > > >> >
> > > >> > What's next?
> > > >> >
> > > >> &g

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-15 Thread Вячеслав Коптилин
Hello Anton,

> I'd like to propose you to provide fixes as a PR since you have a vision
of how it should be made. I'll review them and merge shortly.
Could you please take a look at PR:
https://github.com/apache/ignite/pull/6689

> Since your comments mostly about Javadoc (does this mean that my solution
is so great that you ask me only to fix Javadocs :) ?),
In my humble opinion, I would consider this feature as experimental one (It
does not seem production-ready).
Let me clarify this with the following simple example:

try {
atomicCache.withReadRepair().getAll(keys);
}
catch (CacheException e) {
// What should be done here from the end-user point of view?
}

1. Should I consider that my cluster is broken? There is no answer! The
false-positive result is possible.
2. What should be done here in order to check/resolve the issue? Perhaps, I
should restart a node (which one?), restart the whole cluster, put a new
value...
3. IgniteConsistencyViolationException is absolutely useless. It does not
provide any information about the issue and possible way to fix it.

It seems that transactional caches are covered much better.

> Mostly agree with you, but
> - MVCC is not production ready,
> - not sure near support really required,
> - metrics are better for monitoring, but the Event is enough for my wish
to
> cover AI with consistency check,
> - do we really need Platforms and Thin Client support?
Well, near caches are widely used and fully transactional, so I think it
makes sense to support the feature for near caches too.
.Net is already aware of 'ReadRepair'. It seems to me, that it can be
easily supported for C++. I don't see a reason why it should not be done :)

> Do you mean per partition check and recovery? That's a good idea, but I
found it's not easy to imagine API to for such tool.
Yep, perhaps it can be done on the idle cluster via `idle-verify` command
with additional flag. Agreed, that this approach is not the best one
definitely.

Thanks,
S.

чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov :

> Slava,
>
> Thanks for your review first!
>
> >> Anyway, I left some comments in your pull-request at github. Please take
> a
> >> look. The most annoying thing is poorly documented code :(
> Since your comments mostly about Javadoc (does this mean that my solution
> is so great that you ask me only to fix Javadocs :) ?),
> I'd like to propose you to provide fixes as a PR since you have a vision of
> how it should be made. I'll review them and merge shortly.
>
> >> By the way, is it required to add test related to fail-over scenarios?
> The best check is to use RR at real code.
> For example, I'm injecting RR now to the test with concurrent modifications
> and restarts [1].
>
> >> I just checked, the IEP page and I still cannot find Jira tickets that
> >> should cover existing limitations/improvements.
> >> I would suggest creating the following tasks, at least:
> Mostly agree with you, but
> - MVCC is not production ready,
> - not sure near support really required,
> - metrics are better for monitoring, but the Event is enough for my wish to
> cover AI with consistency check,
> - do we really need Platforms and Thin Client support?
> Also, we should not produce stillborn issue.
> All limitations listed at proxy creation method and they definitely are not
> showstoppers and may be fixed later if someone interested.
> Сoming back to AI 3.0 discussion, we have A LOT of features and it's almost
> impossible (require much more time that feature's cost) to support them
> all.
> I will be pretty happy in case someone will do this and provide help if
> necessary!
>
> >> Perhaps, it would be a good idea to think about the recovery too
> Do you mean per partition check and recovery?
> That's a good idea, but I found it's not easy to imagine API to for such
> tool.
> In case you ready to assist with proper API/design this will definitely
> help.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11973
>
> On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> slava.kopti...@gmail.com>
> wrote:
>
> > Perhaps, it would be a good idea to think about the recovery tool/
> > control-utility command that will allow achieving the same goal.
> > If I am not mistaken it was already proposed in the email thread.
> >
> > Thanks,
> > S.
> >
> > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин  >:
> >
> > > Hi Anton,
> > >
> > > Well, the ReadRepair feature is finally merged and that is good news :)
> > >
> > > Unfortunately, I cannot find a consensus about the whole functionality
> in
> > > any of these topics:
> > >  -
> > >
> >
> http://apache

Re: [REVIEW REQUEST] IGNITE-11951 Improvements in JdkMarshaller

2019-07-11 Thread Павлухин Иван
Merged the patch to master [1]. Thank Alex Plekhanov for a review.

[1] https://issues.apache.org/jira/browse/IGNITE-11951

ср, 10 июл. 2019 г. в 08:42, Павлухин Иван :
>
> Hi,
>
> I made some small improvements in JdkMarshaller [1]. I will be happy
> if someone reviews it. Changes are quite simple.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11951
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-11 Thread Anton Vinogradov
Nikolay,

Initial idea is to count fixes per cache.
In other words, event recording should cause metric increment.
Could you help me with RR metrics implementation as a part of your metrics
journey?

On Thu, Jul 11, 2019 at 10:18 AM Nikolay Izhikov 
wrote:

> Anton,
>
> > - metrics are better for monitoring, but the Event is enough for my wish
> to cover AI with consistency check,
>
> Can you clarify, do you have plans to add metrics of RR events?
>
> I think it should be count of incosistency events per cache(maybe per
> partition).
>
>
> В Чт, 11/07/2019 в 09:53 +0300, Anton Vinogradov пишет:
> > Slava,
> >
> > Thanks for your review first!
> >
> > > > Anyway, I left some comments in your pull-request at github. Please
> take
> >
> > a
> > > > look. The most annoying thing is poorly documented code :(
> >
> > Since your comments mostly about Javadoc (does this mean that my solution
> > is so great that you ask me only to fix Javadocs :) ?),
> > I'd like to propose you to provide fixes as a PR since you have a vision
> of
> > how it should be made. I'll review them and merge shortly.
> >
> > > > By the way, is it required to add test related to fail-over
> scenarios?
> >
> > The best check is to use RR at real code.
> > For example, I'm injecting RR now to the test with concurrent
> modifications
> > and restarts [1].
> >
> > > > I just checked, the IEP page and I still cannot find Jira tickets
> that
> > > > should cover existing limitations/improvements.
> > > > I would suggest creating the following tasks, at least:
> >
> > Mostly agree with you, but
> > - MVCC is not production ready,
> > - not sure near support really required,
> > - metrics are better for monitoring, but the Event is enough for my wish
> to
> > cover AI with consistency check,
> > - do we really need Platforms and Thin Client support?
> > Also, we should not produce stillborn issue.
> > All limitations listed at proxy creation method and they definitely are
> not
> > showstoppers and may be fixed later if someone interested.
> > Сoming back to AI 3.0 discussion, we have A LOT of features and it's
> almost
> > impossible (require much more time that feature's cost) to support them
> all.
> > I will be pretty happy in case someone will do this and provide help if
> > necessary!
> >
> > > > Perhaps, it would be a good idea to think about the recovery too
> >
> > Do you mean per partition check and recovery?
> > That's a good idea, but I found it's not easy to imagine API to for such
> > tool.
> > In case you ready to assist with proper API/design this will definitely
> > help.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> >
> > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин <
> slava.kopti...@gmail.com>
> > wrote:
> >
> > > Perhaps, it would be a good idea to think about the recovery tool/
> > > control-utility command that will allow achieving the same goal.
> > > If I am not mistaken it was already proposed in the email thread.
> > >
> > > Thanks,
> > > S.
> > >
> > > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <
> slava.kopti...@gmail.com>:
> > >
> > > > Hi Anton,
> > > >
> > > > Well, the ReadRepair feature is finally merged and that is good news
> :)
> > > >
> > > > Unfortunately, I cannot find a consensus about the whole
> functionality in
> > > > any of these topics:
> > > >  -
> > > >
> > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > >  -
> > > >
> > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > Also, there are no comments/discussion in JIRA. That makes me sad :(
> > > > especially when a feature is huge, not obvious and involves changing
> > >
> > > public
> > > > API (and that is the case, I think).
> > > >
> > > > Anyway, I left some comments in your pull-request at github. Please
> take
> > >
> > > a
> > > > look. The most annoying thing is poorly documented code :(
> > > > By the way, is it required to add test related to fail-over
> scenarios?
> > > >
> > > > I just checked, the IEP page and I still cannot find Jira 

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-11 Thread Nikolay Izhikov
Anton,

> - metrics are better for monitoring, but the Event is enough for my wish to 
> cover AI with consistency check,

Can you clarify, do you have plans to add metrics of RR events?

I think it should be count of incosistency events per cache(maybe per 
partition).


В Чт, 11/07/2019 в 09:53 +0300, Anton Vinogradov пишет:
> Slava,
> 
> Thanks for your review first!
> 
> > > Anyway, I left some comments in your pull-request at github. Please take
> 
> a
> > > look. The most annoying thing is poorly documented code :(
> 
> Since your comments mostly about Javadoc (does this mean that my solution
> is so great that you ask me only to fix Javadocs :) ?),
> I'd like to propose you to provide fixes as a PR since you have a vision of
> how it should be made. I'll review them and merge shortly.
> 
> > > By the way, is it required to add test related to fail-over scenarios?
> 
> The best check is to use RR at real code.
> For example, I'm injecting RR now to the test with concurrent modifications
> and restarts [1].
> 
> > > I just checked, the IEP page and I still cannot find Jira tickets that
> > > should cover existing limitations/improvements.
> > > I would suggest creating the following tasks, at least:
> 
> Mostly agree with you, but
> - MVCC is not production ready,
> - not sure near support really required,
> - metrics are better for monitoring, but the Event is enough for my wish to
> cover AI with consistency check,
> - do we really need Platforms and Thin Client support?
> Also, we should not produce stillborn issue.
> All limitations listed at proxy creation method and they definitely are not
> showstoppers and may be fixed later if someone interested.
> Сoming back to AI 3.0 discussion, we have A LOT of features and it's almost
> impossible (require much more time that feature's cost) to support them all.
> I will be pretty happy in case someone will do this and provide help if
> necessary!
> 
> > > Perhaps, it would be a good idea to think about the recovery too
> 
> Do you mean per partition check and recovery?
> That's a good idea, but I found it's not easy to imagine API to for such
> tool.
> In case you ready to assist with proper API/design this will definitely
> help.
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> 
> On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин 
> wrote:
> 
> > Perhaps, it would be a good idea to think about the recovery tool/
> > control-utility command that will allow achieving the same goal.
> > If I am not mistaken it was already proposed in the email thread.
> > 
> > Thanks,
> > S.
> > 
> > ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин :
> > 
> > > Hi Anton,
> > > 
> > > Well, the ReadRepair feature is finally merged and that is good news :)
> > > 
> > > Unfortunately, I cannot find a consensus about the whole functionality in
> > > any of these topics:
> > >  -
> > > 
> > 
> > http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > >  -
> > > 
> > 
> > http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > Also, there are no comments/discussion in JIRA. That makes me sad :(
> > > especially when a feature is huge, not obvious and involves changing
> > 
> > public
> > > API (and that is the case, I think).
> > > 
> > > Anyway, I left some comments in your pull-request at github. Please take
> > 
> > a
> > > look. The most annoying thing is poorly documented code :(
> > > By the way, is it required to add test related to fail-over scenarios?
> > > 
> > > I just checked, the IEP page and I still cannot find Jira tickets that
> > > should cover existing limitations/improvements.
> > > I would suggest creating the following tasks, at least:
> > >  - MVCC support
> > >  - Near caches
> > >  - Additional metrics (number of violations, number of repaired entries
> > > etc)
> > >  - Ignite C++ (It looks like, .Net is already aware of that feature)
> > >  - Thin clients support
> > >  - Perhaps, it would be useful to support different strategies to resolve
> > > inconsistencies
> > > 
> > > What do you think?
> > > 
> > > Thanks,
> > > S.
> > > 
> > > 
> > > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :
> > > 
> > > > Folks,
> > > > 
> > > > Th

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-11 Thread Anton Vinogradov
Slava,

Thanks for your review first!

>> Anyway, I left some comments in your pull-request at github. Please take
a
>> look. The most annoying thing is poorly documented code :(
Since your comments mostly about Javadoc (does this mean that my solution
is so great that you ask me only to fix Javadocs :) ?),
I'd like to propose you to provide fixes as a PR since you have a vision of
how it should be made. I'll review them and merge shortly.

>> By the way, is it required to add test related to fail-over scenarios?
The best check is to use RR at real code.
For example, I'm injecting RR now to the test with concurrent modifications
and restarts [1].

>> I just checked, the IEP page and I still cannot find Jira tickets that
>> should cover existing limitations/improvements.
>> I would suggest creating the following tasks, at least:
Mostly agree with you, but
- MVCC is not production ready,
- not sure near support really required,
- metrics are better for monitoring, but the Event is enough for my wish to
cover AI with consistency check,
- do we really need Platforms and Thin Client support?
Also, we should not produce stillborn issue.
All limitations listed at proxy creation method and they definitely are not
showstoppers and may be fixed later if someone interested.
Сoming back to AI 3.0 discussion, we have A LOT of features and it's almost
impossible (require much more time that feature's cost) to support them all.
I will be pretty happy in case someone will do this and provide help if
necessary!

>> Perhaps, it would be a good idea to think about the recovery too
Do you mean per partition check and recovery?
That's a good idea, but I found it's not easy to imagine API to for such
tool.
In case you ready to assist with proper API/design this will definitely
help.

[1] https://issues.apache.org/jira/browse/IGNITE-11973

On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин 
wrote:

> Perhaps, it would be a good idea to think about the recovery tool/
> control-utility command that will allow achieving the same goal.
> If I am not mistaken it was already proposed in the email thread.
>
> Thanks,
> S.
>
> ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин :
>
> > Hi Anton,
> >
> > Well, the ReadRepair feature is finally merged and that is good news :)
> >
> > Unfortunately, I cannot find a consensus about the whole functionality in
> > any of these topics:
> >  -
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> >  -
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > Also, there are no comments/discussion in JIRA. That makes me sad :(
> > especially when a feature is huge, not obvious and involves changing
> public
> > API (and that is the case, I think).
> >
> > Anyway, I left some comments in your pull-request at github. Please take
> a
> > look. The most annoying thing is poorly documented code :(
> > By the way, is it required to add test related to fail-over scenarios?
> >
> > I just checked, the IEP page and I still cannot find Jira tickets that
> > should cover existing limitations/improvements.
> > I would suggest creating the following tasks, at least:
> >  - MVCC support
> >  - Near caches
> >  - Additional metrics (number of violations, number of repaired entries
> > etc)
> >  - Ignite C++ (It looks like, .Net is already aware of that feature)
> >  - Thin clients support
> >  - Perhaps, it would be useful to support different strategies to resolve
> > inconsistencies
> >
> > What do you think?
> >
> > Thanks,
> > S.
> >
> >
> > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :
> >
> >> Folks,
> >>
> >> Thanks to everyone for tips and reviews.
> >> Yardstick checked, no performance drop found.
> >> Additional measurement: RR get() is just up to 7% slower than regular
> get
> >> on real benchmarks (8 clients, 4 servers, 3 backups)
> >> Code merged to the master.
> >> "Must have" tasks created and attached to the IEP.
> >>
> >> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov  wrote:
> >>
> >> > Folks,
> >> >
> >> > Just a minor update.
> >> >
> >> > RunAll [1] with enabled ReadRepair proxy is almost green now (~10
> tests
> >> > left, started with 6k :)).
> >> > During the analisys, I've found some tests with
> >> > - unexpected repairs at tx caches
> >> > - inconsistent state after the test finished (different entries across
> >> the
> 

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-10 Thread Вячеслав Коптилин
Perhaps, it would be a good idea to think about the recovery tool/
control-utility command that will allow achieving the same goal.
If I am not mistaken it was already proposed in the email thread.

Thanks,
S.

ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин :

> Hi Anton,
>
> Well, the ReadRepair feature is finally merged and that is good news :)
>
> Unfortunately, I cannot find a consensus about the whole functionality in
> any of these topics:
>  -
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
>  -
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> Also, there are no comments/discussion in JIRA. That makes me sad :(
> especially when a feature is huge, not obvious and involves changing public
> API (and that is the case, I think).
>
> Anyway, I left some comments in your pull-request at github. Please take a
> look. The most annoying thing is poorly documented code :(
> By the way, is it required to add test related to fail-over scenarios?
>
> I just checked, the IEP page and I still cannot find Jira tickets that
> should cover existing limitations/improvements.
> I would suggest creating the following tasks, at least:
>  - MVCC support
>  - Near caches
>  - Additional metrics (number of violations, number of repaired entries
> etc)
>  - Ignite C++ (It looks like, .Net is already aware of that feature)
>  - Thin clients support
>  - Perhaps, it would be useful to support different strategies to resolve
> inconsistencies
>
> What do you think?
>
> Thanks,
> S.
>
>
> ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :
>
>> Folks,
>>
>> Thanks to everyone for tips and reviews.
>> Yardstick checked, no performance drop found.
>> Additional measurement: RR get() is just up to 7% slower than regular get
>> on real benchmarks (8 clients, 4 servers, 3 backups)
>> Code merged to the master.
>> "Must have" tasks created and attached to the IEP.
>>
>> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov  wrote:
>>
>> > Folks,
>> >
>> > Just a minor update.
>> >
>> > RunAll [1] with enabled ReadRepair proxy is almost green now (~10 tests
>> > left, started with 6k :)).
>> > During the analisys, I've found some tests with
>> > - unexpected repairs at tx caches
>> > - inconsistent state after the test finished (different entries across
>> the
>> > topology)
>> > For example,
>> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration generates obsolete
>> > versions on backups in case of retry, fixed [2]
>> > - initial cache load generates not equal versions on backups, fixed [3]
>> > - testAccountTxNodeRestart causes unexpected repairs (entries have
>> > different versions), to be investigated.
>> >
>> > What's next?
>> >
>> > 1) Going to merge the solution once "RunAll with ReadRepair enabled"
>> > becomes fully green.
>> > 2) Going to add special check after each test which will ensure caches
>> > content after the test is consistent.
>> > 2.1) The Same check can (should?) be injected to
>> > awaitPartitionMapExchange() and similar methods.
>> > 3) Update Jepsen tests with RR checks.
>> > 4) Introduce per partition RR.
>> >
>> > So, the final goal is to be sure that Ignite produces only consistent
>> data
>> > and to have a feature to solve consistency in case we gain inconsistent
>> > state somehow.
>> >
>> > Limitations?
>> >
>> > Currently, RR has some limitations, but they are not related to real
>> > production cases.
>> > In case someone interested to support, for example, MVCC or near caches,
>> > please, feel free to contribute.
>> >
>> > [1]
>> >
>> https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll=pull/6575/head=Latest
>> > [2]
>> >
>> https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
>> > [3]
>> >
>> https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13
>> >
>> > On Fri, Jun 28, 2019 at 2:41 PM Anton Vinogradov  wrote:
>> >
>> >> Slava,
>> >>
>> >> >> I will take a look at your pull request if you don't mind.
>> >> Great news!
>> >>
>> >> >> In any way, could you please update the IEP page with the list of
>> >> >> const

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-10 Thread Вячеслав Коптилин
Hi Anton,

Well, the ReadRepair feature is finally merged and that is good news :)

Unfortunately, I cannot find a consensus about the whole functionality in
any of these topics:
 -
http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
 -
http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
Also, there are no comments/discussion in JIRA. That makes me sad :(
especially when a feature is huge, not obvious and involves changing public
API (and that is the case, I think).

Anyway, I left some comments in your pull-request at github. Please take a
look. The most annoying thing is poorly documented code :(
By the way, is it required to add test related to fail-over scenarios?

I just checked, the IEP page and I still cannot find Jira tickets that
should cover existing limitations/improvements.
I would suggest creating the following tasks, at least:
 - MVCC support
 - Near caches
 - Additional metrics (number of violations, number of repaired entries etc)
 - Ignite C++ (It looks like, .Net is already aware of that feature)
 - Thin clients support
 - Perhaps, it would be useful to support different strategies to resolve
inconsistencies

What do you think?

Thanks,
S.


ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov :

> Folks,
>
> Thanks to everyone for tips and reviews.
> Yardstick checked, no performance drop found.
> Additional measurement: RR get() is just up to 7% slower than regular get
> on real benchmarks (8 clients, 4 servers, 3 backups)
> Code merged to the master.
> "Must have" tasks created and attached to the IEP.
>
> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov  wrote:
>
> > Folks,
> >
> > Just a minor update.
> >
> > RunAll [1] with enabled ReadRepair proxy is almost green now (~10 tests
> > left, started with 6k :)).
> > During the analisys, I've found some tests with
> > - unexpected repairs at tx caches
> > - inconsistent state after the test finished (different entries across
> the
> > topology)
> > For example,
> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration generates obsolete
> > versions on backups in case of retry, fixed [2]
> > - initial cache load generates not equal versions on backups, fixed [3]
> > - testAccountTxNodeRestart causes unexpected repairs (entries have
> > different versions), to be investigated.
> >
> > What's next?
> >
> > 1) Going to merge the solution once "RunAll with ReadRepair enabled"
> > becomes fully green.
> > 2) Going to add special check after each test which will ensure caches
> > content after the test is consistent.
> > 2.1) The Same check can (should?) be injected to
> > awaitPartitionMapExchange() and similar methods.
> > 3) Update Jepsen tests with RR checks.
> > 4) Introduce per partition RR.
> >
> > So, the final goal is to be sure that Ignite produces only consistent
> data
> > and to have a feature to solve consistency in case we gain inconsistent
> > state somehow.
> >
> > Limitations?
> >
> > Currently, RR has some limitations, but they are not related to real
> > production cases.
> > In case someone interested to support, for example, MVCC or near caches,
> > please, feel free to contribute.
> >
> > [1]
> >
> https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll=pull/6575/head=Latest
> > [2]
> >
> https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
> > [3]
> >
> https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13
> >
> > On Fri, Jun 28, 2019 at 2:41 PM Anton Vinogradov  wrote:
> >
> >> Slava,
> >>
> >> >> I will take a look at your pull request if you don't mind.
> >> Great news!
> >>
> >> >> In any way, could you please update the IEP page with the list of
> >> >> constraints/limitations of the proposed approach, TODOs, etc?
> >> Not sure we should keep this at IEP until list (#4 from original letter)
> >> is not confirmed.
> >>
> >> Currently, I'm checking RunAll with RR enabled to almost each get
> request.
> >> "Almost" means: readRepair = !ctx.readThrough() &&
> >> ctx.config().getBackups() > 0 && !ctx.isNear() && !ctx.mvccEnabled()
> >> For now I have 60 failed tests and amount decreasing.
> >>
> >> >> For instance, I would like to see all these limitations on the IEP
> >> page as
> >> >> JIRA tickets. Perhaps, it would be good to cre

Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-10 Thread Anton Vinogradov
Folks,

Thanks to everyone for tips and reviews.
Yardstick checked, no performance drop found.
Additional measurement: RR get() is just up to 7% slower than regular get
on real benchmarks (8 clients, 4 servers, 3 backups)
Code merged to the master.
"Must have" tasks created and attached to the IEP.

On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov  wrote:

> Folks,
>
> Just a minor update.
>
> RunAll [1] with enabled ReadRepair proxy is almost green now (~10 tests
> left, started with 6k :)).
> During the analisys, I've found some tests with
> - unexpected repairs at tx caches
> - inconsistent state after the test finished (different entries across the
> topology)
> For example,
> - testInvokeAllAppliedOnceOnBinaryTypeRegistration generates obsolete
> versions on backups in case of retry, fixed [2]
> - initial cache load generates not equal versions on backups, fixed [3]
> - testAccountTxNodeRestart causes unexpected repairs (entries have
> different versions), to be investigated.
>
> What's next?
>
> 1) Going to merge the solution once "RunAll with ReadRepair enabled"
> becomes fully green.
> 2) Going to add special check after each test which will ensure caches
> content after the test is consistent.
> 2.1) The Same check can (should?) be injected to
> awaitPartitionMapExchange() and similar methods.
> 3) Update Jepsen tests with RR checks.
> 4) Introduce per partition RR.
>
> So, the final goal is to be sure that Ignite produces only consistent data
> and to have a feature to solve consistency in case we gain inconsistent
> state somehow.
>
> Limitations?
>
> Currently, RR has some limitations, but they are not related to real
> production cases.
> In case someone interested to support, for example, MVCC or near caches,
> please, feel free to contribute.
>
> [1]
> https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll=pull/6575/head=Latest
> [2]
> https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
> [3]
> https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13
>
> On Fri, Jun 28, 2019 at 2:41 PM Anton Vinogradov  wrote:
>
>> Slava,
>>
>> >> I will take a look at your pull request if you don't mind.
>> Great news!
>>
>> >> In any way, could you please update the IEP page with the list of
>> >> constraints/limitations of the proposed approach, TODOs, etc?
>> Not sure we should keep this at IEP until list (#4 from original letter)
>> is not confirmed.
>>
>> Currently, I'm checking RunAll with RR enabled to almost each get request.
>> "Almost" means: readRepair = !ctx.readThrough() &&
>> ctx.config().getBackups() > 0 && !ctx.isNear() && !ctx.mvccEnabled()
>> For now I have 60 failed tests and amount decreasing.
>>
>> >> For instance, I would like to see all these limitations on the IEP
>> page as
>> >> JIRA tickets. Perhaps, it would be good to create an epic/umbrella
>> ticket
>> >> in order to track all activities related to `Read Repair` feature.
>> Let's do this at merge day to be sure useless issues will not be created.
>>
>> On Fri, Jun 28, 2019 at 2:01 PM Вячеслав Коптилин <
>> slava.kopti...@gmail.com> wrote:
>>
>>> Hi Anton,
>>>
>>> I will take a look at your pull request if you don't mind.
>>>
>>> In any way, could you please update the IEP page with the list of
>>> constraints/limitations of the proposed approach, TODOs, etc?
>>> For instance, I would like to see all these limitations on the IEP page
>>> as
>>> JIRA tickets. Perhaps, it would be good to create an epic/umbrella ticket
>>> in order to track all activities related to `Read Repair` feature.
>>>
>>> Thanks,
>>> S.
>>>
>>> чт, 20 июн. 2019 г. в 14:15, Anton Vinogradov :
>>>
>>> > Igniters,
>>> > I'm glad to introduce Read Repair feature [0] provides additional
>>> > consistency guarantee for Ignite.
>>> >
>>> > 1) Why we need it?
>>> > The detailed explanation can be found at IEP-31 [1].
>>> > In short, because of bugs, it's possible to gain an inconsistent state.
>>> > We need additional features to handle this case.
>>> >
>>> > Currently we able to check cluster using Idle_verify [2] feature, but
>>> it
>>> > will not fix the data, will not even tell which entries are broken.
>>> > Read Repair is a feature to understand which entries are broken and to
>>> fix
>>> > them.
>>> >
>>> > 1) How it works?
>>> > IgniteCache now able to provide special proxy [3] withReadRepair().
>>> > This proxy guarantee that data will be gained from all owners and
>>> compared.
>>> > In the case of consistency violation situation, data will be recovered
>>> and
>>> > a special event recorded.
>>> >
>>> > 3) Naming?
>>> > Feature name based on Cassandra's Read Repair feature [4], which is
>>> pretty
>>> > similar.
>>> >
>>> > 4) Limitations which can be fixed in the future?
>>> >   * MVCC and Near caches are not supported.
>>> >   * Atomic caches can be checked (false positive case is possible on
>>> this
>>> > check), but can't be recovered.
>>> >   * 

[REVIEW REQUEST] IGNITE-11951 Improvements in JdkMarshaller

2019-07-09 Thread Павлухин Иван
Hi,

I made some small improvements in JdkMarshaller [1]. I will be happy
if someone reviews it. Changes are quite simple.

[1] https://issues.apache.org/jira/browse/IGNITE-11951

-- 
Best regards,
Ivan Pavlukhin


Re: Read Repair (ex. Consistency Check) - review request #2

2019-07-04 Thread Anton Vinogradov
Folks,

Just a minor update.

RunAll [1] with enabled ReadRepair proxy is almost green now (~10 tests
left, started with 6k :)).
During the analisys, I've found some tests with
- unexpected repairs at tx caches
- inconsistent state after the test finished (different entries across the
topology)
For example,
- testInvokeAllAppliedOnceOnBinaryTypeRegistration generates obsolete
versions on backups in case of retry, fixed [2]
- initial cache load generates not equal versions on backups, fixed [3]
- testAccountTxNodeRestart causes unexpected repairs (entries have
different versions), to be investigated.

What's next?

1) Going to merge the solution once "RunAll with ReadRepair enabled"
becomes fully green.
2) Going to add special check after each test which will ensure caches
content after the test is consistent.
2.1) The Same check can (should?) be injected to
awaitPartitionMapExchange() and similar methods.
3) Update Jepsen tests with RR checks.
4) Introduce per partition RR.

So, the final goal is to be sure that Ignite produces only consistent data
and to have a feature to solve consistency in case we gain inconsistent
state somehow.

Limitations?

Currently, RR has some limitations, but they are not related to real
production cases.
In case someone interested to support, for example, MVCC or near caches,
please, feel free to contribute.

[1]
https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll=pull/6575/head=Latest
[2]
https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
[3]
https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13

On Fri, Jun 28, 2019 at 2:41 PM Anton Vinogradov  wrote:

> Slava,
>
> >> I will take a look at your pull request if you don't mind.
> Great news!
>
> >> In any way, could you please update the IEP page with the list of
> >> constraints/limitations of the proposed approach, TODOs, etc?
> Not sure we should keep this at IEP until list (#4 from original letter)
> is not confirmed.
>
> Currently, I'm checking RunAll with RR enabled to almost each get request.
> "Almost" means: readRepair = !ctx.readThrough() &&
> ctx.config().getBackups() > 0 && !ctx.isNear() && !ctx.mvccEnabled()
> For now I have 60 failed tests and amount decreasing.
>
> >> For instance, I would like to see all these limitations on the IEP page
> as
> >> JIRA tickets. Perhaps, it would be good to create an epic/umbrella
> ticket
> >> in order to track all activities related to `Read Repair` feature.
> Let's do this at merge day to be sure useless issues will not be created.
>
> On Fri, Jun 28, 2019 at 2:01 PM Вячеслав Коптилин <
> slava.kopti...@gmail.com> wrote:
>
>> Hi Anton,
>>
>> I will take a look at your pull request if you don't mind.
>>
>> In any way, could you please update the IEP page with the list of
>> constraints/limitations of the proposed approach, TODOs, etc?
>> For instance, I would like to see all these limitations on the IEP page as
>> JIRA tickets. Perhaps, it would be good to create an epic/umbrella ticket
>> in order to track all activities related to `Read Repair` feature.
>>
>> Thanks,
>> S.
>>
>> чт, 20 июн. 2019 г. в 14:15, Anton Vinogradov :
>>
>> > Igniters,
>> > I'm glad to introduce Read Repair feature [0] provides additional
>> > consistency guarantee for Ignite.
>> >
>> > 1) Why we need it?
>> > The detailed explanation can be found at IEP-31 [1].
>> > In short, because of bugs, it's possible to gain an inconsistent state.
>> > We need additional features to handle this case.
>> >
>> > Currently we able to check cluster using Idle_verify [2] feature, but it
>> > will not fix the data, will not even tell which entries are broken.
>> > Read Repair is a feature to understand which entries are broken and to
>> fix
>> > them.
>> >
>> > 1) How it works?
>> > IgniteCache now able to provide special proxy [3] withReadRepair().
>> > This proxy guarantee that data will be gained from all owners and
>> compared.
>> > In the case of consistency violation situation, data will be recovered
>> and
>> > a special event recorded.
>> >
>> > 3) Naming?
>> > Feature name based on Cassandra's Read Repair feature [4], which is
>> pretty
>> > similar.
>> >
>> > 4) Limitations which can be fixed in the future?
>> >   * MVCC and Near caches are not supported.
>> >   * Atomic caches can be checked (false positive case is possible on
>> this
>> > check), but can't be recovered.
>> >   * Partial entry removal can't be recovered.
>> >   * Entries streamed using data streamer (using not a "cache.put" based
>> > updater) and loaded by cache.load
>> >   are perceived as inconsistent since they may have different versions
>> for
>> > same keys.
>> >   * Only explicit get operations are supported (getAndReplace,
>> getAndPut,
>> > etc can be supported in future).
>> >
>> > 5) What's left?
>> >   * SQL/ThinClient/etc support.
>> >   * Metrics (found/repaired).
>> >   * Simple per-partition recovery feature able to 

Re: Read Repair (ex. Consistency Check) - review request #2

2019-06-28 Thread Anton Vinogradov
Slava,

>> I will take a look at your pull request if you don't mind.
Great news!

>> In any way, could you please update the IEP page with the list of
>> constraints/limitations of the proposed approach, TODOs, etc?
Not sure we should keep this at IEP until list (#4 from original letter) is
not confirmed.

Currently, I'm checking RunAll with RR enabled to almost each get request.
"Almost" means: readRepair = !ctx.readThrough() &&
ctx.config().getBackups() > 0 && !ctx.isNear() && !ctx.mvccEnabled()
For now I have 60 failed tests and amount decreasing.

>> For instance, I would like to see all these limitations on the IEP page
as
>> JIRA tickets. Perhaps, it would be good to create an epic/umbrella ticket
>> in order to track all activities related to `Read Repair` feature.
Let's do this at merge day to be sure useless issues will not be created.

On Fri, Jun 28, 2019 at 2:01 PM Вячеслав Коптилин 
wrote:

> Hi Anton,
>
> I will take a look at your pull request if you don't mind.
>
> In any way, could you please update the IEP page with the list of
> constraints/limitations of the proposed approach, TODOs, etc?
> For instance, I would like to see all these limitations on the IEP page as
> JIRA tickets. Perhaps, it would be good to create an epic/umbrella ticket
> in order to track all activities related to `Read Repair` feature.
>
> Thanks,
> S.
>
> чт, 20 июн. 2019 г. в 14:15, Anton Vinogradov :
>
> > Igniters,
> > I'm glad to introduce Read Repair feature [0] provides additional
> > consistency guarantee for Ignite.
> >
> > 1) Why we need it?
> > The detailed explanation can be found at IEP-31 [1].
> > In short, because of bugs, it's possible to gain an inconsistent state.
> > We need additional features to handle this case.
> >
> > Currently we able to check cluster using Idle_verify [2] feature, but it
> > will not fix the data, will not even tell which entries are broken.
> > Read Repair is a feature to understand which entries are broken and to
> fix
> > them.
> >
> > 1) How it works?
> > IgniteCache now able to provide special proxy [3] withReadRepair().
> > This proxy guarantee that data will be gained from all owners and
> compared.
> > In the case of consistency violation situation, data will be recovered
> and
> > a special event recorded.
> >
> > 3) Naming?
> > Feature name based on Cassandra's Read Repair feature [4], which is
> pretty
> > similar.
> >
> > 4) Limitations which can be fixed in the future?
> >   * MVCC and Near caches are not supported.
> >   * Atomic caches can be checked (false positive case is possible on this
> > check), but can't be recovered.
> >   * Partial entry removal can't be recovered.
> >   * Entries streamed using data streamer (using not a "cache.put" based
> > updater) and loaded by cache.load
> >   are perceived as inconsistent since they may have different versions
> for
> > same keys.
> >   * Only explicit get operations are supported (getAndReplace, getAndPut,
> > etc can be supported in future).
> >
> > 5) What's left?
> >   * SQL/ThinClient/etc support.
> >   * Metrics (found/repaired).
> >   * Simple per-partition recovery feature able to work in the background
> in
> > addition to per-entry recovery feature.
> >
> > 6) Is code checked?
> >   * Pull Request #5656 [5] (feature) - has green TC.
> >   * Pull Request #6575 [6] (RunAll with the feature enabled for every
> get()
> > request) - has a limited amount of failures (because of data streamer,
> > cache.load, etc).
> >
> > Thoughts?
> >
> > [0] https://issues.apache.org/jira/browse/IGNITE-10663
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > [2]
> >
> >
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> > [3]
> >
> >
> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> > [4]
> >
> >
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> > [5] https://github.com/apache/ignite/pull/5656
> > [6] https://github.com/apache/ignite/pull/6575
> >
>


Re: Read Repair (ex. Consistency Check) - review request #2

2019-06-28 Thread Вячеслав Коптилин
Hi Anton,

I will take a look at your pull request if you don't mind.

In any way, could you please update the IEP page with the list of
constraints/limitations of the proposed approach, TODOs, etc?
For instance, I would like to see all these limitations on the IEP page as
JIRA tickets. Perhaps, it would be good to create an epic/umbrella ticket
in order to track all activities related to `Read Repair` feature.

Thanks,
S.

чт, 20 июн. 2019 г. в 14:15, Anton Vinogradov :

> Igniters,
> I'm glad to introduce Read Repair feature [0] provides additional
> consistency guarantee for Ignite.
>
> 1) Why we need it?
> The detailed explanation can be found at IEP-31 [1].
> In short, because of bugs, it's possible to gain an inconsistent state.
> We need additional features to handle this case.
>
> Currently we able to check cluster using Idle_verify [2] feature, but it
> will not fix the data, will not even tell which entries are broken.
> Read Repair is a feature to understand which entries are broken and to fix
> them.
>
> 1) How it works?
> IgniteCache now able to provide special proxy [3] withReadRepair().
> This proxy guarantee that data will be gained from all owners and compared.
> In the case of consistency violation situation, data will be recovered and
> a special event recorded.
>
> 3) Naming?
> Feature name based on Cassandra's Read Repair feature [4], which is pretty
> similar.
>
> 4) Limitations which can be fixed in the future?
>   * MVCC and Near caches are not supported.
>   * Atomic caches can be checked (false positive case is possible on this
> check), but can't be recovered.
>   * Partial entry removal can't be recovered.
>   * Entries streamed using data streamer (using not a "cache.put" based
> updater) and loaded by cache.load
>   are perceived as inconsistent since they may have different versions for
> same keys.
>   * Only explicit get operations are supported (getAndReplace, getAndPut,
> etc can be supported in future).
>
> 5) What's left?
>   * SQL/ThinClient/etc support.
>   * Metrics (found/repaired).
>   * Simple per-partition recovery feature able to work in the background in
> addition to per-entry recovery feature.
>
> 6) Is code checked?
>   * Pull Request #5656 [5] (feature) - has green TC.
>   * Pull Request #6575 [6] (RunAll with the feature enabled for every get()
> request) - has a limited amount of failures (because of data streamer,
> cache.load, etc).
>
> Thoughts?
>
> [0] https://issues.apache.org/jira/browse/IGNITE-10663
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> [2]
>
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> [3]
>
> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> [4]
>
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> [5] https://github.com/apache/ignite/pull/5656
> [6] https://github.com/apache/ignite/pull/6575
>


Re: Read Repair (ex. Consistency Check) - review request #2

2019-06-20 Thread Anton Vinogradov
Nikolay,

>> Should we consider it's removal in Ignite 3
Don't think so.

My initial ReadRepair implementation uses version to detect inconsistency.
Strategy can be changed later (most likely it will) or even provided
ability to use own strategy.
Data streamer's and cache.load's cases can be supported by simple values
check as the second phase of checks.
As I mentioned privately - all limitations are limitations for the initial
version and can be eliminated in the future.

Currently, the main goal is to merge the solution to continue the work.
* After the merge, it will be possible to split work on this task (metrics,
whole partitions fix, data streamers, ... etc).
* Seems, I found the consistency issue checking my feature (a backup can be
outdated for some time after tx finish in full_sync).
Merge will provide the ability to investigate bug deeper and solve if
confirmed.

On Thu, Jun 20, 2019 at 2:22 PM Nikolay Izhikov  wrote:

> Anton.
>
> I worried about this limitation:
>
> > Entries streamed using data streamer (using not a "cache.put" based
> updater) and loaded by cache.load.
>
> As we discussed privately in this modes
>
> *ALL ENTRIES ON ALL OWNERS WILL HAVE DIFFERENT VERSIONS*
>
> Why we need this modes, in the first place?
> Should we consider it's removal in Ignite 3 or should we fix them?
>
> В Чт, 20/06/2019 в 14:15 +0300, Anton Vinogradov пишет:
> > Igniters,
> > I'm glad to introduce Read Repair feature [0] provides additional
> > consistency guarantee for Ignite.
> >
> > 1) Why we need it?
> > The detailed explanation can be found at IEP-31 [1].
> > In short, because of bugs, it's possible to gain an inconsistent state.
> > We need additional features to handle this case.
> >
> > Currently we able to check cluster using Idle_verify [2] feature, but it
> > will not fix the data, will not even tell which entries are broken.
> > Read Repair is a feature to understand which entries are broken and to
> fix
> > them.
> >
> > 1) How it works?
> > IgniteCache now able to provide special proxy [3] withReadRepair().
> > This proxy guarantee that data will be gained from all owners and
> compared.
> > In the case of consistency violation situation, data will be recovered
> and
> > a special event recorded.
> >
> > 3) Naming?
> > Feature name based on Cassandra's Read Repair feature [4], which is
> pretty
> > similar.
> >
> > 4) Limitations which can be fixed in the future?
> >   * MVCC and Near caches are not supported.
> >   * Atomic caches can be checked (false positive case is possible on this
> > check), but can't be recovered.
> >   * Partial entry removal can't be recovered.
> >   * Entries streamed using data streamer (using not a "cache.put" based
> > updater) and loaded by cache.load
> >   are perceived as inconsistent since they may have different versions
> for
> > same keys.
> >   * Only explicit get operations are supported (getAndReplace, getAndPut,
> > etc can be supported in future).
> >
> > 5) What's left?
> >   * SQL/ThinClient/etc support.
> >   * Metrics (found/repaired).
> >   * Simple per-partition recovery feature able to work in the background
> in
> > addition to per-entry recovery feature.
> >
> > 6) Is code checked?
> >   * Pull Request #5656 [5] (feature) - has green TC.
> >   * Pull Request #6575 [6] (RunAll with the feature enabled for every
> get()
> > request) - has a limited amount of failures (because of data streamer,
> > cache.load, etc).
> >
> > Thoughts?
> >
> > [0] https://issues.apache.org/jira/browse/IGNITE-10663
> > [1]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > [2]
> >
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> > [3]
> >
> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> > [4]
> >
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> > [5] https://github.com/apache/ignite/pull/5656
> > [6] https://github.com/apache/ignite/pull/6575
>


Re: Read Repair (ex. Consistency Check) - review request #2

2019-06-20 Thread Nikolay Izhikov
Anton.

I worried about this limitation:

> Entries streamed using data streamer (using not a "cache.put" based updater) 
> and loaded by cache.load.

As we discussed privately in this modes 

*ALL ENTRIES ON ALL OWNERS WILL HAVE DIFFERENT VERSIONS*

Why we need this modes, in the first place?
Should we consider it's removal in Ignite 3 or should we fix them?

В Чт, 20/06/2019 в 14:15 +0300, Anton Vinogradov пишет:
> Igniters,
> I'm glad to introduce Read Repair feature [0] provides additional
> consistency guarantee for Ignite.
> 
> 1) Why we need it?
> The detailed explanation can be found at IEP-31 [1].
> In short, because of bugs, it's possible to gain an inconsistent state.
> We need additional features to handle this case.
> 
> Currently we able to check cluster using Idle_verify [2] feature, but it
> will not fix the data, will not even tell which entries are broken.
> Read Repair is a feature to understand which entries are broken and to fix
> them.
> 
> 1) How it works?
> IgniteCache now able to provide special proxy [3] withReadRepair().
> This proxy guarantee that data will be gained from all owners and compared.
> In the case of consistency violation situation, data will be recovered and
> a special event recorded.
> 
> 3) Naming?
> Feature name based on Cassandra's Read Repair feature [4], which is pretty
> similar.
> 
> 4) Limitations which can be fixed in the future?
>   * MVCC and Near caches are not supported.
>   * Atomic caches can be checked (false positive case is possible on this
> check), but can't be recovered.
>   * Partial entry removal can't be recovered.
>   * Entries streamed using data streamer (using not a "cache.put" based
> updater) and loaded by cache.load
>   are perceived as inconsistent since they may have different versions for
> same keys.
>   * Only explicit get operations are supported (getAndReplace, getAndPut,
> etc can be supported in future).
> 
> 5) What's left?
>   * SQL/ThinClient/etc support.
>   * Metrics (found/repaired).
>   * Simple per-partition recovery feature able to work in the background in
> addition to per-entry recovery feature.
> 
> 6) Is code checked?
>   * Pull Request #5656 [5] (feature) - has green TC.
>   * Pull Request #6575 [6] (RunAll with the feature enabled for every get()
> request) - has a limited amount of failures (because of data streamer,
> cache.load, etc).
> 
> Thoughts?
> 
> [0] https://issues.apache.org/jira/browse/IGNITE-10663
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> [2]
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> [3]
> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> [4]
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> [5] https://github.com/apache/ignite/pull/5656
> [6] https://github.com/apache/ignite/pull/6575


signature.asc
Description: This is a digitally signed message part


Read Repair (ex. Consistency Check) - review request #2

2019-06-20 Thread Anton Vinogradov
Igniters,
I'm glad to introduce Read Repair feature [0] provides additional
consistency guarantee for Ignite.

1) Why we need it?
The detailed explanation can be found at IEP-31 [1].
In short, because of bugs, it's possible to gain an inconsistent state.
We need additional features to handle this case.

Currently we able to check cluster using Idle_verify [2] feature, but it
will not fix the data, will not even tell which entries are broken.
Read Repair is a feature to understand which entries are broken and to fix
them.

1) How it works?
IgniteCache now able to provide special proxy [3] withReadRepair().
This proxy guarantee that data will be gained from all owners and compared.
In the case of consistency violation situation, data will be recovered and
a special event recorded.

3) Naming?
Feature name based on Cassandra's Read Repair feature [4], which is pretty
similar.

4) Limitations which can be fixed in the future?
  * MVCC and Near caches are not supported.
  * Atomic caches can be checked (false positive case is possible on this
check), but can't be recovered.
  * Partial entry removal can't be recovered.
  * Entries streamed using data streamer (using not a "cache.put" based
updater) and loaded by cache.load
  are perceived as inconsistent since they may have different versions for
same keys.
  * Only explicit get operations are supported (getAndReplace, getAndPut,
etc can be supported in future).

5) What's left?
  * SQL/ThinClient/etc support.
  * Metrics (found/repaired).
  * Simple per-partition recovery feature able to work in the background in
addition to per-entry recovery feature.

6) Is code checked?
  * Pull Request #5656 [5] (feature) - has green TC.
  * Pull Request #6575 [6] (RunAll with the feature enabled for every get()
request) - has a limited amount of failures (because of data streamer,
cache.load, etc).

Thoughts?

[0] https://issues.apache.org/jira/browse/IGNITE-10663
[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
[2]
https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
[3]
https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
[4]
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
[5] https://github.com/apache/ignite/pull/5656
[6] https://github.com/apache/ignite/pull/6575


Re: Consistency check and fix (review request)

2019-05-15 Thread Anton Vinogradov
Ivan,

1) Currently, we have idle_verify [1] feature allows us to check
consistency guarantee is still respected.
But, idle_verify has a big constraint, the cluster should be at idle mode
(no load).
This feature, actually, will do almost the same but allows you to have any
load.

2) Why we need this? Because of bugs.
For example, currently, we have issue [2] with consistency violation on
node fail under load.
The issue will be fixed, definitely, but, is there any guarantee we'll not
face with another?
This feature is a simple way allows you to check (and fix) consistency in
case you want such additional check.
Just an additional failover. Trust but check and recheck :)

3) Use cases?
There are two approaches:
- you can permanently rescan (in a cyclic way) all the entries you have,
using this feature, to make sure they are not broken (background case)
- you can use this feature on each or *-th request (foreground case)

[1]
https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
[2] https://issues.apache.org/jira/browse/IGNITE-10078

On Thu, May 9, 2019 at 9:09 AM Ivan Pavlukhina  wrote:

> Hi Anton,
>
> Meanwhile can we extend a feature description from a user point of view?
> It would be good to provide some use cases when it can used.
>
> The thing that is yet understood by me is a conflict resolving. E.g. in
> systems inspired by Dynamo (sorry that no references, writing from phone)
> inconsistency is an expected system behavior and users are aware of it and
> choose reconciliation strategy accordingly. But in our case inconsistency
> is an exceptional case. And it is hard for me to imagine a case when we can
> resolve conflicts automatically while expecting no conflicts. Can you help
> me with it?
>
> Sent from my iPhone
>
> > On 25 Apr 2019, at 16:25, Anton Vinogradov  wrote:
> >
> > Folks,
> >
> > Just an update.
> > According to all your tips I decided to refactor API, logic, and approach
> > (mostly everything :)),
> > so, currently refactoring is in progress and you may see inconsistent PR
> > state.
> >
> > Thanks to everyone involved for your tips, review and etc.
> > I'll provide a proper presentation once refactoring will be finished.
> >
> >> On Tue, Apr 16, 2019 at 2:20 PM Anton Vinogradov  wrote:
> >>
> >> Nikolay, that was the first approach
> >>
>  I think we should allow to the administrator to enable/disable
> >> Consistency check.
> >> In that case, we have to introduce cluster-wide change-strategy
> operation,
> >> since every client node should be aware of the change.
> >> Also, we have to specify caches list, and for each - should we check
> each
> >> request or only 5-th and so on.
> >> Procedure and configuration become overcomplicated in this case.
> >>
> >> My idea that specific service will be able to use a special proxy
> >> according to its own strategy
> >> (eg. when administrator inside the building and boss is sleeping - all
> >> operations on "cache[a,b,c]ed*" should check the consistency).
> >> All service clients will have the same guarantees in that case.
> >>
> >> So in other words, consistency should be guaranteed by service, not by
> >> Ignite.
> >> Service should guarantee consistency not only using new proxy but, for
> >> example, using correct isolation fo txs.
> >> That's not a good Idea to specify isolation mode for Ignite, same
> >> situation with get-with-consistency-check.
> >>
> >> On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
> >> wrote:
> >>
> >>> Hello, Anton.
> >>>
>  Customer should be able to change strategy on the fly according to
> >>> time> periods or load.
> >>>
> >>> I think we should allow to administrator to enable/disable Consistency
> >>> check.
> >>> This option shouldn't be related to application code because
> "Consistency
> >>> check" is some kind of maintance procedure.
> >>>
> >>> What do you think?
> >>>
> >>> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
>  Andrey, thanks for tips
> 
> >> You can perform consistency check using idle verify utility.
> 
>  Could you please point to utility's page?
>  According to its name, it requires to stop the cluster to perform the
> >>> check?
>  That's impossible at real production when you should have downtime
> less
>  that some minutes per year.
>  So, the only case I see is to use online check during periods of
> >>> moderate
>  activity.
> 
> >> Recovery tool is good idea
> 
>  This tool is a part of my IEP.
>  But recovery tool (process)
>  - will allow you to check entries in memory only (otherwise, you will
> >>> warm
>  up the cluster incorrectly), and that's a problem when you have
>  persisted/in_memory rate > 10:1
>  - will cause latency drop for some (eg. 90+ percentile) requests,
> which
> >>> is
>  not acceptable for real production, when we have strict SLA.
>  - will not guarantee that each operation will use consistent data,
>  

Re: Consistency check and fix (review request)

2019-05-09 Thread Ivan Pavlukhina
Hi Anton,

Meanwhile can we extend a feature description from a user point of view?  It 
would be good to provide some use cases when it can used.

The thing that is yet understood by me is a conflict resolving. E.g. in systems 
inspired by Dynamo (sorry that no references, writing from phone) inconsistency 
is an expected system behavior and users are aware of it and choose 
reconciliation strategy accordingly. But in our case inconsistency is an 
exceptional case. And it is hard for me to imagine a case when we can resolve 
conflicts automatically while expecting no conflicts. Can you help me with it?

Sent from my iPhone

> On 25 Apr 2019, at 16:25, Anton Vinogradov  wrote:
> 
> Folks,
> 
> Just an update.
> According to all your tips I decided to refactor API, logic, and approach
> (mostly everything :)),
> so, currently refactoring is in progress and you may see inconsistent PR
> state.
> 
> Thanks to everyone involved for your tips, review and etc.
> I'll provide a proper presentation once refactoring will be finished.
> 
>> On Tue, Apr 16, 2019 at 2:20 PM Anton Vinogradov  wrote:
>> 
>> Nikolay, that was the first approach
>> 
 I think we should allow to the administrator to enable/disable
>> Consistency check.
>> In that case, we have to introduce cluster-wide change-strategy operation,
>> since every client node should be aware of the change.
>> Also, we have to specify caches list, and for each - should we check each
>> request or only 5-th and so on.
>> Procedure and configuration become overcomplicated in this case.
>> 
>> My idea that specific service will be able to use a special proxy
>> according to its own strategy
>> (eg. when administrator inside the building and boss is sleeping - all
>> operations on "cache[a,b,c]ed*" should check the consistency).
>> All service clients will have the same guarantees in that case.
>> 
>> So in other words, consistency should be guaranteed by service, not by
>> Ignite.
>> Service should guarantee consistency not only using new proxy but, for
>> example, using correct isolation fo txs.
>> That's not a good Idea to specify isolation mode for Ignite, same
>> situation with get-with-consistency-check.
>> 
>> On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
>> wrote:
>> 
>>> Hello, Anton.
>>> 
 Customer should be able to change strategy on the fly according to
>>> time> periods or load.
>>> 
>>> I think we should allow to administrator to enable/disable Consistency
>>> check.
>>> This option shouldn't be related to application code because "Consistency
>>> check" is some kind of maintance procedure.
>>> 
>>> What do you think?
>>> 
>>> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
 Andrey, thanks for tips
 
>> You can perform consistency check using idle verify utility.
 
 Could you please point to utility's page?
 According to its name, it requires to stop the cluster to perform the
>>> check?
 That's impossible at real production when you should have downtime less
 that some minutes per year.
 So, the only case I see is to use online check during periods of
>>> moderate
 activity.
 
>> Recovery tool is good idea
 
 This tool is a part of my IEP.
 But recovery tool (process)
 - will allow you to check entries in memory only (otherwise, you will
>>> warm
 up the cluster incorrectly), and that's a problem when you have
 persisted/in_memory rate > 10:1
 - will cause latency drop for some (eg. 90+ percentile) requests, which
>>> is
 not acceptable for real production, when we have strict SLA.
 - will not guarantee that each operation will use consistent data,
 sometimes it's extremely essential
 so, the process is a cool idea, but, sometime you may need more.
 
 Ivan, thanks for analysis
 
>> why it comes as an on-demand enabled proxy but not as a mode
>>> enabled by
 
 some configuration property
 It's a bad idea to have this feature permanently enabled, it slows down
>>> the
 system by design.
 Customer should be able to change strategy on the fly according to time
 periods or load.
 Also, we're going to use this proxy for odd requests or for every 5-th,
 10-th, 100-th request depends on the load/time/SLA/etc.
 The goal is to perform as much as possible gets-with-consistency
>>> operations
 without stopping the cluster and never find a problem :)
 
>> As for me it will be great if we can improve consistency guarantees
 
 provided by default.
 Once you checked backups you decreased throughput and increased latency.
 This feature requred only for some financial, nuclear, health systems
>>> when
 you should be additionally sure about consistency.
 It's like a
 - read from backups
 - data modification outside the transaction
 - using FULL_ASYNC instead of FULL_SYNC,
 sometimes it's possible, sometimes not.
 
>> 1. It sounds suspicious that 

Re: Consistency check and fix (review request)

2019-04-25 Thread Anton Vinogradov
Folks,

Just an update.
According to all your tips I decided to refactor API, logic, and approach
(mostly everything :)),
so, currently refactoring is in progress and you may see inconsistent PR
state.

Thanks to everyone involved for your tips, review and etc.
I'll provide a proper presentation once refactoring will be finished.

On Tue, Apr 16, 2019 at 2:20 PM Anton Vinogradov  wrote:

> Nikolay, that was the first approach
>
> >> I think we should allow to the administrator to enable/disable
> Consistency check.
> In that case, we have to introduce cluster-wide change-strategy operation,
> since every client node should be aware of the change.
> Also, we have to specify caches list, and for each - should we check each
> request or only 5-th and so on.
> Procedure and configuration become overcomplicated in this case.
>
> My idea that specific service will be able to use a special proxy
> according to its own strategy
> (eg. when administrator inside the building and boss is sleeping - all
> operations on "cache[a,b,c]ed*" should check the consistency).
> All service clients will have the same guarantees in that case.
>
> So in other words, consistency should be guaranteed by service, not by
> Ignite.
> Service should guarantee consistency not only using new proxy but, for
> example, using correct isolation fo txs.
> That's not a good Idea to specify isolation mode for Ignite, same
> situation with get-with-consistency-check.
>
> On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
> wrote:
>
>> Hello, Anton.
>>
>> > Customer should be able to change strategy on the fly according to
>> time> periods or load.
>>
>> I think we should allow to administrator to enable/disable Consistency
>> check.
>> This option shouldn't be related to application code because "Consistency
>> check" is some kind of maintance procedure.
>>
>> What do you think?
>>
>> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
>> > Andrey, thanks for tips
>> >
>> > > > You can perform consistency check using idle verify utility.
>> >
>> > Could you please point to utility's page?
>> > According to its name, it requires to stop the cluster to perform the
>> check?
>> > That's impossible at real production when you should have downtime less
>> > that some minutes per year.
>> > So, the only case I see is to use online check during periods of
>> moderate
>> > activity.
>> >
>> > > > Recovery tool is good idea
>> >
>> > This tool is a part of my IEP.
>> > But recovery tool (process)
>> > - will allow you to check entries in memory only (otherwise, you will
>> warm
>> > up the cluster incorrectly), and that's a problem when you have
>> > persisted/in_memory rate > 10:1
>> > - will cause latency drop for some (eg. 90+ percentile) requests, which
>> is
>> > not acceptable for real production, when we have strict SLA.
>> > - will not guarantee that each operation will use consistent data,
>> > sometimes it's extremely essential
>> > so, the process is a cool idea, but, sometime you may need more.
>> >
>> > Ivan, thanks for analysis
>> >
>> > > > why it comes as an on-demand enabled proxy but not as a mode
>> enabled by
>> >
>> > some configuration property
>> > It's a bad idea to have this feature permanently enabled, it slows down
>> the
>> > system by design.
>> > Customer should be able to change strategy on the fly according to time
>> > periods or load.
>> > Also, we're going to use this proxy for odd requests or for every 5-th,
>> > 10-th, 100-th request depends on the load/time/SLA/etc.
>> > The goal is to perform as much as possible gets-with-consistency
>> operations
>> > without stopping the cluster and never find a problem :)
>> >
>> > > > As for me it will be great if we can improve consistency guarantees
>> >
>> > provided by default.
>> > Once you checked backups you decreased throughput and increased latency.
>> > This feature requred only for some financial, nuclear, health systems
>> when
>> > you should be additionally sure about consistency.
>> > It's like a
>> > - read from backups
>> > - data modification outside the transaction
>> > - using FULL_ASYNC instead of FULL_SYNC,
>> > sometimes it's possible, sometimes not.
>> >
>> > > > 1. It sounds suspicious that reads can cause writes (unexpected
>> >
>> > deadlocks might be possible).
>> > Code performs writes
>> > - key per additional transaction in case original tx was OPTIMISTIC ||
>> > READ_COMMITTED,
>> > - all keys per same tx in case original tx was PESSIMISTIC &&
>> > !READ_COMMITTED, since you already obtain the locks,
>> > so, deadlock should be impossible.
>> >
>> > > > 2. I do not believe that it is possible to implement a (bugless?)
>> >
>> > feature which will fix other bugs.
>> > It does not fix the bugs, it looks for inconsistency (no matter how it
>> > happened) and reports using events (previous state and how it was
>> fixed).
>> > This allows continuing processing for all the entries, even
>> inconsistent.
>> > But, each such fix should be rechecked 

Re: Consistency check and fix (review request)

2019-04-16 Thread Anton Vinogradov
Nikolay, that was the first approach

>> I think we should allow to the administrator to enable/disable
Consistency check.
In that case, we have to introduce cluster-wide change-strategy operation,
since every client node should be aware of the change.
Also, we have to specify caches list, and for each - should we check each
request or only 5-th and so on.
Procedure and configuration become overcomplicated in this case.

My idea that specific service will be able to use a special proxy according
to its own strategy
(eg. when administrator inside the building and boss is sleeping - all
operations on "cache[a,b,c]ed*" should check the consistency).
All service clients will have the same guarantees in that case.

So in other words, consistency should be guaranteed by service, not by
Ignite.
Service should guarantee consistency not only using new proxy but, for
example, using correct isolation fo txs.
That's not a good Idea to specify isolation mode for Ignite, same situation
with get-with-consistency-check.

On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
wrote:

> Hello, Anton.
>
> > Customer should be able to change strategy on the fly according to time>
> periods or load.
>
> I think we should allow to administrator to enable/disable Consistency
> check.
> This option shouldn't be related to application code because "Consistency
> check" is some kind of maintance procedure.
>
> What do you think?
>
> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
> > Andrey, thanks for tips
> >
> > > > You can perform consistency check using idle verify utility.
> >
> > Could you please point to utility's page?
> > According to its name, it requires to stop the cluster to perform the
> check?
> > That's impossible at real production when you should have downtime less
> > that some minutes per year.
> > So, the only case I see is to use online check during periods of moderate
> > activity.
> >
> > > > Recovery tool is good idea
> >
> > This tool is a part of my IEP.
> > But recovery tool (process)
> > - will allow you to check entries in memory only (otherwise, you will
> warm
> > up the cluster incorrectly), and that's a problem when you have
> > persisted/in_memory rate > 10:1
> > - will cause latency drop for some (eg. 90+ percentile) requests, which
> is
> > not acceptable for real production, when we have strict SLA.
> > - will not guarantee that each operation will use consistent data,
> > sometimes it's extremely essential
> > so, the process is a cool idea, but, sometime you may need more.
> >
> > Ivan, thanks for analysis
> >
> > > > why it comes as an on-demand enabled proxy but not as a mode enabled
> by
> >
> > some configuration property
> > It's a bad idea to have this feature permanently enabled, it slows down
> the
> > system by design.
> > Customer should be able to change strategy on the fly according to time
> > periods or load.
> > Also, we're going to use this proxy for odd requests or for every 5-th,
> > 10-th, 100-th request depends on the load/time/SLA/etc.
> > The goal is to perform as much as possible gets-with-consistency
> operations
> > without stopping the cluster and never find a problem :)
> >
> > > > As for me it will be great if we can improve consistency guarantees
> >
> > provided by default.
> > Once you checked backups you decreased throughput and increased latency.
> > This feature requred only for some financial, nuclear, health systems
> when
> > you should be additionally sure about consistency.
> > It's like a
> > - read from backups
> > - data modification outside the transaction
> > - using FULL_ASYNC instead of FULL_SYNC,
> > sometimes it's possible, sometimes not.
> >
> > > > 1. It sounds suspicious that reads can cause writes (unexpected
> >
> > deadlocks might be possible).
> > Code performs writes
> > - key per additional transaction in case original tx was OPTIMISTIC ||
> > READ_COMMITTED,
> > - all keys per same tx in case original tx was PESSIMISTIC &&
> > !READ_COMMITTED, since you already obtain the locks,
> > so, deadlock should be impossible.
> >
> > > > 2. I do not believe that it is possible to implement a (bugless?)
> >
> > feature which will fix other bugs.
> > It does not fix the bugs, it looks for inconsistency (no matter how it
> > happened) and reports using events (previous state and how it was fixed).
> > This allows continuing processing for all the entries, even inconsistent.
> > But, each such fix should be rechecked manually, for sure.
> >
> > On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван 
> wrote:
> >
> > > Anton,
> > >
> > > Thank you for your effort for improving consistency guarantees
> > > provided by Ignite.
> > >
> > > The subject sounds really vital. Could you please elaborate why it
> > > comes as an on-demand enabled proxy but not as a mode enabled by some
> > > configuration property (or even as a default behavior)? How do you see
> > > the future development of such consistency checks? As for me it will
> > > be great if we can 

Re: Consistency check and fix (review request)

2019-04-16 Thread Nikolay Izhikov
Hello, Anton.

> Customer should be able to change strategy on the fly according to time> 
> periods or load.

I think we should allow to administrator to enable/disable Consistency check.
This option shouldn't be related to application code because "Consistency 
check" is some kind of maintance procedure.

What do you think?

В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
> Andrey, thanks for tips
> 
> > > You can perform consistency check using idle verify utility.
> 
> Could you please point to utility's page?
> According to its name, it requires to stop the cluster to perform the check?
> That's impossible at real production when you should have downtime less
> that some minutes per year.
> So, the only case I see is to use online check during periods of moderate
> activity.
> 
> > > Recovery tool is good idea
> 
> This tool is a part of my IEP.
> But recovery tool (process)
> - will allow you to check entries in memory only (otherwise, you will warm
> up the cluster incorrectly), and that's a problem when you have
> persisted/in_memory rate > 10:1
> - will cause latency drop for some (eg. 90+ percentile) requests, which is
> not acceptable for real production, when we have strict SLA.
> - will not guarantee that each operation will use consistent data,
> sometimes it's extremely essential
> so, the process is a cool idea, but, sometime you may need more.
> 
> Ivan, thanks for analysis
> 
> > > why it comes as an on-demand enabled proxy but not as a mode enabled by
> 
> some configuration property
> It's a bad idea to have this feature permanently enabled, it slows down the
> system by design.
> Customer should be able to change strategy on the fly according to time
> periods or load.
> Also, we're going to use this proxy for odd requests or for every 5-th,
> 10-th, 100-th request depends on the load/time/SLA/etc.
> The goal is to perform as much as possible gets-with-consistency operations
> without stopping the cluster and never find a problem :)
> 
> > > As for me it will be great if we can improve consistency guarantees
> 
> provided by default.
> Once you checked backups you decreased throughput and increased latency.
> This feature requred only for some financial, nuclear, health systems when
> you should be additionally sure about consistency.
> It's like a
> - read from backups
> - data modification outside the transaction
> - using FULL_ASYNC instead of FULL_SYNC,
> sometimes it's possible, sometimes not.
> 
> > > 1. It sounds suspicious that reads can cause writes (unexpected
> 
> deadlocks might be possible).
> Code performs writes
> - key per additional transaction in case original tx was OPTIMISTIC ||
> READ_COMMITTED,
> - all keys per same tx in case original tx was PESSIMISTIC &&
> !READ_COMMITTED, since you already obtain the locks,
> so, deadlock should be impossible.
> 
> > > 2. I do not believe that it is possible to implement a (bugless?)
> 
> feature which will fix other bugs.
> It does not fix the bugs, it looks for inconsistency (no matter how it
> happened) and reports using events (previous state and how it was fixed).
> This allows continuing processing for all the entries, even inconsistent.
> But, each such fix should be rechecked manually, for sure.
> 
> On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван  wrote:
> 
> > Anton,
> > 
> > Thank you for your effort for improving consistency guarantees
> > provided by Ignite.
> > 
> > The subject sounds really vital. Could you please elaborate why it
> > comes as an on-demand enabled proxy but not as a mode enabled by some
> > configuration property (or even as a default behavior)? How do you see
> > the future development of such consistency checks? As for me it will
> > be great if we can improve consistency guarantees provided by default.
> > 
> > Also thinking loud a bit:
> > 1. It sounds suspicious that reads can cause writes (unexpected
> > deadlocks might be possible).
> > 2. I do not believe that it is possible to implement a (bugless?)
> > feature which will fix other bugs.
> > 3. A storage (or database) product (Ignite in our case) consistency is
> > not equal to a user application consistency. So, it might be that
> > introduced checks are insufficient to make business applications
> > happy.
> > 
> > пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
> > > 
> > > Anton,
> > > 
> > > I'm trying tell you that this proxy can produce false positive result,
> > > incorrect result and just hide bugs. What will the next solution?
> > > withNoBugs proxy?
> > > 
> > > You can perform consistency check using idle verify utility. Recovery
> > > tool is good idea but user should trigger this process, not some cache
> > > proxy implementation.
> > > 
> > > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> > > > 
> > > > Seems, we already fixed all bugs caused this feature, but there is no
> > > > warranty we will not create new :)
> > > > This proxy is just checker that consistency is ok.
> > > > 
> > > > > > reaching 

Re: Consistency check and fix (review request)

2019-04-16 Thread Anton Vinogradov
Andrey, thanks for tips

>> You can perform consistency check using idle verify utility.
Could you please point to utility's page?
According to its name, it requires to stop the cluster to perform the check?
That's impossible at real production when you should have downtime less
that some minutes per year.
So, the only case I see is to use online check during periods of moderate
activity.

>> Recovery tool is good idea
This tool is a part of my IEP.
But recovery tool (process)
- will allow you to check entries in memory only (otherwise, you will warm
up the cluster incorrectly), and that's a problem when you have
persisted/in_memory rate > 10:1
- will cause latency drop for some (eg. 90+ percentile) requests, which is
not acceptable for real production, when we have strict SLA.
- will not guarantee that each operation will use consistent data,
sometimes it's extremely essential
so, the process is a cool idea, but, sometime you may need more.

Ivan, thanks for analysis

>> why it comes as an on-demand enabled proxy but not as a mode enabled by
some configuration property
It's a bad idea to have this feature permanently enabled, it slows down the
system by design.
Customer should be able to change strategy on the fly according to time
periods or load.
Also, we're going to use this proxy for odd requests or for every 5-th,
10-th, 100-th request depends on the load/time/SLA/etc.
The goal is to perform as much as possible gets-with-consistency operations
without stopping the cluster and never find a problem :)

>> As for me it will be great if we can improve consistency guarantees
provided by default.
Once you checked backups you decreased throughput and increased latency.
This feature requred only for some financial, nuclear, health systems when
you should be additionally sure about consistency.
It's like a
- read from backups
- data modification outside the transaction
- using FULL_ASYNC instead of FULL_SYNC,
sometimes it's possible, sometimes not.

>> 1. It sounds suspicious that reads can cause writes (unexpected
deadlocks might be possible).
Code performs writes
- key per additional transaction in case original tx was OPTIMISTIC ||
READ_COMMITTED,
- all keys per same tx in case original tx was PESSIMISTIC &&
!READ_COMMITTED, since you already obtain the locks,
so, deadlock should be impossible.

>> 2. I do not believe that it is possible to implement a (bugless?)
feature which will fix other bugs.
It does not fix the bugs, it looks for inconsistency (no matter how it
happened) and reports using events (previous state and how it was fixed).
This allows continuing processing for all the entries, even inconsistent.
But, each such fix should be rechecked manually, for sure.

On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван  wrote:

> Anton,
>
> Thank you for your effort for improving consistency guarantees
> provided by Ignite.
>
> The subject sounds really vital. Could you please elaborate why it
> comes as an on-demand enabled proxy but not as a mode enabled by some
> configuration property (or even as a default behavior)? How do you see
> the future development of such consistency checks? As for me it will
> be great if we can improve consistency guarantees provided by default.
>
> Also thinking loud a bit:
> 1. It sounds suspicious that reads can cause writes (unexpected
> deadlocks might be possible).
> 2. I do not believe that it is possible to implement a (bugless?)
> feature which will fix other bugs.
> 3. A storage (or database) product (Ignite in our case) consistency is
> not equal to a user application consistency. So, it might be that
> introduced checks are insufficient to make business applications
> happy.
>
> пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
> >
> > Anton,
> >
> > I'm trying tell you that this proxy can produce false positive result,
> > incorrect result and just hide bugs. What will the next solution?
> > withNoBugs proxy?
> >
> > You can perform consistency check using idle verify utility. Recovery
> > tool is good idea but user should trigger this process, not some cache
> > proxy implementation.
> >
> > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> > >
> > > Seems, we already fixed all bugs caused this feature, but there is no
> > > warranty we will not create new :)
> > > This proxy is just checker that consistency is ok.
> > >
> > > >> reaching bugless implementation
> > > Not sure it's possible. Once you have software it contains bugs.
> > > This proxy will tell you whether these bugs lead to inconsistency.
> > >
> > > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
> > >
> > > > Method name is minor problem. I still believe that there is no need
> > > > for this proxy because there are no any guarantees about bugless
> > > > implementation this functionality. Better way is reaching bugless
> > > > implementation of current functionality.
> > > >
> > > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov 
> wrote:
> > > > >
> > > > > Andrey,
> > > > >
> > > > > 

Re: Consistency check and fix (review request)

2019-04-16 Thread Павлухин Иван
Anton,

Thank you for your effort for improving consistency guarantees
provided by Ignite.

The subject sounds really vital. Could you please elaborate why it
comes as an on-demand enabled proxy but not as a mode enabled by some
configuration property (or even as a default behavior)? How do you see
the future development of such consistency checks? As for me it will
be great if we can improve consistency guarantees provided by default.

Also thinking loud a bit:
1. It sounds suspicious that reads can cause writes (unexpected
deadlocks might be possible).
2. I do not believe that it is possible to implement a (bugless?)
feature which will fix other bugs.
3. A storage (or database) product (Ignite in our case) consistency is
not equal to a user application consistency. So, it might be that
introduced checks are insufficient to make business applications
happy.

пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
>
> Anton,
>
> I'm trying tell you that this proxy can produce false positive result,
> incorrect result and just hide bugs. What will the next solution?
> withNoBugs proxy?
>
> You can perform consistency check using idle verify utility. Recovery
> tool is good idea but user should trigger this process, not some cache
> proxy implementation.
>
> On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> >
> > Seems, we already fixed all bugs caused this feature, but there is no
> > warranty we will not create new :)
> > This proxy is just checker that consistency is ok.
> >
> > >> reaching bugless implementation
> > Not sure it's possible. Once you have software it contains bugs.
> > This proxy will tell you whether these bugs lead to inconsistency.
> >
> > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
> >
> > > Method name is minor problem. I still believe that there is no need
> > > for this proxy because there are no any guarantees about bugless
> > > implementation this functionality. Better way is reaching bugless
> > > implementation of current functionality.
> > >
> > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> > > >
> > > > Andrey,
> > > >
> > > > >> It means also that at least method name is bad.
> > > > Agreed, already discussed with Aleksey Plekhanov.
> > > > Decided that ".withConsistencyCheck()" is a proper name.
> > > >
> > > > >> What is the profit?
> > > > This proxy allows to check (and fix) is there any consistency violation
> > > > across the topology.
> > > > The proxy will check all backups contain the same values as primary.
> > > > So, when it's possible (you're ready to spend resources for this check)
> > > you
> > > > will be able to read-with-consistency-check.
> > > > This will decrease the amount of "inconsistency caused
> > > > war/strikes/devastation" situations, which is important for financial
> > > > systems.
> > > >
> > > > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> > > >
> > > > > Anton,
> > > > >
> > > > > what does expression "withConsistency" mean? From user's standpoint it
> > > > > means that all operations performed without this proxy are not
> > > > > consistent. It means also that at least method name is bad.
> > > > >
> > > > > Are there any guarantees that withConsistency proxy will not contain
> > > > > bugs that will lead to inconsistent write after inconsistency was
> > > > > found? I think there are no such guarantees. Bugs still are possible.
> > > > > So I always must use withConsistency proxy because I doesn't have
> > > > > other choice - all ways are unreliable and withConsistency just sounds
> > > > > better.
> > > > >
> > > > > Eventually we will have two different ways for working with cache
> > > > > values with different bugs set. What is the profit?
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> > > wrote:
> > > > > >
> > > > > > Folks,
> > > > > >
> > > > > > I've checked the tx benchmarks and found no performance drop.
> > > > > > Also, see no issues at TC results.
> > > > > > So, seems, code ready to be merged.
> > > > > >
> > > > > > Everyone interested, please share any objections about
> > > > > > - public API
> > > > > > - test coverage
> > > > > > - implementation approach
> > > > > >
> > > > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> > > wrote:
> > > > > >
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > This is not a PoC, but the final solution (I hope so:) ) required
> > > the
> > > > > > > review.
> > > > > > > LWW means Last Write Wins, detailed explanation can be found at
> > > IEP-31.
> > > > > > >
> > > > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> > > nizhi...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hello, Anton.
> > > > > > >>
> > > > > > >> Thanks for the PoC.
> > > > > > >>
> > > > > > >> > finds correct values according to LWW strategy
> > > > > > >>
> > > > > > >> Can you, please, clarify what is LWW strategy?
> > > > > > >>
> > > > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > 

Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Anton,

I'm trying tell you that this proxy can produce false positive result,
incorrect result and just hide bugs. What will the next solution?
withNoBugs proxy?

You can perform consistency check using idle verify utility. Recovery
tool is good idea but user should trigger this process, not some cache
proxy implementation.

On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
>
> Seems, we already fixed all bugs caused this feature, but there is no
> warranty we will not create new :)
> This proxy is just checker that consistency is ok.
>
> >> reaching bugless implementation
> Not sure it's possible. Once you have software it contains bugs.
> This proxy will tell you whether these bugs lead to inconsistency.
>
> On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
>
> > Method name is minor problem. I still believe that there is no need
> > for this proxy because there are no any guarantees about bugless
> > implementation this functionality. Better way is reaching bugless
> > implementation of current functionality.
> >
> > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> > >
> > > Andrey,
> > >
> > > >> It means also that at least method name is bad.
> > > Agreed, already discussed with Aleksey Plekhanov.
> > > Decided that ".withConsistencyCheck()" is a proper name.
> > >
> > > >> What is the profit?
> > > This proxy allows to check (and fix) is there any consistency violation
> > > across the topology.
> > > The proxy will check all backups contain the same values as primary.
> > > So, when it's possible (you're ready to spend resources for this check)
> > you
> > > will be able to read-with-consistency-check.
> > > This will decrease the amount of "inconsistency caused
> > > war/strikes/devastation" situations, which is important for financial
> > > systems.
> > >
> > > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> > >
> > > > Anton,
> > > >
> > > > what does expression "withConsistency" mean? From user's standpoint it
> > > > means that all operations performed without this proxy are not
> > > > consistent. It means also that at least method name is bad.
> > > >
> > > > Are there any guarantees that withConsistency proxy will not contain
> > > > bugs that will lead to inconsistent write after inconsistency was
> > > > found? I think there are no such guarantees. Bugs still are possible.
> > > > So I always must use withConsistency proxy because I doesn't have
> > > > other choice - all ways are unreliable and withConsistency just sounds
> > > > better.
> > > >
> > > > Eventually we will have two different ways for working with cache
> > > > values with different bugs set. What is the profit?
> > > >
> > > >
> > > >
> > > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> > wrote:
> > > > >
> > > > > Folks,
> > > > >
> > > > > I've checked the tx benchmarks and found no performance drop.
> > > > > Also, see no issues at TC results.
> > > > > So, seems, code ready to be merged.
> > > > >
> > > > > Everyone interested, please share any objections about
> > > > > - public API
> > > > > - test coverage
> > > > > - implementation approach
> > > > >
> > > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> > wrote:
> > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > This is not a PoC, but the final solution (I hope so:) ) required
> > the
> > > > > > review.
> > > > > > LWW means Last Write Wins, detailed explanation can be found at
> > IEP-31.
> > > > > >
> > > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> > nizhi...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hello, Anton.
> > > > > >>
> > > > > >> Thanks for the PoC.
> > > > > >>
> > > > > >> > finds correct values according to LWW strategy
> > > > > >>
> > > > > >> Can you, please, clarify what is LWW strategy?
> > > > > >>
> > > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > > > >> > Ilya,
> > > > > >> >
> > > > > >> > This is impossible due to a conflict between some isolation
> > levels
> > > > and
> > > > > >> > get-with-consistency expectations.
> > > > > >> > Basically, it's impossible to perform get-with-consistency
> > after the
> > > > > >> other
> > > > > >> > get at !READ_COMMITTED transaction.
> > > > > >> > The problem here is that value should be cached according to the
> > > > > >> isolation
> > > > > >> > level, so get-with-consistency is restricted in this case.
> > > > > >> > Same problem we have at case get-with-consistency after put, so
> > we
> > > > have
> > > > > >> > restriction here too.
> > > > > >> > So, the order matter. :)
> > > > > >> >
> > > > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > > > >> >
> > > > > >> > [1]
> > > > > >> >
> > > > > >>
> > > >
> > https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > > > >> >
> > > > > >> > On Wed, Apr 3, 2019 at 4:54 

Re: Consistency check and fix (review request)

2019-04-15 Thread Anton Vinogradov
Seems, we already fixed all bugs caused this feature, but there is no
warranty we will not create new :)
This proxy is just checker that consistency is ok.

>> reaching bugless implementation
Not sure it's possible. Once you have software it contains bugs.
This proxy will tell you whether these bugs lead to inconsistency.

On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:

> Method name is minor problem. I still believe that there is no need
> for this proxy because there are no any guarantees about bugless
> implementation this functionality. Better way is reaching bugless
> implementation of current functionality.
>
> On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> >
> > Andrey,
> >
> > >> It means also that at least method name is bad.
> > Agreed, already discussed with Aleksey Plekhanov.
> > Decided that ".withConsistencyCheck()" is a proper name.
> >
> > >> What is the profit?
> > This proxy allows to check (and fix) is there any consistency violation
> > across the topology.
> > The proxy will check all backups contain the same values as primary.
> > So, when it's possible (you're ready to spend resources for this check)
> you
> > will be able to read-with-consistency-check.
> > This will decrease the amount of "inconsistency caused
> > war/strikes/devastation" situations, which is important for financial
> > systems.
> >
> > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> >
> > > Anton,
> > >
> > > what does expression "withConsistency" mean? From user's standpoint it
> > > means that all operations performed without this proxy are not
> > > consistent. It means also that at least method name is bad.
> > >
> > > Are there any guarantees that withConsistency proxy will not contain
> > > bugs that will lead to inconsistent write after inconsistency was
> > > found? I think there are no such guarantees. Bugs still are possible.
> > > So I always must use withConsistency proxy because I doesn't have
> > > other choice - all ways are unreliable and withConsistency just sounds
> > > better.
> > >
> > > Eventually we will have two different ways for working with cache
> > > values with different bugs set. What is the profit?
> > >
> > >
> > >
> > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> wrote:
> > > >
> > > > Folks,
> > > >
> > > > I've checked the tx benchmarks and found no performance drop.
> > > > Also, see no issues at TC results.
> > > > So, seems, code ready to be merged.
> > > >
> > > > Everyone interested, please share any objections about
> > > > - public API
> > > > - test coverage
> > > > - implementation approach
> > > >
> > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> wrote:
> > > >
> > > > > Nikolay,
> > > > >
> > > > > This is not a PoC, but the final solution (I hope so:) ) required
> the
> > > > > review.
> > > > > LWW means Last Write Wins, detailed explanation can be found at
> IEP-31.
> > > > >
> > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> nizhi...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hello, Anton.
> > > > >>
> > > > >> Thanks for the PoC.
> > > > >>
> > > > >> > finds correct values according to LWW strategy
> > > > >>
> > > > >> Can you, please, clarify what is LWW strategy?
> > > > >>
> > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > > >> > Ilya,
> > > > >> >
> > > > >> > This is impossible due to a conflict between some isolation
> levels
> > > and
> > > > >> > get-with-consistency expectations.
> > > > >> > Basically, it's impossible to perform get-with-consistency
> after the
> > > > >> other
> > > > >> > get at !READ_COMMITTED transaction.
> > > > >> > The problem here is that value should be cached according to the
> > > > >> isolation
> > > > >> > level, so get-with-consistency is restricted in this case.
> > > > >> > Same problem we have at case get-with-consistency after put, so
> we
> > > have
> > > > >> > restriction here too.
> > > > >> > So, the order matter. :)
> > > > >> >
> > > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >>
> > >
> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > > >> >
> > > > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > > > >> ilya.kasnach...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hello!
> > > > >> > >
> > > > >> > > Sounds useful especially for new feature development.
> > > > >> > >
> > > > >> > > Can you do a run of all tests with cache.forConsistency(),
> see if
> > > > >> there are
> > > > >> > > cases that fail?
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > --
> > > > >> > > Ilya Kasnacheev
> > > > >> > >
> > > > >> > >
> > > > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > > > >> > >
> > > > >> > > > Igniters,
> > > > >> > > >
> > > > >> > > > Sometimes, at real 

Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Method name is minor problem. I still believe that there is no need
for this proxy because there are no any guarantees about bugless
implementation this functionality. Better way is reaching bugless
implementation of current functionality.

On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
>
> Andrey,
>
> >> It means also that at least method name is bad.
> Agreed, already discussed with Aleksey Plekhanov.
> Decided that ".withConsistencyCheck()" is a proper name.
>
> >> What is the profit?
> This proxy allows to check (and fix) is there any consistency violation
> across the topology.
> The proxy will check all backups contain the same values as primary.
> So, when it's possible (you're ready to spend resources for this check) you
> will be able to read-with-consistency-check.
> This will decrease the amount of "inconsistency caused
> war/strikes/devastation" situations, which is important for financial
> systems.
>
> On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
>
> > Anton,
> >
> > what does expression "withConsistency" mean? From user's standpoint it
> > means that all operations performed without this proxy are not
> > consistent. It means also that at least method name is bad.
> >
> > Are there any guarantees that withConsistency proxy will not contain
> > bugs that will lead to inconsistent write after inconsistency was
> > found? I think there are no such guarantees. Bugs still are possible.
> > So I always must use withConsistency proxy because I doesn't have
> > other choice - all ways are unreliable and withConsistency just sounds
> > better.
> >
> > Eventually we will have two different ways for working with cache
> > values with different bugs set. What is the profit?
> >
> >
> >
> > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
> > >
> > > Folks,
> > >
> > > I've checked the tx benchmarks and found no performance drop.
> > > Also, see no issues at TC results.
> > > So, seems, code ready to be merged.
> > >
> > > Everyone interested, please share any objections about
> > > - public API
> > > - test coverage
> > > - implementation approach
> > >
> > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
> > >
> > > > Nikolay,
> > > >
> > > > This is not a PoC, but the final solution (I hope so:) ) required the
> > > > review.
> > > > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> > > >
> > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > > > wrote:
> > > >
> > > >> Hello, Anton.
> > > >>
> > > >> Thanks for the PoC.
> > > >>
> > > >> > finds correct values according to LWW strategy
> > > >>
> > > >> Can you, please, clarify what is LWW strategy?
> > > >>
> > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > >> > Ilya,
> > > >> >
> > > >> > This is impossible due to a conflict between some isolation levels
> > and
> > > >> > get-with-consistency expectations.
> > > >> > Basically, it's impossible to perform get-with-consistency after the
> > > >> other
> > > >> > get at !READ_COMMITTED transaction.
> > > >> > The problem here is that value should be cached according to the
> > > >> isolation
> > > >> > level, so get-with-consistency is restricted in this case.
> > > >> > Same problem we have at case get-with-consistency after put, so we
> > have
> > > >> > restriction here too.
> > > >> > So, the order matter. :)
> > > >> >
> > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > >> >
> > > >> > [1]
> > > >> >
> > > >>
> > https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > >> >
> > > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > > >> ilya.kasnach...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hello!
> > > >> > >
> > > >> > > Sounds useful especially for new feature development.
> > > >> > >
> > > >> > > Can you do a run of all tests with cache.forConsistency(), see if
> > > >> there are
> > > >> > > cases that fail?
> > > >> > >
> > > >> > > Regards,
> > > >> > > --
> > > >> > > Ilya Kasnacheev
> > > >> > >
> > > >> > >
> > > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > > >> > >
> > > >> > > > Igniters,
> > > >> > > >
> > > >> > > > Sometimes, at real deployment, we're faced with inconsistent
> > state
> > > >> across
> > > >> > > > the topology.
> > > >> > > > This means that somehow we have different values for the same
> > key at
> > > >> > > > different nodes.
> > > >> > > > This is an extremely rare situation, but, when you have
> > thousands of
> > > >> > > > terabytes of data, this can be a real problem.
> > > >> > > >
> > > >> > > > Apache Ignite provides a consistency guarantee, each affinity
> > node
> > > >> should
> > > >> > > > contain the same value for the same key, at least eventually.
> > > >> > > > But this guarantee can be violated because of bugs, see IEP-31
> > [1]
> > > >> for
> 

Re: Consistency check and fix (review request)

2019-04-15 Thread Anton Vinogradov
Andrey,

>> It means also that at least method name is bad.
Agreed, already discussed with Aleksey Plekhanov.
Decided that ".withConsistencyCheck()" is a proper name.

>> What is the profit?
This proxy allows to check (and fix) is there any consistency violation
across the topology.
The proxy will check all backups contain the same values as primary.
So, when it's possible (you're ready to spend resources for this check) you
will be able to read-with-consistency-check.
This will decrease the amount of "inconsistency caused
war/strikes/devastation" situations, which is important for financial
systems.

On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:

> Anton,
>
> what does expression "withConsistency" mean? From user's standpoint it
> means that all operations performed without this proxy are not
> consistent. It means also that at least method name is bad.
>
> Are there any guarantees that withConsistency proxy will not contain
> bugs that will lead to inconsistent write after inconsistency was
> found? I think there are no such guarantees. Bugs still are possible.
> So I always must use withConsistency proxy because I doesn't have
> other choice - all ways are unreliable and withConsistency just sounds
> better.
>
> Eventually we will have two different ways for working with cache
> values with different bugs set. What is the profit?
>
>
>
> On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
> >
> > Folks,
> >
> > I've checked the tx benchmarks and found no performance drop.
> > Also, see no issues at TC results.
> > So, seems, code ready to be merged.
> >
> > Everyone interested, please share any objections about
> > - public API
> > - test coverage
> > - implementation approach
> >
> > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
> >
> > > Nikolay,
> > >
> > > This is not a PoC, but the final solution (I hope so:) ) required the
> > > review.
> > > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> > >
> > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > > wrote:
> > >
> > >> Hello, Anton.
> > >>
> > >> Thanks for the PoC.
> > >>
> > >> > finds correct values according to LWW strategy
> > >>
> > >> Can you, please, clarify what is LWW strategy?
> > >>
> > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > >> > Ilya,
> > >> >
> > >> > This is impossible due to a conflict between some isolation levels
> and
> > >> > get-with-consistency expectations.
> > >> > Basically, it's impossible to perform get-with-consistency after the
> > >> other
> > >> > get at !READ_COMMITTED transaction.
> > >> > The problem here is that value should be cached according to the
> > >> isolation
> > >> > level, so get-with-consistency is restricted in this case.
> > >> > Same problem we have at case get-with-consistency after put, so we
> have
> > >> > restriction here too.
> > >> > So, the order matter. :)
> > >> >
> > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > >> >
> > >> > [1]
> > >> >
> > >>
> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > >> >
> > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > >> ilya.kasnach...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hello!
> > >> > >
> > >> > > Sounds useful especially for new feature development.
> > >> > >
> > >> > > Can you do a run of all tests with cache.forConsistency(), see if
> > >> there are
> > >> > > cases that fail?
> > >> > >
> > >> > > Regards,
> > >> > > --
> > >> > > Ilya Kasnacheev
> > >> > >
> > >> > >
> > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > >> > >
> > >> > > > Igniters,
> > >> > > >
> > >> > > > Sometimes, at real deployment, we're faced with inconsistent
> state
> > >> across
> > >> > > > the topology.
> > >> > > > This means that somehow we have different values for the same
> key at
> > >> > > > different nodes.
> > >> > > > This is an extremely rare situation, but, when you have
> thousands of
> > >> > > > terabytes of data, this can be a real problem.
> > >> > > >
> > >> > > > Apache Ignite provides a consistency guarantee, each affinity
> node
> > >> should
> > >> > > > contain the same value for the same key, at least eventually.
> > >> > > > But this guarantee can be violated because of bugs, see IEP-31
> [1]
> > >> for
> > >> > > > details.
> > >> > > >
> > >> > > > So, I created the issue [2] to handle such situations.
> > >> > > > The main idea is to have a special cache.withConsistency() proxy
> > >> allows
> > >> > > > checking a fix inconsistency on get operation.
> > >> > > >
> > >> > > > I've created PR [3] with following improvements (when
> > >> > > > cache.withConsistency() proxy used):
> > >> > > >
> > >> > > > - PESSIMISTIC && !READ_COMMITTED transaction
> > >> > > > -- checks values across the topology (under locks),
> > >> > > > -- finds correct values 

Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Anton,

what does expression "withConsistency" mean? From user's standpoint it
means that all operations performed without this proxy are not
consistent. It means also that at least method name is bad.

Are there any guarantees that withConsistency proxy will not contain
bugs that will lead to inconsistent write after inconsistency was
found? I think there are no such guarantees. Bugs still are possible.
So I always must use withConsistency proxy because I doesn't have
other choice - all ways are unreliable and withConsistency just sounds
better.

Eventually we will have two different ways for working with cache
values with different bugs set. What is the profit?



On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
>
> Folks,
>
> I've checked the tx benchmarks and found no performance drop.
> Also, see no issues at TC results.
> So, seems, code ready to be merged.
>
> Everyone interested, please share any objections about
> - public API
> - test coverage
> - implementation approach
>
> On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
>
> > Nikolay,
> >
> > This is not a PoC, but the final solution (I hope so:) ) required the
> > review.
> > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> >
> > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > wrote:
> >
> >> Hello, Anton.
> >>
> >> Thanks for the PoC.
> >>
> >> > finds correct values according to LWW strategy
> >>
> >> Can you, please, clarify what is LWW strategy?
> >>
> >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> >> > Ilya,
> >> >
> >> > This is impossible due to a conflict between some isolation levels and
> >> > get-with-consistency expectations.
> >> > Basically, it's impossible to perform get-with-consistency after the
> >> other
> >> > get at !READ_COMMITTED transaction.
> >> > The problem here is that value should be cached according to the
> >> isolation
> >> > level, so get-with-consistency is restricted in this case.
> >> > Same problem we have at case get-with-consistency after put, so we have
> >> > restriction here too.
> >> > So, the order matter. :)
> >> >
> >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> >> >
> >> > [1]
> >> >
> >> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> >> >
> >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> >> ilya.kasnach...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hello!
> >> > >
> >> > > Sounds useful especially for new feature development.
> >> > >
> >> > > Can you do a run of all tests with cache.forConsistency(), see if
> >> there are
> >> > > cases that fail?
> >> > >
> >> > > Regards,
> >> > > --
> >> > > Ilya Kasnacheev
> >> > >
> >> > >
> >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> >> > >
> >> > > > Igniters,
> >> > > >
> >> > > > Sometimes, at real deployment, we're faced with inconsistent state
> >> across
> >> > > > the topology.
> >> > > > This means that somehow we have different values for the same key at
> >> > > > different nodes.
> >> > > > This is an extremely rare situation, but, when you have thousands of
> >> > > > terabytes of data, this can be a real problem.
> >> > > >
> >> > > > Apache Ignite provides a consistency guarantee, each affinity node
> >> should
> >> > > > contain the same value for the same key, at least eventually.
> >> > > > But this guarantee can be violated because of bugs, see IEP-31 [1]
> >> for
> >> > > > details.
> >> > > >
> >> > > > So, I created the issue [2] to handle such situations.
> >> > > > The main idea is to have a special cache.withConsistency() proxy
> >> allows
> >> > > > checking a fix inconsistency on get operation.
> >> > > >
> >> > > > I've created PR [3] with following improvements (when
> >> > > > cache.withConsistency() proxy used):
> >> > > >
> >> > > > - PESSIMISTIC && !READ_COMMITTED transaction
> >> > > > -- checks values across the topology (under locks),
> >> > > > -- finds correct values according to LWW strategy,
> >> > > > -- records special event in case consistency violation found
> >> (contains
> >> > > > inconsistent map > and last values ),
> >> > > > -- enlists writes with latest value for each inconsistent key, so
> >> it will
> >> > > > be written on tx.commit().
> >> > > >
> >> > > > - OPTIMISTIC || READ_COMMITTED transactions
> >> > > > -- checks values across the topology (not under locks, so
> >> false-positive
> >> > > > case is possible),
> >> > > > -- starts PESSIMISTIC && SERIALIZABLE (at separate thread)
> >> transaction
> >> > >
> >> > > for
> >> > > > each possibly broken key and fixes it on a commit if necessary.
> >> > > > -- original transaction performs get-after-fix and can be continued
> >> if
> >> > >
> >> > > the
> >> > > > fix does not conflict with isolation level.
> >> > > >
> >> > > > Future plans
> >> > > > - Consistency guard (special process 

Re: Consistency check and fix (review request)

2019-04-12 Thread Anton Vinogradov
Folks,

I've checked the tx benchmarks and found no performance drop.
Also, see no issues at TC results.
So, seems, code ready to be merged.

Everyone interested, please share any objections about
- public API
- test coverage
- implementation approach

On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:

> Nikolay,
>
> This is not a PoC, but the final solution (I hope so:) ) required the
> review.
> LWW means Last Write Wins, detailed explanation can be found at IEP-31.
>
> On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> wrote:
>
>> Hello, Anton.
>>
>> Thanks for the PoC.
>>
>> > finds correct values according to LWW strategy
>>
>> Can you, please, clarify what is LWW strategy?
>>
>> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
>> > Ilya,
>> >
>> > This is impossible due to a conflict between some isolation levels and
>> > get-with-consistency expectations.
>> > Basically, it's impossible to perform get-with-consistency after the
>> other
>> > get at !READ_COMMITTED transaction.
>> > The problem here is that value should be cached according to the
>> isolation
>> > level, so get-with-consistency is restricted in this case.
>> > Same problem we have at case get-with-consistency after put, so we have
>> > restriction here too.
>> > So, the order matter. :)
>> >
>> > See OperationRestrictionsCacheConsistencyTest [1] for details.
>> >
>> > [1]
>> >
>> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
>> >
>> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com>
>> > wrote:
>> >
>> > > Hello!
>> > >
>> > > Sounds useful especially for new feature development.
>> > >
>> > > Can you do a run of all tests with cache.forConsistency(), see if
>> there are
>> > > cases that fail?
>> > >
>> > > Regards,
>> > > --
>> > > Ilya Kasnacheev
>> > >
>> > >
>> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
>> > >
>> > > > Igniters,
>> > > >
>> > > > Sometimes, at real deployment, we're faced with inconsistent state
>> across
>> > > > the topology.
>> > > > This means that somehow we have different values for the same key at
>> > > > different nodes.
>> > > > This is an extremely rare situation, but, when you have thousands of
>> > > > terabytes of data, this can be a real problem.
>> > > >
>> > > > Apache Ignite provides a consistency guarantee, each affinity node
>> should
>> > > > contain the same value for the same key, at least eventually.
>> > > > But this guarantee can be violated because of bugs, see IEP-31 [1]
>> for
>> > > > details.
>> > > >
>> > > > So, I created the issue [2] to handle such situations.
>> > > > The main idea is to have a special cache.withConsistency() proxy
>> allows
>> > > > checking a fix inconsistency on get operation.
>> > > >
>> > > > I've created PR [3] with following improvements (when
>> > > > cache.withConsistency() proxy used):
>> > > >
>> > > > - PESSIMISTIC && !READ_COMMITTED transaction
>> > > > -- checks values across the topology (under locks),
>> > > > -- finds correct values according to LWW strategy,
>> > > > -- records special event in case consistency violation found
>> (contains
>> > > > inconsistent map > and last values ),
>> > > > -- enlists writes with latest value for each inconsistent key, so
>> it will
>> > > > be written on tx.commit().
>> > > >
>> > > > - OPTIMISTIC || READ_COMMITTED transactions
>> > > > -- checks values across the topology (not under locks, so
>> false-positive
>> > > > case is possible),
>> > > > -- starts PESSIMISTIC && SERIALIZABLE (at separate thread)
>> transaction
>> > >
>> > > for
>> > > > each possibly broken key and fixes it on a commit if necessary.
>> > > > -- original transaction performs get-after-fix and can be continued
>> if
>> > >
>> > > the
>> > > > fix does not conflict with isolation level.
>> > > >
>> > > > Future plans
>> > > > - Consistency guard (special process periodically checks we have no
>> > > > inconsistency).
>> > > > - MVCC support.
>> > > > - Atomic caches support.
>> > > > - Thin client support.
>> > > > - SQL support.
>> > > > - Read-with-consistency before the write operation.
>> > > > - Single key read-with-consistency optimization, now the collection
>> > > > approach used each time.
>> > > > - Do not perform read-with-consistency for the key in case it was
>> > > > consistently read some time ago.
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> > >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
>> > > > [2] https://issues.apache.org/jira/browse/IGNITE-10663
>> > > > [3] https://github.com/apache/ignite/pull/5656
>> > > >
>>
>


Re: Consistency check and fix (review request)

2019-04-03 Thread Nikolay Izhikov
Hello, Anton.

Thanks for the PoC.

> finds correct values according to LWW strategy

Can you, please, clarify what is LWW strategy?

В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> Ilya,
> 
> This is impossible due to a conflict between some isolation levels and
> get-with-consistency expectations.
> Basically, it's impossible to perform get-with-consistency after the other
> get at !READ_COMMITTED transaction.
> The problem here is that value should be cached according to the isolation
> level, so get-with-consistency is restricted in this case.
> Same problem we have at case get-with-consistency after put, so we have
> restriction here too.
> So, the order matter. :)
> 
> See OperationRestrictionsCacheConsistencyTest [1] for details.
> 
> [1]
> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> 
> On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev 
> wrote:
> 
> > Hello!
> > 
> > Sounds useful especially for new feature development.
> > 
> > Can you do a run of all tests with cache.forConsistency(), see if there are
> > cases that fail?
> > 
> > Regards,
> > --
> > Ilya Kasnacheev
> > 
> > 
> > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > 
> > > Igniters,
> > > 
> > > Sometimes, at real deployment, we're faced with inconsistent state across
> > > the topology.
> > > This means that somehow we have different values for the same key at
> > > different nodes.
> > > This is an extremely rare situation, but, when you have thousands of
> > > terabytes of data, this can be a real problem.
> > > 
> > > Apache Ignite provides a consistency guarantee, each affinity node should
> > > contain the same value for the same key, at least eventually.
> > > But this guarantee can be violated because of bugs, see IEP-31 [1] for
> > > details.
> > > 
> > > So, I created the issue [2] to handle such situations.
> > > The main idea is to have a special cache.withConsistency() proxy allows
> > > checking a fix inconsistency on get operation.
> > > 
> > > I've created PR [3] with following improvements (when
> > > cache.withConsistency() proxy used):
> > > 
> > > - PESSIMISTIC && !READ_COMMITTED transaction
> > > -- checks values across the topology (under locks),
> > > -- finds correct values according to LWW strategy,
> > > -- records special event in case consistency violation found (contains
> > > inconsistent map > and last values ),
> > > -- enlists writes with latest value for each inconsistent key, so it will
> > > be written on tx.commit().
> > > 
> > > - OPTIMISTIC || READ_COMMITTED transactions
> > > -- checks values across the topology (not under locks, so false-positive
> > > case is possible),
> > > -- starts PESSIMISTIC && SERIALIZABLE (at separate thread) transaction
> > 
> > for
> > > each possibly broken key and fixes it on a commit if necessary.
> > > -- original transaction performs get-after-fix and can be continued if
> > 
> > the
> > > fix does not conflict with isolation level.
> > > 
> > > Future plans
> > > - Consistency guard (special process periodically checks we have no
> > > inconsistency).
> > > - MVCC support.
> > > - Atomic caches support.
> > > - Thin client support.
> > > - SQL support.
> > > - Read-with-consistency before the write operation.
> > > - Single key read-with-consistency optimization, now the collection
> > > approach used each time.
> > > - Do not perform read-with-consistency for the key in case it was
> > > consistently read some time ago.
> > > 
> > > [1]
> > > 
> > > 
> > 
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > > [2] https://issues.apache.org/jira/browse/IGNITE-10663
> > > [3] https://github.com/apache/ignite/pull/5656
> > > 


signature.asc
Description: This is a digitally signed message part


Re: Consistency check and fix (review request)

2019-04-03 Thread Anton Vinogradov
Ilya,

This is impossible due to a conflict between some isolation levels and
get-with-consistency expectations.
Basically, it's impossible to perform get-with-consistency after the other
get at !READ_COMMITTED transaction.
The problem here is that value should be cached according to the isolation
level, so get-with-consistency is restricted in this case.
Same problem we have at case get-with-consistency after put, so we have
restriction here too.
So, the order matter. :)

See OperationRestrictionsCacheConsistencyTest [1] for details.

[1]
https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java

On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> Sounds useful especially for new feature development.
>
> Can you do a run of all tests with cache.forConsistency(), see if there are
> cases that fail?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
>
> > Igniters,
> >
> > Sometimes, at real deployment, we're faced with inconsistent state across
> > the topology.
> > This means that somehow we have different values for the same key at
> > different nodes.
> > This is an extremely rare situation, but, when you have thousands of
> > terabytes of data, this can be a real problem.
> >
> > Apache Ignite provides a consistency guarantee, each affinity node should
> > contain the same value for the same key, at least eventually.
> > But this guarantee can be violated because of bugs, see IEP-31 [1] for
> > details.
> >
> > So, I created the issue [2] to handle such situations.
> > The main idea is to have a special cache.withConsistency() proxy allows
> > checking a fix inconsistency on get operation.
> >
> > I've created PR [3] with following improvements (when
> > cache.withConsistency() proxy used):
> >
> > - PESSIMISTIC && !READ_COMMITTED transaction
> > -- checks values across the topology (under locks),
> > -- finds correct values according to LWW strategy,
> > -- records special event in case consistency violation found (contains
> > inconsistent map > and last values ),
> > -- enlists writes with latest value for each inconsistent key, so it will
> > be written on tx.commit().
> >
> > - OPTIMISTIC || READ_COMMITTED transactions
> > -- checks values across the topology (not under locks, so false-positive
> > case is possible),
> > -- starts PESSIMISTIC && SERIALIZABLE (at separate thread) transaction
> for
> > each possibly broken key and fixes it on a commit if necessary.
> > -- original transaction performs get-after-fix and can be continued if
> the
> > fix does not conflict with isolation level.
> >
> > Future plans
> > - Consistency guard (special process periodically checks we have no
> > inconsistency).
> > - MVCC support.
> > - Atomic caches support.
> > - Thin client support.
> > - SQL support.
> > - Read-with-consistency before the write operation.
> > - Single key read-with-consistency optimization, now the collection
> > approach used each time.
> > - Do not perform read-with-consistency for the key in case it was
> > consistently read some time ago.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > [2] https://issues.apache.org/jira/browse/IGNITE-10663
> > [3] https://github.com/apache/ignite/pull/5656
> >
>


Re: Consistency check and fix (review request)

2019-04-03 Thread Ilya Kasnacheev
Hello!

Sounds useful especially for new feature development.

Can you do a run of all tests with cache.forConsistency(), see if there are
cases that fail?

Regards,
-- 
Ilya Kasnacheev


ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :

> Igniters,
>
> Sometimes, at real deployment, we're faced with inconsistent state across
> the topology.
> This means that somehow we have different values for the same key at
> different nodes.
> This is an extremely rare situation, but, when you have thousands of
> terabytes of data, this can be a real problem.
>
> Apache Ignite provides a consistency guarantee, each affinity node should
> contain the same value for the same key, at least eventually.
> But this guarantee can be violated because of bugs, see IEP-31 [1] for
> details.
>
> So, I created the issue [2] to handle such situations.
> The main idea is to have a special cache.withConsistency() proxy allows
> checking a fix inconsistency on get operation.
>
> I've created PR [3] with following improvements (when
> cache.withConsistency() proxy used):
>
> - PESSIMISTIC && !READ_COMMITTED transaction
> -- checks values across the topology (under locks),
> -- finds correct values according to LWW strategy,
> -- records special event in case consistency violation found (contains
> inconsistent map > and last values ),
> -- enlists writes with latest value for each inconsistent key, so it will
> be written on tx.commit().
>
> - OPTIMISTIC || READ_COMMITTED transactions
> -- checks values across the topology (not under locks, so false-positive
> case is possible),
> -- starts PESSIMISTIC && SERIALIZABLE (at separate thread) transaction for
> each possibly broken key and fixes it on a commit if necessary.
> -- original transaction performs get-after-fix and can be continued if the
> fix does not conflict with isolation level.
>
> Future plans
> - Consistency guard (special process periodically checks we have no
> inconsistency).
> - MVCC support.
> - Atomic caches support.
> - Thin client support.
> - SQL support.
> - Read-with-consistency before the write operation.
> - Single key read-with-consistency optimization, now the collection
> approach used each time.
> - Do not perform read-with-consistency for the key in case it was
> consistently read some time ago.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> [2] https://issues.apache.org/jira/browse/IGNITE-10663
> [3] https://github.com/apache/ignite/pull/5656
>


Consistency check and fix (review request)

2019-04-03 Thread Anton Vinogradov
Igniters,

Sometimes, at real deployment, we're faced with inconsistent state across
the topology.
This means that somehow we have different values for the same key at
different nodes.
This is an extremely rare situation, but, when you have thousands of
terabytes of data, this can be a real problem.

Apache Ignite provides a consistency guarantee, each affinity node should
contain the same value for the same key, at least eventually.
But this guarantee can be violated because of bugs, see IEP-31 [1] for
details.

So, I created the issue [2] to handle such situations.
The main idea is to have a special cache.withConsistency() proxy allows
checking a fix inconsistency on get operation.

I've created PR [3] with following improvements (when
cache.withConsistency() proxy used):

- PESSIMISTIC && !READ_COMMITTED transaction
-- checks values across the topology (under locks),
-- finds correct values according to LWW strategy,
-- records special event in case consistency violation found (contains
inconsistent map > and last values ),
-- enlists writes with latest value for each inconsistent key, so it will
be written on tx.commit().

- OPTIMISTIC || READ_COMMITTED transactions
-- checks values across the topology (not under locks, so false-positive
case is possible),
-- starts PESSIMISTIC && SERIALIZABLE (at separate thread) transaction for
each possibly broken key and fixes it on a commit if necessary.
-- original transaction performs get-after-fix and can be continued if the
fix does not conflict with isolation level.

Future plans
- Consistency guard (special process periodically checks we have no
inconsistency).
- MVCC support.
- Atomic caches support.
- Thin client support.
- SQL support.
- Read-with-consistency before the write operation.
- Single key read-with-consistency optimization, now the collection
approach used each time.
- Do not perform read-with-consistency for the key in case it was
consistently read some time ago.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
[2] https://issues.apache.org/jira/browse/IGNITE-10663
[3] https://github.com/apache/ignite/pull/5656


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-23 Thread Valentin Kulichenko
Dmitry,

Thanks for pointing this out. Fixed in master and 2.6.

-Val

On Sat, Jun 23, 2018 at 10:49 AM Dmitriy Govorukhin <
dmitriy.govoruk...@gmail.com> wrote:

> Valentin,
>
> Seems that these changes have classes without license head. TC link
> 
>
> /data/teamcity/work/c182b70f2dfa6507/modules/spring/src/test/java/org/apache/ignite/transactions/spring/GridSpringTransactionManagerSpringBeanSelfTest.java
> /data/teamcity/work/c182b70f2dfa6507/modules/spring/src/test/java/org/apache/ignite/transactions/spring/GridSpringTransactionManagerAbstractTest.java
>
> Please, add headers for these classes.
>
> On Sat, Jun 23, 2018 at 2:33 AM Amir Akhmedov 
> wrote:
>
>> Great, thanks! As always, happy to contribute!
>>
>> Thanks,
>> Amir
>>
>>
>> On Fri, Jun 22, 2018 at 7:32 PM Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>> > Amir,
>> >
>> > I merged you change to master and 2.6. Thanks!
>> >
>> > -Val
>> >
>> > On Fri, Jun 22, 2018 at 4:21 PM Amir Akhmedov 
>> > wrote:
>> >
>> >> Val,
>> >> I replied to it already :)
>> >>
>> >> Thanks,
>> >> Amir
>> >>
>> >>
>> >> On Fri, Jun 22, 2018 at 7:20 PM Valentin Kulichenko <
>> >> valentin.kuliche...@gmail.com> wrote:
>> >>
>> >>> Amir,
>> >>>
>> >>> Thanks for quick reaction. I added a follow up question in the ticket.
>> >>>
>> >>> -Val
>> >>>
>> >>> On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov <
>> amir.akhme...@gmail.com>
>> >>> wrote:
>> >>>
>>  Hi Val,
>>  Thanks for your comments. I replied in the ticket with my vision of
>> the
>>  issue and how I tried to solve it. Please check it and let me know.
>> 
>>  Thanks,
>>  Amir
>> 
>> 
>>  On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
>>  valentin.kuliche...@gmail.com> wrote:
>> 
>> > Hi Amir,
>> >
>> > I reviewed the changes and I'm not sure I understood how they fix
>> they
>> > issue. I left more detailed comment in the ticket, can you please
>> clarify?
>> >
>> > -Val
>> >
>> > On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov <
>> dpavlov@gmail.com>
>> > wrote:
>> >
>> >> Hi Amir,
>> >>
>> >> let me say sincere thank you for continuing to contribute.
>> >>
>> >> Bumping up this thread.
>> >>
>> >> Igniters, who has an expertise here?
>> >>
>> >>
>> >> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov <
>> amir.akhme...@gmail.com>:
>> >>
>> >> > Hi All,
>> >> > Can you please review my changes for IGNITE-8740.
>> >> >
>> >> > PR: https://github.com/apache/ignite/pull/4208
>> >> > TC:
>> >> >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
>> >> >
>> >> > Thanks,
>> >> > Amir
>> >> >
>> >>
>> >
>>
>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-23 Thread Dmitriy Govorukhin
Valentin,

Seems that these changes have classes without license head. TC link


/data/teamcity/work/c182b70f2dfa6507/modules/spring/src/test/java/org/apache/ignite/transactions/spring/GridSpringTransactionManagerSpringBeanSelfTest.java
/data/teamcity/work/c182b70f2dfa6507/modules/spring/src/test/java/org/apache/ignite/transactions/spring/GridSpringTransactionManagerAbstractTest.java

Please, add headers for these classes.

On Sat, Jun 23, 2018 at 2:33 AM Amir Akhmedov 
wrote:

> Great, thanks! As always, happy to contribute!
>
> Thanks,
> Amir
>
>
> On Fri, Jun 22, 2018 at 7:32 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Amir,
> >
> > I merged you change to master and 2.6. Thanks!
> >
> > -Val
> >
> > On Fri, Jun 22, 2018 at 4:21 PM Amir Akhmedov 
> > wrote:
> >
> >> Val,
> >> I replied to it already :)
> >>
> >> Thanks,
> >> Amir
> >>
> >>
> >> On Fri, Jun 22, 2018 at 7:20 PM Valentin Kulichenko <
> >> valentin.kuliche...@gmail.com> wrote:
> >>
> >>> Amir,
> >>>
> >>> Thanks for quick reaction. I added a follow up question in the ticket.
> >>>
> >>> -Val
> >>>
> >>> On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov  >
> >>> wrote:
> >>>
>  Hi Val,
>  Thanks for your comments. I replied in the ticket with my vision of
> the
>  issue and how I tried to solve it. Please check it and let me know.
> 
>  Thanks,
>  Amir
> 
> 
>  On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
>  valentin.kuliche...@gmail.com> wrote:
> 
> > Hi Amir,
> >
> > I reviewed the changes and I'm not sure I understood how they fix
> they
> > issue. I left more detailed comment in the ticket, can you please
> clarify?
> >
> > -Val
> >
> > On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov  >
> > wrote:
> >
> >> Hi Amir,
> >>
> >> let me say sincere thank you for continuing to contribute.
> >>
> >> Bumping up this thread.
> >>
> >> Igniters, who has an expertise here?
> >>
> >>
> >> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov  >:
> >>
> >> > Hi All,
> >> > Can you please review my changes for IGNITE-8740.
> >> >
> >> > PR: https://github.com/apache/ignite/pull/4208
> >> > TC:
> >> >
> >> >
> >>
> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
> >> >
> >> > Thanks,
> >> > Amir
> >> >
> >>
> >
>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Amir Akhmedov
Great, thanks! As always, happy to contribute!

Thanks,
Amir


On Fri, Jun 22, 2018 at 7:32 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Amir,
>
> I merged you change to master and 2.6. Thanks!
>
> -Val
>
> On Fri, Jun 22, 2018 at 4:21 PM Amir Akhmedov 
> wrote:
>
>> Val,
>> I replied to it already :)
>>
>> Thanks,
>> Amir
>>
>>
>> On Fri, Jun 22, 2018 at 7:20 PM Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Amir,
>>>
>>> Thanks for quick reaction. I added a follow up question in the ticket.
>>>
>>> -Val
>>>
>>> On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov 
>>> wrote:
>>>
 Hi Val,
 Thanks for your comments. I replied in the ticket with my vision of the
 issue and how I tried to solve it. Please check it and let me know.

 Thanks,
 Amir


 On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
 valentin.kuliche...@gmail.com> wrote:

> Hi Amir,
>
> I reviewed the changes and I'm not sure I understood how they fix they
> issue. I left more detailed comment in the ticket, can you please clarify?
>
> -Val
>
> On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov 
> wrote:
>
>> Hi Amir,
>>
>> let me say sincere thank you for continuing to contribute.
>>
>> Bumping up this thread.
>>
>> Igniters, who has an expertise here?
>>
>>
>> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :
>>
>> > Hi All,
>> > Can you please review my changes for IGNITE-8740.
>> >
>> > PR: https://github.com/apache/ignite/pull/4208
>> > TC:
>> >
>> >
>> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
>> >
>> > Thanks,
>> > Amir
>> >
>>
>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Valentin Kulichenko
Amir,

I merged you change to master and 2.6. Thanks!

-Val

On Fri, Jun 22, 2018 at 4:21 PM Amir Akhmedov 
wrote:

> Val,
> I replied to it already :)
>
> Thanks,
> Amir
>
>
> On Fri, Jun 22, 2018 at 7:20 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Amir,
>>
>> Thanks for quick reaction. I added a follow up question in the ticket.
>>
>> -Val
>>
>> On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov 
>> wrote:
>>
>>> Hi Val,
>>> Thanks for your comments. I replied in the ticket with my vision of the
>>> issue and how I tried to solve it. Please check it and let me know.
>>>
>>> Thanks,
>>> Amir
>>>
>>>
>>> On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com> wrote:
>>>
 Hi Amir,

 I reviewed the changes and I'm not sure I understood how they fix they
 issue. I left more detailed comment in the ticket, can you please clarify?

 -Val

 On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov 
 wrote:

> Hi Amir,
>
> let me say sincere thank you for continuing to contribute.
>
> Bumping up this thread.
>
> Igniters, who has an expertise here?
>
>
> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :
>
> > Hi All,
> > Can you please review my changes for IGNITE-8740.
> >
> > PR: https://github.com/apache/ignite/pull/4208
> > TC:
> >
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
> >
> > Thanks,
> > Amir
> >
>



Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Amir Akhmedov
Val,
I replied to it already :)

Thanks,
Amir


On Fri, Jun 22, 2018 at 7:20 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Amir,
>
> Thanks for quick reaction. I added a follow up question in the ticket.
>
> -Val
>
> On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov 
> wrote:
>
>> Hi Val,
>> Thanks for your comments. I replied in the ticket with my vision of the
>> issue and how I tried to solve it. Please check it and let me know.
>>
>> Thanks,
>> Amir
>>
>>
>> On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Hi Amir,
>>>
>>> I reviewed the changes and I'm not sure I understood how they fix they
>>> issue. I left more detailed comment in the ticket, can you please clarify?
>>>
>>> -Val
>>>
>>> On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov 
>>> wrote:
>>>
 Hi Amir,

 let me say sincere thank you for continuing to contribute.

 Bumping up this thread.

 Igniters, who has an expertise here?


 вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :

 > Hi All,
 > Can you please review my changes for IGNITE-8740.
 >
 > PR: https://github.com/apache/ignite/pull/4208
 > TC:
 >
 >
 https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
 >
 > Thanks,
 > Amir
 >

>>>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Valentin Kulichenko
Amir,

Thanks for quick reaction. I added a follow up question in the ticket.

-Val

On Fri, Jun 22, 2018 at 3:48 PM Amir Akhmedov 
wrote:

> Hi Val,
> Thanks for your comments. I replied in the ticket with my vision of the
> issue and how I tried to solve it. Please check it and let me know.
>
> Thanks,
> Amir
>
>
> On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Amir,
>>
>> I reviewed the changes and I'm not sure I understood how they fix they
>> issue. I left more detailed comment in the ticket, can you please clarify?
>>
>> -Val
>>
>> On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov 
>> wrote:
>>
>>> Hi Amir,
>>>
>>> let me say sincere thank you for continuing to contribute.
>>>
>>> Bumping up this thread.
>>>
>>> Igniters, who has an expertise here?
>>>
>>>
>>> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :
>>>
>>> > Hi All,
>>> > Can you please review my changes for IGNITE-8740.
>>> >
>>> > PR: https://github.com/apache/ignite/pull/4208
>>> > TC:
>>> >
>>> >
>>> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
>>> >
>>> > Thanks,
>>> > Amir
>>> >
>>>
>>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Amir Akhmedov
Hi Val,
Thanks for your comments. I replied in the ticket with my vision of the
issue and how I tried to solve it. Please check it and let me know.

Thanks,
Amir


On Fri, Jun 22, 2018 at 5:42 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi Amir,
>
> I reviewed the changes and I'm not sure I understood how they fix they
> issue. I left more detailed comment in the ticket, can you please clarify?
>
> -Val
>
> On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov 
> wrote:
>
>> Hi Amir,
>>
>> let me say sincere thank you for continuing to contribute.
>>
>> Bumping up this thread.
>>
>> Igniters, who has an expertise here?
>>
>>
>> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :
>>
>> > Hi All,
>> > Can you please review my changes for IGNITE-8740.
>> >
>> > PR: https://github.com/apache/ignite/pull/4208
>> > TC:
>> >
>> >
>> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
>> >
>> > Thanks,
>> > Amir
>> >
>>
>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Valentin Kulichenko
Hi Amir,

I reviewed the changes and I'm not sure I understood how they fix they
issue. I left more detailed comment in the ticket, can you please clarify?

-Val

On Fri, Jun 22, 2018 at 9:53 AM Dmitry Pavlov  wrote:

> Hi Amir,
>
> let me say sincere thank you for continuing to contribute.
>
> Bumping up this thread.
>
> Igniters, who has an expertise here?
>
>
> вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :
>
> > Hi All,
> > Can you please review my changes for IGNITE-8740.
> >
> > PR: https://github.com/apache/ignite/pull/4208
> > TC:
> >
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
> >
> > Thanks,
> > Amir
> >
>


Re: Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-22 Thread Dmitry Pavlov
Hi Amir,

let me say sincere thank you for continuing to contribute.

Bumping up this thread.

Igniters, who has an expertise here?


вс, 17 июн. 2018 г. в 17:59, Amir Akhmedov :

> Hi All,
> Can you please review my changes for IGNITE-8740.
>
> PR: https://github.com/apache/ignite/pull/4208
> TC:
>
> https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll
>
> Thanks,
> Amir
>


Review request for IGNITE-8740: Support reuse of already initialized Ignite in IgniteSpringBean

2018-06-17 Thread Amir Akhmedov
Hi All,
Can you please review my changes for IGNITE-8740.

PR: https://github.com/apache/ignite/pull/4208
TC:
https://ci.ignite.apache.org/viewLog.html?buildId=1397283=buildResultsDiv=IgniteTests24Java8_RunAll

Thanks,
Amir


Re: IGNITE-8685 review request

2018-06-05 Thread Alexey Goncharuk
Thanks, Dmitriy, I'll take a look.

вт, 5 июн. 2018 г. в 15:24, Dmitriy Govorukhin :

> Igniters,
>
> I prepared a rather important patch [1] related to WAL.
> In the current implementation, we calculation incorrect size for
> SEGMENT_SWITCH record, it leads to stopping the iteration at the end of a
> segment and iterator do not advance to the next segment.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-8685
>


IGNITE-8685 review request

2018-06-05 Thread Dmitriy Govorukhin
Igniters,

I prepared a rather important patch [1] related to WAL.
In the current implementation, we calculation incorrect size for
SEGMENT_SWITCH record, it leads to stopping the iteration at the end of a
segment and iterator do not advance to the next segment.

[1] https://issues.apache.org/jira/browse/IGNITE-8685


Re: Review request for IGNITE-8534 Upgrade Ignite Spark Module's Spark version to 2.3.

2018-05-30 Thread Ray
Before spark 2.3, spark is compiled using scala 2.11 and 2.10 separately.
Spark-2.10 module in Ignite exists to accommodate this, it's not used
anywhere else in project.
Now spark 2.3 decided to remove support for scala 2.10, so we can safely
remove spark-2.10 module in Ignite.

It won't affect visor-console-2.10 and scala-2.10 modules because these two
modules are Ignite internal modules to support legacy users using scala
2.10.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Review request for IGNITE-8534 Upgrade Ignite Spark Module's Spark version to 2.3.

2018-05-30 Thread Petr Ivanov
Ray, could you share details of your investigation regarding spark-2.10 module 
removal?

Is it not used anywhere that it can be safely removed?
And what about visor-console-2.10 and scala-2.10 modules?



> On 30 May 2018, at 08:41, Ray  wrote:
> 
> Hi Dmitriy,
> 
> Thanks for the reply.
> 
> I have resolved conflicts in PR and changed the ticket status to path
> available.
> Please review and leave comments.
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



Re: Review request for IGNITE-8534 Upgrade Ignite Spark Module's Spark version to 2.3.

2018-05-29 Thread Dmitry Pavlov
Hi Ray,

Could you also please resolve conflicts in PR?

Conflicting files
modules/spark-2.10/pom.xml

вт, 29 мая 2018 г. в 18:19, Dmitry Pavlov :

> Hi Ray,
>
> Status of this ticket is In Progress, so it is not displayed in filters.
>
> Could you please set status to Patch Available if PR is ready for review?
>
> Sincerely,
> Dmitriy Pavlov
>
> пн, 28 мая 2018 г. в 8:40, Ray :
>
>> Valentin Kulichenko and Nikolay Izhikov can you please take a look at PR
>> and
>> provide comments? Other reviewers are welcome as well.
>>
>> https://github.com/apache/ignite/pull/4033
>>
>> I modified Ignite Spark module to fit Spark 2.3 APIs and removed
>> spark-2.10
>> module because Spark 2.3 stooped support for scala 2.10.
>> https://issues.apache.org/jira/browse/SPARK-19810
>>
>> Also, please find the dev list discussion thread here.
>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/Discussion-Upgrade-Ignite-Spark-Module-s-Spark-version-to-2-3-0-td30762.html
>>
>> Ray
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>>
>


Re: Review request for IGNITE-8534 Upgrade Ignite Spark Module's Spark version to 2.3.

2018-05-29 Thread Dmitry Pavlov
Hi Ray,

Status of this ticket is In Progress, so it is not displayed in filters.

Could you please set status to Patch Available if PR is ready for review?

Sincerely,
Dmitriy Pavlov

пн, 28 мая 2018 г. в 8:40, Ray :

> Valentin Kulichenko and Nikolay Izhikov can you please take a look at PR
> and
> provide comments? Other reviewers are welcome as well.
>
> https://github.com/apache/ignite/pull/4033
>
> I modified Ignite Spark module to fit Spark 2.3 APIs and removed spark-2.10
> module because Spark 2.3 stooped support for scala 2.10.
> https://issues.apache.org/jira/browse/SPARK-19810
>
> Also, please find the dev list discussion thread here.
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Discussion-Upgrade-Ignite-Spark-Module-s-Spark-version-to-2-3-0-td30762.html
>
> Ray
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Review request for IGNITE-8534 Upgrade Ignite Spark Module's Spark version to 2.3.

2018-05-27 Thread Ray
Valentin Kulichenko and Nikolay Izhikov can you please take a look at PR and
provide comments? Other reviewers are welcome as well.

https://github.com/apache/ignite/pull/4033

I modified Ignite Spark module to fit Spark 2.3 APIs and removed spark-2.10
module because Spark 2.3 stooped support for scala 2.10.
https://issues.apache.org/jira/browse/SPARK-19810

Also, please find the dev list discussion thread here.
http://apache-ignite-developers.2346864.n4.nabble.com/Discussion-Upgrade-Ignite-Spark-Module-s-Spark-version-to-2-3-0-td30762.html

Ray




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Review request

2016-02-03 Thread Vladimir Ozerov
As per cache - I hardly understand affected logic, so my review wouldn't
help much here.

As per the rest changes - looks good for me. I also see garbage from NIO
and "force keys" as huge memory hotspots. The only problem is
GridCompoundFuture:

if (futs == null)
futs = new ArrayList<>();

futs.add(fut);


Things like this are proven to be anti-pattern in terms of memory
allocations, because on the last line you effectively allocate Object[10],
while usually you will have much less child futures. We could delay
allocation of array if we store the very first child future as direct
reference. And only second added future should lead to ArrayList
allocation. This should positively affect lots operations with "compound
semantics" and single cache key involved (e.g. single puts/gets).

On Mon, Feb 1, 2016 at 9:08 PM, Yakov Zhdanov  wrote:

> No visible changes to throughput and latency on our common configuration,
> but allocation pressure reduced up to 20% in put-get benchmarks.
>
> --Yakov
>
> 2016-02-01 20:02 GMT+03:00 Dmitriy Setrakyan :
>
> > Any preliminary performance numbers?
> >
> > On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov 
> wrote:
> >
> > > Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR
> and
> > > provide comments? Other reviewers are welcome, too! =)
> > >
> > > https://github.com/apache/ignite/pull/422
> > >
> > > I did some changes to decrease allocation pressure and fixed force keys
> > > request not to be sent if rebalancing has already been successfully
> > > completed on operation's topology version.
> > >
> > > --Yakov
> > >
> >
>


Re: Review request

2016-02-03 Thread Dmitriy Setrakyan
On Wed, Feb 3, 2016 at 7:02 AM, Vladimir Ozerov 
wrote:

> As per cache - I hardly understand affected logic, so my review wouldn't
> help much here.
>
> As per the rest changes - looks good for me. I also see garbage from NIO
> and "force keys" as huge memory hotspots. The only problem is
> GridCompoundFuture:
>
> if (futs == null)
> futs = new ArrayList<>();
>
> futs.add(fut);
>
>
> Things like this are proven to be anti-pattern in terms of memory
> allocations, because on the last line you effectively allocate Object[10],
> while usually you will have much less child futures. We could delay
> allocation of array if we store the very first child future as direct
> reference. And only second added future should lead to ArrayList
> allocation. This should positively affect lots operations with "compound
> semantics" and single cache key involved (e.g. single puts/gets).
>

Vova,
I couldn’t agree more. Can we try it out and see what kind of GC or
performance improvement we get?


> On Mon, Feb 1, 2016 at 9:08 PM, Yakov Zhdanov  wrote:
>
> > No visible changes to throughput and latency on our common configuration,
> > but allocation pressure reduced up to 20% in put-get benchmarks.
> >
> > --Yakov
> >
> > 2016-02-01 20:02 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Any preliminary performance numbers?
> > >
> > > On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov 
> > wrote:
> > >
> > > > Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR
> > and
> > > > provide comments? Other reviewers are welcome, too! =)
> > > >
> > > > https://github.com/apache/ignite/pull/422
> > > >
> > > > I did some changes to decrease allocation pressure and fixed force
> keys
> > > > request not to be sent if rebalancing has already been successfully
> > > > completed on operation's topology version.
> > > >
> > > > --Yakov
> > > >
> > >
> >
>


Re: Review request

2016-02-03 Thread Yakov Zhdanov
I think we should not allocate list at all! We can just add listener to
added futures!:) and the semantics will be preserved!

However this will not work if we want to iterate over the added ones.
Number of unfinished futures will be still available through listener calls
count. Let me review.

If this doesn't work, we can maintain an array on our own.

And the last point.. This GC impact should be measured. What happen if you
add a reference to future and reference to collection?

Yakov
On Feb 3, 2016 7:44 PM, "Dmitriy Setrakyan"  wrote:

> On Wed, Feb 3, 2016 at 7:02 AM, Vladimir Ozerov 
> wrote:
>
> > As per cache - I hardly understand affected logic, so my review wouldn't
> > help much here.
> >
> > As per the rest changes - looks good for me. I also see garbage from NIO
> > and "force keys" as huge memory hotspots. The only problem is
> > GridCompoundFuture:
> >
> > if (futs == null)
> > futs = new ArrayList<>();
> >
> > futs.add(fut);
> >
> >
> > Things like this are proven to be anti-pattern in terms of memory
> > allocations, because on the last line you effectively allocate
> Object[10],
> > while usually you will have much less child futures. We could delay
> > allocation of array if we store the very first child future as direct
> > reference. And only second added future should lead to ArrayList
> > allocation. This should positively affect lots operations with "compound
> > semantics" and single cache key involved (e.g. single puts/gets).
> >
>
> Vova,
> I couldn’t agree more. Can we try it out and see what kind of GC or
> performance improvement we get?
>
>
> > On Mon, Feb 1, 2016 at 9:08 PM, Yakov Zhdanov 
> wrote:
> >
> > > No visible changes to throughput and latency on our common
> configuration,
> > > but allocation pressure reduced up to 20% in put-get benchmarks.
> > >
> > > --Yakov
> > >
> > > 2016-02-01 20:02 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Any preliminary performance numbers?
> > > >
> > > > On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov 
> > > wrote:
> > > >
> > > > > Vladimir Ozerov and Alex Goncharuk, can you please take a look at
> PR
> > > and
> > > > > provide comments? Other reviewers are welcome, too! =)
> > > > >
> > > > > https://github.com/apache/ignite/pull/422
> > > > >
> > > > > I did some changes to decrease allocation pressure and fixed force
> > keys
> > > > > request not to be sent if rebalancing has already been successfully
> > > > > completed on operation's topology version.
> > > > >
> > > > > --Yakov
> > > > >
> > > >
> > >
> >
>


Re: Review request

2016-02-01 Thread Dmitriy Setrakyan
On Mon, Feb 1, 2016 at 10:08 AM, Yakov Zhdanov  wrote:

> No visible changes to throughput and latency on our common configuration,
> but allocation pressure reduced up to 20% in put-get benchmarks.
>

Nice!


>
> --Yakov
>
> 2016-02-01 20:02 GMT+03:00 Dmitriy Setrakyan :
>
> > Any preliminary performance numbers?
> >
> > On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov 
> wrote:
> >
> > > Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR
> and
> > > provide comments? Other reviewers are welcome, too! =)
> > >
> > > https://github.com/apache/ignite/pull/422
> > >
> > > I did some changes to decrease allocation pressure and fixed force keys
> > > request not to be sent if rebalancing has already been successfully
> > > completed on operation's topology version.
> > >
> > > --Yakov
> > >
> >
>


Re: Review request

2016-02-01 Thread Dmitriy Setrakyan
Any preliminary performance numbers?

On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov  wrote:

> Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR and
> provide comments? Other reviewers are welcome, too! =)
>
> https://github.com/apache/ignite/pull/422
>
> I did some changes to decrease allocation pressure and fixed force keys
> request not to be sent if rebalancing has already been successfully
> completed on operation's topology version.
>
> --Yakov
>


Review request

2016-02-01 Thread Yakov Zhdanov
Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR and
provide comments? Other reviewers are welcome, too! =)

https://github.com/apache/ignite/pull/422

I did some changes to decrease allocation pressure and fixed force keys
request not to be sent if rebalancing has already been successfully
completed on operation's topology version.

--Yakov


Re: Review request

2016-02-01 Thread Yakov Zhdanov
No visible changes to throughput and latency on our common configuration,
but allocation pressure reduced up to 20% in put-get benchmarks.

--Yakov

2016-02-01 20:02 GMT+03:00 Dmitriy Setrakyan :

> Any preliminary performance numbers?
>
> On Mon, Feb 1, 2016 at 8:52 AM, Yakov Zhdanov  wrote:
>
> > Vladimir Ozerov and Alex Goncharuk, can you please take a look at PR and
> > provide comments? Other reviewers are welcome, too! =)
> >
> > https://github.com/apache/ignite/pull/422
> >
> > I did some changes to decrease allocation pressure and fixed force keys
> > request not to be sent if rebalancing has already been successfully
> > completed on operation's topology version.
> >
> > --Yakov
> >
>