Re: IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-06-21 Thread Alexei Scherbakov
Do we have a real reproducer for thread unsafe behavior, which causes data
inconsistency ?
AFAIK all required parts of txn processing are already properly linearized,
and other parts are ready to be processed in parallel (like txn recovery)

пн, 19 июн. 2023 г. в 22:25, Anton Vinogradov :

> Folks, idea to synchronize all methods unfortunately failed :(
> 1) TxState has 4 implementations, a lot of changes are required
> 2) IgniteTxEntry is not synchronized as well ...
> 3) And even IgniteInternalTx implementations (1+ lines) are not
> synchronized as well ...
> It seems to be unreal to refactor this properly.
>
> Also, the methods synchronization is just provides current data read
> guarantee, not a thread safety.
>
> If I understand correctly, the only proper fix is to keep everything
> unsynchronized, but guarantee every tx processing only at one thread at the
> same time + data visibility.
> Possible fix is to process same tx at the same thread each time, but we
> already found that tx can be created at the user thread, and can be, for
> example, suspended or committed from the user thread again. So, seems, it's
> impossible to provide such guatantee.
>
> But, the possible solution is to wrap each tx processing with some lock or
> synchronize section, like:
> synchronize(tx){
> val aaa = tx.getAAA();
> tx.updateXXX();
> tx.updateYYY();
> }
> This will guarantee fields visibility as we as strict tx processing, step
> by step.
> Single lock/synchronize should not cause the perfomance problem, I think.
>
> But, this may cause a deadlock it case some such executions will require
> another at the other threads, but related to the same tx.
>
> And the current question is:
> Do we expect that Ignite is not required to process something related to
> the same tx at different threads simultaneously?
>
> On Wed, May 24, 2023 at 4:11 PM Anton Vinogradov  wrote:
>
> > >> could you please point to this at code, it may be not needed after the
> > fix and can bring the performance growth.
> > BTW, found the trick.
> > Still necessary to keep copying.
> >
> > On Wed, May 24, 2023 at 2:44 PM Anton Vinogradov  wrote:
> >
> >> Andrey,
> >>
> >> Thanks for the tip.
> >> Of course, I'll benchmark the fix before the merge.
> >>
> >> According to the comment,
> >> >>  and return entries copy form tx state in order to avoid
> >> ConcurrentModificationException.
> >> , could you please point to this at code, it may be not needed after the
> >> fix and can bring the performance growth.
> >>
> >> >> I believe that mentioned invariants were broken later but ...
> >> >> ... this state should be accessed mostly from one thread
> >> Code was never designed to fit this statement.
> >> For example, the most of cctx.tm().newTx(...) calls dated by 2014
> (which
> >> means "before 2014").
> >> Currently, allwost all tx creations happen not at the striped pool as
> >> well as tx preparations.
> >> Only 1/2 of the messages now striped correctly.
> >> Of course, it's theoretically possible to process tx at the same thread
> >> each time, but, global refactoring with a performance drop is required
> in
> >> this case, I think.
> >>
> >> My current Idea is to finish synchronization started by you.
> >> I've pepared the fix [1], got the visa and going to benchmark it.
> >>
> >> [1] https://github.com/apache/ignite/pull/10732/files
> >>
> >> On Tue, May 23, 2023 at 8:54 PM Andrey Gura  wrote:
> >>
> >>> Please, run benchmarks after fixing the problem. E.g. replacing HashMap
> >>> to
> >>> ConcurrentHashMap can significantly affect performance.
> >>>
> >>> See for example comments to IGNITE-2968 issue (
> >>>
> >>>
> https://issues.apache.org/jira/browse/IGNITE-2968?focusedCommentId=15415170=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15415170
> >>> ).
> >>>
> >>> I believe that mentioned invariants were broken later but in general I
> >>> agree with Alexey, this state should be accessed mostly from one
> thread.
> >>> Exceptional cases should be synchronized or redesigned. E.g. if metrics
> >>> read a transaction's state I prefer remove these metrics or ignore some
> >>> inaccuracy then performance reducing.
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, May 19, 2023 at 7:32 PM Ivan Daschinsky 
> >&g

Re: IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-05-19 Thread Alexei Scherbakov
Tx processing is supposed to be thread bound by hashing the version to a
partition, see methods like [1]
If for some cases this invariant is broken, this should be fixed.

[1] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition

пт, 19 мая 2023 г. в 15:57, Anton Vinogradov :

> Igniters,
>
> My team was faced with node failure [1] because of non-threadsafe
> collections usage.
>
> IgniteTxStateImpl's fields
> - activeCacheIds
> - txMap
> are not thread safe, but are widely used from different threads without the
> proper sync.
>
> The main question is ... why?
>
> According to the research, we have no guarantee that tx will be processed
> at the single thread.
> It may be processed at the several! threads at the striped pool and at the
> tx recovery thread as well.
>
> Thread at the striped pool will be selected by the message's partition()
> method, which can be calculated like this:
> - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : -1;
> - return U.safeAbs(version().hashCode());
> - ...,
> so, no guarantee it is processed at the same thread (proven by tests).
>
> Seems, we MAY lose the data.
> For example, ignoring some or all keys from txMap at commit.
>
> If anyone knows why this is not a problem (I mean sync lack, not data loss)
> or how to fix this properly, please give me a hint, or correct my
> conclusions if necessary.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-19445
>


-- 

Best regards,
Alexei Scherbakov


Re: Naming style for configuration properties defining durations in Ignite 3

2022-12-26 Thread Alexei Scherbakov
What about specifying dimensions in config, like

joinTimeout=5s

Internally it is always stored in millis.


пт, 23 дек. 2022 г. в 11:53, Pavel Tupitsyn :

> Agree with the proposal.
> Those properties go into config files (json/hocon), and it is difficult to
> navigate to relevant docs from there.
>
> I would go with point 3 - longer suffixes like Millis - to make it
> unambiguous.
>
> On Fri, Dec 23, 2022 at 10:06 AM Roman Puchkovskiy <
> roman.puchkovs...@gmail.com> wrote:
>
> > Hi Igniters!
> >
> > In Ignite 3, in configuration schemas, we have some properties that
> > define durations/intervals (usually, these are timeout durations).
> > Almost always they are defined without mentioning the duration unit,
> > like this:
> >
> > /** Node join timeout. */
> > @Value(hasDefault = true)
> > public int joinTimeout = 5_000;
> >
> > This makes a user puzzled about the unit of measure: it might be
> > milliseconds (as it often is), but who knows.
> > It seems beneficial to include the unit of measure in names of such
> > properties, like joinTimeoutMs or joinTimeoutMillis.
> >
> > We might try to solve this via documentation, but, when a property
> > name speaks by itself, we don't need to reach for the documentation at
> > all, so it still seems better to have it in a name (and, if you edit a
> > config file, reaching for documentation might be a bit more
> > time-consuming than navigating to a method javadoc in an IDE).
> >
> > We could do one of the following:
> >
> > 1. Do not change the naming schema, just improve the documentation; we
> > might also establish a strong convention of always having all
> > durations in the same unit (like milliseconds) [there is a
> > disadvantage: if we add a property that is naturally measured in, say,
> > hours, we would still have to use millis, which would make
> > configuration very cumbersome with all these zeros]
> > 2. Use short suffixes (ns/ms/sec/min/hr/days): joinTimeoutMs. There is
> > a problem with microseconds if we ever need them (how do we abbreviate
> > it according to this style?)
> > 3. Use longer suffixes (nanos/micros/millis/secs/mins/hours/days):
> > joinTimeoutMillis. A bit longer than the previous one.
> > 4. ... or something else?
> >
> > What do you think?
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Change default behaviour of atomic operations inside transactions

2022-10-28 Thread Alexei Scherbakov
Changing the atomic caches in a transaction is really weird, so I would
remove the property as well and retain only the safe behavior.
If you want to update an atomic cache, do it before or after a transaction.

пт, 28 окт. 2022 г. в 13:06, Maksim Timonin :

> Hi, all!
>
> In private discussion with Ivan Daschinsky and Anton Vinogradov we
> discussed optional scenarios when such a situation is possible.
>
> Then I agree with Stan's proposal:
> 1. Revert deprecation.
> 2. Change default value in 2.15.
> 3. Notify users in release notes, an exception message - how to change the
> behavior back.
>
> If there are no objections, I'll revert a commit on Monday.
>
> Thanks!
>
>
> On Tue, Oct 25, 2022 at 3:43 PM Maksim Timonin 
> wrote:
>
> > Hi, Stan!
> >
> > >> Say, I have an ATOMIC and TRANSACTIONAL caches in my system, and I
> need
> > to change them at the same time
> >
> > Looks very unreliable. Which guarantees users expect from Ignite in this
> > case? For example - transaction rollbacks but atomic change (within this
> > tx) succeeds, and vice versa. I'm not sure Ignite should allow this
> > behaviour. Do you know real real cases when Ignite is used in such a way?
> >
> >
> > On Tue, Oct 25, 2022 at 3:27 PM Anton Vinogradov  wrote:
> >
> >> >> WDYT?
> >> +1
> >>
> >> On Tue, Oct 25, 2022 at 11:40 AM Stanislav Lukyanov <
> >> stanlukya...@gmail.com>
> >> wrote:
> >>
> >> > Nikita,
> >> >
> >> > The system property isn't really the problem, right? The problem is
> the
> >> > default behavior?
> >> > Do you suggest that the future behavior change will be added to the
> >> > release notes?
> >> > Can you add a proposed release note text to the ticket so that we are
> on
> >> > the same page about what will be announced?
> >> >
> >> > Also, should there be something like run-time warning for the
> operations
> >> > that will later become forbidden?
> >> >
> >> > About the 2.16 change. I agree with the default - makes sense. But can
> >> we
> >> > please keep a way to revert this?
> >> > I think just changing the default behavior but keeping the property is
> >> the
> >> > best.
> >> >
> >> > The problem is that there can be people that understand the behavior
> but
> >> > want to do that anyway. Say, I have an ATOMIC and TRANSACTIONAL caches
> >> in
> >> > my system,
> >> > and I need to change them at the same time. How would I do that after
> >> the
> >> > change is implemented?
> >> >
> >> >
> >> > Thinking about this, I believe a more aggressive change would be
> better
> >> -
> >> > but with a possibility to opt-out.
> >> > My proposal:
> >> > - Don't deprecate the property - revert the commit.
> >> > - Change the default of the property.
> >> > - Make sure the error message explains how to return the old behavior
> >> > (IGNITE_ALLOW_ATOMIC_OPS_IN_TX=true).
> >> > - Make sure this is mentioned in the release notes.
> >> > - Do this in 2.15, not 2.16.
> >> >
> >> > WDYT?
> >> >
> >> >
> >> > Sorry for diving deep into this - this is a breaking change that
> >> > potentially impacts many users, that's why I'm a bit anxious :)
> >> >
> >> > Thanks,
> >> > Stan
> >> >
> >> > > On 24 Oct 2022, at 21:25, Nikita Amelchev 
> >> wrote:
> >> > >
> >> > > Stanislav,
> >> > >
> >> > > 2.15: The system property will be deprecated. Release notes will
> >> > > contain warning info about deprecation and behavior in future
> >> > > releases.
> >> > >
> >> > > 2.16: The system property will be removed. All atomic operations
> >> > > within transactions will be forbidden.
> >> > >
> >> > > See merged PR: https://github.com/apache/ignite/pull/10327/files
> >> > >
> >> > > сб, 22 окт. 2022 г. в 17:42, Stanislav Lukyanov <
> >> stanlukya...@gmail.com
> >> > >:
> >> > >>
> >> > >> Hi all,
> >> > >>
> >> > >> Can someone please clarify what specific changes will be
> implemented
> >> in
> >> > 2.15 and 2.16? Wha

Re: [DISCUSSION] Change default behaviour of atomic operations inside transactions

2022-10-17 Thread Alexei Scherbakov
By placing the @Deprecated annotation on the property.

пн, 17 окт. 2022 г. в 19:07, Anton Vinogradov :

> How can we deprecate this?
>
> On Mon, Oct 17, 2022 at 5:30 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > We can do breaking changes by following the approved procedure: 1)
> > deprecate in the next release 2) remove in the some release after the
> next
> >
> > The ticket looks fine to me.
> >
> > пн, 17 окт. 2022 г. в 15:50, Anton Vinogradov :
> >
> > > We MUST break this, of course!
> > > Atomic operations inside the transaction is a wrong and unexpected
> > > behaviour and MUST be restricted for every user.
> > >
> > > On Mon, Oct 17, 2022 at 3:05 PM Julia Bakulina  >
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I have found this ticket
> > > https://issues.apache.org/jira/browse/IGNITE-8801 -
> > > > Change default behaviour of atomic operations inside transactions -
> in
> > > > backlog and created a PR with changes. The ticket relates to
> > > > https://issues.apache.org/jira/browse/IGNITE-2313.
> > > > During the review process it appeared that probably there is no need
> in
> > > > this ticket as it includes the changes of the default API and we
> should
> > > not
> > > > break backward compatibility.
> > > >
> > > > Do we need these changes? Should the ticket be closed with "won't
> fix"?
> > > >
> > > > Have a nice day,
> > > > Julia
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Change default behaviour of atomic operations inside transactions

2022-10-17 Thread Alexei Scherbakov
We can do breaking changes by following the approved procedure: 1)
deprecate in the next release 2) remove in the some release after the next

The ticket looks fine to me.

пн, 17 окт. 2022 г. в 15:50, Anton Vinogradov :

> We MUST break this, of course!
> Atomic operations inside the transaction is a wrong and unexpected
> behaviour and MUST be restricted for every user.
>
> On Mon, Oct 17, 2022 at 3:05 PM Julia Bakulina 
> wrote:
>
> > Hi Team,
> >
> > I have found this ticket
> https://issues.apache.org/jira/browse/IGNITE-8801 -
> > Change default behaviour of atomic operations inside transactions - in
> > backlog and created a PR with changes. The ticket relates to
> > https://issues.apache.org/jira/browse/IGNITE-2313.
> > During the review process it appeared that probably there is no need in
> > this ticket as it includes the changes of the default API and we should
> not
> > break backward compatibility.
> >
> > Do we need these changes? Should the ticket be closed with "won't fix"?
> >
> > Have a nice day,
> > Julia
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] IEP-91 Transaction protocol

2022-10-12 Thread Alexei Scherbakov
Hi Roman, good catch.

These methods should be in Transaction, I've fixed that.

вт, 11 окт. 2022 г. в 13:12, Roman Puchkovskiy :

> Hi Alexei.
>
> Great proposal, thank you for your efforts.
>
> One thing that I noted is that the proposed IgniteInstructions
> interface contains parameterless commit*() and rollback*() methods.
> The transactions are not thread-bound, so there seems to be no
> implicit mechanism for the global IgniteInstructions to know about the
> current transaction, hence it looks like either these methods need to
> belong to the Transaction interface, or that they need Transaction to
> be passed as a parameter. Or, if there is still some other mechanism,
> I missed it while reading.
>
> пн, 26 сент. 2022 г. в 16:32, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
> >
> > Hello, igniters.
> >
> > After a long and hard work I've finally pushed the IEP
> > <
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-91%3A+Transaction+protocol
> >
> > to
> > the necessary detail level.
> > Any useful feedback would be greatly appreciated.
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: Apache Ignite 3.0.0 beta 1 RELEASE [Time, Scope, Manager]

2022-10-08 Thread Alexei Scherbakov
+1 Sounds good

пт, 7 окт. 2022 г. в 18:07, Andrey Gura :

> Hi, Igniters!
>
> It's time for a new release of Apache Ignite 3 beta 1. The expected
> feature list consists of:
>
> - RPM and DEB packages: simplified installation and node management
> with system services.
> - Client's Partition Awareness: Clients are now aware of data
> distribution over the cluster nodes which helps avoid additional
> network transmissions and lowers operations latency.
> - C++ client:  Basic C++ client, able to perform operations on data.
> - Autogenerated values: now a function can be specified as a default
> value generator during a table creation. Currently only
> gen_random_uuid is supported.
> - SQL Transactions.
> - Transactional Protocol: improved locking model, multi-version based
> lock-free read-only transactions.
> - Storage: A number of improvements to memory-only and on-disk engines
> based on Page Memory.
> - Indexes: Basic functionality, hash and sorted indexes.
> - Client logging: A LoggerFactory may be provided during client
> creation to specify a custom logger for logs generated by the client.
> - Metrics framework: Collection and export of cluster metrics.
>
> I want to propose myself to be the release manager of the Apache
> Ignite 3 beta 1.
>
> Also I propose the following milestones for the release:
>
> Scope Freeze: October 12, 2022
> Code Freeze: October 20, 2022
> Voting Date: October 31, 2022
> Release Date: November 5, 2022
>
> WDYT?
>


-- 

Best regards,
Alexei Scherbakov


[DISCUSSION] IEP-91 Transaction protocol

2022-09-26 Thread Alexei Scherbakov
Hello, igniters.

After a long and hard work I've finally pushed the IEP
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-91%3A+Transaction+protocol>
to
the necessary detail level.
Any useful feedback would be greatly appreciated.

-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Thin client: Colocated Data Transactional Get

2022-08-11 Thread Alexei Scherbakov
Moving transaction logic to the client seems to me a bad idea.

I would instead send the transaction's code close to data using Compute API.

чт, 11 авг. 2022 г. в 17:54, Maxim Muzafarov :

> Igniters,
>
>
> I'd like to discuss with you some thoughts about getting colocated
> data [1] from nodes via thin client (mostly the java thin client).
>
> - We do have partition awareness enabled for non-transactional data [2].
> - We do have from now on partition awareness for caches with custom
> affinity functions [3].
> - We do NOT have an option to execute a transactional operation on
> affinity node [4], however, I think it is possilbe to send a
> transactional get request over the affinity channels instead of the
> default one.
>
>
> The process execution on the thin client side may looks like:
>
> open transaction (colocated get request) -> set the right client
> context -> pick up the affinity node channel (instead of the default
> one) -> send the request over this channel right to the destination
> primaries.
>
> The interface improvement may looks like:
>
> IgniteClient {
> public void withAffinityNode(String cacheName, Object affKey);
> }
>
> IgniteClient affClient = Ignition.startClient(new ClientConfiguration()
> .setAddresses("node1_address:10800", "node2_address:10800",
> "node3_address:10800"))
> .withAffinityNode("person", "affKey");
>
> try (ClientTransaction tx = affClient.transactions().txStart()
> ) {
> ClientCache personCache = affClient.cache("person");
> ClientCache paymentsCache =
> affClient.cache("payments");
>
>  // personCache.get("affKey");
>  // paymentsCache.get(..);
> }
> catch (ClientException e) {
> // Ignore.
> }
>
>
> Additional benefits:
>
> - ScanQuery, SqlQuery with #setLocal(true) flag right on the required
> affinity channel;
> - ClientCompute task right on the required affinity channel;
>
>
> WDYT?
>
>
> [1]
> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation#affinity-colocation
> [2]
> https://ignite.apache.org/docs/latest/thin-clients/getting-started-with-thin-clients#partition-awareness
> [3] https://issues.apache.org/jira/browse/IGNITE-17316
> [4]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/client/thin/TcpClientCache.java#L994
>


-- 

Best regards,
Alexei Scherbakov


Re: [ANNOUNCE] Welcome Alexander Lapin as a new committer

2022-03-11 Thread Alexei Scherbakov
At last. Congrats.

чт, 10 мар. 2022 г. в 15:24, Roman Kondakov :

> Alexander, congratulations!
>
> --
> Roman Kondakov
>
>
> On 10.03.2022 19:32, Вячеслав Коптилин wrote:
> > Hi,
> >
> > Congratulations, Alexander!
> >
> > Thanks,
> > S.
> >
> > ср, 9 мар. 2022 г. в 18:57, Ivan Pavlukhin :
> >
> >> Alex, congratulations, well deserved!
> >>
> >> 2022-03-09 18:18 GMT+03:00, Petr Ivanov :
> >>> Congratulations!
> >>>
> >>>> On 9 Mar 2022, at 17:46, Kseniya Romanova 
> >> wrote:
> >>>> Igniters! The Apache Ignite Project Management Committee (PMC) has
> >>>> invited
> >>>> Alexander Lapin to become a new committer and are happy to announce
> that
> >>>> he
> >>>> has
> >>>> accepted.
> >>>>
> >>>> Alexander contributed to the Ignite several important improvements of
> >> the
> >>>> JDBC thin driver, and tracing framework.  Also, he created some
> valuable
> >>>> content about Tracing and Data Rebalance. Now, he is deeply involved
> in
> >>>> developing Apache Ignite 3.0.
> >>>>
> >>>> Being a committer enables easier contribution to the project since
> there
> >>>> is
> >>>> no need to go via the patch submission process. This should enable
> >> better
> >>>> productivity.
> >>>>
> >>>> Please join me in welcoming Alexander, and congratulating him on the
> new
> >>>> role in the Apache Ignite Community.
> >>>>
> >>>> Cheers,
> >>>> Kseniya Romanova
> >>>
> >>
> >> --
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] @Nullable/@NotNull annotation usage in Ignite 3

2021-12-16 Thread Alexei Scherbakov
+1 for 2

чт, 16 дек. 2021 г. в 18:50, Pavel Tupitsyn :

> Option 2 seems the most sensible.
> It translates to/from other languages and should be already familiar to
> some developers.
>
> For example, with nullable checks enabled, C# treats everything as not
> null, unless specified otherwise with "?".
> Same for other languages where Option/Maybe type is present. Nothing is
> null by default.
>
> On Thu, Dec 16, 2021 at 6:14 PM Alexander Polovtcev <
> alexpolovt...@gmail.com>
> wrote:
>
> > Dear Igniters!
> >
> > I would like to propose a discussion about defining a policy regarding
> > where and how to use @Nullable/@NotNull annotations. These annotations
> are
> > used in conjunction with a static analysis engine (e.g. built in IDEA)
> and
> > are useful for avoiding null dereferencing and specifying method
> contracts.
> >
> > I can see the following possible options:
> >
> > 1. *Use both @Nullable and @NotNull annotations everywhere* (i.e. method
> > parameters and return types, class fields). Pros: the most robust and
> > expressive variant; easy to agree on and specify. Cons: very verbose; may
> > lead to code cluttering;
> > 2. *Use only @Nullable *for specifying method parameters that accept null
> > or class fields that can be null, treating @NotNull as an implicit
> default.
> > Pros: correlates with the default IDEA settings (with all corresponding
> > inspections enabled); not as verbose as option 1, since nullable
> parameters
> > are quite rare. Cons: less sound and complete, especially when working
> with
> > third-party libraries that are not annotated, since we cannot apply the
> > implicit @NotNull there;
> > 3. *Use only @NotNull *and treat @Nullable as an implicit default. Pros:
> > less verbose than option 1, better correlates with Java language
> semantics
> > (since all Java references are nullable). Cons: more verbose than option
> 2;
> > may be impossible to properly set up the analysis engine or may require
> > switching to a different annotation provider that supports jsr-305
> > @ParametersAreNullableByDefault;
> > 4. *Do not use @Nullable nor @NotNull*. The most radical option in case
> we
> > will not be able to agree on any of the above three. No annotations - no
> > need for the discussion.
> >
> > What do you think? Are there any other options out there? I would like to
> > collect as many options as possible and organize a vote some time later.
> >
> > --
> > With regards,
> > Aleksandr Polovtcev
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Transaction API desing for Ignite 3

2021-11-30 Thread Alexei Scherbakov
Pavel, I agree with Val to avoid overloading due to a loss of API
transparency.

Val, moving the tx argument at the first position seems good to me.

пн, 29 нояб. 2021 г. в 22:03, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Alexei,
>
> One more comment: I actually think that the transaction should be the first
> argument, not the last. This way it's easier to keep the API consistent.
> For example, if a method uses varargs as one of the parameters, you won't
> be able to put the tx parameter at the end. There might be other cases as
> well. What do you think?
>
> -Val
>
> On Mon, Nov 29, 2021 at 10:59 AM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > I like Alexei's suggestion. This seems to be the most transparent and
> > explicit approach. Basically, this ensures that the user is always aware
> of
> > whether an operation is enlisted in a transaction or not. Any other
> option
> > is either error-prone, or introduces unnecessary counter-intuitive
> > limitations.
> >
> > I don't think we should keep overloads without the tx parameter, because
> > that will pretty much eliminate the value of this change. One thing we
> can
> > do to address this is to have separate "non-tx" views, which can only be
> > used to execute implicit transactions. But I would look at this after we
> > more or less stabilize the primary API.
> >
> > -Val
> >
> > On Mon, Nov 29, 2021 at 5:03 AM Pavel Tupitsyn 
> > wrote:
> >
> >> Alexei,
> >>
> >> Are we going to offer an overload without tx parameter?
> >>
> >> getAsync(K key);
> >> getAsync(K key, Transaction tx);
> >>
> >> On Mon, Nov 29, 2021 at 3:43 PM Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com> wrote:
> >>
> >> > Pavel,
> >> >
> >> > The problem with a current approach to me is the possibility of
> >> forgetting
> >> > to enlist a table into a transaction, because it is not enforced.
> >> > Having the explicit argument for this purpose seems less error-prone
> to
> >> me.
> >> >
> >> > пн, 29 нояб. 2021 г. в 15:13, Pavel Tupitsyn :
> >> >
> >> > > Taras, yes, yours is the actual syntax in main branch right now,
> >> > > I've skipped the tx argument in my code accidentally.
> >> > >
> >> > > On Mon, Nov 29, 2021 at 3:03 PM Taras Ledkov 
> >> > wrote:
> >> > >
> >> > > > Hi colleagues,
> >> > > >
> >> > > > 2Pavel:
> >> > > > > RecordView txView = view.withTransaction();
> >> > > > Can we use the syntax (see below) to attach the table / operation
> to
> >> > the
> >> > > > started transaction?
> >> > > > RecordView  txPersonView =
> >> > > > person.recordView().withTransaction(txView.transaction());
> >> > > >
> >> > > >
> >> > > > On Mon, Nov 29, 2021 at 1:34 PM Pavel Tupitsyn <
> >> ptupit...@apache.org>
> >> > > > wrote:
> >> > > >
> >> > > > > Alexei,
> >> > > > >
> >> > > > > I agree that runInTransaction is confusing and error-prone.
> >> > > > >
> >> > > > > But we already have view.withTransaction(), which seems to be
> the
> >> > most
> >> > > > > boilerplate-free approach.
> >> > > > > The example above will look like this:
> >> > > > >
> >> > > > > public void testMixedPutGet() throws TransactionException {
> >> > > > > RecordView view = accounts.recordView();
> >> > > > >
> >> > > > > view.upsert(makeValue(1, BALANCE_1));
> >> > > > >
> >> > > > > RecordView txView = view.withTransaction();
> >> > > > >
> >> > > > > txView.getAsync(makeKey(1)).thenCompose(r ->
> >> > > > > txView.upsertAsync(makeValue(1, r.doubleValue("balance") +
> DELTA),
> >> > > > > tx)).thenCompose(txView.transaction().commitAsync()).join();
> >> > > > >
> >> > > > > assertEquals(BALANCE_1 + DELTA,
> >> > > > > view.get(makeKey(1)).doubleValue("balance"));
> >> > > 

Re: IEP-61 Transaction API desing for Ignite 3

2021-11-29 Thread Alexei Scherbakov
Pavel,

The problem with a current approach to me is the possibility of forgetting
to enlist a table into a transaction, because it is not enforced.
Having the explicit argument for this purpose seems less error-prone to me.

пн, 29 нояб. 2021 г. в 15:13, Pavel Tupitsyn :

> Taras, yes, yours is the actual syntax in main branch right now,
> I've skipped the tx argument in my code accidentally.
>
> On Mon, Nov 29, 2021 at 3:03 PM Taras Ledkov  wrote:
>
> > Hi colleagues,
> >
> > 2Pavel:
> > > RecordView txView = view.withTransaction();
> > Can we use the syntax (see below) to attach the table / operation to the
> > started transaction?
> > RecordView  txPersonView =
> > person.recordView().withTransaction(txView.transaction());
> >
> >
> > On Mon, Nov 29, 2021 at 1:34 PM Pavel Tupitsyn 
> > wrote:
> >
> > > Alexei,
> > >
> > > I agree that runInTransaction is confusing and error-prone.
> > >
> > > But we already have view.withTransaction(), which seems to be the most
> > > boilerplate-free approach.
> > > The example above will look like this:
> > >
> > > public void testMixedPutGet() throws TransactionException {
> > > RecordView view = accounts.recordView();
> > >
> > > view.upsert(makeValue(1, BALANCE_1));
> > >
> > > RecordView txView = view.withTransaction();
> > >
> > > txView.getAsync(makeKey(1)).thenCompose(r ->
> > > txView.upsertAsync(makeValue(1, r.doubleValue("balance") + DELTA),
> > > tx)).thenCompose(txView.transaction().commitAsync()).join();
> > >
> > > assertEquals(BALANCE_1 + DELTA,
> > > view.get(makeKey(1)).doubleValue("balance"));
> > > }
> > >
> > > Is there any problem with this?
> > >
> > > On Mon, Nov 29, 2021 at 10:45 AM Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com> wrote:
> > >
> > > > Folks,
> > > >
> > > > Recently I've pushed transactions support phase 1 for Ignite 3, see
> > [1].
> > > > Feel free to give feedback.
> > > > Current implementation attempts to automatically enlist a table into
> > > > transaction if it's started using [2] or [3] by using thread local
> > > context,
> > > > similar to Ignite 2 approach, to reduce the amount of boilerplate
> code.
> > > > But it turns out such an approach still has unacceptable drawbacks
> > from a
> > > > user experience point of view.
> > > >
> > > > Consider the example [4]:
> > > >
> > > > public void testMixedPutGet() throws TransactionException {
> > > > accounts.recordView().upsert(makeValue(1, BALANCE_1));
> > > >
> > > > igniteTransactions.runInTransaction(tx -> {
> > > > var txAcc = accounts.recordView().withTransaction(tx);
> > > >
> > > > txAcc.getAsync(makeKey(1)).thenCompose(r ->
> > > > txAcc.upsertAsync(makeValue(1, r.doubleValue("balance") +
> > > DELTA))).join();
> > > > });
> > > >
> > > > assertEquals(BALANCE_1 + DELTA,
> > > > accounts.recordView().get(makeKey(1)).doubleValue("balance"));
> > > > }
> > > >
> > > > Here we *have to* to manually enlist a table if it's used in async
> > chain
> > > > call, because the caller thread will be different and the chained
> > > operation
> > > > will be executed in separate tx.
> > > > This works similarly in Ignite 2 and is very confusing.
> > > >
> > > > To avoid this, I propose to add an explicit Transaction argument to
> > each
> > > > table API method. Null value means to start the implicit transaction
> > > > (autocommit mode). For example:
> > > >
> > > > /**
> > > >  * Asynchronously inserts a record into the table if it doesn't
> > exist
> > > > or replaces the existed one.
> > > >  *
> > > >  * @param rec A record to insert into the table. The record
> cannot
> > be
> > > > {@code null}.
> > > >  * @param tx The transaction or {@code null} to auto commit.
> > > >  * @return Future representing pending completion of the
> operation.
> > > >  */
> > > > @NotNull CompletableFuture upsertAsync(@

Re: IEP-61 Transaction API desing for Ignite 3

2021-11-28 Thread Alexei Scherbakov
Folks,

Recently I've pushed transactions support phase 1 for Ignite 3, see [1].
Feel free to give feedback.
Current implementation attempts to automatically enlist a table into
transaction if it's started using [2] or [3] by using thread local context,
similar to Ignite 2 approach, to reduce the amount of boilerplate code.
But it turns out such an approach still has unacceptable drawbacks from a
user experience point of view.

Consider the example [4]:

public void testMixedPutGet() throws TransactionException {
accounts.recordView().upsert(makeValue(1, BALANCE_1));

igniteTransactions.runInTransaction(tx -> {
var txAcc = accounts.recordView().withTransaction(tx);

txAcc.getAsync(makeKey(1)).thenCompose(r ->
txAcc.upsertAsync(makeValue(1, r.doubleValue("balance") + DELTA))).join();
});

assertEquals(BALANCE_1 + DELTA,
accounts.recordView().get(makeKey(1)).doubleValue("balance"));
}

Here we *have to* to manually enlist a table if it's used in async chain
call, because the caller thread will be different and the chained operation
will be executed in separate tx.
This works similarly in Ignite 2 and is very confusing.

To avoid this, I propose to add an explicit Transaction argument to each
table API method. Null value means to start the implicit transaction
(autocommit mode). For example:

/**
 * Asynchronously inserts a record into the table if it doesn't exist
or replaces the existed one.
 *
 * @param rec A record to insert into the table. The record cannot be
{@code null}.
 * @param tx The transaction or {@code null} to auto commit.
 * @return Future representing pending completion of the operation.
 */
@NotNull CompletableFuture upsertAsync(@NotNull R rec, @Nullable
Transaction tx);

The example [4] turns to

public void testMixedPutGet() throws TransactionException {
RecordView view = accounts.recordView();

view.upsert(makeValue(1, BALANCE_1));

igniteTransactions.runInTransaction(tx -> {
view.getAsync(makeKey(1), tx).thenCompose(r ->
view.upsertAsync(makeValue(1, r.doubleValue("balance") + DELTA),
tx)).join();
});

assertEquals(BALANCE_1 + DELTA,
view.get(makeKey(1)).doubleValue("balance"));
}

Share your thoughts.

[1] https://issues.apache.org/jira/browse/IGNITE-15085
[2] 
org.apache.ignite.tx.IgniteTransactions#runInTransaction(java.util.function.Consumer)
[3] 
org.apache.ignite.tx.IgniteTransactions#runInTransaction(java.util.function.Function)
[4] org.apache.ignite.internal.table.TxAbstractTest#testMixedPutGet

ср, 14 июл. 2021 г. в 14:12, Alexei Scherbakov :

> Andrey,
>
> 1) "As a user, I'd expect runInTransaction(closure) will create Tx for me,
> commit Tx after a successful closure call, and rollback Tx in case of
> error."
> - I'm ok with this behavior, and will alter javadoc.
>
> 2) "Transaction tx = beginTx()" - there is no such method "beginTx" in the
> proposed API, and I'm not intending to add it.
> For the synchronous case I suggest to use "runInTransaction", which
> eliminates the need in AutoClosable.
>
>
>
> ср, 14 июл. 2021 г. в 13:21, Ivan Daschinsky :
>
>> > yes, it is stated in the javadoc in the PR.
>> Ah, I see.
>>
>> ср, 14 июл. 2021 г. в 12:16, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com
>> >:
>>
>> > Ivan,
>> >
>> > And what if I have already committed transaction? Is it safe rollback
>> > already committed transaction? Rollback will silently return and do
>> > nothing? - yes, it is stated in the javadoc in the PR.
>> >
>> > Andrey,
>> >
>> > Then using "runInTransaction", lack of commit will cause a transaction
>> to
>> > rollback automatically.
>> >
>> > There is no need for a "close" method, it just adds confusion.
>> >
>> >
>> > ср, 14 июл. 2021 г. в 11:37, Andrey Mashenkov <
>> andrey.mashen...@gmail.com
>> > >:
>> >
>> > > Agree with Ivan.
>> > >
>> > > Method runInTransaction() should try to finish the transaction if the
>> > user
>> > > forgot to commit one.
>> > > I guess it might be a common mistake among new users.
>> > >
>> > > Also, I suggest to extent all table projections for better UX.
>> > > Let's allow
>> > > table.kvView().withTx(tx)
>> > > to user may cache kvVew instance and do
>> > > kvView.withTx(tx)
>> > > rather than
>> > > table.withTx(tx).kvVew()
>> > >
>> > >
>> > >
>> > > On Wed, J

Re: [VOTE] Allow or prohibit usages of the Guava library methods

2021-09-09 Thread Alexei Scherbakov
I've checked Guava's feature list and came to a conclusion it's usefulness
has been diminished by switching to base java 11.

-1 for general use, but we can use some code parts then needed.





ср, 8 сент. 2021 г. в 14:09, Вячеслав Коптилин :

> -1
> I am leaning toward -1 because of vulnerability issues (that is a possible
> case in general).
>
> Thanks,
> S.
>
> ср, 8 сент. 2021 г. в 12:13, Andrey Mashenkov  >:
>
> > -1
> > Supporting few copy-pasted methods is much easier than support
> dependencies
> > compatibility.
> >
> > On Tue, Sep 7, 2021 at 7:42 PM Zhenya Stanilovsky
> >  wrote:
> >
> > >
> > > Aleksandr, thanks for this activity.
> > > -1 from my side, all my decisions are in linked discussion.
> > >
> > > >Dear Igniters,
> > > >
> > > >In this thread
> > > ><
> > >
> >
> https://lists.apache.org/thread.html/r4120a03a2bf32098e54e21ae02e509b0d68f413bc7cc1f8f6d85c93d%40%3Cdev.ignite.apache.org%3E
> > > >
> > > >we've been discussing the problems and opportunities of using Guava
> > > >< https://github.com/google/guava > in Ignite 3. We have agreed that
> it
> > > >should be added as a shaded dependency, but we haven't decided whether
> > to
> > > >allow using Guava methods in the Ignite codebase or not. Therefore I
> > would
> > > >like to propose a vote:
> > > >
> > > >[+1 Allow]: allow using Guava methods, if justified.
> > > >[-1 Prohibit]: prohibit using all Guava methods.
> > > >
> > > >The voting will commence on Monday, September 13th at 9:00 UTC. Also
> > feel
> > > >free to express your opinion in the original discussion thread.
> > > >
> > > >--
> > > >With regards,
> > > >Aleksandr Polovtcev
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Replace Map with List and Iterable in KeyValueView Ignite 3 APIs

2021-09-09 Thread Alexei Scherbakov
Pavel,

I think the current API looks more natural compared to your proposal.

-1  from my side, see comments below.

чт, 9 сент. 2021 г. в 15:38, Pavel Tupitsyn :

> Igniters,
>
> I propose to replace Map with List in getAll and invokeAll, and
> Iterable in putAll APIs of Ignite 3.x KeyValueView.
>
> 1. Performance
> putAll simply iterates over the map, we can easily accept Iterable instead.
> Iterable can be implemented over anything, it can lazily read data from a
> file or some other place, instead of allocating a huge collection and
> performing unnecessary hashing.
>
> getAll returns a Map, but we don't know if the user code needs a map or
> just wants to iterate over the results, in which case Map is just overhead.
>

"allocating a huge collection" -
This method is not intended for loading a huge number of entries.
The allowed map size for the putAll should be limited to some reasonable
value.

Streaming loading should be addressed in a separate API similar to
DataStream from Ignite 2.


>
> 2. Equality
> getAll returns Map, but in many cases, the map will be useless
> because K does not have proper equals()/hashCode() implementation, so
> map.get(key) does not work.
>

We shouldn't rely on user equals/hashCode while working with key-value API.
Internally it uses binary representation of a user object for comparison
purposes.
The returned map implementation should work in the same way.


>
> Notes:
> - It is not clear which Pair class to use yet - IgniteBiTuple or something
> else.
> - Ignite 3 won't deadlock due to putAll entry order, so we don't have to
> worry about sorting.
>
> Thoughts, objections?
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Code style for Ignite 3

2021-08-20 Thread Alexei Scherbakov
+1

пт, 20 авг. 2021 г. в 10:54, Alexander Polovtcev :

> Hi, Val. This is an extremely welcome change, thank you!
>
> On Fri, Aug 20, 2021 at 12:17 AM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Igniters,
> >
> > I would like to discuss a potential change to the coding guidelines for
> > Ignite 3. Currently, we're using the existing guidelines inherited from
> > Ignite 2, which are described here:
> > https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines
> >
> > Current guidelines, however, exist for many years and have several
> issues.
> > They are cumbersome, carry a lot of legacy stuff, and can't be automated.
> > Every now and then, they seem to cause questions and confusion.
> >
> > While it's hard to make drastic changes in Ignite 2, we still have a
> great
> > opportunity to update the guidelines in Ignite 3. I would identify two
> > major goals here:
> >
> >1. Simplification. Having too many rules and restrictions tend to
> >complicate development rather than providing any value. We should come
> > up
> >with a minimum set of rules and then make amends one by one if needed.
> >2. The ability for automation. I hold a strong belief that code style
> >checking has to become a part of the build. Therefore, we need to make
> > sure
> >that any rule that ends up in the guideline can be automatically
> > verified.
> >
> > I propose the following process to define the new guideline:
> >
> >1. Use Google code style as the starting point:
> >https://google.github.io/styleguide/javaguide.html
> >2. Replace the 100 column limit with 140. The latter is the value we
> >already use in Ignite 2, and it seems to be more reasonable, in my
> > opinion.
> >3. Use 4 spaces block indentation and 8 spaces for continuation (as
> >opposed to 2 and 4). Nothing wrong with 2 spaces, in my view, but 4
> > spaces
> >should provide a smoother transition, as we're really used to this
> > style.
> >4. For any other changes, initiate separate discussions going forward.
> >
> > Several reasons why I specifically propose Google style:
> >
> >1. This is essentially the standard for many projects. I don't think
> >there is a need for us to reinvent the wheel.
> >2. It's minimalistic and developer-friendly. No overcomplication.
> >3. (probably the most important) It comes with all the required tools
> >and configurations for automation (e.g., here is the config for IDEA:
> >
> https://github.com/google/error-prone/blob/master/.idea/GoogleStyle.xml
> > )
> >
> > Please let me know what you think. If there are no objections, I will
> start
> > the process.
> >
> > -Val
> >
>
>
> --
> With regards,
> Aleksandr Polovtcev
>


-- 

Best regards,
Alexei Scherbakov


Re: Static hierarchy in jmx tree

2021-08-19 Thread Alexei Scherbakov
ere are from different hosts. So right now there are few
> scenarios:
> > > - right now JMX is turned off by default, so there is no problem
> > > - if you want to use jmx you have to turn it on manually and it would
> work. After my patch consistent id in case of persistent or node id in
> other cases would be used.
> > > - if you want to specify instance name you can chose different names
> for different instances or separate them to different jvm.
> > > - if you want to use the previous logic with class loader id, option
> MBEAN_APPEND_CLASS_LOADER_ID is still available.
> > >
> > >
> > >> 3. Your patch introduces breaking change. This can be done only in two
> > >> steps: release N deprecated the behavior, release N + 1 changes the
> > >> behavior, according to the new rules.
> > >
> > > Ok. If we take a decision, I will do that.
> >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Google Guava in Ignite 3

2021-08-05 Thread Alexei Scherbakov
+1

чт, 5 авг. 2021 г. в 16:12, Alexander Polovtcev :

> Hello, dear Igniters!
>
> I would like to discuss the possibility of using Guava
> <https://github.com/google/guava> in Ignite 3. I know about the
> restrictive
> policy of using it in Ignite 2, but I have the following reasons:
>
> 1. We are de-facto using it already as an implicit dependency, since the
> Calcite module depends on it, and Calcite is going to stay for a while =)
> 2. AFAIK, the "bytecode" module is copied into the codebase only to strip
> Guava away from it. We can remove this module, which will improve the
> maintainability of the project.
> 3. We have some copy-paste of Guava code in the project. For example, see
> this
> <
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L136
> >
> and this
> <
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L428
> >
> .
> 4. Regarding security concerns, this report
> <https://www.cvedetails.com/product/52274/Google-Guava.html?vendor_id=1224
> >
> shows no major vulnerability issues for the last three years.
>
> Taking these points into account, I propose to allow using Guava both in
> production and test code and to add it as an explicit dependency.
>
> What do you think?
>
> --
> With regards,
> Aleksandr Polovtcev
>


-- 

Best regards,
Alexei Scherbakov


Re: Re[2]: [ANNOUNCE] Welcome Zhenya Stanilovsky as a new committer

2021-08-03 Thread Alexei Scherbakov
My regards!

вт, 3 авг. 2021 г. в 19:02, Dmitry Pavlov :

> Zhenya, congrats with new role. Well deserved!
>
> On 2021/07/30 16:10:41, Shishkov Ilya  wrote:
> > Zhenya,
> >
> > Congrats!
> >
> > пт, 30 июл. 2021 г. в 14:01, Zhenya Stanilovsky
>  > >:
> >
> > >
> > >
> > > Guys, thank you very much !!
> > >
> > > >Zhenya,
> > > >
> > > >Congrats!
> > > >
> > > >--
> > > >Regards,
> > > >Konstantin Orlov
> > > >
> > > >
> > > >
> > > >
> > > >> On 30 Jul 2021, at 12:20, Вячеслав Коптилин <
> slava.kopti...@gmail.com
> > > > wrote:
> > > >>
> > > >> Hooray!
> > > >>
> > > >> Congrats! May the Force be with you!
> > > >>
> > > >> Thanks,
> > > >> S.
> > > >>
> > > >> пт, 30 июл. 2021 г. в 11:17, Anton Vinogradov < a...@apache.org >:
> > > >>
> > > >>> Congrats!
> > > >>>
> > > >>> On Fri, Jul 30, 2021 at 10:19 AM ткаленко кирилл <
> > > tkalkir...@yandex.ru >
> > > >>> wrote:
> > > >>>
> > > >>>> Zhenya, congratulations!
> > > >>>>
> > > >>>>  Пересылаемое сообщение 
> > > >>>> 30.07.2021, 09:50, "Ivan Daschinsky" < ivanda...@apache.org >:
> > > >>>>
> > > >>>>
> > > >>>> Zhenya, congrats, well deserved!
> > > >>>>
> > > >>>> пт, 30 июл. 2021 г. в 00:44, Andrey Mashenkov <
> > > >>>  andrey.mashen...@gmail.com
> > > >>>>> :
> > > >>>>
> > > >>>>> Congratulations Zhenya!
> > > >>>>>
> > > >>>>> On Fri, Jul 30, 2021 at 12:06 AM Maxim Muzafarov <
> mmu...@apache.org
> > > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> The Project Management Committee (PMC) for Apache Ignite has
> invited
> > > >>>>>> Zhenya Stanilovsky to become a committer and we are pleased to
> > > >>>> announce
> > > >>>>>> that
> > > >>>>>> he has accepted.
> > > >>>>>>
> > > >>>>>> For the last few years, Zhenya made a lot of performance fixes
> > > >>>>>> especially for the core modules and important contributions to
> the
> > > >>>>>> Apache Ignite codebase. He is actively involved in integrating
> the
> > > >>>>>> Calcite framework with Apache Ignite. Moreover, he has been a
> great
> > > >>>>>> help in the preparation of several Apache Ignite major releases
> by
> > > >>>>>> carrying out stress-load tests. Besides the code contributions,
> > > >>> Zhenya
> > > >>>>>> is also an active community member and help users on dev and
> users
> > > >>>>>> lists.
> > > >>>>>>
> > > >>>>>> Being a committer enables easier contribution to the project
> since
> > > >>>> there
> > > >>>>> is
> > > >>>>>> no need to go via the patch submission process. This should
> enable
> > > >>>> better
> > > >>>>>> productivity.
> > > >>>>>>
> > > >>>>>> Please join me in welcoming Ivan, and congratulating him on the
> new
> > > >>>> role
> > > >>>>> in
> > > >>>>>> the Apache Ignite Community.
> > > >>>>>>
> > > >>>>>> Best Regards,
> > > >>>>>> Maxim Muzafarov
> > > >>>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best regards,
> > > >>>>> Andrey V. Mashenkov
> > > >>>>
> > > >>>>  Конец пересылаемого сообщения 
> > > >>>>
> > > >>>
> > >
> > >
> > >
> > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Transaction API desing for Ignite 3

2021-07-14 Thread Alexei Scherbakov
Andrey,

1) "As a user, I'd expect runInTransaction(closure) will create Tx for me,
commit Tx after a successful closure call, and rollback Tx in case of
error."
- I'm ok with this behavior, and will alter javadoc.

2) "Transaction tx = beginTx()" - there is no such method "beginTx" in the
proposed API, and I'm not intending to add it.
For the synchronous case I suggest to use "runInTransaction", which
eliminates the need in AutoClosable.



ср, 14 июл. 2021 г. в 13:21, Ivan Daschinsky :

> > yes, it is stated in the javadoc in the PR.
> Ah, I see.
>
> ср, 14 июл. 2021 г. в 12:16, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Ivan,
> >
> > And what if I have already committed transaction? Is it safe rollback
> > already committed transaction? Rollback will silently return and do
> > nothing? - yes, it is stated in the javadoc in the PR.
> >
> > Andrey,
> >
> > Then using "runInTransaction", lack of commit will cause a transaction to
> > rollback automatically.
> >
> > There is no need for a "close" method, it just adds confusion.
> >
> >
> > ср, 14 июл. 2021 г. в 11:37, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> > >:
> >
> > > Agree with Ivan.
> > >
> > > Method runInTransaction() should try to finish the transaction if the
> > user
> > > forgot to commit one.
> > > I guess it might be a common mistake among new users.
> > >
> > > Also, I suggest to extent all table projections for better UX.
> > > Let's allow
> > > table.kvView().withTx(tx)
> > > to user may cache kvVew instance and do
> > > kvView.withTx(tx)
> > > rather than
> > > table.withTx(tx).kvVew()
> > >
> > >
> > >
> > > On Wed, Jul 14, 2021 at 10:13 AM Ivan Daschinsky 
> > > wrote:
> > >
> > > > Alexey, and is there any analogue to close() of transaction? When you
> > > start
> > > > transaction, you should somehow to close it, if you don't catch
> > exception
> > > > or forget to commit.
> > > >
> > > > I suggest to add method closeAsync() to Transaction, so user can call
> > it
> > > in
> > > > handle or whenComplete, i.e.
> > > >
> > > > So code will looks like
> > > >
> > > > CacheApi cache = CacheApi.getCache("testCache");
> > > >
> > > > Transactions
> > > > .beginTransaction()
> > > > .thenCompose(tx -> {
> > > > CacheApi txCache = cache.withTx(tx);
> > > > CompletableFuture result = txCache.getAsync("key")
> > > > .thenCompose(val -> {
> > > > if (val == "test") {
> > > > return txCache.putAsync("key", "test1");
> > > > }
> > > > else
> > > > return CompletableFuture.completedFuture(null);
> > > > })
> > > > .thenCompose(v -> tx.commitAsync())
> > > > .handle((v, ex) -> null);
> > > > return result.thenCompose(v -> tx.closeAsync());
> > > > });
> > > >
> > > > I also suggests to add method something like this
> > > >
> > > > static CompletableFuture inTxAsync(Function > > > CompletableFuture> action) {
> > > > return Transactions
> > > > .beginTransaction()
> > > > .thenCompose(tx -> {
> > > > CompletableFuture result = action.apply(tx)
> > > > .handle((v, ex) -> null);
> > > > return result.thenCompose(v -> tx.closeAsync());
> > > > });
> > > > }
> > > >
> > > > Async api is not very readable, but this method can help user write
> > code,
> > > > this is rewritten first example:
> > > >
> > > > Transactions.inTxAsync(tx -> {
> > > > CacheApi txCache = cache.withTx(tx);
> > > > return txCache.getAsync("key")
> > > > .thenCompose(val -> {
> > > > if (val == "test") {
> > > > return txCache.putAsync("key", "test1");
> > > > }
> > > > else
> > > >  

Re: IEP-61 Transaction API desing for Ignite 3

2021-07-14 Thread Alexei Scherbakov
Adding table.kvView().withTx(tx) seems fine to me.

ср, 14 июл. 2021 г. в 12:15, Alexei Scherbakov :

> Ivan,
>
> And what if I have already committed transaction? Is it safe rollback
> already committed transaction? Rollback will silently return and do
> nothing? - yes, it is stated in the javadoc in the PR.
>
> Andrey,
>
> Then using "runInTransaction", lack of commit will cause a transaction to
> rollback automatically.
>
> There is no need for a "close" method, it just adds confusion.
>
>
> ср, 14 июл. 2021 г. в 11:37, Andrey Mashenkov  >:
>
>> Agree with Ivan.
>>
>> Method runInTransaction() should try to finish the transaction if the user
>> forgot to commit one.
>> I guess it might be a common mistake among new users.
>>
>> Also, I suggest to extent all table projections for better UX.
>> Let's allow
>> table.kvView().withTx(tx)
>> to user may cache kvVew instance and do
>> kvView.withTx(tx)
>> rather than
>> table.withTx(tx).kvVew()
>>
>>
>>
>> On Wed, Jul 14, 2021 at 10:13 AM Ivan Daschinsky 
>> wrote:
>>
>> > Alexey, and is there any analogue to close() of transaction? When you
>> start
>> > transaction, you should somehow to close it, if you don't catch
>> exception
>> > or forget to commit.
>> >
>> > I suggest to add method closeAsync() to Transaction, so user can call
>> it in
>> > handle or whenComplete, i.e.
>> >
>> > So code will looks like
>> >
>> > CacheApi cache = CacheApi.getCache("testCache");
>> >
>> > Transactions
>> > .beginTransaction()
>> > .thenCompose(tx -> {
>> > CacheApi txCache = cache.withTx(tx);
>> > CompletableFuture result = txCache.getAsync("key")
>> > .thenCompose(val -> {
>> > if (val == "test") {
>> > return txCache.putAsync("key", "test1");
>> > }
>> > else
>> > return CompletableFuture.completedFuture(null);
>> > })
>> > .thenCompose(v -> tx.commitAsync())
>> > .handle((v, ex) -> null);
>> > return result.thenCompose(v -> tx.closeAsync());
>> > });
>> >
>> > I also suggests to add method something like this
>> >
>> > static CompletableFuture inTxAsync(Function> > CompletableFuture> action) {
>> > return Transactions
>> > .beginTransaction()
>> > .thenCompose(tx -> {
>> > CompletableFuture result = action.apply(tx)
>> > .handle((v, ex) -> null);
>> > return result.thenCompose(v -> tx.closeAsync());
>> >     });
>> > }
>> >
>> > Async api is not very readable, but this method can help user write
>> code,
>> > this is rewritten first example:
>> >
>> > Transactions.inTxAsync(tx -> {
>> > CacheApi txCache = cache.withTx(tx);
>> > return txCache.getAsync("key")
>> > .thenCompose(val -> {
>> > if (val == "test") {
>> > return txCache.putAsync("key", "test1");
>> > }
>> > else
>> > return CompletableFuture.completedFuture(null);
>> > })
>> > .thenCompose(v -> tx.commitAsync());
>> > });
>> >
>> > ср, 14 июл. 2021 г. в 10:03, Alexei Scherbakov <
>> > alexey.scherbak...@gmail.com
>> > >:
>> >
>> > > Andrey,
>> > >
>> > > I suggest you look at the PR [1], if you haven't.
>> > >
>> > > A transaction [2]
>> > > Transactions facade [3]
>> > > Examples [4]
>> > >
>> > > [1] https://github.com/apache/ignite-3/pull/214/files
>> > > [2]
>> > >
>> > >
>> >
>> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/Transaction.java
>> > > [3]
>> > >
>> > >
>> >
>> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/IgniteTransactions.java
>> > > [4]
>> > >
>> > >
>>

Re: IEP-61 Transaction API desing for Ignite 3

2021-07-14 Thread Alexei Scherbakov
Ivan,

And what if I have already committed transaction? Is it safe rollback
already committed transaction? Rollback will silently return and do
nothing? - yes, it is stated in the javadoc in the PR.

Andrey,

Then using "runInTransaction", lack of commit will cause a transaction to
rollback automatically.

There is no need for a "close" method, it just adds confusion.


ср, 14 июл. 2021 г. в 11:37, Andrey Mashenkov :

> Agree with Ivan.
>
> Method runInTransaction() should try to finish the transaction if the user
> forgot to commit one.
> I guess it might be a common mistake among new users.
>
> Also, I suggest to extent all table projections for better UX.
> Let's allow
> table.kvView().withTx(tx)
> to user may cache kvVew instance and do
> kvView.withTx(tx)
> rather than
> table.withTx(tx).kvVew()
>
>
>
> On Wed, Jul 14, 2021 at 10:13 AM Ivan Daschinsky 
> wrote:
>
> > Alexey, and is there any analogue to close() of transaction? When you
> start
> > transaction, you should somehow to close it, if you don't catch exception
> > or forget to commit.
> >
> > I suggest to add method closeAsync() to Transaction, so user can call it
> in
> > handle or whenComplete, i.e.
> >
> > So code will looks like
> >
> > CacheApi cache = CacheApi.getCache("testCache");
> >
> > Transactions
> > .beginTransaction()
> > .thenCompose(tx -> {
> > CacheApi txCache = cache.withTx(tx);
> > CompletableFuture result = txCache.getAsync("key")
> > .thenCompose(val -> {
> > if (val == "test") {
> > return txCache.putAsync("key", "test1");
> > }
> > else
> > return CompletableFuture.completedFuture(null);
> > })
> > .thenCompose(v -> tx.commitAsync())
> > .handle((v, ex) -> null);
> > return result.thenCompose(v -> tx.closeAsync());
> > });
> >
> > I also suggests to add method something like this
> >
> > static CompletableFuture inTxAsync(Function > CompletableFuture> action) {
> > return Transactions
> > .beginTransaction()
> > .thenCompose(tx -> {
> > CompletableFuture result = action.apply(tx)
> > .handle((v, ex) -> null);
> > return result.thenCompose(v -> tx.closeAsync());
> > });
> > }
> >
> > Async api is not very readable, but this method can help user write code,
> > this is rewritten first example:
> >
> > Transactions.inTxAsync(tx -> {
> > CacheApi txCache = cache.withTx(tx);
> > return txCache.getAsync("key")
> > .thenCompose(val -> {
> > if (val == "test") {
> > return txCache.putAsync("key", "test1");
> > }
> > else
> > return CompletableFuture.completedFuture(null);
> > })
> > .thenCompose(v -> tx.commitAsync());
> > });
> >
> > ср, 14 июл. 2021 г. в 10:03, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com
> > >:
> >
> > > Andrey,
> > >
> > > I suggest you look at the PR [1], if you haven't.
> > >
> > > A transaction [2]
> > > Transactions facade [3]
> > > Examples [4]
> > >
> > > [1] https://github.com/apache/ignite-3/pull/214/files
> > > [2]
> > >
> > >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/Transaction.java
> > > [3]
> > >
> > >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/IgniteTransactions.java
> > > [4]
> > >
> > >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/table/src/test/java/org/apache/ignite/internal/table/TxTest.java
> > >
> > >
> > > вт, 13 июл. 2021 г. в 19:41, Andrey Gura :
> > >
> > > > Alexey,
> > > >
> > > > could you please describe Transaction interface?
> > > >
> > > > Also it would be great to have a couple examples of using the
> proposed
> > > API.
> > > >
> > > > On Tue, Jul 13, 2021 at 4:43 PM Alexei Scherbakov
> > > >  wrote:
> > > > >
> > > > > Folks,
> > > > >
> > > > > I've prepared a PR implementing my vision of public transactions
> API.
> > > > >
> > > > > API is very simple and similar to Ignite 2, but has some
> differences.
> > > > >
> > > > > More details can be found here [1]
> > > > >
> > > > > Share your thoughts.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-15086
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Transaction API desing for Ignite 3

2021-07-14 Thread Alexei Scherbakov
Ivan,

We don't need the "close" method in the proposed approach, because it's
enough having "commit(Async)" and "rollback(Async)" to finish a
transaction. Semantically "close" is the same as "rollback".

If you are using "runInTransaction" API, you can't forget to call "close",
it will be done automatically (rollback will be called at the end of a
transaction closure).

For async API it's required semantically to call commit or rollback.

As for utility methods, I would keep tx API as small as possible for now.
Anyway, these methods can be added later, if the need arises.





ср, 14 июл. 2021 г. в 10:13, Ivan Daschinsky :

> Alexey, and is there any analogue to close() of transaction? When you start
> transaction, you should somehow to close it, if you don't catch exception
> or forget to commit.
>
> I suggest to add method closeAsync() to Transaction, so user can call it in
> handle or whenComplete, i.e.
>
> So code will looks like
>
> CacheApi cache = CacheApi.getCache("testCache");
>
> Transactions
> .beginTransaction()
> .thenCompose(tx -> {
> CacheApi txCache = cache.withTx(tx);
> CompletableFuture result = txCache.getAsync("key")
> .thenCompose(val -> {
> if (val == "test") {
> return txCache.putAsync("key", "test1");
> }
> else
> return CompletableFuture.completedFuture(null);
> })
> .thenCompose(v -> tx.commitAsync())
> .handle((v, ex) -> null);
> return result.thenCompose(v -> tx.closeAsync());
> });
>
> I also suggests to add method something like this
>
> static CompletableFuture inTxAsync(Function CompletableFuture> action) {
> return Transactions
> .beginTransaction()
> .thenCompose(tx -> {
> CompletableFuture result = action.apply(tx)
> .handle((v, ex) -> null);
> return result.thenCompose(v -> tx.closeAsync());
> });
> }
>
> Async api is not very readable, but this method can help user write code,
> this is rewritten first example:
>
> Transactions.inTxAsync(tx -> {
> CacheApi txCache = cache.withTx(tx);
> return txCache.getAsync("key")
> .thenCompose(val -> {
>     if (val == "test") {
> return txCache.putAsync("key", "test1");
> }
> else
> return CompletableFuture.completedFuture(null);
> })
> .thenCompose(v -> tx.commitAsync());
> });
>
> ср, 14 июл. 2021 г. в 10:03, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Andrey,
> >
> > I suggest you look at the PR [1], if you haven't.
> >
> > A transaction [2]
> > Transactions facade [3]
> > Examples [4]
> >
> > [1] https://github.com/apache/ignite-3/pull/214/files
> > [2]
> >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/Transaction.java
> > [3]
> >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/IgniteTransactions.java
> > [4]
> >
> >
> https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/table/src/test/java/org/apache/ignite/internal/table/TxTest.java
> >
> >
> > вт, 13 июл. 2021 г. в 19:41, Andrey Gura :
> >
> > > Alexey,
> > >
> > > could you please describe Transaction interface?
> > >
> > > Also it would be great to have a couple examples of using the proposed
> > API.
> > >
> > > On Tue, Jul 13, 2021 at 4:43 PM Alexei Scherbakov
> > >  wrote:
> > > >
> > > > Folks,
> > > >
> > > > I've prepared a PR implementing my vision of public transactions API.
> > > >
> > > > API is very simple and similar to Ignite 2, but has some differences.
> > > >
> > > > More details can be found here [1]
> > > >
> > > > Share your thoughts.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-15086
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Transaction API desing for Ignite 3

2021-07-14 Thread Alexei Scherbakov
Andrey,

I suggest you look at the PR [1], if you haven't.

A transaction [2]
Transactions facade [3]
Examples [4]

[1] https://github.com/apache/ignite-3/pull/214/files
[2]
https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/Transaction.java
[3]
https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/api/src/main/java/org/apache/ignite/tx/IgniteTransactions.java
[4]
https://github.com/apache/ignite-3/blob/d2122ce8c15de020e121f53509bd5a097aac9cf2/modules/table/src/test/java/org/apache/ignite/internal/table/TxTest.java


вт, 13 июл. 2021 г. в 19:41, Andrey Gura :

> Alexey,
>
> could you please describe Transaction interface?
>
> Also it would be great to have a couple examples of using the proposed API.
>
> On Tue, Jul 13, 2021 at 4:43 PM Alexei Scherbakov
>  wrote:
> >
> > Folks,
> >
> > I've prepared a PR implementing my vision of public transactions API.
> >
> > API is very simple and similar to Ignite 2, but has some differences.
> >
> > More details can be found here [1]
> >
> > Share your thoughts.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-15086
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Transaction API desing for Ignite 3

2021-07-13 Thread Alexei Scherbakov
Pavel,

"runInTransaction" is supposed to provide an "old-fashioned" way to write a
transaction for easier migration.

Manual enlisting of tables is required, because I strive to avoid any
thread based control of transactions in Ignite 3.

Actually, a single thread will be able to work with any amount of
transactions at the same time.

I would keep it for convenience, but let's see other opinions.






вт, 13 июл. 2021 г. в 18:22, Pavel Tupitsyn :

> Alexei,
>
> The API looks good to me, except "runInTransaction", which I find
> confusing.
>
> It looks like every operation performed by the passed Consumer will be
> automatically enlisted in a transaction,
> but, looking at tests, "withTx" call is still required inside the Consumer.
>
> I don't think we need this method at all, it barely provides any
> convenience but may confuse some users.
>
> On Tue, Jul 13, 2021 at 4:43 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > Folks,
> >
> > I've prepared a PR implementing my vision of public transactions API.
> >
> > API is very simple and similar to Ignite 2, but has some differences.
> >
> > More details can be found here [1]
> >
> > Share your thoughts.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-15086
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>


-- 

Best regards,
Alexei Scherbakov


IEP-61 Transaction API desing for Ignite 3

2021-07-13 Thread Alexei Scherbakov
Folks,

I've prepared a PR implementing my vision of public transactions API.

API is very simple and similar to Ignite 2, but has some differences.

More details can be found here [1]

Share your thoughts.

[1] https://issues.apache.org/jira/browse/IGNITE-15086

-- 

Best regards,
Alexei Scherbakov


Re: Ignite 3.0 Tuple API: how to check if value is null?

2021-07-12 Thread Alexei Scherbakov
tuple.isNull(colName) to test for emptiness also seems useful.

пн, 12 июл. 2021 г. в 18:20, Alexei Scherbakov :

> +1 to make some improvements here.
>
> Using Optional doesn't make sense to me because it always involves boxing
> (and we already have tuple.value(colName)).
>
> I suggest to add methods similar to:
>
> tuple.doubleValue("field", double dfltValue)
>
> which returns default value if the field is null.
>
>
> ср, 7 июл. 2021 г. в 08:34, Ivan Daschinsky :
>
>> Function basically returns two values. if value is null, it returns smth
>> like false, NaN, otherwise ,smth like true, 4.5. Syntax is a bit weird as
>> for me, but it is better than nothing.
>>
>>
>> In golang it looks like this:
>>
>> if isValid, val:= getVal; isValid {
>> 
>> }
>>
>>
>>
>> ср, 7 июл. 2021 г., 00:28 Valentin Kulichenko <
>> valentin.kuliche...@gmail.com
>> >:
>>
>> > So what happens if the value is NULL? Exception?
>> >
>> > -Val
>> >
>> > On Tue, Jul 6, 2021 at 1:52 PM Ivan Daschinsky 
>> > wrote:
>> >
>> > > > Out of curiosity, what would this code do if the value is NULL?
>> What is
>> > > the
>> > > type of the 'weight' variable?
>> > >
>> > > float of course.
>> > > https://www.c-sharpcorner.com/article/out-parameter-in-c-sharp-7/
>> > >
>> > >
>> > > вт, 6 июл. 2021 г., 22:30 Valentin Kulichenko <
>> > > valentin.kuliche...@gmail.com
>> > > >:
>> > >
>> > > > Pavel,
>> > > >
>> > > > Optionals are available in Java and we can use them. This is still
>> > boxing
>> > > > though, and I don't know what the performance impact would be. In
>> > > addition,
>> > > > optional API is redundant for non-nullable fields. Perhaps, we can
>> > > provide
>> > > > both options (e.g. having intValue() and intValueOptional()
>> methods).
>> > > >
>> > > > Out of curiosity, what would this code do if the value is NULL?
>> What is
>> > > the
>> > > > type of the 'weight' variable?
>> > > >
>> > > > if (tuple.TryGetFloatValue("weight", out var weight))
>> > > > doSomething(weight)
>> > > >
>> > > > -Val
>> > > >
>> > > > On Tue, Jul 6, 2021 at 2:13 AM Ivan Daschinsky > >
>> > > > wrote:
>> > > >
>> > > > > Ah, I see, you meant Optionals family. Yep, it is worth to think
>> > about.
>> > > > >
>> > > > > вт, 6 июл. 2021 г., 10:06 Pavel Tupitsyn :
>> > > > >
>> > > > > > Ivan,
>> > > > > >
>> > > > > > Nothing wrong except for performance concerns.
>> > > > > > The following code looks up the column by name twice:
>> > > > > >
>> > > > > > if (!tuple.isNull("weight"))
>> > > > > >doSomething(tuple.floatValue("weight"))
>> > > > > >
>> > > > > > Whereas in other languages you could do it in one shot:
>> > > > > >
>> > > > > > if (tuple.TryGetFloatValue("weight", out var weight))
>> > > > > > doSomething(weight)
>> > > > > >
>> > > > > > or Option weight = tuple.floatValue("weight") and so on.
>> > > > > >
>> > > > > > On Tue, Jul 6, 2021 at 9:58 AM Ivan Daschinsky <
>> > ivanda...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Sorry, but what is wrong with simple method isNull()
>> > > > > > >
>> > > > > > > вт, 6 июл. 2021 г., 09:55 Pavel Tupitsyn <
>> ptupit...@apache.org>:
>> > > > > > >
>> > > > > > > > Val,
>> > > > > > > >
>> > > > > > > > > I don't think there is a significantly better way
>> > > > > > > > > of doing this in Java.
>> > > > > > > >
>> > > > > > > > Yep looks like there is no way to return two values without
>> > > boxing.
>> > > > > > > > No ref, no ou

Re: Ignite 3.0 Tuple API: how to check if value is null?

2021-07-12 Thread Alexei Scherbakov
gt;
> > > > > > > > On Tue, Jul 6, 2021 at 12:44 AM Valentin Kulichenko <
> > > > > > > > valentin.kuliche...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Pavel,
> > > > > > > > >
> > > > > > > > > That's a good point, but I don't think there is a
> > significantly
> > > > > > better
> > > > > > > > way
> > > > > > > > > of doing this in Java.
> > > > > > > > >
> > > > > > > > > There should be a way to check if a field is nullable or
> not
> > > > > though.
> > > > > > > > Schema
> > > > > > > > > already provides this information, doesn't it?
> > > > > > > > >
> > > > > > > > > -Val
> > > > > > > > >
> > > > > > > > > On Mon, Jul 5, 2021 at 11:03 AM Pavel Tupitsyn <
> > > > > ptupit...@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > Looks like Tuple API has no efficient way to tell if a
> > value
> > > > for
> > > > > a
> > > > > > > > > nullable
> > > > > > > > > > column of primitive type is null.
> > > > > > > > > >
> > > > > > > > > > - Tuple#intValue() will return 0 when the actual value is
> > > null
> > > > =>
> > > > > > not
> > > > > > > > > clear
> > > > > > > > > > if 0 is 0 or null.
> > > > > > > > > > - Tuple#value() works, but is more expensive due to
> boxing
> > > and
> > > > > type
> > > > > > > > > lookup.
> > > > > > > > > >
> > > > > > > > > > Any ideas on how to improve this?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Problem with dropping the index

2021-07-08 Thread Alexei Scherbakov
I assume DurableBackgroundTask will no longer use references to root ids.

The proposed solution looks good to me.



чт, 8 июл. 2021 г. в 10:28, Ivan Bessonov :

> Hi guys,
>
> I see that the original message and its clarification is hard to consume.
> The problem is that indexes are identified by their names. So, the
> situation
> that Kirill described is a valid one:
> - you have index X for columns (a, b);
> - you drop it;
> - you create index X for columns (a, c).
>
> This is a realistic scenario. If all of that happened during a single
> checkpoint
> interval and node failed.
>
> Real issue here is the fact that we restore metastorage separately from
> everything else.
> This means that DurableBackgroundTask will start deleting X (a, c) if we're
> not careful.
>
> It all depends on the implementation. I'd suggest Alexeys solution with a
> few tweaks:
> - I'm not convinced that we need to log indexes creation, this is
> excessive;
> - how do you drop index for the metatree:
> -- pick a unique name;
> -- acquire checkpoint read lock;
> -- create DurableBackgroundTask that will delete index with that unique
> name;
> -- create logical WAL record "drop index X";
> -- "rename" link to the index in meta tree;
> -- release checkpoint read lock;
> -- start DurableBackgroundTask.
>
> Recovery for "drop index X" should also create another
> DurableBackgroundTask,
> this way I think we'll be able to avoid any possible issues. If
> DurableBackgroundTask
> has a name of an unexisting index then that's fine, task will be completed
> as
> successful, that's a normal flow.
>
> I hope everything's clear here, I'm ready to clarify more details about my
> idea. Thank you!
>
>
> ср, 7 июл. 2021 г. в 17:20, ткаленко кирилл :
>
> > I'll try to explain:
> >
> > We create indexes through InlineIndexFactory#createIndex, where at the
> > beginning we create the root page using QueryIndexDefinition#treeName, so
> > if we create the same index 2 times, then we will refer to the same root.
> >
> > In order to make the deletion of the index independent, I propose to
> > replace the name of the root pages with an arbitrary name (other than
> > QueryIndexDefinition#treeName, for example UUID.toString()), in order to
> > avoid the problems of a node crash, I propose to fix this in a logical
> > record.
> >
> > After we change the name of the root page to delete the index, we can
> > easily delete it and rebuild it (if the user does it) in parallel.
> >
> > If we consider your proposal, then the creation and deletion of the index
> > will go in parallel and they will look at the same roots.
> >
> > You can see how the renaming will take place here:
> > https://github.com/apache/ignite/pull/9223
> >
> > 07.07.2021, 10:28, "Alexei Scherbakov" :
> > > вт, 6 июл. 2021 г. в 15:57, ткаленко кирилл :
> > >
> > >>  >> Can you clarify what it means to rename root index trees ?
> > >>  Replacing
> > >>
> >
> org.apache.ignite.internal.processors.cache.persistence.IndexStorageImpl.IndexItem,
> > >>  changing IndexItem#idxName, but keeping fIndexItem#pageId.
> > >
> > > Changing to what ? Some temporary name ? Can you give a detailed step
> by
> > > step description of the algorithm ?
> > >
> > >>  Suggested solution is not suitable for the situation: add index ->
> drop
> > >>  index -> add an index. We can start deleting the last added index.
> > >
> > > How can we do that, give me an example ?
> > >
> > > From my understanding, the suggested solution should work ok for any
> > number
> > > of create/drop sequences.
> > >
> > >>  06.07.2021, 14:00, "Alexei Scherbakov"  >:
> > >>  > Can you clarify what it means to rename root index trees ?
> > >>  >
> > >>  > The simple solution which immediately comes to me is
> > >>  >
> > >>  > 1) write logical record on index creation - on reading it create an
> > index
> > >>  > during logical recovery
> > >>  > 2) write logical record on index deletion - on reading it delete an
> > index
> > >>  > during logical recovery and start background clearing task with
> real
> > root
> > >>  > pages.
> > >>  >
> > >>  > Will it work for you ?
> > >>  >
> > >>  > вт, 6 июл. 2021 г. в 12:27, ткаленко кирилл  >:
> > >>  >
> > >>

Re: MOVING Partitions and Rebalancing

2021-07-07 Thread Alexei Scherbakov
Hi.

It's ok to have MOVING partitions in the described scenario, because the
second node is started after the first and starts a rebalancing on the next
topology version.

But it shouldn't break query consistency, because reading from the MOVING
partition is not allowed.

Why does it happen for text queries? Probably we are mapping to MOVING
partitions, which is incorrect.


вт, 6 июл. 2021 г. в 22:07, Atri Sharma :

> Gentle ping. Please help with context here.
>
> On Mon, 5 Jul 2021, 13:03 Atri Sharma,  wrote:
>
> > Hi All,
> >
> > As part of the text queries overhaul effort, I am looking into the
> > following ticket:
> >
> > https://issues.apache.org/jira/browse/IGNITE-12401
> >
> > As I understand it, the problem lies in the fact that a partition can
> > move, thus causing duplicate data between two Lucene indices (on
> > different nodes).
> >
> > I wish to discuss the solution to this problem:
> >
> > 1. Fix the issue with MOVING partitions (need help here, since I am
> > not aware of the internals of rebalancing).
> >
> > 2. Circumvent the problem for text queries (maybe assign/use a
> > globally unique ID per entry, and use that to remove duplicates during
> > the merge phase?)
> >
> > Please share thoughts and inputs.
> >
> > Atri
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Problem with dropping the index

2021-07-07 Thread Alexei Scherbakov
вт, 6 июл. 2021 г. в 15:57, ткаленко кирилл :

> >> Can you clarify what it means to rename root index trees ?
> Replacing
> org.apache.ignite.internal.processors.cache.persistence.IndexStorageImpl.IndexItem,
> changing IndexItem#idxName, but keeping fIndexItem#pageId.
>

Changing to what ? Some temporary name ? Can you give a detailed step by
step description of the algorithm ?


>
> Suggested solution is not suitable for the situation: add index -> drop
> index -> add an index. We can start deleting the last added index.
>

How can we do that, give me an example ?

>From my understanding, the suggested solution should work ok for any number
of create/drop sequences.


>
> 06.07.2021, 14:00, "Alexei Scherbakov" :
> > Can you clarify what it means to rename root index trees ?
> >
> > The simple solution which immediately comes to me is
> >
> > 1) write logical record on index creation - on reading it create an index
> > during logical recovery
> > 2) write logical record on index deletion - on reading it delete an index
> > during logical recovery and start background clearing task with real root
> > pages.
> >
> > Will it work for you ?
> >
> > вт, 6 июл. 2021 г. в 12:27, ткаленко кирилл :
> >
> >>  Hello everyone!
> >>
> >>  Currently, dropping indexes consists of the following steps (based on
> >>  SchemaAbstractDiscoveryMessage's):
> >>
> >>  Step 1: Removing the index from the SQL engine and starting
> >>  DurableBackgroundCleanupIndexTreeTask, which removes the index trees
> in the
> >>  background;
> >>  Step 1.1: DurableBackgroundCleanupIndexTreeTask is added to the
> >>  metaStorage and removed after successful completion at the next
> checkpoint.
> >>
> >>  Step 2: Removing the index from the cache configuration and persist it.
> >>
> >>  Problems:
> >>
> >>  1)We add and immediately delete the index, a checkpoint does not happen
> >>  and the node crashes, after restarting
> >>  DurableBackgroundCleanupIndexTreeTask will not be able to complete and
> will
> >>  periodically restart due to the fact that it saves
> >>  DurableBackgroundCleanupIndexTreeTask#rootPages (root pages of index
> trees)
> >>  that have not appeared;
> >>
> >>  2)After adding a DurableBackgroundCleanupIndexTreeTask node crashes,
> after
> >>  restarting the node, the task will clean the index trees and there
> will be
> >>  errors when using the index;
> >>
> >>  3)etc.
> >>
> >>  Suggested solution:
> >>
> >>  Rename the root index trees and write about this with a logical entry
> in
> >>  the WAL and do this at the first start of
> >>  DurableBackgroundCleanupIndexTreeTask.
> >>  Thus, if we find the renamed root pages in task 1, we can clear the
> index
> >>  trees to the end, otherwise the task can be completed.
> >>  Also, if we find that rename pages are present, and the step 2 has not
> >>  been completed, then we can start rebuilding the indexes.
> >>
> >>  WDYT?
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: Problem with dropping the index

2021-07-06 Thread Alexei Scherbakov
Can you clarify what it means to rename root index trees ?

The simple solution which immediately comes to me is

1) write logical record on index creation - on reading it create an index
during logical recovery
2) write logical record on index deletion - on reading it delete an index
during logical recovery and start background clearing task with real root
pages.

Will it work for you ?


вт, 6 июл. 2021 г. в 12:27, ткаленко кирилл :

> Hello everyone!
>
> Currently, dropping indexes consists of the following steps (based on
> SchemaAbstractDiscoveryMessage's):
>
> Step 1: Removing the index from the SQL engine and starting
> DurableBackgroundCleanupIndexTreeTask, which removes the index trees in the
> background;
> Step 1.1: DurableBackgroundCleanupIndexTreeTask is added to the
> metaStorage and removed after successful completion at the next checkpoint.
>
> Step 2: Removing the index from the cache configuration and persist it.
>
> Problems:
>
> 1)We add and immediately delete the index, a checkpoint does not happen
> and the node crashes, after restarting
> DurableBackgroundCleanupIndexTreeTask will not be able to complete and will
> periodically restart due to the fact that it saves
> DurableBackgroundCleanupIndexTreeTask#rootPages (root pages of index trees)
> that have not appeared;
>
> 2)After adding a DurableBackgroundCleanupIndexTreeTask node crashes, after
> restarting the node, the task will clean the index trees and there will be
> errors when using the index;
>
> 3)etc.
>
> Suggested solution:
>
> Rename the root index trees and write about this with a logical entry in
> the WAL and do this at the first start of
> DurableBackgroundCleanupIndexTreeTask.
> Thus, if we find the renamed root pages in task 1, we can clear the index
> trees to the end, otherwise the task can be completed.
> Also, if we find that rename pages are present, and the step 2 has not
> been completed, then we can start rebuilding the indexes.
>
> WDYT?
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Technical discussion

2021-07-01 Thread Alexei Scherbakov
Hi.

We have made some progress on the topic.

The JRaft fork is merged to Ignite 3 master, now it's integrated with other
ready components.

The design of transactional protocol in the first iteration is published on
the master [1]

[1] https://github.com/apache/ignite-3/tree/main/modules/transactions


сб, 20 мар. 2021 г. в 21:00, Alexei Scherbakov :

> Folks,
>
> I want to share some information about progress in implementing the raft
> protocol in ignite 3, which is a prerequisite for metastorage.
>
> The implementation will consist of client and server modules. The client
> is responsible for interoperability between raft server node and any other
> remote/local java process
>
> I have recently finished a raft client API. The public API part is
> available here [1] for review. The entry point is RaftGroupService
> interface. The service implementation has not been finished yet and can be
> skipped for now.
>
> As for the server part, currently we are investigating two options. First
> is etcd [2] implementation ported to Java. The drawback here is the amount
> of work required to make it working. Second option is the adoption of
> jraft [3] implementation. It is a full featured implementation already
> written in Java, but the code is not quite clean in my opinion and will
> require some refactoring.
>
> The next step is to make a raft client working with server
> implementations. At least one is required for the next alpha. It is planned
> to have the same client for both server implementations. As soon as both
> will be ready, we will compare them by running consistency tests and
> benchmarks and drop the worst. I will give the next update when we will
> have a working client and at least one server implementation ready.
>
> [1] https://github.com/apache/ignite-3/pull/59/files
> [2] https://github.com/etcd-io/etcd/tree/master/raft
> [3] https://github.com/sofastack/sofa-jraft
>
> пт, 27 нояб. 2020 г. в 20:26, Alexey Goncharuk  >:
>
>> Folks, thanks to everyone who joined the call. Summary:
>>
>>- We agree that it may be beneficial to separate metastorage and group
>>membership services, however, the abstractions should be clean enough
>> so
>>that we could implement group membership via metastorage
>>- Production cluster setup will involve an administrator 'init' command
>>that will initialize the metastorage raft group. Once the metastorage
>> is
>>initialized, all nodes may be restarted arbitrarily
>>- HA cluster must contain at least 3 nodes. 2-node cluster will stop
>>progress when one of the nodes fails (due to metastorage requirements)
>>- We will provide a 'developer' cluster mode which will allow a 1-node
>>setup and auto-initialization without the 'init' command
>>- We are targeting centralized affinity calculation that will be stored
>>to the metastorage. Metastorage downtime does not necessarily mean
>> cluster
>>availability (subject to the partition replication protocol choice). It
>>would be good to maximally hide the partition object so that we could
>>support range partitioning in the future
>>
>> To discuss at the next meeting (do not hesitate to send questions here
>> before the meeting):
>>
>>- Raft implementation details (API model, porting, etc)
>>- Transactions interaction with replication protocol
>>- Weaker consistency options
>>
>> Please add more if I forgot something and let's choose a time for the next
>> meeting.
>>
>> --AG
>>
>> чт, 26 нояб. 2020 г. в 16:12, Kseniya Romanova > >:
>>
>> > Done
>> >
>> > чт, 26 нояб. 2020 г. в 13:18, Ivan Daschinsky :
>> >
>> > > Alexey, is it possible to manage call at 16:00 MSK?
>> > >
>> > > чт, 26 нояб. 2020 г. в 12:30, Alexey Goncharuk <
>> > alexey.goncha...@gmail.com
>> > > >:
>> > >
>> > > > Hi Ivan,
>> > > >
>> > > > Unfortunately, the earliest window available for us is 12:00 MSK (1
>> > hour
>> > > > slot), or after 14:30 MSK. Let me know what time works best for you.
>> > > >
>> > > > ср, 25 нояб. 2020 г. в 21:38, Ivan Daschinsky > >:
>> > > >
>> > > > > Alexey, I kindly ask you to move the meeting a little bit earlier,
>> > > ideal
>> > > > > variant -- in the morning.
>> > > > >
>> > > > > ср, 25 нояб. 2020 г. в 20:10, Alexey Goncharuk <
>> > > > alexey.goncha...@gmail.com
>> > > > > >:

Re: Text Queries Support

2021-06-21 Thread Alexei Scherbakov
Hi.

One of the biggest issues with text queries is a lack of support for lucene
indices persistence, which makes this functionality useless if a
persistence is enabled.

I would first take care of it.

пн, 21 июн. 2021 г. в 12:16, Maksim Timonin :

> Hi, Atri!
>
> You're right, Actually there is a lack of support for TextQueries. For the
> last ticket I'm doing I see some obvious issues with them (no page size
> support, for example). I'm glad that somebody wants to maintain this
> functionality. Thanks a lot!
>
> For the MergeSort algorithm there is already a patch for that [1]. It's
> currently on review. This patch introduces an abstract reducer for
> CacheQueries with 2 implementations (unordered, merge-sort). Then TextQuery
> leverages on MergeSort to order results from multiple nodes by score. This
> patch also fixes the pageSize issue, I've mentioned before. Could you
> please check if it fully matches your idea? Any issues or comments are
> welcome.
>
> I've prepared this ticket, because I need the MergeSort algorithm for the
> new type of queries I'm implementing (IndexQuery, it should also provide
> ordered results over multiple nodes). Currently I'm not planning to go
> further with TextQuery, so if you're going to support this it'll be a great
> contribution, I think.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-14703
> [2] https://github.com/apache/ignite/pull/9081
>
>
> On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma  wrote:
>
> > Hi All,
> >
> > I have been looking into our text queries support and see that it has
> > limited community support.
> >
> > Therefore, I volunteer to be the maintainer of the module and work on
> > enhancing it further.
> >
> > First goal would be to move to Lucene 8.x, then work on sorted reduce
> > - merge across nodes. Fundamentally, this is doable since Lucene ranks
> > documents according to their score, and documents are returned in the
> > order of their score. Since the scoring function is homogeneous, this
> > means that across nodes, we can compare scores and merge sort.
> >
> > Please let me know if I can take this up.
> >
> > Atri
> >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Code style. Variable abbrevations

2021-06-07 Thread Alexei Scherbakov
ager [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgnitePdsPartitionPreloadTest.java:63:
> > > >>> Abbrevation should be used for DEFAULT_REGION! Please, use dflt,
> > > instead of
> > > >>> DEFAULT [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgnitePdsWholeClusterRestartTest.java:47:
> > > >>> Abbrevation should be used for ENTRIES_COUNT! Please, use cnt,
> > instead
> > > of
> > > >>> COUNT [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgnitePdsRebalancingOnNotStableTopologyTest.java:49:
> > > >>> Abbrevation should be used for CHECKPOINT_FREQUENCY! Please, use
> > freq,
> > > >>> instead of FREQUENCY [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgnitePdsTransactionsHangTest.java:75:
> > > >>> Abbrevation should be used for MAX_KEY_COUNT! Please, use cnt,
> > instead
> > > of
> > > >>> COUNT [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgnitePdsTransactionsHangTest.java:78:
> > > >>> Abbrevation should be used for CHECKPOINT_FREQUENCY! Please, use
> > freq,
> > > >>> instead of FREQUENCY [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/SlowHistoricalRebalanceSmallHistoryTest.java:57:
> > > >>> Abbrevation should be used for SUPPLY_MESSAGE_LATCH! Please, use
> msg,
> > > >>> instead of MESSAGE [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgniteLogicalRecoveryTest.java:74:
> > > >>> Abbrevation should be used for SHARED_GROUP_NAME! Please, use grp,
> > > instead
> > > >>> of GROUP [IgniteAbbrevationsRule]
> > > >>> [ERROR]
> > > >>>
> > >
> >
> /Users/sbt-izhikov-nv/work/ignite/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/db/IgniteLogicalRecoveryTest.java:200:
> > > >>> Abbrevation should be used for cacheLoader! Please, use ldr,
> instead
> > of
> > > >>> Loader [IgniteAbbrevationsRule]
> > > >>> ```
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules#AbbreviationRules-VariableAbbreviation
> > > >>> [2] https://github.com/apache/ignite/pull/9153
> > > >>>
> > > >>>
> > > >>
> > >
> > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Node and cluster life-cycle in ignite-3

2021-06-03 Thread Alexei Scherbakov
I've made a small mistake above, the correct sentence is

void stop(); // Stop a component. Invoked in "depends on" relation order.
In the example above,  RaftGroup  is stopped before the Network.

чт, 3 июн. 2021 г. в 17:36, Alexei Scherbakov :

> Sergey, I'm ok with the runlevel approach.
>
> I've thought about the node/components lifecycle, here my ideas:
>
> 1. The Component interface.
> Each manageable component must implement it.
>
> 2. Define components hierarchy.
> Each component can depend on others - this produces component hierarchy
> defined by "depends on" relation.
> For example, RaftGroup depends on a ClusterService to send messages to
> other nodes.
>
> 3. Cyclic dependencies in the component hierarchy are forbidden.
>
> 4. Some form of dependency injection for easier component construction.
>
> 5. Transparent component lifecycle, defined by following methods:
>
> void start(); // Start a component
> void afterStart(); // Called then all component dependencies are
> initialized. Invoked in reverse to "depends on" relation order.
> void beforeStop();  // Called before the component is going to stop (for
> example, to cancel a pending operation) Invoked in reverse to "depends on"
> relation order.
> void stop(); // Stop a component. Invoked in "depends on" relation order.
> In the example above, Ignite is stopped before the Network.
> boolean isStopping(); // Flag to check if a node is stopping right now.
> boolean runnable(int runLevel) // Defines if a component has to be started
> on a specific run level.
>
> 6. Dynamic components (can be started/stopped at any time)
>
> 7. enterBusy/leaveBusy/block logic (similar to Ignite2) to avoid races on
> node stopping.
>
> чт, 3 июн. 2021 г. в 13:08, Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
>> Hi Sergey,
>>
>> Sounds interesting, I do agree that it might be beneficial to improve the
>> lifecycle management in 3.0 - 2.x version is far from perfect.
>>
>> Regarding your questions:
>>
>> 1. Can this be done via the metastore?
>> 2. I think we should list the run levels that we think should be there,
>> and
>> then it will be easier to identify dependencies between them. Can you give
>> an example of independent run levels?
>>
>> -Val
>>
>> On Tue, Jun 1, 2021 at 7:57 AM Sergey Chugunov > >
>> wrote:
>>
>> >  Hello Igniters,
>> >
>> > I would like to start a discussion on evolving IEP-73 [1]. Now it
>> covers a
>> > narrow topic about components dependencies but it makes sense to cover
>> in
>> > the IEP a broader question: how different components should be
>> initialized
>> > to support different modes of an individual node or a whole cluster.
>> >
>> > There is an idea to borrow the notion of run-levels from Unix-like
>> systems,
>> > and I suggest the following design to implement it.
>> >
>> >1. To start and function at a specific run-level node needs to start
>> and
>> >initialize components in a proper order. During initialization
>> > components
>> >may need to notify each other about reaching a particular run-level
>> so
>> >other components are able to execute their actions. Orchestrating of
>> > this
>> >process should be a responsibility of a new component.
>> >
>> >2. Orchestration component doesn't manage the initialization process
>> >directly but uses another abstraction called scenario. Examples of
>> >run-levels in the context of Ignite 2.x may be Maintenance Mode,
>> >INACTIVE-READONLY-ACTIVE states of a cluster, and each level is
>> reached
>> >when a corresponding scenario has executed.
>> >
>> >So the responsibility of the orchestrator will be managing scenarios
>> and
>> >providing them with infrastructure of spreading notification events
>> > between
>> >components. All low-level details and knowledge of existing
>> components
>> > and
>> >their dependencies are encapsulated inside scenarios.
>> >
>> >3. Scenarios allow nesting, e.g. a scenario for INACTIVE cluster
>> state
>> >can be "upgraded" to READONLY state by executing diff between
>> INACTIVE
>> > and
>> >READONLY scenarios.
>> >
>> >
>> > I see several advantages of this design compared to existing model in
>> > Ignite 2.x (mostly implemented in IgniteKernal and based on two main

Re: Node and cluster life-cycle in ignite-3

2021-06-03 Thread Alexei Scherbakov
Sergey, I'm ok with the runlevel approach.

I've thought about the node/components lifecycle, here my ideas:

1. The Component interface.
Each manageable component must implement it.

2. Define components hierarchy.
Each component can depend on others - this produces component hierarchy
defined by "depends on" relation.
For example, RaftGroup depends on a ClusterService to send messages to
other nodes.

3. Cyclic dependencies in the component hierarchy are forbidden.

4. Some form of dependency injection for easier component construction.

5. Transparent component lifecycle, defined by following methods:

void start(); // Start a component
void afterStart(); // Called then all component dependencies are
initialized. Invoked in reverse to "depends on" relation order.
void beforeStop();  // Called before the component is going to stop (for
example, to cancel a pending operation) Invoked in reverse to "depends on"
relation order.
void stop(); // Stop a component. Invoked in "depends on" relation order.
In the example above, Ignite is stopped before the Network.
boolean isStopping(); // Flag to check if a node is stopping right now.
boolean runnable(int runLevel) // Defines if a component has to be started
on a specific run level.

6. Dynamic components (can be started/stopped at any time)

7. enterBusy/leaveBusy/block logic (similar to Ignite2) to avoid races on
node stopping.

чт, 3 июн. 2021 г. в 13:08, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Hi Sergey,
>
> Sounds interesting, I do agree that it might be beneficial to improve the
> lifecycle management in 3.0 - 2.x version is far from perfect.
>
> Regarding your questions:
>
> 1. Can this be done via the metastore?
> 2. I think we should list the run levels that we think should be there, and
> then it will be easier to identify dependencies between them. Can you give
> an example of independent run levels?
>
> -Val
>
> On Tue, Jun 1, 2021 at 7:57 AM Sergey Chugunov 
> wrote:
>
> >  Hello Igniters,
> >
> > I would like to start a discussion on evolving IEP-73 [1]. Now it covers
> a
> > narrow topic about components dependencies but it makes sense to cover in
> > the IEP a broader question: how different components should be
> initialized
> > to support different modes of an individual node or a whole cluster.
> >
> > There is an idea to borrow the notion of run-levels from Unix-like
> systems,
> > and I suggest the following design to implement it.
> >
> >1. To start and function at a specific run-level node needs to start
> and
> >initialize components in a proper order. During initialization
> > components
> >may need to notify each other about reaching a particular run-level so
> >other components are able to execute their actions. Orchestrating of
> > this
> >process should be a responsibility of a new component.
> >
> >2. Orchestration component doesn't manage the initialization process
> >directly but uses another abstraction called scenario. Examples of
> >run-levels in the context of Ignite 2.x may be Maintenance Mode,
> >INACTIVE-READONLY-ACTIVE states of a cluster, and each level is
> reached
> >when a corresponding scenario has executed.
> >
> >So the responsibility of the orchestrator will be managing scenarios
> and
> >providing them with infrastructure of spreading notification events
> > between
> >components. All low-level details and knowledge of existing components
> > and
> >their dependencies are encapsulated inside scenarios.
> >
> >3. Scenarios allow nesting, e.g. a scenario for INACTIVE cluster state
> >can be "upgraded" to READONLY state by executing diff between INACTIVE
> > and
> >READONLY scenarios.
> >
> >
> > I see several advantages of this design compared to existing model in
> > Ignite 2.x (mostly implemented in IgniteKernal and based on two main
> > methods: start and onKernalStart):
> >
> >1. More flexible model allows implementing more diverse run-levels for
> >different needs (already mentioned Maintenance Mode, cluster state
> modes
> >like ACTIVE-INACTIVE and smart strategies for cache warmup on node
> > start).
> >
> >2. Knowledge of components and their dependencies is encapsulated
> inside
> >scenarios which makes it easier to create new scenarios.
> >
> >
> > Open questions:
> >
> >1. As I see right now it is hard to standardize initialization events
> >components notify each other with.
> >
> >2. It is not clear if run-levels should be organized into one rigid
> >hierarchy (when the first run-level should always precede the second
> > and so
> >on) or they should be more independent.
> >
> >
> > What do you think?
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-73%3A+Node+startup
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Static hierarchy in jmx tree

2021-04-28 Thread Alexei Scherbakov
Igor Akkuratov,

I have several concerns about your patch.

1. I don't understand why setting
IGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false + igniteInstanceName is not a
solution to your issue ?
Just put it in the documentation and this should do fine.

2. Removing the classloader id can break the template working in the
container environment, where the instances with the same name are
instantiated using different classloaders.
How is this scenario supposed to work with a single template ?

3. Your patch introduces breaking change. This can be done only in two
steps: release N deprecated the behavior, release N + 1 changes the
behavior, according to the new rules.
But first let's decide if we really need to do any change at all.




<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Без
вирусов. www.avast.ru
<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#m_-5956278403710209994_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

пт, 16 апр. 2021 г. в 14:16, Igor Akkuratov :

> Is there anybody alive?
>


-- 

Best regards,
Alexei Scherbakov

<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Без
вирусов. www.avast.ru
<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: [DISCUSSION] Error handling in Ignite 3

2021-04-21 Thread Alexei Scherbakov
Alexei,

I think it's ok to do error conversion if it makes sense, but better to
preserve the root cause whenever possible.
Another way to solve the described scenario is to introduce something like
checked IgniteRetryAgainException, which forces the user to retry or ignore
it.
It's difficult to foresee exactly at this point what is the best solution
without knowing the exact scenario.

Andrey,

I've just realized your proposal to add IGN to each string code. This is ok
to me, for example IGN-TBL-0001

ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk :

> Aleksei,
>
> > The method should always report root cause, in your example it will be
> > B-, no matter which module API is called
>
> I may be wrong, but I doubt this will be usable for an end-user. Let's
> imagine that the same root exception was raised in different contexts
> resulting in two outcomes. The first one is safe to retry (say, the root
> cause led to a transaction prepare failure), but the second outcome may be
> a state in which no matter how many retries will be attempted, the
> operation will never succeed. Only the upper-level layer can tell the
> difference and return a proper message to the user, so I would say that
> some error conversion/wrapping will be required no matter what.
>
> --AG
>
> пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> > >:
> >
> > > Hi Alexey,
> > > I like the idea.
> > >
> > > 1.
> > >
> > > >   TBL-0001 is a *string representation* of the error. It is built
> from
> > 2
> > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > internally packed in int. No any kind of parsing will be necessary to
> > > read
> > > > scope/category.
> > >
> > > I think Alexey mean if it will be possible to make smth like that
> > >
> > > catch (IgniteException e) {
> > > if (e.getScope() == "TBL" && e.getCode() == 1234)
> > > continue; // E.g. retry my TX
> > > }
> > >
> > > It looks useful to me.
> > >
> >
> > I have in mind something like this:
> >
> > public class IgniteException extends RuntimeException {
> > private int errorCode;
> >
> > public IgniteException(ErrorScope scope, int code, String message,
> > Throwable cause) {
> > super(message, cause);
> > this.errorCode = make(scope, code);
> > }
> >
> > public boolean matches(ErrorScope scope, int code) {
> > return errorCode == make(scope, code);
> > }
> >
> > private int make(ErrorScope scope, int code) {
> > return ((scope.ordinal() << 16) | code);
> > }
> >
> > public ErrorScope scope() {
> > return ErrorScope.values()[errorCode >> 16];
> > }
> >
> > public int code() {
> > return 0x & errorCode;
> > }
> >
> > public static void main(String[] args) {
> > IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> "test",
> > null);
> >
> > System.out.println(e.matches(ErrorScope.RAFT, 2));
> > System.out.println(e.scope());
> > System.out.println(e.code());
> >
> > try {
> > throw e;
> > }
> > catch (IgniteException ee) {
> > System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > }
> > }
> > }
> >
> >
> > >
> > > 2. How you see a cross-module exception throwing?
> > > Assume, user call -> module A, which recursively call -> module B,
> which
> > > fails.
> > > So, module A component calls a module B component and got an Exception
> > with
> > > "B-1234" exception.
> > > Module A do not expect any exception here and doesn't take care of
> > "B-xxx"
> > > error codes, but only "A-.
> > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > > or reuse "B-" code with the own message, pointing original
> exception
> > as
> > > a cause for both cases?
> > >
> > > The first approach may looks confusing, while the second one produces
> too
> > > many "UNK-" codes.
> > > What code should get the user in that case?
> > >
> > &

Re: [DISCUSSION] Error handling in Ignite 3

2021-04-21 Thread Alexei Scherbakov
Andrey,

I've already proposed the similar string representation in the very first
message of the topic.

ср, 21 апр. 2021 г. в 12:31, Andrey Mashenkov :

> Alexey,
>
> As I understand, you suggest ErrorScope enum for easier analysis and it
> will be a part of a error code.
> But what about String representation?
>
> I think we should use a common prefix for error codes in error messages.
> Such codes will be more searchable and as a bonus, vendor-specific.
> String representation might look like IGN-001042 or IGN-TBL042.
>
>
>
>
> On Wed, Apr 21, 2021 at 11:43 AM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > I've create the ticket for implementing this [1]
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-14611
> >
> > пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com
> > >:
> >
> > >
> > >
> > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > > >:
> > >
> > >> Hi Alexey,
> > >> I like the idea.
> > >>
> > >> 1.
> > >>
> > >> >   TBL-0001 is a *string representation* of the error. It is built
> > from 2
> > >> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > >> > internally packed in int. No any kind of parsing will be necessary
> to
> > >> read
> > >> > scope/category.
> > >>
> > >> I think Alexey mean if it will be possible to make smth like that
> > >>
> > >> catch (IgniteException e) {
> > >> if (e.getScope() == "TBL" && e.getCode() == 1234)
> > >> continue; // E.g. retry my TX
> > >> }
> > >>
> > >> It looks useful to me.
> > >>
> > >
> > > I have in mind something like this:
> > >
> > > public class IgniteException extends RuntimeException {
> > > private int errorCode;
> > >
> > > public IgniteException(ErrorScope scope, int code, String message,
> > > Throwable cause) {
> > > super(message, cause);
> > > this.errorCode = make(scope, code);
> > > }
> > >
> > > public boolean matches(ErrorScope scope, int code) {
> > > return errorCode == make(scope, code);
> > > }
> > >
> > > private int make(ErrorScope scope, int code) {
> > > return ((scope.ordinal() << 16) | code);
> > > }
> > >
> > > public ErrorScope scope() {
> > > return ErrorScope.values()[errorCode >> 16];
> > > }
> > >
> > > public int code() {
> > > return 0x & errorCode;
> > > }
> > >
> > > public static void main(String[] args) {
> > > IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > > "test", null);
> > >
> > > System.out.println(e.matches(ErrorScope.RAFT, 2));
> > > System.out.println(e.scope());
> > > System.out.println(e.code());
> > >
> > > try {
> > > throw e;
> > > }
> > > catch (IgniteException ee) {
> > > System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > > }
> > > }
> > > }
> > >
> > >
> > >>
> > >> 2. How you see a cross-module exception throwing?
> > >> Assume, user call -> module A, which recursively call -> module B,
> which
> > >> fails.
> > >> So, module A component calls a module B component and got an Exception
> > >> with
> > >> "B-1234" exception.
> > >> Module A do not expect any exception here and doesn't take care of
> > "B-xxx"
> > >> error codes, but only "A-.
> > >> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > >> or reuse "B-" code with the own message, pointing original
> exception
> > >> as
> > >> a cause for both cases?
> > >>
> > >> The first approach may looks confusing, while the second one produces
> > too
> > >> many "UNK-" codes.
> > >> What code should get the user in that case?
> > >>
> > >>
> > >>
> > > The method should alway

Re: [DISCUSSION] Error handling in Ignite 3

2021-04-21 Thread Alexei Scherbakov
I've create the ticket for implementing this [1]

[1] https://issues.apache.org/jira/browse/IGNITE-14611

пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov :

>
>
> чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov  >:
>
>> Hi Alexey,
>> I like the idea.
>>
>> 1.
>>
>> >   TBL-0001 is a *string representation* of the error. It is built from 2
>> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
>> > internally packed in int. No any kind of parsing will be necessary to
>> read
>> > scope/category.
>>
>> I think Alexey mean if it will be possible to make smth like that
>>
>> catch (IgniteException e) {
>> if (e.getScope() == "TBL" && e.getCode() == 1234)
>> continue; // E.g. retry my TX
>> }
>>
>> It looks useful to me.
>>
>
> I have in mind something like this:
>
> public class IgniteException extends RuntimeException {
> private int errorCode;
>
> public IgniteException(ErrorScope scope, int code, String message,
> Throwable cause) {
> super(message, cause);
> this.errorCode = make(scope, code);
> }
>
> public boolean matches(ErrorScope scope, int code) {
> return errorCode == make(scope, code);
> }
>
> private int make(ErrorScope scope, int code) {
> return ((scope.ordinal() << 16) | code);
> }
>
> public ErrorScope scope() {
> return ErrorScope.values()[errorCode >> 16];
> }
>
> public int code() {
> return 0x & errorCode;
> }
>
> public static void main(String[] args) {
> IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> "test", null);
>
> System.out.println(e.matches(ErrorScope.RAFT, 2));
> System.out.println(e.scope());
> System.out.println(e.code());
>
> try {
> throw e;
> }
> catch (IgniteException ee) {
> System.out.println(ee.matches(ErrorScope.RAFT, 1));
> }
> }
> }
>
>
>>
>> 2. How you see a cross-module exception throwing?
>> Assume, user call -> module A, which recursively call -> module B, which
>> fails.
>> So, module A component calls a module B component and got an Exception
>> with
>> "B-1234" exception.
>> Module A do not expect any exception here and doesn't take care of "B-xxx"
>> error codes, but only "A-.
>> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
>> or reuse "B-" code with the own message, pointing original exception
>> as
>> a cause for both cases?
>>
>> The first approach may looks confusing, while the second one produces too
>> many "UNK-" codes.
>> What code should get the user in that case?
>>
>>
>>
> The method should always report root cause, in your example it will be
> B-, no matter which module API is called.
>
>
>>
>>
>>
>> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
>> alexey.scherbak...@gmail.com> wrote:
>>
>> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev > >:
>> >
>> > > Hello!
>> > >
>> > > > All public methods throw only unchecked
>> > > org.apache.ignite.lang.IgniteException containing aforementioned
>> fields.
>> > > > Each public method must have a section in the javadoc with a list of
>> > all
>> > > possible error codes for this method.
>> > >
>> > > I don't think this is feasible at all.
>> > > Imagine javadoc for cache.get() method featuring three pages of
>> possible
>> > > error codes thrown by this method.
>> > >
>> >
>> > Of course there is no need to write 3 pages of error codes, this makes
>> no
>> > sense.
>> > I think we can use error ranges here, or, probably, document most
>> important
>> > error scenarios.
>> > The point here is to try to document errors as much as possible.
>> >
>> >
>> > > Also, updated every two weeks to account for changes in
>> > > underlying mechanisms.
>> > >
>> > > There is still a confusion between "error code for any error in logs"
>> and
>> > > "error code for any pair of method & exception". Which one are we
>> > > discussing really?
>> > >
>> > > This is impossible to track or test, but

Re: [DISCUSSION] Error handling in Ignite 3

2021-04-16 Thread Alexei Scherbakov
чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov :

> Hi Alexey,
> I like the idea.
>
> 1.
>
> >   TBL-0001 is a *string representation* of the error. It is built from 2
> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > internally packed in int. No any kind of parsing will be necessary to
> read
> > scope/category.
>
> I think Alexey mean if it will be possible to make smth like that
>
> catch (IgniteException e) {
> if (e.getScope() == "TBL" && e.getCode() == 1234)
> continue; // E.g. retry my TX
> }
>
> It looks useful to me.
>

I have in mind something like this:

public class IgniteException extends RuntimeException {
private int errorCode;

public IgniteException(ErrorScope scope, int code, String message,
Throwable cause) {
super(message, cause);
this.errorCode = make(scope, code);
}

public boolean matches(ErrorScope scope, int code) {
return errorCode == make(scope, code);
}

private int make(ErrorScope scope, int code) {
return ((scope.ordinal() << 16) | code);
}

public ErrorScope scope() {
return ErrorScope.values()[errorCode >> 16];
}

public int code() {
return 0x & errorCode;
}

public static void main(String[] args) {
IgniteException e = new IgniteException(ErrorScope.RAFT, 1, "test",
null);

System.out.println(e.matches(ErrorScope.RAFT, 2));
System.out.println(e.scope());
System.out.println(e.code());

try {
throw e;
}
catch (IgniteException ee) {
System.out.println(ee.matches(ErrorScope.RAFT, 1));
}
}
}


>
> 2. How you see a cross-module exception throwing?
> Assume, user call -> module A, which recursively call -> module B, which
> fails.
> So, module A component calls a module B component and got an Exception with
> "B-1234" exception.
> Module A do not expect any exception here and doesn't take care of "B-xxx"
> error codes, but only "A-.
> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> or reuse "B-" code with the own message, pointing original exception as
> a cause for both cases?
>
> The first approach may looks confusing, while the second one produces too
> many "UNK-" codes.
> What code should get the user in that case?
>
>
>
The method should always report root cause, in your example it will be
B-, no matter which module API is called.


>
>
>
> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev  >:
> >
> > > Hello!
> > >
> > > > All public methods throw only unchecked
> > > org.apache.ignite.lang.IgniteException containing aforementioned
> fields.
> > > > Each public method must have a section in the javadoc with a list of
> > all
> > > possible error codes for this method.
> > >
> > > I don't think this is feasible at all.
> > > Imagine javadoc for cache.get() method featuring three pages of
> possible
> > > error codes thrown by this method.
> > >
> >
> > Of course there is no need to write 3 pages of error codes, this makes no
> > sense.
> > I think we can use error ranges here, or, probably, document most
> important
> > error scenarios.
> > The point here is to try to document errors as much as possible.
> >
> >
> > > Also, updated every two weeks to account for changes in
> > > underlying mechanisms.
> > >
> > > There is still a confusion between "error code for any error in logs"
> and
> > > "error code for any pair of method & exception". Which one are we
> > > discussing really?
> > >
> > > This is impossible to track or test, but is susceptible for common
> > > error-hiding antipattern where all exceptions are caught in cache.get()
> > and
> > > discarded, and instead a brand new CH-0001 "error in cache.get()" is
> > thrown
> > > to the user.
> > >
> > > Which is certainly not something that anybody wants.
> > >
> >
> > Certainly not. We are talking here about root cause - what is exactly the
> > reason for method call failure.
> >
> >
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov :
> > >
> > > > Hi Alexei,
> > &

Re: [DISCUSSION] Error handling in Ignite 3

2021-04-15 Thread Alexei Scherbakov
чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev :

> Hello!
>
> > All public methods throw only unchecked
> org.apache.ignite.lang.IgniteException containing aforementioned fields.
> > Each public method must have a section in the javadoc with a list of all
> possible error codes for this method.
>
> I don't think this is feasible at all.
> Imagine javadoc for cache.get() method featuring three pages of possible
> error codes thrown by this method.
>

Of course there is no need to write 3 pages of error codes, this makes no
sense.
I think we can use error ranges here, or, probably, document most important
error scenarios.
The point here is to try to document errors as much as possible.


> Also, updated every two weeks to account for changes in
> underlying mechanisms.
>
> There is still a confusion between "error code for any error in logs" and
> "error code for any pair of method & exception". Which one are we
> discussing really?
>
> This is impossible to track or test, but is susceptible for common
> error-hiding antipattern where all exceptions are caught in cache.get() and
> discarded, and instead a brand new CH-0001 "error in cache.get()" is thrown
> to the user.
>
> Which is certainly not something that anybody wants.
>

Certainly not. We are talking here about root cause - what is exactly the
reason for method call failure.


>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov :
>
> > Hi Alexei,
> >
> > > Each public method *must *have a section in the javadoc with a list of
> > all possible error codes for this method.
> >
> > I consider it is redundant, because any public exception can be thrown
> from
> > public API.
> > If it not happens today, it may change tomorrow: one will be removed,
> > another one will be added.
> >
> > >Nested exceptions are not forbidden to use. They can provide additional
> > details on the error for debug purposes
> >
> > I see another issue, which is in the Ignite 2.x, but not attend here. We
> > can have a deep nested exception and use it for handling.
> > In the result, all time when we are handling an exception we use
> > pattern like this:
> > try{
> > ...
> > }
> > catch (Exception e) {
> > if (X.hasCause(e, TimeoutException.class))
> > throw e;
> >
> > if (X.hasCause(e, ConnectException.class, EOFException.class))
> > continue;
> >
> > if (X.hasCause(e, InterruptedException.class))
> > return false;
> > }
> >
> > If we have a pant to make only one exception to the client side, we can
> > also do it for an internal exception.
> >
> > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> >
> > > Alexey,
> > >
> > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> kukushkinale...@gmail.com
> > >:
> > >
> > > > Just some points looking questionable to me, although I realize the
> > error
> > > > handling style may be very opinionated:
> > > >
> > > >- Would it make sense splitting the proposed composite error code
> > > >(TBL-0001) into separate numeric code (0001) and scope/category
> > > ("TBL")
> > > > to
> > > >avoid parsing the code when an error handler needs to analyze only
> > the
> > > >category or the code?
> > > >
> > >
> > >   TBL-0001 is a *string representation* of the error. It is built from
> 2
> > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > internally packed in int. No any kind of parsing will be necessary to
> > read
> > > scope/category.
> > >
> > >
> > > >- "*The cause - short string description of an issue, readable by
> > > > user.*".
> > > >This terminology sounds confusing to me since that "cause" sounds
> > like
> > > > Java
> > > >Throwable's Message to me and the "Cause" is a lower level
> > exception.
> > > >
> > >
> > > The string describes the cause of error, so the name. I'm ok to rename
> it
> > > to a message. It will be stored in IgniteException.message field
> anyway.
> > >
> > >
> > > >- "*The action - steps for a user to resolve error...*". The
> action
> > is
> > > >very important but do we want to make

Re: [DISCUSSION] Error handling in Ignite 3

2021-04-14 Thread Alexei Scherbakov
Alexey,

ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin :

> Just some points looking questionable to me, although I realize the error
> handling style may be very opinionated:
>
>- Would it make sense splitting the proposed composite error code
>(TBL-0001) into separate numeric code (0001) and scope/category ("TBL")
> to
>avoid parsing the code when an error handler needs to analyze only the
>category or the code?
>

  TBL-0001 is a *string representation* of the error. It is built from 2
byte scope id (mapped to name TBL) and 2 byte number (0001). Both
internally packed in int. No any kind of parsing will be necessary to read
scope/category.


>- "*The cause - short string description of an issue, readable by
> user.*".
>This terminology sounds confusing to me since that "cause" sounds like
> Java
>Throwable's Message to me and the "Cause" is a lower level exception.
>

The string describes the cause of error, so the name. I'm ok to rename it
to a message. It will be stored in IgniteException.message field anyway.


>- "*The action - steps for a user to resolve error...*". The action is
>very important but do we want to make it part of the IgniteException? I
> do
>not think the recovery action text should be part of the exception.
>IgniteException may include a URL pointing to the corresponding
>documentation - this is discussable.
>

This will not be the part of the exception. A user should visit the
documentation page and read the action section by corresponding error code.


>- "*All public methods throw only unchecked IgniteException*" - I assume
>we still respect JCache specification and prefer using standard Java
>exception to signal about invalid parameters.
>

Using standard java exceptions whenever possible makes sense.


>- Why we do not follow the "classic" structured exception handling
>practices in Ignite:
>

Ignite 3 will be multi language, and other languages use other error
processing models. SQL for example uses error codes.
The single exception approach simplifies and unifies error handling across
platforms for me.


>   - Why do we not allow using checked exceptions? It seems to me
>   sometimes forcing the user to handle an error serves as a hint and
> thus
>   improves usability. For example, handling an optimistic/pessimistic
>   transaction conflict/deadlock. Or handling a timeout for operations
> with
>   timeouts.
>

A valid point. Checked exceptions must be used for whose methods, where
error handling is enforced, for example tx optimistic failure.
Such errors will also have corresponding error codes.


>   - Why single public IgniteException and no exception hierarchy? Java
>   is optimized for structured exception handling instead of using
> IF-ELSE to
>   analyze the codes.
>

Exception hierarchy is not required when using error codes and applicable
only to java API, so I would avoid spending efforts on it.


>   - Why no nested exceptions? Sometimes an error handler is interested
>   only in high level exceptions (like Invalid Configuration) and
> sometimes
>   details are needed (like specific configuration parser exceptions).
>

Nested exceptions are not forbidden to use. They can provide additional
details on the error for debug purposes, but not strictly required, because
error code + message should provide enough information to the user.


>- For async methods returning a Future we may have a universal rule on
>how to handle exceptions. For example, we may specify that any async
> method
>can throw only invalid argument exceptions. All other errors are
> reported
>via the exceptionally(IgniteException -> {}) callback even if the async
>method was executed synchronously.
>

This is ok to me.


>
>
> вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Igniters,
> >
> > I would like to start the discussion about error handling in Ignite 3 and
> > how we can improve it compared to Ignite 2.
> >
> > The error handling in Ignite 2 was not very good because of generic
> > CacheException thrown on almost any occasion, having deeply nested root
> > cause and often containing no useful information on further steps to fix
> > the issue.
> >
> > I aim to fix it by introducing some rules on error handling.
> >
> > *Public exception structure.*
> >
> > A public exception must have an error code, a cause, and an action.
> >
> > * The code - the combination of 2 byte scope id and 2 byte error number
> > within the module. This allows up to 2^16 error

[DISCUSSION] Error handling in Ignite 3

2021-04-13 Thread Alexei Scherbakov
Igniters,

I would like to start the discussion about error handling in Ignite 3 and
how we can improve it compared to Ignite 2.

The error handling in Ignite 2 was not very good because of generic
CacheException thrown on almost any occasion, having deeply nested root
cause and often containing no useful information on further steps to fix
the issue.

I aim to fix it by introducing some rules on error handling.

*Public exception structure.*

A public exception must have an error code, a cause, and an action.

* The code - the combination of 2 byte scope id and 2 byte error number
within the module. This allows up to 2^16 errors for each scope, which
should be enough. The error code string representation can look like
RFT-0001 or TBL-0001
* The cause - short string description of an issue, readable by user. This
can have dynamic parameters depending on the error type for better user
experience, like "Can't write a snapshot, no space left on device {0}"
* The action - steps for a user to resolve error situation described in the
documentation in the corresponding error section, for example "Clean up
disk space and retry the operation".

Common errors should have their own scope, for example IGN-0001

All public methods throw only unchecked
org.apache.ignite.lang.IgniteException containing aforementioned fields.
Each public method must have a section in the javadoc with a list of all
possible error codes for this method.

A good example with similar structure can be found here [1]

*Async timeouts.*

Because almost all API methods in Ignite 3 are async, they all will have a
configurable default timeout and can complete with timeout error if a
computation is not finished in time, for example if a response has not been
yet received.
I suggest to complete the async op future with TimeoutException in this
case to make it on par with synchronous execution using future.get, which
will throw java.util.concurrent.TimeoutException on timeout.
For reference, see java.util.concurrent.CompletableFuture#orTimeout
No special error code should be used for this scenario.

*Internal exceptions hierarchy.*

All internal exceptions should extend
org.apache.ignite.internal.lang.IgniteInternalException for checked
exceptions and
org.apache.ignite.internal.lang.IgniteInternalCheckedException for
unchecked exceptions.

Thoughts ?

[1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm

-- 

Best regards,
Alexei Scherbakov


Re: [ANNOUNCE] Welcome Ivan Daschinsky as a new committer

2021-04-13 Thread Alexei Scherbakov
Ivan, great work.

вт, 13 апр. 2021 г. в 10:53, Ivan Pavlukhin :

> Ivan, congrats!
>
> 2021-04-13 9:41 GMT+03:00, Nikolay Izhikov :
> > Congrats! Well deserved.
> >
> >> 13 апр. 2021 г., в 09:34, Zhenya Stanilovsky  >
> >> написал(а):
> >>
> >>
> >> Big deal ! Ivan, ignite it !)
> >>
> >>
> >>
> >>> The Project Management Committee (PMC) for Apache Ignite has invited
> >>> Ivan Daschinsky to become a committer and we are pleased to announce
> >>> that
> >>> he has accepted.
> >>>
> >>> Ivan made a lot of contributions to Apache Ignite.
> >>> He helped a lot to improve our Python Thin Client fixing a lot of
> >>> different
> >>> bugs and contributing major feature such as asyncio support and
> >>> C-extension
> >>> which improved performance significantly for many cases. Thanks to Ivan
> >>> Python
> >>> Thin client has become much more stable and production-ready. He also
> >>> introduced the CMake building system for Ignite C++, and has made a
> >>> number
> >>> of
> >>> other important improvements. Besides the code contributions, Ivan is
> >>> also
> >>> an
> >>> active community member.
> >>>
> >>> Being a committer enables easier contribution to the project since
> there
> >>> is
> >>> no need to go via the patch submission process. This should enable
> >>> better
> >>> productivity.
> >>>
> >>> Please join me in welcoming Ivan, and congratulating him on the new
> role
> >>> in
> >>> the Apache Ignite Community.
> >>>
> >>> Best Regards,
> >>> Igor
> >>
> >>
> >>
> >>
> >
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


-- 

Best regards,
Alexei Scherbakov


Re: Model of permissions for Ignite 3

2021-04-12 Thread Alexei Scherbakov
 and
> > 'ServicePermission'
> > > > > represent cache, compute,
> > > > > and service permissions accordingly,  allow wildcards, for example,
> > > > > "org.apache.ignite.internal.*".
> > > > > Class 'IgniteClusterPermission' represents permission without
> > actions.
> > > > > Interface 'GridSecurityProcessor' has a default implementation of
> the
> > > > > 'authorize' method.
> > > > > 'SecurityTestSuite' is green.
> > > > >
> > > > >
> > > > > This representation of permission, IMHO, has the following
> > advantages:
> > > > > - A developer can easily add new permission without needing to
> touch
> > > the
> > > > > core module.
> > > > > - There is no need to implement complicated logic to authorize an
> > > > > operation inside a security plugin.
> > > > >But a developer has the opportunity to add custom logic.
> > > > > - Wildcards for permission's name from a box, for example, 'new
> > > > > CachePermission("x.y.z.*", "get,put")'.
> > > > > - There is no need to implement 'SecurityPermissionSet' and a set
> of
> > > > > methods from 'SecurityContex' ('xxxAllowed(String,
> > > SecurityPermission))'.
> > > > > - We can define a security policy in a file as java does. It could
> > > > > simplify work for administrators.
> > > > >
> > > > > WDYT?
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Terms clarification and modules splitting logic

2021-04-08 Thread Alexei Scherbakov
Alexey,

Thanks for the detailed explanation.

Ok, let's agree on having the internal package. I've created the ticket [1]
to unify it's usage within the project.

[1] https://issues.apache.org/jira/browse/IGNITE-14506

чт, 8 апр. 2021 г. в 15:34, Alexey Goncharuk :

> Alexei,
>
> The main benefit from Jigsaw that I see for the project structure is
> controllable module interaction.
>
> Let's take our networking module as an example first. We may want to make
> sure that module implementation specifics do not leak to outside modules,
> so we define in the module definition that the module exports package
> org.apache.ignite.internal.network. Now, even if we have a public class in
> package org.apache.ignite.internal.network.scalecube (the class may be
> public for many reasons, including a need for access in other
> implementation packages), other modules will not be able to directly work
> with this public class - the code will not compile.
>
> Another important feature of Jigsaw is the qualified export statement that
> limits the exported API to specific modules. Let's say for some reason we
> want to limit Raft client usage only to metastorage and partition
> components. Then we can specify in the raft module descriptor that raft API
> is only exported to metastorage and partition modules. Other modules will
> not compile if they will try to work with raft API.
>
> To me, this looks like a very powerful mechanism allowing to strictly
> define modules structure and hierarchy.
>
> As for the utility classes, @Internal looks less obvious for me because a
> user cannot directly see it without looking at the class itself. When
> 'internal' is imprinted in the package, you can see the violation directly
> at the usage site because there will be an import statement with an
> 'internal' package. You can check this as simple as an obvious grep
> command, which will not work with the annotation.
>
> --AG
>
> ср, 31 мар. 2021 г. в 21:04, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Alexey,
> >
> > Can you provide us some details on jygsaw adoption to better understand
> > the benefits ?
> >
> > "We should be free to change them without any compatibility contract" -
> > let's mark such classes with a special annotation like @Internal, will it
> > work for you ?
> >
> >
> >
> > ср, 31 мар. 2021 г. в 15:10, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> > >:
> >
> > > This won't work with the Java Jigsaw module system because it prohibits
> > > having two identical packages in different modules. I really hope that
> we
> > > will adopt Jigsaw in the near future. Unless you are suggesting moving
> > all
> > > utility classes under org.apache.ignite.api.util package, bit this
> looks
> > > really odd to me - why would IgniteUuid be in api.util package?
> > >
> > > As for the public and private utilities, I think there may be some
> > classes
> > > that may be common for all modules, but should not be treated as public
> > API
> > > because we should be free to change them without any compatibility
> > > contract. An example of such a class is GridFunc. Arguably, many of its
> > > methods should be removed for good, but I am sure there will be a few
> > > really useful ones. Nevertheless, we should not encourage or allow
> users
> > to
> > > use GridFunc.
> > >
> > > --AG
> > >
> > > ср, 31 мар. 2021 г. в 14:27, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com
> > > >:
> > >
> > > > Alexey,
> > > >
> > > > I would instead  suggest moving the public utility classes to
> > > > org.apache.ignite.api. package in the util module to separate them
> from
> > > > internal classes, if we really need this.
> > > >
> > > > Actually, I don't think there is a point in separating
> public/internal
> > > > classes in the util module. What are the benefits of this ?
> > > >
> > > > ср, 31 мар. 2021 г. в 12:16, Alexey Goncharuk <
> > > alexey.goncha...@gmail.com
> > > > >:
> > > >
> > > > > Alexei,
> > > > >
> > > > > I had the same opinion regarding the internal package, but we still
> > > need
> > > > to
> > > > > somehow distinguish between public and internal classes in the
> > > > ignite-util
> > > > > module. If we introduce the internal package in the util, we should
> > > > follow
> >

Re: [DISSCUSSION] Common logger interface.

2021-04-08 Thread Alexei Scherbakov
Andrey,

The PR looks good to me.

Maybe we can wrap all internal threads into IgniteThread - I'm almost sure
this will work in this way.

Is it really need to use thread-locals for user threads? - probably not.
I'm not sure if there is any problem at all. As soon as we want to have
async API everywhere, out code should not be executed in user thread

чт, 8 апр. 2021 г. в 13:37, Andrey Mashenkov :

> Also, with the suggested approach,
> we should avoid indirectly usage of ForkJoinPool internally or set our own
> pool instance explicitly when using reactive things.
>
> On Thu, Apr 8, 2021 at 1:33 PM Andrey Mashenkov <
> andrey.mashen...@gmail.com>
> wrote:
>
> > Alexey,
> >
> > I've made a PR for logger [1].
> > Seems, we will need 2 logger classes.
> > 1. Node-aware logger adapter, that will add node prefix to messages and
> > delegate calls to System.Logger or whatever.
> > 2. Logger wrapper that will get logger from a thread-local.
> >
> > I don't like to use ThreadLocal directly when possible.
> > Maybe we can wrap all internal threads into IgniteThread and keep the
> > logger in an IgniteThread field to avoid lookups into thread-local-map.
> >
> > For user threads, only ThreadLocals can be used.
> > Is it really need to use thread-locals for user threads?
> > Will it be always obvious which node exception was thrown on? Any kind of
> > embedded mode?
> >
> > [1] https://github.com/apache/ignite-3/pull/87
> >
> > On Thu, Apr 8, 2021 at 12:32 PM Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> >
> >> Andrey,
> >>
> >> *final* word in the example is missing, my bad.
> >>
> >> I like the static logger approach.
> >>
> >> Regarding your comments:
> >> * The static logger can easily be used by multiple nodes in a single
> JVM,
> >> it's a matter of implementation. It can be achieved by setting thread
> >> local
> >> logger context for the node.
> >> For user threads the context can be set while entering ignite context
> (for
> >> example, by calling public API method)
> >> * Factory method is not necessary, because we already have a proxy
> object
> >> -
> >> LogWrapper, hiding internal implementation.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ср, 7 апр. 2021 г. в 18:39, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >> >:
> >>
> >> > Alexei,
> >> >
> >> > I see you've merged LoggerWrapper into main and use it in Raft module.
> >> > I can't figure out what manner you suggest to use LoggerWrapper in.
> >> > In the example above 'LOG' field is static, non-final and you create a
> >> > wrapper explicitly.
> >> >
> >> > I see 2 ways:
> >> > * Use a factory method to create a logger and set it into 'static
> final'
> >> > field.
> >> > In this case, a user will not be able to split logs from different
> nodes
> >> > running in the same JVM.
> >> > * Set logger into non-static field (with dependency injection future).
> >> > In this case, we need to pass the logger to every class instance where
> >> it
> >> > can be used.
> >> >
> >> >
> >> > On Fri, Mar 26, 2021 at 6:48 PM Вячеслав Коптилин <
> >> > slava.kopti...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hello Alexei,
> >> > >
> >> > > It would be nice to add something like as follows:
> >> > > boolean isInfoEnabled();
> >> > > boolean isDebugEnabled();
> >> > > or
> >> > > boolean isLoggable(Level) - the same way which System.Logger
> >> suggests
> >> > >
> >> > > Thanks,
> >> > > S.
> >> > >
> >> > > пт, 26 мар. 2021 г. в 17:41, Alexei Scherbakov <
> >> > > alexey.scherbak...@gmail.com
> >> > > >:
> >> > >
> >> > > > Andrey,
> >> > > >
> >> > > > I've introduced a new class LogWrapper to fix usability issues [1]
> >> > > >
> >> > > > The suggested usage is something like:
> >> > > >
> >> > > > private static LogWrapper LOG = new LogWrapper(MyClass.class);
> >> > > >
> >> > > > [1]
> >> > > >
> >> > > >
> >> > &g

Re: [DISSCUSSION] Common logger interface.

2021-04-08 Thread Alexei Scherbakov
Andrey,

*final* word in the example is missing, my bad.

I like the static logger approach.

Regarding your comments:
* The static logger can easily be used by multiple nodes in a single JVM,
it's a matter of implementation. It can be achieved by setting thread local
logger context for the node.
For user threads the context can be set while entering ignite context (for
example, by calling public API method)
* Factory method is not necessary, because we already have a proxy object -
LogWrapper, hiding internal implementation.







ср, 7 апр. 2021 г. в 18:39, Andrey Mashenkov :

> Alexei,
>
> I see you've merged LoggerWrapper into main and use it in Raft module.
> I can't figure out what manner you suggest to use LoggerWrapper in.
> In the example above 'LOG' field is static, non-final and you create a
> wrapper explicitly.
>
> I see 2 ways:
> * Use a factory method to create a logger and set it into 'static final'
> field.
> In this case, a user will not be able to split logs from different nodes
> running in the same JVM.
> * Set logger into non-static field (with dependency injection future).
> In this case, we need to pass the logger to every class instance where it
> can be used.
>
>
> On Fri, Mar 26, 2021 at 6:48 PM Вячеслав Коптилин <
> slava.kopti...@gmail.com>
> wrote:
>
> > Hello Alexei,
> >
> > It would be nice to add something like as follows:
> > boolean isInfoEnabled();
> > boolean isDebugEnabled();
> > or
> > boolean isLoggable(Level) - the same way which System.Logger suggests
> >
> > Thanks,
> > S.
> >
> > пт, 26 мар. 2021 г. в 17:41, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com
> > >:
> >
> > > Andrey,
> > >
> > > I've introduced a new class LogWrapper to fix usability issues [1]
> > >
> > > The suggested usage is something like:
> > >
> > > private static LogWrapper LOG = new LogWrapper(MyClass.class);
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/gridgain/apache-ignite-3/blob/9acb050a6a6a601ead849797293a1d0ad48ab9e0/modules/core/src/main/java/org/apache/ignite/lang/LogWrapper.java
> > >
> > > пт, 26 мар. 2021 г. в 16:05, Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > > >:
> > >
> > > > Forgot to attach a link to the PR with an example [1].
> > > >
> > > > [1] https://github.com/apache/ignite-3/pull/59
> > > >
> > > > On Fri, Mar 26, 2021 at 4:03 PM Andrey Mashenkov <
> > > > andrey.mashen...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Igniters,
> > > > >
> > > > > In almost every new task we faced the problem of what logger has to
> > be
> > > > > used: JUL. log4J or any else.
> > > > >
> > > > > Since JDK 9 there is a System.Logger which interface looks
> acceptable
> > > for
> > > > > use,
> > > > > excepts maybe some usability issues like method signatures.
> > > > > LogLevel is passed as a mandatory argument, and no shortcut methods
> > are
> > > > > provided (like 'warn', 'error' or 'info').
> > > > >
> > > > > I like Alex Scherbakov idea [1] to use a brand new JDK system
> logger
> > by
> > > > > default and
> > > > > extend it with shortcut methods.
> > > > >
> > > > > I've created a ticket to unify logger usage in Ignite-3.0 project
> to
> > > fix
> > > > > already existed code.
> > > > >
> > > > > Any thoughts or objections?
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrey V. Mashenkov
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov


Re: Terms clarification and modules splitting logic

2021-03-31 Thread Alexei Scherbakov
Alexey,

Can you provide us some details on jygsaw adoption to better understand
the benefits ?

"We should be free to change them without any compatibility contract" -
let's mark such classes with a special annotation like @Internal, will it
work for you ?



ср, 31 мар. 2021 г. в 15:10, Alexey Goncharuk :

> This won't work with the Java Jigsaw module system because it prohibits
> having two identical packages in different modules. I really hope that we
> will adopt Jigsaw in the near future. Unless you are suggesting moving all
> utility classes under org.apache.ignite.api.util package, bit this looks
> really odd to me - why would IgniteUuid be in api.util package?
>
> As for the public and private utilities, I think there may be some classes
> that may be common for all modules, but should not be treated as public API
> because we should be free to change them without any compatibility
> contract. An example of such a class is GridFunc. Arguably, many of its
> methods should be removed for good, but I am sure there will be a few
> really useful ones. Nevertheless, we should not encourage or allow users to
> use GridFunc.
>
> --AG
>
> ср, 31 мар. 2021 г. в 14:27, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Alexey,
> >
> > I would instead  suggest moving the public utility classes to
> > org.apache.ignite.api. package in the util module to separate them from
> > internal classes, if we really need this.
> >
> > Actually, I don't think there is a point in separating public/internal
> > classes in the util module. What are the benefits of this ?
> >
> > ср, 31 мар. 2021 г. в 12:16, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> > >:
> >
> > > Alexei,
> > >
> > > I had the same opinion regarding the internal package, but we still
> need
> > to
> > > somehow distinguish between public and internal classes in the
> > ignite-util
> > > module. If we introduce the internal package in the util, we should
> > follow
> > > the same structure in other modules.
> > >
> > > Thoughts?
> > >
> > > вт, 30 мар. 2021 г. в 18:37, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com
> > > >:
> > >
> > > > +1 to package and module naming.
> > > > +1 to service definition as "component providing a high-level API to
> > > > user/other components/services"
> > > >
> > > > I would avoid defining strict rules for Manager and Processor.
> > > > For me it just adds confusion without real value.
> > > > A component can be a Manager if it manages something, a Processor if
> it
> > > > processes something, and so on.
> > > > I think having Component and Service (which is also a Component) is
> > > enough.
> > > > Any component can be singleton or not - it's defined by its
> lifecycle.
> > > >
> > > > +1 to renaming core to something more meaningful, but the name lang
> > > doesn't
> > > > fit for a collection of utility classes for me, I would prefer
> > > ignite-util.
> > > > Apache Tomcat has the same jar, for reference. I'm also fine to leave
> > it
> > > as
> > > > is.
> > > > -1 to have an "internal" package. All modules are known to be
> internal
> > > > except api and (partially) util, so why bother at all?
> > > >
> > > >
> > > > вт, 30 мар. 2021 г. в 12:05, Andrey Mashenkov <
> > > andrey.mashen...@gmail.com
> > > > >:
> > > >
> > > > > Agree with package and module naming.
> > > > >
> > > > > I just thought that
> > > > > Service is a self-suffucient component and provides high-level API
> to
> > > > > user/other components/services (e.g. RaftService to TableService).
> > > > > Manager is internal component - a logical brick of the Service
> (e.g.
> > > > > RaftGroupManager or TableSchemaManager, TableAffinityManager), it
> is
> > > not
> > > > > self-sufficient as affinity or schema make no sense without the
> > table.
> > > > > Processor is just helper-component of the Service that routes
> > messages,
> > > > > executes async tasks, manages subscriptions and implements some
> > > secondary
> > > > > functions.
> > > > >
> > > > > On Tue, Mar 30, 2021 at 11:24 AM Alexey Goncharuk <
> > > >

Re: Terms clarification and modules splitting logic

2021-03-31 Thread Alexei Scherbakov
Alexey,

I would instead  suggest moving the public utility classes to
org.apache.ignite.api. package in the util module to separate them from
internal classes, if we really need this.

Actually, I don't think there is a point in separating public/internal
classes in the util module. What are the benefits of this ?

ср, 31 мар. 2021 г. в 12:16, Alexey Goncharuk :

> Alexei,
>
> I had the same opinion regarding the internal package, but we still need to
> somehow distinguish between public and internal classes in the ignite-util
> module. If we introduce the internal package in the util, we should follow
> the same structure in other modules.
>
> Thoughts?
>
> вт, 30 мар. 2021 г. в 18:37, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > +1 to package and module naming.
> > +1 to service definition as "component providing a high-level API to
> > user/other components/services"
> >
> > I would avoid defining strict rules for Manager and Processor.
> > For me it just adds confusion without real value.
> > A component can be a Manager if it manages something, a Processor if it
> > processes something, and so on.
> > I think having Component and Service (which is also a Component) is
> enough.
> > Any component can be singleton or not - it's defined by its lifecycle.
> >
> > +1 to renaming core to something more meaningful, but the name lang
> doesn't
> > fit for a collection of utility classes for me, I would prefer
> ignite-util.
> > Apache Tomcat has the same jar, for reference. I'm also fine to leave it
> as
> > is.
> > -1 to have an "internal" package. All modules are known to be internal
> > except api and (partially) util, so why bother at all?
> >
> >
> > вт, 30 мар. 2021 г. в 12:05, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> > >:
> >
> > > Agree with package and module naming.
> > >
> > > I just thought that
> > > Service is a self-suffucient component and provides high-level API to
> > > user/other components/services (e.g. RaftService to TableService).
> > > Manager is internal component - a logical brick of the Service (e.g.
> > > RaftGroupManager or TableSchemaManager, TableAffinityManager), it is
> not
> > > self-sufficient as affinity or schema make no sense without the table.
> > > Processor is just helper-component of the Service that routes messages,
> > > executes async tasks, manages subscriptions and implements some
> secondary
> > > functions.
> > >
> > > On Tue, Mar 30, 2021 at 11:24 AM Alexey Goncharuk <
> > > alexey.goncha...@gmail.com> wrote:
> > >
> > > > Hello Alexander, Igniters,
> > > >
> > > > I support the suggestion, we need to work out some ground rules to
> > have a
> > > > consistent naming convention. Agree with having at most one component
> > per
> > > > project module - this requirement may turn out to be too strict in
> the
> > > > future, but now it seems reasonable and may help us to better
> structure
> > > the
> > > > code. Additionally, I would encourage us to make package names
> > consistent
> > > > with the module's structure to make modules Jigsaw-compliant. We do
> not
> > > > have module definitions now, but I think it would be great to have
> > them,
> > > it
> > > > should help us to enforce component boundaries and proper
> > responsibility
> > > > encapsulation.
> > > >
> > > > As for the naming, it's not entirely clear for me when to use the
> term
> > > > Service vs Manager. Serice is an entry point to a component/server,
> but
> > > so
> > > > is Manager - a Manager defines an API that is exposed by a module to
> > > other
> > > > modules. Subjectively, I see the following difference between a
> Manager
> > > and
> > > > a Service in the examples of entities you provided:
> > > >  * A Manager is a node singleton. Its whole purpose is to provide an
> > API
> > > > gateway for other components into a particular subsystem of a node
> > > >  * A Service is an object that is bound to a particular runtime
> entity
> > > > (raft group service is bound to a raft group, and we can have
> multiple
> > > Raft
> > > > groups; partition service is bound to a particular partition). We can
> > > > re-create services based on changing runtime state and/or
> > configuration.
> > > &g

Re: Terms clarification and modules splitting logic

2021-03-30 Thread Alexei Scherbakov
> > cases
> > >will have component-public but ignite-internal API and a lifecycle,
> > > somehow
> > >related to the lifecycle of a node or cluster. So, *structurally*
> > >TableManager, SchemaManager, AffinityManager, etc are all
> components.
> > > For
> > >example, TableManager will have methods like createTable(),
> > > alterTable(),
> > >dropTable(), etc and a lifecycle that will create listeners (aka
> > >DistributedMetastorage watches) on schema and affinity updates in
> > order
> > > to
> > >create/drop raft servers for particular partitions that should be
> > > hosted on
> > >local node). Components are lined up in a graph without cycles, for
> > more
> > >details please see mentioned above Ignite cluster & node lifecycle.
> > ><
> > >
> >
> https://github.com/apache/ignite-3/blob/ignite-14393/modules/runner/README.md
> > > >
> > >- Manager is a driving point of a component with high level
> lifecycle
> > >logic and API methods. My intention here is to agree about naming:
> > > should
> > >we use the term Manager, Processor or anything else?
> > >- Service is an entry point to some component/server or a group of
> > >components/servers. See RaftGroupService.java
> > ><
> > >
> >
> https://github.com/apache/ignite-3/blob/main/modules/raft-client/src/main/java/org/apache/ignite/raft/client/service/RaftGroupService.java
> > > >
> > >as an example.
> > >- Server, for example RaftServer, seems to be self-explanatory
> itself.
> > >
> > >
> > > *Dividing code into modules.*
> > > It seems useful to introduce a restriction that a module should contain
> > at
> > > most one component. So that, combining component-specific modules and
> > ones
> > > of api, lang, etc we will end up with something like following:
> > >
> > >- affinity // TO be created.
> > >- api [public]
> > >- baseline // TO be created.
> > >- bytecode
> > >- cli
> > >- cli-common
> > >- configuration
> > >- configuration-annotation-processor
> > >- core // Module with classes like IgniteUuid. Should we raname it
> to
> > >lang/utils/commons?
> > >- metastorage-client // To be created.
> > >- metastorage-common // To be created.
> > >- metastorage-server // TO be created.
> > >- network
> > >- raft // raft-server?
> > >- raft-client
> > >- rest
> > >- runner
> > >- schema
> > >- table // Seems that there might be a conflict between the meaning
> of
> > >table module that we already have and table module with
> > > create/dropTable()
> > >- vault
> > >
> > > Also it's not quite clear to me how we should split lang and util
> classes
> > > some of which belong to the public api, and some to the private.
> > >
> > > Please share your thoughts about topics mentioned above.
> > >
> > > Best regards,
> > > Alexander
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISSCUSSION] Common logger interface.

2021-03-26 Thread Alexei Scherbakov
Andrey,

I've introduced a new class LogWrapper to fix usability issues [1]

The suggested usage is something like:

private static LogWrapper LOG = new LogWrapper(MyClass.class);

[1]
https://github.com/gridgain/apache-ignite-3/blob/9acb050a6a6a601ead849797293a1d0ad48ab9e0/modules/core/src/main/java/org/apache/ignite/lang/LogWrapper.java

пт, 26 мар. 2021 г. в 16:05, Andrey Mashenkov :

> Forgot to attach a link to the PR with an example [1].
>
> [1] https://github.com/apache/ignite-3/pull/59
>
> On Fri, Mar 26, 2021 at 4:03 PM Andrey Mashenkov <
> andrey.mashen...@gmail.com>
> wrote:
>
> > Hi Igniters,
> >
> > In almost every new task we faced the problem of what logger has to be
> > used: JUL. log4J or any else.
> >
> > Since JDK 9 there is a System.Logger which interface looks acceptable for
> > use,
> > excepts maybe some usability issues like method signatures.
> > LogLevel is passed as a mandatory argument, and no shortcut methods are
> > provided (like 'warn', 'error' or 'info').
> >
> > I like Alex Scherbakov idea [1] to use a brand new JDK system logger by
> > default and
> > extend it with shortcut methods.
> >
> > I've created a ticket to unify logger usage in Ignite-3.0 project to fix
> > already existed code.
> >
> > Any thoughts or objections?
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-70: Async Continuation Executor

2021-03-26 Thread Alexei Scherbakov
Pavel,

Dedicated pool looks safer and more manageable to me. Make sure the threads
in the pool are lazily started and stopped if not used for some time.

Because I have no more real arguments against the change, I suggest to
proceed with this approach.

чт, 25 мар. 2021 г. в 22:16, Pavel Tupitsyn :

> Alexei,
>
> > we already have ways to control a listener's behavior
> No, we don't have a way to fix current broken and dangerous behavior
> globally.
> You should not expect the user to fix every async call manually.
>
> > commonPool can alter existing deployments in unpredictable ways,
> > if commonPool is heavily used for other purposes
> Common pool resizes dynamically to accommodate the load [1]
> What do you think about Stan's suggestion to use our public pool instead?
>
> [1]
>
> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
>
> On Thu, Mar 25, 2021 at 10:10 PM Pavel Tupitsyn 
> wrote:
>
> > > I don't agree that the code isn't related to Ignite - it is something
> > that the user does via Ignite API
> >
> > This is a misconception. When you write general-purpose async code, it
> > looks like this:
> >
> > myClass.fooAsync()
> > .chain(igniteCache.putAsync)
> > .chain(myClass.barAsync)
> > .chain(...)
> >
> > And so on, you jump from one continuation to another.
> > You don't think about this as "I use Ignite API to run my continuation",
> > this is just another async call among hundreds of others.
> >
> > And you don't want 1 of 20 libraries that you use to have "special needs"
> > like Ignite does right now.
> >
> > I know Java is late to the async party and not everyone is used to this
> > mindset,
> > but the situation changes, more and more code bases go async all the way,
> > use CompletionStage everywhere, etc.
> >
> >
> > > If we go with the public pool - no additional options needed.
> >
> > I guess public pool should work.
> > However, I would prefer to keep using commonPool, which is recommended
> for
> > a general purpose like this.
> >
> > On Thu, Mar 25, 2021 at 3:56 PM Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> >
> >> Pavel,
> >>
> >> The change still looks a bit risky to me, because the default executor
> is
> >> set to commonPool and can alter existing deployments in unpredictable
> >> ways,
> >> if commonPool is heavily used for other purposes.
> >>
> >> Runnable::run usage is not obvious as well and should be properly
> >> documented as a way to return to old behavior.
> >>
> >> I'm not sure we need it in 2.X for the reasons above - we already have
> >> ways
> >> to control a listener's behavior - it's a matter of good documentation
> to
> >> me.
> >>
> >>
> >>
> >>
> >>
> >>
> >> чт, 25 мар. 2021 г. в 15:33, Pavel Tupitsyn :
> >>
> >> > Alexei,
> >> >
> >> > > Sometimes it's more desirable to execute the listener in the same
> >> thread
> >> > > It's up to the user to decide.
> >> >
> >> > Yes, we give users a choice to configure the executor as Runnable::run
> >> and
> >> > use the same thread if needed.
> >> > However, it should not be the default behavior as explained above (bad
> >> > usability, unexpected major issues).
> >> >
> >> > On Thu, Mar 25, 2021 at 3:06 PM Alexei Scherbakov <
> >> > alexey.scherbak...@gmail.com> wrote:
> >> >
> >> > > Pavel,
> >> > >
> >> > > While I understand the issue and overall agree with you, I'm against
> >> the
> >> > > execution of listeners in separate thread pool by default.
> >> > >
> >> > > Sometimes it's more desirable to execute the listener in the same
> >> thread,
> >> > > for example if it's some lightweight closure.
> >> > >
> >> > > It's up to the user to decide.
> >> > >
> >> > > I think the IgniteFuture.listen method should be properly documented
> >> to
> >> > > avoid execution of cluster operations or any other potentially
> >> blocking
> >> > > operations inside the listener.
> >> > >
> >> > > Otherwise listenAsync should be used.
> >> > >
> >> > >
> >> > >
> >> > > чт, 

Re: [DISCUSSION] IgniteFuture class future in Ignite-3.0.

2021-03-25 Thread Alexei Scherbakov
I think both options are fine, but personally lean toward CompletableFuture.

чт, 25 мар. 2021 г. в 17:56, Atri Sharma :

> I would suggest using CompletableFuture -- I don't see a need for a custom
> interface that is unique to us.
>
> It also allows a lower barrier for new contributors for understanding
> existing code
>
> On Thu, 25 Mar 2021, 20:18 Andrey Mashenkov, 
> wrote:
>
> > Hi Igniters,
> >
> > I'd like to start a discussion about replacing our custom IgniteFuture
> > class with CompletableFuture - existed JDK class
> > or rework it's implementation (like some other products done) to a
> > composition of CompletionStage and Future interfaces.
> > or maybe other option if you have any ideas. Do you?
> >
> > 1. The first approach pros and cons are
> > + Well-known JDK class
> > + Already implemented
> > - It is a class, not an interface.
> > - Expose some potentially harmful methods like "complete()".
> >
> > On the other side, it has copy() method to create defensive copy and
> > minimalCompletionStage() to restrict harmful method usage.
> > Thus, this look like an applicable solution, but we should be careful
> > exposing internal future to the outside.
> >
> > 2. The second approach is to implement our own interface like the next
> one:
> >
> > interface IgniteFuture extends CompletableStage, Future {
> > }
> >
> > Pros and cons are
> > + Our interfaces/classes contracts will expose an interface rather than
> > concrete implementation.
> > + All methods are safe.
> > - Some implementation is required.
> > - CompletableStage has a method toCompletableFuture() and can be
> converted
> > to CompletableFuture. This should be supported.
> >
> > However, we still could wrap CompletableFuture and don't bother about
> > creating a defensive copy.
> >
> >
> > Other project experience:
> > * Spotify uses CompletableFuture directly [1].
> > * Redis goes the second approach [2]
> > * Vertx explicitly extends CompletableFuture [3]. However, they have
> custom
> > future classes and a number of helpers that could be replaced with
> > CompletableStage. Maybe it is just a legacy.'
> >
> > Any thoughts?
> >
> > [1]
> >
> >
> https://spotify.github.io/completable-futures/apidocs/com/spotify/futures/ConcurrencyReducer.html
> > [2]
> >
> >
> https://lettuce.io/lettuce-4/release/api/com/lambdaworks/redis/RedisFuture.html
> > [3]
> >
> >
> https://javadoc.io/static/org.jspare.vertx/vertx-jspare/1.1.0-M03/org/jspare/vertx/concurrent/VertxCompletableFuture.html
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-70: Async Continuation Executor

2021-03-25 Thread Alexei Scherbakov
Pavel,

The change still looks a bit risky to me, because the default executor is
set to commonPool and can alter existing deployments in unpredictable ways,
if commonPool is heavily used for other purposes.

Runnable::run usage is not obvious as well and should be properly
documented as a way to return to old behavior.

I'm not sure we need it in 2.X for the reasons above - we already have ways
to control a listener's behavior - it's a matter of good documentation to
me.






чт, 25 мар. 2021 г. в 15:33, Pavel Tupitsyn :

> Alexei,
>
> > Sometimes it's more desirable to execute the listener in the same thread
> > It's up to the user to decide.
>
> Yes, we give users a choice to configure the executor as Runnable::run and
> use the same thread if needed.
> However, it should not be the default behavior as explained above (bad
> usability, unexpected major issues).
>
> On Thu, Mar 25, 2021 at 3:06 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > Pavel,
> >
> > While I understand the issue and overall agree with you, I'm against the
> > execution of listeners in separate thread pool by default.
> >
> > Sometimes it's more desirable to execute the listener in the same thread,
> > for example if it's some lightweight closure.
> >
> > It's up to the user to decide.
> >
> > I think the IgniteFuture.listen method should be properly documented to
> > avoid execution of cluster operations or any other potentially blocking
> > operations inside the listener.
> >
> > Otherwise listenAsync should be used.
> >
> >
> >
> > чт, 25 мар. 2021 г. в 14:04, Pavel Tupitsyn :
> >
> > > Stan,
> > >
> > > We have thread pools dedicated for specific purposes, like cache
> > (striped),
> > > compute (pub), query, etc
> > > As I understand it, the reason here is to limit the number of threads
> > > dedicated to a given subsystem.
> > > For example, Compute may be overloaded with work, but Cache and
> Discovery
> > > will keep going.
> > >
> > > This is different from async continuations, which are arbitrary user
> > code.
> > > So what is the benefit of having a new user pool for arbitrary code
> that
> > is
> > > probably not related to Ignite at all?
> > >
> > > On Thu, Mar 25, 2021 at 1:31 PM  wrote:
> > >
> > > > Pavel,
> > > >
> > > > This is a great work, fully agree with the overall idea and approach.
> > > >
> > > > However, I have some reservations about the API. We sure do have a
> lot
> > of
> > > > async stuff in the system, and I would suggest to stick to the usual
> > > design
> > > > - create a separate thread pool, add a single property to control the
> > > size
> > > > of the pool.
> > > > Alternatively, we may consider using public pool for that. May I ask
> if
> > > > there is an example use case which doesn’t work with public pool?
> > > >
> > > > For .NET, agree that we should follow the rules and APIs of the
> > platform,
> > > > so the behavior might slightly differ.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > > On 24 Mar 2021, at 09:52, Pavel Tupitsyn 
> > wrote:
> > > > >
> > > > > Igniters, since there are no more comments and/or review feedback,
> > > > > I'm going to merge the changes by the end of the week.
> > > > >
> > > > >> On Mon, Mar 22, 2021 at 10:37 PM Pavel Tupitsyn <
> > ptupit...@apache.org
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> Ready for review:
> > > > >> https://github.com/apache/ignite/pull/8870
> > > > >>
> > > > >> On Sun, Mar 21, 2021 at 8:09 PM Pavel Tupitsyn <
> > ptupit...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Simple benchmark added - see JmhCacheAsyncListenBenchmark in the
> > PR.
> > > > >>> There is a 6-8% drop (1 client, 2 servers, 1 machine, int
> key/val).
> > > > >>> I expect this difference to become barely observable on
> real-world
> > > > >>> workloads.
> > > > >>>
> > > > >>> On Thu, Mar 18, 2021 at 12:35 PM Pavel Tupitsyn <
> > > ptupit...@apache.org>
> > > > >>> wrote:
> > > > >>>
> > >

Re: [DISCUSSION] Patch completely breaks MVCC.

2021-03-25 Thread Alexei Scherbakov
Maksim,

It seems to me from the description "Patch completely breaks MVCC" the
proposed patch should be postponed until at least the public API for
MVCC will be removed.

Or can you clarify the impact of the patch ? Does the existing MVCC
functionality will remain unbroken ?


чт, 25 мар. 2021 г. в 14:52, Andrey Mashenkov :

> Hi Maksim,
>
> Do you mean MVCC will not work at all or MVCC will not support indices
> after your changes?
> Anyway, this looks like a major change and may be too harmful for the minor
> version (10.1).
>
> Before break MVCC index (or MVCC mode) we should force the user first to
> drop all MVCC indices (or even MVCC caches) before switching to the version
> with a fix.
> The migration process should be well-documented as well.
>
> I believe a user should be able to migrate to the new Ignite version with
> exited persistence with no issues. E.g.
> * Ignite shouldn't start if existed persistence has a MVCC index (cache)
> and maybe other internal persistent MVCC structures.
> * Even if the user dropped all MVCC indices/caches before the upgrade,
> probably there can be an incomplete checkpoint and there are WAL records
> related to MVCC in WAL that should be correctly processed.
>
>
>
>
> On Thu, Mar 25, 2021 at 1:27 PM Maksim Timonin 
> wrote:
>
> > Hi, Igniters!
> >
> > the MVCC feature marked as IgniteExperimental and this annotation is more
> > weaker than deprecated. So we can remove this functionality in any
> moment.
> > So I propose:
> > 1. Now I leave all affected tests marked as ignored.
> > 2. Create a ticket for removing TRANSACTIONAL_SNAPSHOT from
> > CacheAtomicityMode for a future minor release 10.1.
> > 3. There is a ticket for removing all MVCC code [1]. So we can finish it
> in
> > any release for future.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-13871
> >
> > WDYT?
> >
> >
> > On Mon, Mar 15, 2021 at 9:58 PM Maksim Timonin 
> > wrote:
> >
> > > Hi, Igniters!
> > >
> > > I'm working on a feature (moving indexes to the core module) and skip
> > > specific implementation for MVCC as it is considered deprecated (the
> vote
> > > result [1]). Am I right that now there is no need to support MVCC? Then
> > > there are a lot of tests (both Java, C++) that fail because they run
> with
> > > TRANSACTIONAL_SNAPSHOT atomicity mode.
> > >
> > > There are 2 cases:
> > > 1. MVCC mode is just a parameter of a test. I just removed it from a
> > > parameters list;
> > > 2. There are tests that run only for MVCC. I marked them with the
> @Ignore
> > > annotation.
> > >
> > > But would it better just completely remove all such tests that are
> broken
> > > by the patch?
> > >
> > > [1]
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/RESULT-VOTE-Removing-MVCC-public-API-td50705.html#a50706
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-70: Async Continuation Executor

2021-03-25 Thread Alexei Scherbakov
gt; slava.kopti...@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Pavel,
> > >>>>>>>>>>
> > >>>>>>>>>> Well, I think that the user should use the right API instead
> > >>>>> of
> > >>>>>>>>> introducing
> > >>>>>>>>>> uncontested overhead for everyone.
> > >>>>>>>>>> For instance, the code that is provided by IEP can changed as
> > >>>>>>> follows:
> > >>>>>>>>>>
> > >>>>>>>>>> IgniteFuture fut = cache.putAsync(1, 1);
> > >>>>>>>>>> fut.listenAync(f -> {
> > >>>>>>>>>>// Executes on Striped pool and deadlocks.
> > >>>>>>>>>>cache.replace(1, 2);
> > >>>>>>>>>> }, ForkJoinPool.commonPool());
> > >>>>>>>>>>
> > >>>>>>>>>> Of course, it does not mean that this fact should not be
> > >>>>> properly
> > >>>>>>>>>> documented.
> > >>>>>>>>>> Perhaps, I am missing something.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> S.
> > >>>>>>>>>>
> > >>>>>>>>>> ср, 17 мар. 2021 г. в 16:01, Pavel Tupitsyn <
> > >>>>> ptupit...@apache.org
> > >>>>>>> :
> > >>>>>>>>>>
> > >>>>>>>>>>> Slava,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Your suggestion is to keep things as is and discard the IEP,
> > >>>>>> right?
> > >>>>>>>>>>>
> > >>>>>>>>>>>> this can lead to significant overhead
> > >>>>>>>>>>> Yes, there is some overhead, but the cost of accidentally
> > >>>>>> starving
> > >>>>>>>> the
> > >>>>>>>>>>> striped pool is worse,
> > >>>>>>>>>>> not to mention the deadlocks.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I believe that we should favor correctness over performance
> > >>>>> in
> > >>>>>> any
> > >>>>>>>>> case.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Mar 17, 2021 at 3:34 PM Вячеслав Коптилин <
> > >>>>>>>>>>> slava.kopti...@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Well, the specified method already exists :)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>/**
> > >>>>>>>>>>>> * Registers listener closure to be asynchronously
> > >>>>> notified
> > >>>>>>>>>> whenever
> > >>>>>>>>>>>> future completes.
> > >>>>>>>>>>>> * Closure will be processed in specified executor.
> > >>>>>>>>>>>> *
> > >>>>>>>>>>>> * @param lsnr Listener closure to register. Cannot be
> > >>>>>> {@code
> > >>>>>>>>>> null}.
> > >>>>>>>>>>>> * @param exec Executor to run listener. Cannot be
> > >>>>> {@code
> > >>>>>>>> null}.
> > >>>>>>>>>>>> */
> > >>>>>>>>>>>>public void listenAsync(IgniteInClosure > >>>>>>>> IgniteFuture>
> > >>>>>>>>>>> lsnr,
> > >>>>>>>>>>>> Executor exec);
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> S.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ср, 17 мар. 2021 г. в 15:25, Вячеслав Коптилин <
> > >>>>>>>>>> slava.kopti...@gmail.com
> > >>>>>>>>>>>> :
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hello Pavel,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I took a look at your IEP and pool request. I have the
> > >>>>>>> following
> > >>>>>>>>>>>> concerns.
> > >>>>>>>>>>>>> First of all, this change breaks the contract of
> > >>>>>>>>>>>> IgniteFuture#listen(lsnr)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>/**
> > >>>>>>>>>>>>> * Registers listener closure to be asynchronously
> > >>>>>> notified
> > >>>>>>>>>>> whenever
> > >>>>>>>>>>>>> future completes.
> > >>>>>>>>>>>>> * Closure will be processed in thread that
> > >>>>> completes
> > >>>>>> this
> > >>>>>>>>> future
> > >>>>>>>>>>> or
> > >>>>>>>>>>>>> (if future already
> > >>>>>>>>>>>>> * completed) immediately in current thread.
> > >>>>>>>>>>>>> *
> > >>>>>>>>>>>>> * @param lsnr Listener closure to register. Cannot
> > >>>>> be
> > >>>>>>> {@code
> > >>>>>>>>>>> null}.
> > >>>>>>>>>>>>> */
> > >>>>>>>>>>>>>public void listen(IgniteInClosure > >>>>>> IgniteFuture>
> > >>>>>>>>>> lsnr);
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>In your pull request, the listener is always called
> > >>>>> from
> > >>>>>> a
> > >>>>>>>>>>> specified
> > >>>>>>>>>>>>> thread pool (which is fork-join by default)
> > >>>>>>>>>>>>>even though the future is already completed at the
> > >>>>> moment
> > >>>>>>> the
> > >>>>>>>>>>> listen
> > >>>>>>>>>>>>> method is called.
> > >>>>>>>>>>>>>In my opinion, this can lead to significant
> > >>>>> overhead -
> > >>>>>>>>> submission
> > >>>>>>>>>>>>> requires acquiring a lock and notifying a pool thread.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>It seems to me, that we should not change the
> > >>>>> current
> > >>>>>>>> behavior.
> > >>>>>>>>>>>>> However, thread pool executor can be added as an
> > >>>>> optional
> > >>>>>>>> parameter
> > >>>>>>>>>> of
> > >>>>>>>>>>>>> listen() method as follows:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>public void listen(IgniteInClosure > >>>>>>>> IgniteFuture>
> > >>>>>>>>>>> lsnr,
> > >>>>>>>>>>>>> Executor exec);
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> S.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> пн, 15 мар. 2021 г. в 19:24, Pavel Tupitsyn <
> > >>>>>>>> ptupit...@apache.org
> > >>>>>>>>>> :
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Igniters,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Please review the IEP [1] and let me know your
> > >>>>> thoughts.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-70%3A+Async+Continuation+Executor
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> <http://www.trimble.com/>
> > >>>>> Raymond Wilson
> > >>>>> Solution Architect, Civil Construction Software Systems (CCSS)
> > >>>>> 11 Birmingham Drive | Christchurch, New Zealand
> > >>>>> raymond_wil...@trimble.com
> > >>>>>
> > >>>>> <
> > >>>>>
> >
> https://worksos.trimble.com/?utm_source=Trimble_medium=emailsign_campaign=Launch
> > >>>>>>
> > >>>>>
> > >>>>
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-61 Technical discussion

2021-03-20 Thread Alexei Scherbakov
gt; find each other (discovery) and how they detect
> > failures.
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> I suppose, that gossip protocol is an ideal
> candidate.
> > > For
> > > > > > >>> example,
> > > > > > >>> >> > >> consul [1] uses this approach, using serf [2] library
> > to
> > > > > > discover
> > > > > > >>> >> > members
> > > > > > >>> >> > >> of cluster.
> > > > > > >>> >> > >> Then consul forms raft ensemble (server nodes) and
> > client
> > > > use
> > > > > > >>> raft
> > > > > > >>> >> > >> ensemble only as lock service.
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> PacificA suggests internal heartbeats mechanism for
> > > failure
> > > > > > >>> >> detection of
> > > > > > >>> >> > >> replicated group, but it says nothing about initial
> > > > discovery
> > > > > > of
> > > > > > >>> >> nodes.
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> WDYT?
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> [1] --
> https://www.consul.io/docs/architecture/gossip
> > > > > > >>> >> > >> [2] -- https://www.serf.io/
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> чт, 19 нояб. 2020 г. в 12:46, Alexey Goncharuk <
> > > > > > >>> >> > >> alexey.goncha...@gmail.com>:
> > > > > > >>> >> > >>
> > > > > > >>> >> > >>> Following up the Ignite 3.0 scope/development
> approach
> > > > > > threads,
> > > > > > >>> >> this is
> > > > > > >>> >> > >>> a separate thread to discuss technical aspects of
> the
> > > IEP.
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>> Let's reiterate one more time on the questions
> raised
> > by
> > > > > Ivan
> > > > > > >>> and
> > > > > > >>> >> also
> > > > > > >>> >> > >>> see if there are any other thoughts on the IEP:
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>>- *Whether to deploy metastorage on a separate
> > subset
> > > > of
> > > > > > the
> > > > > > >>> >> nodes
> > > > > > >>> >> > >>>or allow Ignite to choose these nodes
> > > automatically.* I
> > > > > > >>> think it
> > > > > > >>> >> is
> > > > > > >>> >> > >>>feasible to maintain both modes: by default,
> Ignite
> > > > will
> > > > > > >>> choose
> > > > > > >>> >> > >>>metastorage nodes automatically which essentially
> > > will
> > > > > > >>> provide
> > > > > > >>> >> the
> > > > > > >>> >> > same
> > > > > > >>> >> > >>>seamless user experience as TCP discovery SPI -
> no
> > > > > separate
> > > > > > >>> >> roles,
> > > > > > >>> >> > >>>simplistic deployment. For deployments where
> people
> > > > want
> > > > > to
> > > > > > >>> have
> > > > > > >>> >> > more
> > > > > > >>> >> > >>>fine-grained control over the nodes' assignments,
> > we
> > > > will
> > > > > > >>> >> provide a
> > > > > > >>> >> > runtime
> > > > > > >>> >> > >>>configuration which will allow pinning
> metastorage
> > > > group
> > > > > to
> > > > > > >>> >> certain
> > > > > > >>> >> > nodes,
> > > > > > >>> >> > >>>thus eliminating the latency concerns.
> > > > > > >>> >> > >>>- *Whether there are any TLA+ specs for the
> > PacificA
> > > > > > >>> protocol.*
> > > > > > >>> >> Not
> > > > > > >>> >> > >>>to my knowledge, but it is known to be used in
> > > > production
> > > > > > by
> > > > > > >>> >> > Microsoft and
> > > > > > >>> >> > >>>other projects, e.g. [1]
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>> I would like to collect general feedback on the IEP,
> > as
> > > > well
> > > > > > as
> > > > > > >>> >> > feedback
> > > > > > >>> >> > >>> on specific parts of it, such as:
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>>- Metastorage API
> > > > > > >>> >> > >>>- Any existing library that can be used to avoid
> > > > > > >>> re-implementing
> > > > > > >>> >> the
> > > > > > >>> >> > >>>protocol ourselves? Perhaps, porting the existing
> > > > > > >>> implementation
> > > > > > >>> >> to
> > > > > > >>> >> > Java
> > > > > > >>> >> > >>>(the way TiKV did with etcd-raft [2] [3]? This
> is a
> > > > very
> > > > > > >>> neat way
> > > > > > >>> >> > btw in my
> > > > > > >>> >> > >>>opinion because I like the finite automata-like
> > > > approach
> > > > > of
> > > > > > >>> the
> > > > > > >>> >> > replication
> > > > > > >>> >> > >>>module, and, additionally, we could sync bug
> fixes
> > > and
> > > > > > >>> >> improvements
> > > > > > >>> >> > from
> > > > > > >>> >> > >>>the upstream project)
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>> Thanks,
> > > > > > >>> >> > >>> --AG
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>> [1]
> > > > > > >>> >> > >>>
> > > > > > >>> >>
> > > > > >
> > > https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal
> > > > > > >>> >> > >>> [2]
> https://github.com/etcd-io/etcd/tree/master/raft
> > > > > > >>> >> > >>> [3] https://github.com/tikv/raft-rs
> > > > > > >>> >> > >>>
> > > > > > >>> >> > >>
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> --
> > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
> > > > > > >>> >> > >>
> > > > > > >>> >> > >>
> > > > > > >>> >> > >> --
> > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
> > > > > > >>> >> > >>
> > > > > > >>> >> > >
> > > > > > >>> >> > >
> > > > > > >>> >> > > --
> > > > > > >>> >> > > Sincerely yours, Ivan Daschinskiy
> > > > > > >>> >> > >
> > > > > > >>> >> >
> > > > > > >>> >> >
> > > > > > >>> >> > --
> > > > > > >>> >> > Sincerely yours, Ivan Daschinskiy
> > > > > > >>> >> >
> > > > > > >>> >>
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > --
> > > > > > >>> > Sincerely yours, Ivan Daschinskiy
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Sincerely yours, Ivan Daschinskiy
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] IEP-69 The evolutionary release process

2021-03-09 Thread Alexei Scherbakov
t; > 3. I don't see why there would be implicit PDS compatibility
> between
> > > any
> > > > > > X.0.0 and Y.0.0, X != Y.
> > > > > > 4. I think this is a sensible approach.
> > > > > > 5. Since ignite-3 seems to be a separate repo ATM, I don't see
> why
> > > it is
> > > > > > applicable.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > > >
> > > > > >
> > > > > > пт, 5 мар. 2021 г. в 22:09, Maxim Muzafarov :
> > > > > >
> > > > > > > Ignites,
> > > > > > >
> > > > > > >
> > > > > > > I've created the IEP-69 [1] which describes the evolutionary
> > > release
> > > > > > > process for the Apache Ignite 2.x version. You can find all the
> > > > > > > details of my suggestion there, but here you can find the
> crucial
> > > > > > > points:
> > > > > > >
> > > > > > > 0. Versioning - grand.major.bug-fix[-rc_number]
> > > > > > >
> > > > > > > 1. Prepare the next 3.0 release based on 2.x with some breaking
> > > > > > > compatibility changes. The same things happen from time to time
> > > with
> > > > > > > other Apache projects like Hadoop, Spark.
> > > > > > >
> > > > > > > 2. Discuss with the whole Community and assign the right
> release
> > > > > > > version to the activities related to the development of the new
> > > Ignite
> > > > > > > architecture (currently all the changes you can find in the
> > > ignite-3
> > > > > > > branch).
> > > > > > > I see no 3.0 discussions on the dev-list and I see no-activity
> with
> > > > > > > the 3.0 version currently. So,  it's better to remove the
> `lock`
> > > from
> > > > > > > the 3.0 version and allow the removal of obsolete features.
> > > > > > >
> > > > > > > 3. Guarantee the PDS compatibility between the `grand`
> versions of
> > > the
> > > > > > > Apache Ignite for the next year.
> > > > > > >
> > > > > > > 4. Guarantee the bug-fix release for the last 2.x Apache Ignite
> > > > > > > version for the next year.
> > > > > > >
> > > > > > > 5. Perform some improvements which break the backward
> > > compatibility,
> > > > > > > for instance: removing @deprecated API (except metrics),
> removing
> > > > > > > obsolete modules, changing the cluster defaults. You can find
> > > > > > > additional details on the IEP-69 page [1].
> > > > > > >
> > > > > > >
> > > > > > > Please, share your thoughts.
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-69%3A+The+evolutionary+release+process
> > > > > > >
> > > > > >
> > >
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Java 11 for Ignite 3.0 development

2020-12-08 Thread Alexei Scherbakov
I think we should move forward, so java11 seems like a proper choice for 3.

ср, 9 дек. 2020 г. в 10:17, Ivan Bessonov :

> This is an awesome idea.
>
> Honestly, I can't come up with strong technical arguments for Java 11 as a
> source level, I had no chance to work with it long enough, but it feels
> like a
> proper time to move to a "modern" technology. Subjectively I can say that
> Java 11 has a lot of good optimization and Ignite should run better on it.
> So
> it makes no sense to compile for 8 but recommend 11, you know.
>
> вт, 8 дек. 2020 г. в 21:16, Данилов Семён :
>
> > +1 for sure. AFAIK, the only thing holding us back from using Java 11 is
> > the dominance of Java 8, but I'm sure that by the time Ignite 3 is GA,
> > there will be much fewer Java 8 users if any significant number at all.
> By
> > the by, Ignite's sources need minimal effort to be able to be compiled
> with
> > Java 11 as a target.
> >
> > 08.12.2020, 15:00, "Nikolay Izhikov" :
> > > +1 for using java 11.
> > >
> > >>  8 дек. 2020 г., в 13:18, ткаленко кирилл 
> > написал(а):
> > >>
> > >>  +1
> > >>
> > >>  08.12.2020, 12:48, "Philipp Masharov" :
> > >>>  Hello!
> > >>>
> > >>>  Andrey's arguments are solid.
> > >>>
> > >>>  On Tue, Dec 8, 2020 at 12:23 PM Pavel Tupitsyn <
> ptupit...@apache.org>
> > wrote:
> > >>>
> > >>>>   +1, Java 11 seems to be the only right choice at the moment.
> > >>>>
> > >>>>   On Tue, Dec 8, 2020 at 12:08 PM Alexey Zinoviev <
> > zaleslaw@gmail.com>
> > >>>>   wrote:
> > >>>>
> > >>>>   > I totally support Java 11 for development. It's time to go
> forward
> > >>>>   >
> > >>>>   > вт, 8 дек. 2020 г. в 11:40, Andrey Gura :
> > >>>>   >
> > >>>>   > > Igniters,
> > >>>>   > >
> > >>>>   > > We already had some discussion about using modern Java
> versions
> > for
> > >>>>   > > Ignite 3.0 development [1] but we still don't have consensus.
> > >>>>   > > As I see from this discussion the strongest argument for Java
> > 11 is
> > >>>>   > > the fact that Java 11 is the latest LTS release which will
> have
> > >>>>   > > premier support until September 2023. So I don't see any
> reason
> > for
> > >>>>   > > preferring any other version of Java at this moment.
> > >>>>   > >
> > >>>>   > > The purpose of this thread is to gather opinions about using
> > Java 11
> > >>>>   > > in the Ignite 3.0 project and, eventually, reach a consensus
> on
> > this.
> > >>>>   > >
> > >>>>   > > I want to share my several arguments in favor of abandoning
> > Java 8 and
> > >>>>   > > preferring Java 11:
> > >>>>   > >
> > >>>>   > > * Java 8 has gone through the End of Public Updates process
> for
> > legacy
> > >>>>   > > releases. So it doesn't make sense to start new development on
> > Java 8.
> > >>>>   > >
> > >>>>   > > * Java 9+ brings Jigsaw modularization which allows us to have
> > more
> > >>>>   > > fine-grained structure of Ignite modules and APIs in the
> future.
> > >>>>   > >
> > >>>>   > > * Ignite actively uses Unsafe functionality which, firstly,
> > isn't
> > >>>>   > > public, and secondly, leads to problems with running Ignite
> > under Java
> > >>>>   > > 9+ (modularization which requires dozens of command-line
> > options in
> > >>>>   > > order to forcibly export corresponding packages) and GraalVM.
> > Such a
> > >>>>   > > situation could be described as bad user experience and we
> > should fix
> > >>>>   > > it. Var handles [2] could be used for solving described
> > problems.
> > >>>>   > >
> > >>>>   > > * Java 9+ introduces Flight Recorder API [3] which could be
> > used in
> > >>>>   > > the Ignite project for lightweight profiling of internal
> > processes.
> > >>>>   > >
> > >>>>   > > Please, share your opinions, objections and ideas about this
> > topic. I
> > >>>>   > > hope we will not have serious disagreements and the consensus
> > will be
> > >>>>   > > reached quickly.
> > >>>>   > >
> > >>>>   > >
> > >>>>   > > 1.
> > >>>>   > >
> > >>>>   >
> > >>>>
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Ignite-3-0-development-approach-tp49922p50295.html
> > >>>>   > > 2.
> > >>>>   > >
> > >>>>   >
> > >>>>
> >
> https://docs.oracle.com/javase/9/docs/api/java/lang/invoke/VarHandle.html
> > >>>>   > > 3.
> > >>>>   > >
> > >>>>   >
> > >>>>
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/jdk.jfr/jdk/jfr/FlightRecorder.html
> > >>>>   > >
> > >>>>   >
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 

Best regards,
Alexei Scherbakov


Re: Removing MVCC public API

2020-12-08 Thread Alexei Scherbakov
+1

ср, 9 дек. 2020 г. в 10:03, Petr Ivanov :

> +1
>
>
> > On 9 Dec 2020, at 09:39, Nikita Amelchev  wrote:
> >
> > +1
> >
> > ср, 9 дек. 2020 г. в 08:29, ткаленко кирилл :
> >>
> >> +1
> >>
> >>
> >> 08.12.2020, 23:47, "Andrey Mashenkov" :
> >>> +1
> >>>
> >>> On Tue, Dec 8, 2020 at 11:22 PM Igor Seliverstov  >
> >>> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> 08.12.2020 22:38, Andrey Gura пишет:
> >>>>> +1
> >>>>>
> >>>>> On Tue, Dec 8, 2020 at 10:02 PM Nikolay Izhikov  >
> >>>> wrote:
> >>>>>> +1
> >>>>>>
> >>>>>>> 8 дек. 2020 г., в 21:54, Valentin Kulichenko <
> >>>> valentin.kuliche...@gmail.com> написал(а):
> >>>>>>>
> >>>>>>> +1
> >>>>>>>
> >>>>>>> On Tue, Dec 8, 2020 at 8:31 AM Вячеслав Коптилин <
> >>>> slava.kopti...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello Igniters,
> >>>>>>>>
> >>>>>>>> I want to start voting on removing the public API (and eventually
> all
> >>>>>>>> unused parts) related to the MVCC feature.
> >>>>>>>>
> >>>>>>>> This topic has already been discussed many times (at least, [1],
> [2])
> >>>> and
> >>>>>>>> the community has agreed the feature implementation must be
> >>>> reapproached,
> >>>>>>>> because using coordinator node for transactions ordering and 2pc
> >>>> protocol
> >>>>>>>> is slow by design and will not scale well. [3]
> >>>>>>>>
> >>>>>>>> Moreover, the current implementation has critical issues [4], not
> >>>> supported
> >>>>>>>> by the community, and not well tested at all.
> >>>>>>>>
> >>>>>>>> Removing the public API first will allow us to clean up the code
> >>>> later step
> >>>>>>>> by step without rushing and keep intact useful improvements that
> are
> >>>>>>>> already in use or can be reused for other parts in the future.
> >>>>>>>> For instance, partition counters implementation is already
> adapted to
> >>>> fix
> >>>>>>>> tx caches protocol issues [5].
> >>>>>>>>
> >>>>>>>> The future of MVCC is unclear for now, but, definitely, this
> feature
> >>>> is
> >>>>>>>> useful for a lot of user scenarios and can be scheduled for later
> >>>> Ignite
> >>>>>>>> versions.
> >>>>>>>> Also, the MVCC feature is in an experimental state, so it can be
> >>>> modified
> >>>>>>>> in any way, I think.
> >>>>>>>>
> >>>>>>>> +1 - to accept removing MVVC feature from public API
> >>>>>>>> 0 - don't care either way
> >>>>>>>> -1 - do not accept removing API (explain why)
> >>>>>>>>
> >>>>>>>> The vote will hold for 7 days and will end on Wednesday, December
> >>>> 16th at
> >>>>>>>> 19:00 UTC:
> >>>>>>>>
> >>>>>>>>
> >>>>
> https://www.timeanddate.com/countdown/generic?iso=20201216T19=1440=cursive
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Mark-MVCC-with-IgniteExperimental-td45669.html
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>>>
> >>>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Disable-MVCC-test-suites-td50416.html
> >>>>>>>> [3]
> >>>>>>>>
> >>>>>>>>
> >>>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Mark-MVCC-with-IgniteExperimental-tp45669p45727.html
> >>>>>>>> [4]
> >>>>>>>>
> >>>>>>>>
> >>>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Mark-MVCC-with-IgniteExperimental-tp45669p45716.html
> >>>>>>>> [5]
> >>>>>>>>
> >>>>>>>>
> >>>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Mark-MVCC-with-IgniteExperimental-tp45669p45714.html
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Slava.
> >>>>>>>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrey V. Mashenkov
> >
> >
> >
> > --
> > Best wishes,
> > Amelchev Nikita
>
>

-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSS] Ignite 3.0 development approach

2020-11-10 Thread Alexei Scherbakov
; > > > > ptupit...@apache.org
> > > > > > > >:
> > > > > > > > >>>>>
> > > > > > > > >>>>>> 1. Rewriting from scratch is never a good idea.
> > > > > > > > >>>>>> We don't want to follow the path of Netscape and lose
> > all
> > > > our
> > > > > > > users
> > > > > > > > >>>>>> by the time we have a working 3.0 [1]
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> 2. Not sure about new repo - seems like some pain and
> no
> > > > gain,
> > > > > > > > what's
> > > > > > > > >>>> the
> > > > > > > > >>>>>> problem with a branch?
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> 3. We should keep existing integration tests when
> > > possible.
> > > > > > > > >>>>>> We have accumulated a lot of edge case knowledge over
> > the
> > > > > years,
> > > > > > > > >>>>>> it is not a good idea to send all of that down the
> > drain.
> > > > > > > > >>>>>> Yes, integration tests are slow, but they are the most
> > > > > valuable.
> > > > > > > > >>>>>> I think we can move more stuff into nightly runs and
> > have
> > > a
> > > > > fast
> > > > > > > and
> > > > > > > > >>>>> modern
> > > > > > > > >>>>>> basic suite.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Alexey, you are much more familiar with the Ignite
> core
> > > > > codebase
> > > > > > > > than
> > > > > > > > >>>>> most
> > > > > > > > >>>>>> of us,
> > > > > > > > >>>>>> can you please explain in more detail which particular
> > > > > feature,
> > > > > > in
> > > > > > > > >> your
> > > > > > > > >>>>>> opinion,
> > > > > > > > >>>>>> mandates this "start from scratch" approach?
> > > > > > > > >>>>>> Is it really not possible at all to follow a less
> > radical
> > > > way?
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> [1]
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> On Mon, Nov 2, 2020 at 2:25 PM Nikolay Izhikov <
> > > > > > > nizhi...@apache.org
> > > > > > > > >
> > > > > > > > >>>>>> wrote:
> > > > > > > > >&g

Re: Custom Affinity Functions proposed for removal?

2020-11-02 Thread Alexei Scherbakov
Thanks for the clarification.

There was no intention to remove the customizable key to partition mapping.

Difficulties arise when mapping partitions to nodes, so it's desirable to
have internally tested implementation with a way to customize it's behavior
without additional coding on the user side.

пн, 2 нояб. 2020 г. в 23:01, Raymond Wilson :

> Just to be clear, the affinity functions we are using convert keys to
> partitions, we do not map partitions to nodes and leave that to Ignite.
>
> On Tue, Nov 3, 2020 at 8:48 AM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > Hello.
> >
> > Custom affinity functions can cause weird bugs and data loss if
> implemented
> > wrongly.
> > There is an intention  to keep a backup filter based on user attributes
> > (with additional validation logic to ensure correctness) for controllable
> > data placement.
> >
> > Can you describe more precisely why you had to implement custom affinity
> > functions and not resort to default rendezvous affinity + backup filter ?
> >
> >
> > пн, 2 нояб. 2020 г. в 21:45, Raymond Wilson  >:
> >
> > > We also use custom affinity functions (vis the C# client).
> > >
> > > The wish list mentions use of a particular annotation
> > > (@CentralizedAffinityFunction):
> > > Is the wish to remove just this annotation, or the ability to define
> > custom
> > > affinity functions at all?
> > >
> > > In our case we use affinity functions to ensure particular distribution
> > of
> > > spatial data across a processing cluster to ensure even load
> management.
> > >
> > > On Tue, Nov 3, 2020 at 5:31 AM Moti Nisenson 
> > > wrote:
> > >
> > > > I saw at
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
> > > > that custom affinity functions are on the potential wishlist for
> > removal.
> > > > The way we're using it's very critical that we be able to control the
> > > > placement of data quite precisely - as part of that we specify
> > explicitly
> > > > the partition we want in the key, and then our affinity function uses
> > > that
> > > > (else delegating to default rendezvous). We don't need all the
> > > > abilities there, although I think that often others do.
> > > >
> > > > This seems to me to be a case that the benefit of removing this is
> > > minimal
> > > > and could cause quite a lot of disruption to users.
> > > >
> > > > Thanks!
> > > >
> > >
> > >
> > > --
> > > <http://www.trimble.com/>
> > > Raymond Wilson
> > > Solution Architect, Civil Construction Software Systems (CCSS)
> > > 11 Birmingham Drive | Christchurch, New Zealand
> > > +64-21-2013317 Mobile
> > > raymond_wil...@trimble.com
> > >
> > > <
> > >
> >
> https://worksos.trimble.com/?utm_source=Trimble_medium=emailsign_campaign=Launch
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
>
> <
> https://worksos.trimble.com/?utm_source=Trimble_medium=emailsign_campaign=Launch
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Custom Affinity Functions proposed for removal?

2020-11-02 Thread Alexei Scherbakov
Hello.

Custom affinity functions can cause weird bugs and data loss if implemented
wrongly.
There is an intention  to keep a backup filter based on user attributes
(with additional validation logic to ensure correctness) for controllable
data placement.

Can you describe more precisely why you had to implement custom affinity
functions and not resort to default rendezvous affinity + backup filter ?


пн, 2 нояб. 2020 г. в 21:45, Raymond Wilson :

> We also use custom affinity functions (vis the C# client).
>
> The wish list mentions use of a particular annotation
> (@CentralizedAffinityFunction):
> Is the wish to remove just this annotation, or the ability to define custom
> affinity functions at all?
>
> In our case we use affinity functions to ensure particular distribution of
> spatial data across a processing cluster to ensure even load management.
>
> On Tue, Nov 3, 2020 at 5:31 AM Moti Nisenson 
> wrote:
>
> > I saw at
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
> > that custom affinity functions are on the potential wishlist for removal.
> > The way we're using it's very critical that we be able to control the
> > placement of data quite precisely - as part of that we specify explicitly
> > the partition we want in the key, and then our affinity function uses
> that
> > (else delegating to default rendezvous). We don't need all the
> > abilities there, although I think that often others do.
> >
> > This seems to me to be a case that the benefit of removing this is
> minimal
> > and could cause quite a lot of disruption to users.
> >
> > Thanks!
> >
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
>
> <
> https://worksos.trimble.com/?utm_source=Trimble_medium=emailsign_campaign=Launch
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: IEP-52: Binary Delivery & Upgradability Enhancements

2020-09-16 Thread Alexei Scherbakov
Can you, please, describe the advantages of the proposed way from the
> > user
> > >>> perspective?
> > >>>
> > >>> How the typical DevOps pipeline should look like with this
> enhancement?
> > >>>
> > >>> How I as a user can create a fully functional installation package of
> > the
> > >>> Ignite?
> > >>> AFAIK downloading some artifacts from the internet straight to the
> > >>> production server is a security anti-pattern.
> > >>>
> > >>>
> > >>>> 28 авг. 2020 г., в 01:59, Denis Magda 
> написал(а):
> > >>>>
> > >>>> Petr, thanks,
> > >>>>
> > >>>> There is also a collection of modules located in our extensions
> > >>> repository:
> > >>>> https://github.com/apache/ignite-extensions
> > >>>>
> > >>>> @Saikat Maitra  is migrating all our
> > existing
> > >>>> integrations to that repository and, once this is done, the
> extensions
> > >>> will
> > >>>> be released to Maven separately. Is it reasonable to expand the
> scope
> > of
> > >>>> the IEP-52 and contemplate on how to install those extensions?
> > >>>>
> > >>>> -
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On Thu, Aug 27, 2020 at 3:40 PM Valentin Kulichenko <
> > >>>> valentin.kuliche...@gmail.com> wrote:
> > >>>>
> > >>>>> Hi Petr,
> > >>>>>
> > >>>>> The proposal makes sense to me - thanks for starting the
> discussion.
> > >>>>> Current installation and upgrade procedures involve a lot of manual
> > >>> steps
> > >>>>> and quite error-prone, we need to automate and simplify them as
> much
> > as
> > >>>>> possible.
> > >>>>>
> > >>>>> I believe we eventually should end up with a unified command-line
> > tool
> > >>>>> (ignitectl?) that will incorporate all the operations
> (enable/disable
> > >>>>> modules, start/stop, configuration updates, activation, baseline,
> > etc.).
> > >>>>> This IEP is a step in this direction.
> > >>>>>
> > >>>>> Looking forward to testing a prototype :)
> > >>>>>
> > >>>>> -Val
> > >>>>>
> > >>>>> On Thu, Aug 27, 2020 at 2:17 AM Petr Ivanov 
> > >>> wrote:
> > >>>>>
> > >>>>>> Hi, Igniters!
> > >>>>>>
> > >>>>>>
> > >>>>>> For Apache Ignite 3.0 I would like to propose vision of enhanced
> > >>> delivery
> > >>>>>> and upgrade processes [1].
> > >>>>>> The key idea is to make main binary as slim as possible
> (delivering
> > >>> every
> > >>>>>> optional component by demand only) while providing way to run each
> > new
> > >>>>>> version seamlessly without much of the efforts migrating data and
> > >>>>>> configuration between upgrades.
> > >>>>>>
> > >>>>>> I plan to create prototype based on current release of Apache
> Ignite
> > >>>>>> (2.8.1 or 2.9.0 if it will be released soon) later in September.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158873958
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Getting rid of NONE cache rebalance mode

2020-07-22 Thread Alexei Scherbakov
As a reminder - we already have a ticket for a deprecation of
rebalanceDelay as well [1]

[1] https://issues.apache.org/jira/browse/IGNITE-12662

ср, 22 июл. 2020 г. в 09:39, Alexei Scherbakov :

> Ivan,
> My opinion the ASYNC rebalancing is a best approach for off-loading 3-d
> party store, and it provides consistency.
>
> +1 for deprecation of NONE in the next release - ignore NONE and use ASYNC
> instead
> For those who require absence of rebalancing for some reason still be
> possible to use rebalanceDelay=infinity.
>
> +1 for removal of rebalanceMode in 3.0.
> Note what we still require SYNC logic internally for system cache in some
> places.
>
>
>
> вт, 21 июл. 2020 г. в 15:59, Ivan Pavlukhin :
>
>> Alexey,
>>
>> Thank you for explanation. I feel that I miss a couple bits to
>> understand the picture fully. I am thinking about a case which I tend
>> to call a Memcached use-case. There is a cache over underlying storage
>> with read-through and expiration and without any rebalancing at all.
>> When new nodes enter they take ownership for some partitions from
>> already running nodes and serve client requests. Entries for not
>> owning anymore partitions expire according to configuration.
>>
>> Actually, I have an idea. My guess is that "rebalancing" is a smarter
>> and better approach than waiting for expiration. Am I right?
>>
>> 2020-07-21 15:31 GMT+03:00, Alexey Goncharuk > >:
>> > Ivan,
>> >
>> > In my understanding this mode does not work at all even in the presence
>> of
>> > ForceKeysRequest which is now supposed to fetch values from peers in
>> case
>> > of a miss. In this mode we 1) move partitions to OWNING state
>> > unconditionally, and 2) choose an arbitrary OWNING node for force keys
>> > request. Therefore, after a user started two additional nodes in a
>> cluster,
>> > the request may be mapped to a node which does not hold any data. We
>> will
>> > do a read-through in this case, but it will result in significant load
>> > increase on a third-party storage right after a node started, which
>> means
>> > that adding a node will increase, not decrease, the load on the database
>> > being cached.
>> > All these issues go away when (A)SYNC mode is used.
>> >
>> > Val,
>> > The idea makes sense to me - a user can use rebalance future to wait for
>> > rebalance to finish. This will simplify the configuration even further.
>> >
>> > пн, 20 июл. 2020 г. в 21:27, Valentin Kulichenko <
>> > valentin.kuliche...@gmail.com>:
>> >
>> >> +1 for deprecating/removing NONE mode.
>> >>
>> >> Alexey, what do you think about the SYNC mode? In my experience, it
>> does
>> >> not add much value as well. I would go as far as removing the
>> >> rebalancingMode parameter altogether (probably in 3.0).
>> >>
>> >> -Val
>> >>
>> >> On Mon, Jul 20, 2020 at 11:09 AM Ivan Pavlukhin 
>> >> wrote:
>> >>
>> >> > Alexey, Igniters,
>> >> >
>> >> > Could you please outline motivation answering following questions?
>> >> > 1. Does this mode generally work correctly today?
>> >> > 2. Can this mode be useful at all?
>> >> >
>> >> > I can imagine that it might be useful in a transparent caching use
>> >> > case (if I did not misunderstand).
>> >> >
>> >> > 2020-07-20 20:39 GMT+03:00, Pavel Tupitsyn :
>> >> > > +1
>> >> > >
>> >> > > More evidence:
>> >> > >
>> >> >
>> >>
>> https://stackoverflow.com/questions/62902640/apache-ignite-cacherebalancemode-is-not-respected-by-nodes
>> >> > >
>> >> > > On Mon, Jul 20, 2020 at 8:26 PM Alexey Goncharuk
>> >> > > 
>> >> > > wrote:
>> >> > >
>> >> > >> Igniters,
>> >> > >>
>> >> > >> I would like to run the idea of deprecating and probably ignoring
>> >> > >> the
>> >> > >> NONE
>> >> > >> rebalance mode by the community. It's in the removal list for
>> Ignite
>> >> 3.0
>> >> > >> [1], but it looks like it still confuses and creates issues for
>> >> > >> users
>> >> > >> [2].
>> >> > >>
>> >> > >> What about deprecating it in one of the next releases and even
>> >> ignoring
>> >> > >> this constant in further releases, interpreting it as ASYNC,
>> before
>> >> > >> Ignite
>> >> > >> 3.0? I find it hard to believe that any Ignite user actually has
>> >> > >> RebalanceMode.NONE set in their configuration due to its
>> absolutely
>> >> > >> unpredictable behavior.
>> >> > >>
>> >> > >> Thanks for your thoughts,
>> >> > >> --AG
>> >> > >>
>> >> > >> [1]
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
>> >> > >> [2]
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> http://apache-ignite-developers.2346864.n4.nabble.com/About-Rebalance-Mode-SYNC-amp-NONE-td47279.html
>> >> > >>
>> >> > >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Best regards,
>> >> > Ivan Pavlukhin
>> >> >
>> >>
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: [MTCGA]: new failures in builds [5479110] needs to be handled

2020-07-22 Thread Alexei Scherbakov
I'll take a look.

ср, 22 июл. 2020 г. в 02:14, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New test failure in master
> ContinuousQueryMarshallerTest.testRemoteFilterFactoryServer
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4649996366513987762=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - alexey scherbakov 
> https://ci.ignite.apache.org/viewModification.html?modId=904804
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 02:14:19 22-07-2020
>


-- 

Best regards,
Alexei Scherbakov


Re: Getting rid of NONE cache rebalance mode

2020-07-22 Thread Alexei Scherbakov
Ivan,
My opinion the ASYNC rebalancing is a best approach for off-loading 3-d
party store, and it provides consistency.

+1 for deprecation of NONE in the next release - ignore NONE and use ASYNC
instead
For those who require absence of rebalancing for some reason still be
possible to use rebalanceDelay=infinity.

+1 for removal of rebalanceMode in 3.0.
Note what we still require SYNC logic internally for system cache in some
places.



вт, 21 июл. 2020 г. в 15:59, Ivan Pavlukhin :

> Alexey,
>
> Thank you for explanation. I feel that I miss a couple bits to
> understand the picture fully. I am thinking about a case which I tend
> to call a Memcached use-case. There is a cache over underlying storage
> with read-through and expiration and without any rebalancing at all.
> When new nodes enter they take ownership for some partitions from
> already running nodes and serve client requests. Entries for not
> owning anymore partitions expire according to configuration.
>
> Actually, I have an idea. My guess is that "rebalancing" is a smarter
> and better approach than waiting for expiration. Am I right?
>
> 2020-07-21 15:31 GMT+03:00, Alexey Goncharuk :
> > Ivan,
> >
> > In my understanding this mode does not work at all even in the presence
> of
> > ForceKeysRequest which is now supposed to fetch values from peers in case
> > of a miss. In this mode we 1) move partitions to OWNING state
> > unconditionally, and 2) choose an arbitrary OWNING node for force keys
> > request. Therefore, after a user started two additional nodes in a
> cluster,
> > the request may be mapped to a node which does not hold any data. We will
> > do a read-through in this case, but it will result in significant load
> > increase on a third-party storage right after a node started, which means
> > that adding a node will increase, not decrease, the load on the database
> > being cached.
> > All these issues go away when (A)SYNC mode is used.
> >
> > Val,
> > The idea makes sense to me - a user can use rebalance future to wait for
> > rebalance to finish. This will simplify the configuration even further.
> >
> > пн, 20 июл. 2020 г. в 21:27, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> >> +1 for deprecating/removing NONE mode.
> >>
> >> Alexey, what do you think about the SYNC mode? In my experience, it does
> >> not add much value as well. I would go as far as removing the
> >> rebalancingMode parameter altogether (probably in 3.0).
> >>
> >> -Val
> >>
> >> On Mon, Jul 20, 2020 at 11:09 AM Ivan Pavlukhin 
> >> wrote:
> >>
> >> > Alexey, Igniters,
> >> >
> >> > Could you please outline motivation answering following questions?
> >> > 1. Does this mode generally work correctly today?
> >> > 2. Can this mode be useful at all?
> >> >
> >> > I can imagine that it might be useful in a transparent caching use
> >> > case (if I did not misunderstand).
> >> >
> >> > 2020-07-20 20:39 GMT+03:00, Pavel Tupitsyn :
> >> > > +1
> >> > >
> >> > > More evidence:
> >> > >
> >> >
> >>
> https://stackoverflow.com/questions/62902640/apache-ignite-cacherebalancemode-is-not-respected-by-nodes
> >> > >
> >> > > On Mon, Jul 20, 2020 at 8:26 PM Alexey Goncharuk
> >> > > 
> >> > > wrote:
> >> > >
> >> > >> Igniters,
> >> > >>
> >> > >> I would like to run the idea of deprecating and probably ignoring
> >> > >> the
> >> > >> NONE
> >> > >> rebalance mode by the community. It's in the removal list for
> Ignite
> >> 3.0
> >> > >> [1], but it looks like it still confuses and creates issues for
> >> > >> users
> >> > >> [2].
> >> > >>
> >> > >> What about deprecating it in one of the next releases and even
> >> ignoring
> >> > >> this constant in further releases, interpreting it as ASYNC, before
> >> > >> Ignite
> >> > >> 3.0? I find it hard to believe that any Ignite user actually has
> >> > >> RebalanceMode.NONE set in their configuration due to its absolutely
> >> > >> unpredictable behavior.
> >> > >>
> >> > >> Thanks for your thoughts,
> >> > >> --AG
> >> > >>
> >> > >> [1]
> >> > >>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
> >> > >> [2]
> >> > >>
> >> > >>
> >> >
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/About-Rebalance-Mode-SYNC-amp-NONE-td47279.html
> >> > >>
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >>
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


-- 

Best regards,
Alexei Scherbakov


Re: [MTCGA]: new failures in builds [5479103] needs to be handled

2020-07-21 Thread Alexei Scherbakov
I'll take a look.

ср, 22 июл. 2020 г. в 06:29, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New test failure in master
> CacheSerializableTransactionsTest.testConflictResolution
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=6728566098547435193=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - alexey scherbakov 
> https://ci.ignite.apache.org/viewModification.html?modId=904804
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 06:29:19 22-07-2020
>


-- 

Best regards,
Alexei Scherbakov


Re: Choosing historical rebalance heuristics

2020-07-17 Thread Alexei Scherbakov
gt; >>
> > >> Though, there are some other corner cases, e.g. this one:
> > >> - Configured size of WAL archive is big (>100 GB)
> > >> - Cache has small partitions (e.g. 1000 entries)
> > >> - Infrequent updates (e.g. ~100 in the whole WAL history of any node)
> > >> - There is another cache with very frequent updates which allocate
> >99%
> > of
> > >> WAL
> > >> In such scenario we may need to iterate over >100 GB of WAL in order
> to
> > >> fetch <1% of needed updates. Even though the amount of network traffic
> > is
> > >> still optimized, it would be more effective to transfer partitions
> with
> > >> ~1000 entries fully instead of reading >100 GB of WAL.
> > >>
> > >> I want to highlight that your heuristic definitely makes the situation
> > >> better, but due to possible corner cases we should keep the fallback
> > lever
> > >> to restrict or limit historical rebalance as before. Probably, it
> would
> > be
> > >> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
> > >> default value (1000, 500 or even 0) and apply your heuristic only for
> > >> partitions with bigger size.
> > >>
> > >> Regarding case [2]: it looks like an improvement that can mitigate
> some
> > >> corner cases (including the one that I have described). I'm ok with it
> > as
> > >> long as it takes data updates reordering on backup nodes into account.
> > We
> > >> don't track skipped updates for atomic caches. As a result, detection
> of
> > >> the absence of updates between two checkpoint markers with the same
> > >> partition counter can be false positive.
> > >>
> > >> --
> > >> Best Regards,
> > >> Ivan Rakov
> > >>
> > >> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov <
> vldpyat...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hi guys,
> > >> >
> > >> > I want to implement a more honest heuristic for historical
> rebalance.
> > >> > Before, a cluster makes a choice between the historical rebalance or
> > >> not it
> > >> > only from a partition size. This threshold more known by a name of
> > >> property
> > >> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
> > >> > It might prevent a historical rebalance when a partition is too
> small,
> > >> but
> > >> > not if WAL contains more updates than a size of partition,
> historical
> > >> > rebalance still can be chosen.
> > >> > There is a ticket where need to implement more fair heuristic[1].
> > >> >
> > >> > My idea for implementation is need to estimate a size of data which
> > >> will be
> > >> > transferred owe network. In other word if need to rebalance a part
> of
> > >> WAL
> > >> > that contains N updates, for recover a partition on another node,
> > which
> > >> > have to contain M rows at all, need chooses a historical rebalance
> on
> > >> the
> > >> > case where N < M (WAL history should be presented as well).
> > >> >
> > >> > This approach is easy implemented, because a coordinator node has
> the
> > >> size
> > >> > of partitions and counters' interval. But in this case cluster still
> > can
> > >> > find not many updates in too long WAL history. I assume a
> possibility
> > to
> > >> > work around it, if rebalance historical iterator will not handle
> > >> > checkpoints where not contains updates of particular cache.
> > Checkpoints
> > >> can
> > >> > skip if counters for the cache (maybe even a specific partitions)
> was
> > >> not
> > >> > changed between it and next one.
> > >> >
> > >> > Ticket for improvement rebalance historical iterator[2]
> > >> >
> > >> > I want to hear a view of community on the thought above.
> > >> > Maybe anyone has another opinion?
> > >> >
> > >> > [1]: https://issues.apache.org/jira/browse/IGNITE-13253
> > >> > [2]: https://issues.apache.org/jira/browse/IGNITE-13254
> > >> >
> > >> > --
> > >> > Vladislav Pyatkov
> > >> >
> > >>
> > >
> > >
> > > --
> > > Vladislav Pyatkov
> > >
> >
> >
> > --
> > Vladislav Pyatkov
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Extended logging for rebalance performance analysis

2020-06-23 Thread Alexei Scherbakov
Hi, Kirill.

Looks good to me.

вт, 23 июн. 2020 г. в 18:05, ткаленко кирилл :

> Hello, Alexey!
>
> I suggest that we decide what we can do within ticket [1].
>
> Add "rebalanceId" and "topVer" related to rebalance to all messages.
>
> Add statistical information to a log message:
> [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing
> [rebalanceId=1, grp=cache1,
> supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1, partitions=5, entries=100,
> duration=12ms,
> bytesRcvd=5M, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> progress=1/2]
>
> Add a message to log after rebalancing for all cache groups is complete:
> [2020-05-06 20:56:36,999][INFO ][...] Completed rebalance chain:
> [rebalanceId=2, partitions=10, entries=200, duration=50ms, bytesRcvd=10M]
>
> Any comments or suggestions?
>
> [1] - https://issues.apache.org/jira/browse/IGNITE-12080
>
> 20.05.2020, 23:08, "ткаленко кирилл" :
> > Hello, Alexey! Unfortunately, my response was delayed.
> >
> > Point 2: You can do as you suggested, I think it is still worth
> specifying how many partitions were obtained.
> >
> > [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing [grp=cache1,
> > supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1, partitions=5,
> entries=100, duration=12ms,
> > bytesRcvd=5M, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> > progress=1/2]
> >
> > Point 3: is It "rebalanceId"?
> >
> > Point 5: I think we can output a summary for each supplier, so as not to
> keep it in mind.
> >
> > [2020-05-06 20:56:36,999][INFO ][...] Completed rebalance chain:
> > [rebalanceId=2, [supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> partitions=5, entries=100, duration=12ms, bytesRcvd=5M],
> > [supplier=94a3fcbc-18d5-4c64-b0ab-4313aba2, partitions=5,
> entries=100, duration=12ms, bytesRcvd=5M]]
> >
> > I can add "rebalanceId" to each message that you gave at above.
> >
> > A detailed message will help us understand how correctly the suppliers
> were selected.
> >
> > 06.05.2020, 22:08, "Alexei Scherbakov" :
> >>  Hello.
> >>
> >>  Let's look at existing rebalancing log for a single group:
> >>
> >>  [2020-05-06 20:56:36,999][INFO ][...] Rebalancing scheduled
> >>  [order=[ignite-sys-cache, cache1, cache2, default],
> >>  top=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> >>  evt=DISCOVERY_CUSTOM_EVT, node=9d9edb7b-eb01-47a1-8ff9-fef715d2]
> >>  ...
> >>  [2020-05-06 20:56:37,034][INFO ][...] Prepared rebalancing [grp=cache1,
> >>  mode=ASYNC, supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> >>  partitionsCount=11, topVer=AffinityTopologyVersion [topVer=3,
> >>  minorTopVer=1]]
> >>  [2020-05-06 20:56:37,036][INFO ][...] Prepared rebalancing [grp=cache1,
> >>  mode=ASYNC, supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50,
> >>  partitionsCount=10, topVer=AffinityTopologyVersion [topVer=3,
> >>  minorTopVer=1]]
> >>  [2020-05-06 20:56:37,036][INFO ][...] Starting rebalance routine
> [cache1,
> >>  topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> >>  supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1, fullPartitions=[1, 5,
> 7, 9,
> >>  11, 13, 15, 23, 27, 29, 31], histPartitions=[]]
> >>  [2020-05-06 20:56:37,037][INFO ][...] Starting rebalance routine
> [cache1,
> >>  topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> >>  supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50, fullPartitions=[6, 8,
> 10,
> >>  16, 18, 20, 22, 24, 26, 28], histPartitions=[]]
> >>  [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing
> [grp=cache1,
> >>  supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> >>  topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], progress=1/2]
> >>  [2020-05-06 20:56:37,046][INFO ][...] Completed (final) rebalancing
> >>  [grp=cache1, supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50,
> >>  topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], progress=2/2]
> >>  [2020-05-06 20:56:37,048][INFO ][...] Completed rebalance future:
> >>  RebalanceFuture [grp=CacheGroupContext [grp=cache1],
> >>  topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> rebalanceId=2,
> >>  routines=2]
> >>
> >>  From these logs I'm already can get answers to 1 and 4.
> >>  The logs look concise and easy to read and understand, and should
> >>  remain what way.
> >>
> >>  But I think some proposed improvements can be done here without harm.
> >

Re: [MTCGA]: new failures in builds [5395772] needs to be handled

2020-06-22 Thread Alexei Scherbakov
Not clear why a bot reports my change, first failure was 3 days after the
commit [1]

[1]
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2134815735276887535=%3Cdefault%3E=testDetails

пн, 22 июн. 2020 г. в 05:20, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *Test with high flaky rate in master
> BinaryConfigurationTest.TestXmlConfiguration
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4488201393739220401=%3Cdefault%3E=testDetails
>
>  *Test with high flaky rate in master
> BinaryConfigurationTest.TestCodeConfiguration
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2134815735276887535=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - alexey scherbakov 
> https://ci.ignite.apache.org/viewModification.html?modId=903321
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 05:20:15 22-06-2020
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSS] Add flag methods to ClusterState enum

2020-06-10 Thread Alexei Scherbakov
But it looks like we do not need methods *readOnly *and *inactive*.
What is the point in adding them ?


ср, 10 июн. 2020 г. в 21:05, Alexei Scherbakov :

> Sergey Antonov,
>
> The proposal looks good to me.
> Use of org.apache.ignite.cluster.ClusterState#active adds a
> boilerplate code (a lot of static imports) and does an unnecessary state
> check.
>
>
>
>
> ср, 10 июн. 2020 г. в 19:02, Pavel Tupitsyn :
>
>> Sergey,
>>
>> I disagree - looks weird.
>> We have lots of enums, is this one special in some way?
>>
>> Thanks,
>> Pavel
>>
>> On Wed, Jun 10, 2020 at 6:58 PM Sergey Antonov > >
>> wrote:
>>
>> > Igniters, I'd like to propose a small improvement in ClusterState
>> class. I
>> > want to remove the static method boolean ClusterState#active and add
>> > methods to the enum:
>> >
>> >- boolean active()
>> >- boolean readOnly()
>> >- boolean inactive()
>> >
>> > From my point of view these methods more useful than comparing with
>> > specific enum's value.
>> >
>> > I'm going to do that on the ticket [1].
>> >
>> > Any objections?
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-13144
>> > --
>> > BR, Sergey Antonov
>> >
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSS] Add flag methods to ClusterState enum

2020-06-10 Thread Alexei Scherbakov
Sergey Antonov,

The proposal looks good to me.
Use of org.apache.ignite.cluster.ClusterState#active adds a
boilerplate code (a lot of static imports) and does an unnecessary state
check.




ср, 10 июн. 2020 г. в 19:02, Pavel Tupitsyn :

> Sergey,
>
> I disagree - looks weird.
> We have lots of enums, is this one special in some way?
>
> Thanks,
> Pavel
>
> On Wed, Jun 10, 2020 at 6:58 PM Sergey Antonov 
> wrote:
>
> > Igniters, I'd like to propose a small improvement in ClusterState class.
> I
> > want to remove the static method boolean ClusterState#active and add
> > methods to the enum:
> >
> >- boolean active()
> >- boolean readOnly()
> >- boolean inactive()
> >
> > From my point of view these methods more useful than comparing with
> > specific enum's value.
> >
> > I'm going to do that on the ticket [1].
> >
> > Any objections?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-13144
> > --
> > BR, Sergey Antonov
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Various shutdown guaranties

2020-06-09 Thread Alexei Scherbakov
+1, this is exactly what I want.

I'm fine with either IMMEDIATE or DEFAULT.

вт, 9 июн. 2020 г. в 19:41, Ivan Rakov :

> Vlad,
>
> +1, that's what I mean.
> We don't need either  or dedicated USE_STATIC_CONFIGURATION in case
> the user will be able to retrieve current shutdown policy and apply the one
> he needs.
> My only requirement is that ignite.cluster().getShutdownPolicy() should
> return a statically configured value {@link
> IgniteConfiguration#shutdownPolicy} in case no override has been specified.
> So, static configuration will be applied only on cluster start, like it
> currently works for SQL schemas.
>
> On Tue, Jun 9, 2020 at 7:09 PM V.Pyatkov  wrote:
>
> > Hi,
> >
> > ignite.cluster().setShutdownPolicy(null); // Clear dynamic value and
> switch
> > to statically configured.
> >
> > I do not understand why we need it. if user want to change configuration
> to
> > any other value he set it explicitly.
> > We can to add warning on start when static option does not math to
> dynamic
> > (dynamic always prefer if it initiated).
> >
> > shutdownPolicy=IMMEDIATE|GRACEFUL
> >
> > Looks better that DEFAULT and WAIT_FOR_BACKUP.
> >
> > I general I consider job cancellation need to added in these policies'
> > enumeration.
> > But we can do it in the future.
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Various shutdown guaranties

2020-06-09 Thread Alexei Scherbakov
be passed as dynamic property value via
> > ignite.cluster().setShutdownPolicy");
> > ...
> >   }
> > ...
> > }
> >
>
> What do you think?
>
> On Tue, Jun 9, 2020 at 3:09 PM Ivan Rakov  wrote:
>
> > Alex,
> >
> > Also shutdown policy must be always consistent on the grid or
> unintentional
> >> data loss is possible if two nodes are stopping simultaneously with
> >> different policies.
> >
> >  Totally agree.
> >
> > Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier.
> >
> >  I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS.
> >
> > 5. Let's keep a static property for simplifying setting of initial
> >> behavior.
> >> In most cases the policy will never be changed during grid's lifetime.
> >> No need for an explicit call to API on grid start.
> >> A joining node should check a local configuration value to match the
> grid.
> >> If a dynamic value is already present in a metastore, it should override
> >> static value with a warning.
> >
> > To sum it up:
> > - ShutdownPolicy can be set with static configuration
> > (IgniteConfiguration#setShutdownPolicy), on join we validate that
> > statically configured policies on different server nodes are the same
> > - It's possible to override statically configured value by adding
> > distributed metastorage value, which can be done by
> > calling ignite.cluster().setShutdownPolicy(plc) or control.sh method
> > - Dynamic property is persisted
> >
> > Generally, I don't mind if we have both dynamic and static configuration
> > properties. Necessity to call ignite.cluster().setShutdownPolicy(plc); on
> > every new cluster creation is a usability issue itself.
> > What bothers me here are the possible conflicts between static and
> dynamic
> > configuration. User may be surprised if he has shutdown policy X in
> > IgniteConfiguration, but the cluster behaves according to policy Y
> (because
> > several months ago another admin had called
> > IgniteCluster#setShutdownPolicy).
> > We can handle it by adding a separate enum field to the shutdown policy:
> >
> >> public enum ShutdownPolicy {
> >>   /* Default value of dynamic shutdown policy property. If it's set, the
> >> shutdown policy is resolved according to value of static {@link
> >> IgniteConfiguration#shutdownPolicy} configuration parameter. */
> >>   USE_STATIC_CONFIGURATION,
> >>
> >>   /* Node leaves the cluster even if it's the last owner of some
> >> partitions. Only partitions of caches with backups > 0 are taken into
> >> account. */
> >>   IMMEDIATE,
> >>
> >>   /* Shutdown is blocked until node is safe to leave without the data
> >> loss. */
> >>   GRACEFUL
> >> }
> >>
> > This way:
> > 1) User may easily understand whether the static parameter is overridden
> > by dynamic. If ignite.cluster().getShutdownPolicy() return anything
> except
> > USE_STATIC_CONFIGURATION, behavior is overridden.
> > 2) User may clear previous overriding by calling
> > ignite.cluster().setShutdownPolicy(USE_STATIC_CONFIGURATION). After that,
> > behavior will be resolved based in IgniteConfiguration#shutdownPolicy
> again.
> > If we agree on this mechanism, I propose to use IMMEDIATE name instead of
> > DEFAULT for non-safe policy in order to don't confuse user.
> > Meanwhile, static configuration will accept the same enum, but
> > USE_STATIC_CONFIGURATION will be restricted:
> >
> >> public class IgniteConfiguration {
> >>   public static final ShutdownPolicy DFLT_STATIC_SHUTDOWN_POLICY =
> >> IMMEDIATE;
> >>   private ShutdownPolicy shutdownPolicy = DFLT_STATIC_SHUTDOWN_POLICY;
> >>   ...
> >>   public void setShutdownPolicy(ShutdownPolicy shutdownPlc) {
> >> if (shutdownPlc ==  USE_STATIC_CONFIGURATION)
> >>   throw new IllegalArgumentException("USE_STATIC_CONFIGURATION can
> >> only be passed as dynamic property value via
> >> ignite.cluster().setShutdownPolicy");
> >> ...
> >>   }
> >> ...
> >> }
> >>
> >
> > What do you think?
> >
> > On Tue, Jun 9, 2020 at 11:46 AM Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> >
> >> Ivan Rakov,
> >>
> >> Your proposal overall looks good to me. My comments:
> >>
> >> 1. I would avoid adding such a method, because it will be impossible to
> >> change it

Re: Various shutdown guaranties

2020-06-09 Thread Alexei Scherbakov
 "defaultShutdownPolicy" as a dynamic cluster configuration,
> two values are available so far: DEFAULT and WAIT_FOR_BACKUPS
> 3. This property is stored in the distributed metastorage (thus persisted),
> can be changed via Java API and ./control.sh
> 4. Behavior configured with this property will be applied only on common
> ways of stopping the node - Ignite.close() and kill .
> 5. *Don't* add new options to the static IgniteConfiguration to avoid
> conflicts between dynamic and static configuration
>
> --
> Best Regards,
> Ivan Rakov
>
> On Mon, Jun 8, 2020 at 6:44 PM V.Pyatkov  wrote:
>
> > Hi
> >
> > We need to have ability to calling shutdown with various guaranties.
> > For example:
> > Need to reboot a node, but after that node should be available for
> > historical rebalance (all partitions in MOVING state should have gone to
> > OWNING).
> >
> > Implemented a circled reboot of cluster, but all data should be available
> > on
> > that time (at least one copy of partition should be available in
> cluster).
> >
> > Need to wait not only data available, but all jobs (before this behavior
> > available through a stop(false) method invocation).
> >
> > All these reason required various behavior before shutting down node.
> > I propose slightly modify public API and add here method which shown on
> > shutdown behavior directly:
> > Ignite.close(Shutdown)
> >
> > /public enum Shutdownn {
> > /**
> >  * Stop immediately as soon components are ready.
> >  */
> > IMMEDIATE,
> > /**
> >  * Stop node when all partitions completed moving from/to this node
> to
> > another.
> >  */
> > NORMAL,
> > /**
> >  * Node will stop if and only if it does not store any unique
> > partitions, that does not have copies on cluster.
> >  */
> > GRACEFUL,
> > /**
> >  * Node stops graceful and wait all jobs before shutdown.
> >  */
> > ALL
> > }/
> >
> > Method close without parameter Ignite.close() will get shutdown behavior
> > configured for cluster wide. It will be implemented through distributed
> > meta
> > storage and additional utilities for configuration.
> > Also, will be added a method to configure shutdown on start, this is look
> > as
> > IgniteConfiguration.setShutdown(Shutdown).
> > If shutting down did not configure all be worked as before according to
> > IMMEDIATE behavior.
> > All other close method will be marked as deprecated.
> >
> > I will be waiting for your opinions.
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Various shutdown guaranties

2020-06-08 Thread Alexei Scherbakov
Graceful policy should only be applicable to caches having a number of
backups > 0.

пн, 8 июн. 2020 г. в 14:54, Alexei Scherbakov :

> V.Pyatkov
>
>
> While I agree we need a way to prevent unintentional data loss on
> shutdown, I do not like the proposed shutdown flags enum.
> I see no relation between possible data loss on shutdown and waiting for
> some jobs to complete.
>
> All we need is a new method (duplicated by system property), like
>
> IgniteConfiguration.setShutdownPolicy(GRACEFUL|DEFAULT);
> and an optional
> IgniteConfiguration.setGracefulShutdownTimeout(long);  // Force a shutdown
> if the timeout is expired.
>
> For enabled graceful policy a node shouldn't normally stop if it is the
> last owner for any partition.
> This will prevent unintentional data loss on stop when it is possible, for
> example if a grid is deployed over kubernetes.
>
> The properties also should be changeable at runtime using JMX or
> control.sh interface.
>
>
>
>
> пн, 8 июн. 2020 г. в 13:46, V.Pyatkov :
>
>> Hi
>>
>> We need to have ability to calling shutdown with various guaranties.
>> For example:
>> Need to reboot a node, but after that node should be available for
>> historical rebalance (all partitions in MOVING state should have gone to
>> OWNING).
>>
>> Implemented a circled reboot of cluster, but all data should be available
>> on
>> that time (at least one copy of partition should be available in cluster).
>>
>> Need to wait not only data available, but all jobs (before this behavior
>> available through a stop(false) method invocation).
>>
>> All these reason required various behavior before shutting down node.
>> I propose slightly modify public API and add here method which shown on
>> shutdown behavior directly:
>> Ignite.close(Shutdown)
>>
>> /public enum Shutdownn {
>> /**
>>  * Stop immediately as soon components are ready.
>>  */
>> IMMEDIATE,
>> /**
>>  * Stop node when all partitions completed moving from/to this node to
>> another.
>>  */
>> NORMAL,
>> /**
>>  * Node will stop if and only if it does not store any unique
>> partitions, that does not have copies on cluster.
>>  */
>> GRACEFUL,
>> /**
>>  * Node stops graceful and wait all jobs before shutdown.
>>  */
>> ALL
>> }/
>>
>> Method close without parameter Ignite.close() will get shutdown behavior
>> configured for cluster wide. It will be implemented through distributed
>> meta
>> storage and additional utilities for configuration.
>> Also, will be added a method to configure shutdown on start, this is look
>> as
>> IgniteConfiguration.setShutdown(Shutdown).
>> If shutting down did not configure all be worked as before according to
>> IMMEDIATE behavior.
>> All other close method will be marked as deprecated.
>>
>> I will be waiting for your opinions.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov


Re: Various shutdown guaranties

2020-06-08 Thread Alexei Scherbakov
V.Pyatkov


While I agree we need a way to prevent unintentional data loss on shutdown,
I do not like the proposed shutdown flags enum.
I see no relation between possible data loss on shutdown and waiting for
some jobs to complete.

All we need is a new method (duplicated by system property), like

IgniteConfiguration.setShutdownPolicy(GRACEFUL|DEFAULT);
and an optional
IgniteConfiguration.setGracefulShutdownTimeout(long);  // Force a shutdown
if the timeout is expired.

For enabled graceful policy a node shouldn't normally stop if it is the
last owner for any partition.
This will prevent unintentional data loss on stop when it is possible, for
example if a grid is deployed over kubernetes.

The properties also should be changeable at runtime using JMX or control.sh
interface.




пн, 8 июн. 2020 г. в 13:46, V.Pyatkov :

> Hi
>
> We need to have ability to calling shutdown with various guaranties.
> For example:
> Need to reboot a node, but after that node should be available for
> historical rebalance (all partitions in MOVING state should have gone to
> OWNING).
>
> Implemented a circled reboot of cluster, but all data should be available
> on
> that time (at least one copy of partition should be available in cluster).
>
> Need to wait not only data available, but all jobs (before this behavior
> available through a stop(false) method invocation).
>
> All these reason required various behavior before shutting down node.
> I propose slightly modify public API and add here method which shown on
> shutdown behavior directly:
> Ignite.close(Shutdown)
>
> /public enum Shutdownn {
> /**
>  * Stop immediately as soon components are ready.
>  */
> IMMEDIATE,
> /**
>  * Stop node when all partitions completed moving from/to this node to
> another.
>  */
> NORMAL,
> /**
>  * Node will stop if and only if it does not store any unique
> partitions, that does not have copies on cluster.
>  */
> GRACEFUL,
> /**
>  * Node stops graceful and wait all jobs before shutdown.
>  */
> ALL
> }/
>
> Method close without parameter Ignite.close() will get shutdown behavior
> configured for cluster wide. It will be implemented through distributed
> meta
> storage and additional utilities for configuration.
> Also, will be added a method to configure shutdown on start, this is look
> as
> IgniteConfiguration.setShutdown(Shutdown).
> If shutting down did not configure all be worked as before according to
> IMMEDIATE behavior.
> All other close method will be marked as deprecated.
>
> I will be waiting for your opinions.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


-- 

Best regards,
Alexei Scherbakov


Re: Extended logging for rebalance performance analysis

2020-05-26 Thread Alexei Scherbakov
Hi, Kirill.

2. Ok for me.

3. We already have log message about rebalance cancellation for specific
rebalanceId and log message about new rebalancing has started with next
rebalanceId.

5. We already have summary information per group for each supplier added in
p.2
I would avoid duplication here or put it under debug logging.

Information about selected suppliers can already be obtained from logs:
[2020-05-06 20:56:37,034][INFO ][...] Prepared rebalancing [grp=cache1,
mode=ASYNC, supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
partitionsCount=11, topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1]]
[2020-05-06 20:56:37,036][INFO ][...] Prepared rebalancing [grp=cache1,
mode=ASYNC, supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50,
partitionsCount=10, topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1]]

so I still do not see any value in "detailed" message.

I think this is enough to understand rebalancing flow using proper grepping
by topVer and rebalanceId.

All additional aggregation should be done by external tools using
rebalancing metrics.
I'm on the same page with Maxim Muzafarov here.



ср, 20 мая 2020 г. в 23:08, ткаленко кирилл :

> Hello, Alexey! Unfortunately, my response was delayed.
>
> Point 2: You can do as you suggested, I think it is still worth specifying
> how many partitions were obtained.
>
> [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing [grp=cache1,
> supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1, partitions=5, entries=100,
> duration=12ms,
> bytesRcvd=5M, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> progress=1/2]
>
> Point 3: is It "rebalanceId"?
>
> Point 5: I think we can output a summary for each supplier, so as not to
> keep it in mind.
>
> [2020-05-06 20:56:36,999][INFO ][...] Completed rebalance chain:
> [rebalanceId=2, [supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> partitions=5, entries=100, duration=12ms, bytesRcvd=5M],
> [supplier=94a3fcbc-18d5-4c64-b0ab-4313aba2, partitions=5, entries=100,
> duration=12ms, bytesRcvd=5M]]
>
> I can add "rebalanceId" to each message that you gave at above.
>
> A detailed message will help us understand how correctly the suppliers
> were selected.
>
> 06.05.2020, 22:08, "Alexei Scherbakov" :
> > Hello.
> >
> > Let's look at existing rebalancing log for a single group:
> >
> > [2020-05-06 20:56:36,999][INFO ][...] Rebalancing scheduled
> > [order=[ignite-sys-cache, cache1, cache2, default],
> > top=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> > evt=DISCOVERY_CUSTOM_EVT, node=9d9edb7b-eb01-47a1-8ff9-fef715d2]
> > ...
> > [2020-05-06 20:56:37,034][INFO ][...] Prepared rebalancing [grp=cache1,
> > mode=ASYNC, supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> > partitionsCount=11, topVer=AffinityTopologyVersion [topVer=3,
> > minorTopVer=1]]
> > [2020-05-06 20:56:37,036][INFO ][...] Prepared rebalancing [grp=cache1,
> > mode=ASYNC, supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50,
> > partitionsCount=10, topVer=AffinityTopologyVersion [topVer=3,
> > minorTopVer=1]]
> > [2020-05-06 20:56:37,036][INFO ][...] Starting rebalance routine [cache1,
> > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> > supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1, fullPartitions=[1, 5, 7,
> 9,
> > 11, 13, 15, 23, 27, 29, 31], histPartitions=[]]
> > [2020-05-06 20:56:37,037][INFO ][...] Starting rebalance routine [cache1,
> > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> > supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50, fullPartitions=[6, 8, 10,
> > 16, 18, 20, 22, 24, 26, 28], histPartitions=[]]
> > [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing [grp=cache1,
> > supplier=94a3fcbc-18d5-4c64-b0ab-4313aba1,
> > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], progress=1/2]
> > [2020-05-06 20:56:37,046][INFO ][...] Completed (final) rebalancing
> > [grp=cache1, supplier=b3f3aeeb-5fa0-42f7-a74e-cf39fa50,
> > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], progress=2/2]
> > [2020-05-06 20:56:37,048][INFO ][...] Completed rebalance future:
> > RebalanceFuture [grp=CacheGroupContext [grp=cache1],
> > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], rebalanceId=2,
> > routines=2]
> >
> > From these logs I'm already can get answers to 1 and 4.
> > The logs look concise and easy to read and understand, and should
> > remain what way.
> >
> > But I think some proposed improvements can be done here without harm.
> >
> > 2. OK, let's add it to supplier info per cache with additional info:
> >
> > [2020-05-06 20:56:37,044][INFO ][...] Completed rebalancing [grp=cache1,

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov :

> > This willl takes us to the re-encryption using full rebalancing
>
> Rebalance will require 2x efforts for reencryption
>
> 1. Read and send data from supplier node.
> 2. Reencrypt and write data on demander node.
>
> Instead of
>
> 1. Read, reencrypt and write data on «demander» node.
>

Usually, reading and sending is not a bottleneck. And don't forget we can
run out of WAL history and fall back to full rebalancing with partition
eviction eliminating all efforts from offline re-encryption.

On the other side, for a grid having many nodes one-by-one re-encryption
can take a long time.
It should also be possible to re-encrypt all data as fast as possible if,
for example, if a load can be switched to another grid, where offline
encryption will come in handy.

So, I suggest to implement offline re-encryption and online re-encryption
using rebalancing as a first step.

Next step can be online in-place re-encryption. It's important to measure
business impact from it on online grid.


>
>
> > 25 мая 2020 г., в 11:46, Alexei Scherbakov 
> написал(а):
> >
> > For me, the one big disadvantage for offline re-encryption is the
> > possibility to run out of WAL history.
> > If an re-encryption takes a long time we will get full rebalancing with
> > partition eviction.
> > This willl takes us to the re-encryption using full rebalancing, proposed
> > by me earlier.
> >
> >
> >
> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
> >
> >>> And definitely this approach is much simplier to implement
> >>
> >> I agree.
> >>
> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> fully offline procedure:
> >>
> >> 1. Stop node.
> >> 2. Execute some control.sh command that will reencrypt all data without
> >> starting node
> >> 3. Start node.
> >>
> >> Pavel, can you, please, write it one more time - what disadvantages in
> >> offline procedure?
> >>
> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>
> >> написал(а):
> >>>
> >>> And definitely this approach is much simplier to implement because all
> >>> corner cases are handled by rebalancing code.
> >>>
> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com
> >>>> :
> >>>
> >>>> I mean: serving supply requests.
> >>>>
> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >>>> alexey.scherbak...@gmail.com>:
> >>>>
> >>>>> Nikolay,
> >>>>>
> >>>>> Can you explain why such restriction is necessary ?
> >>>>> Most likely having a currently re-encrypting node serving only demand
> >>>>> requests will have least preformance impact on a grid.
> >>>>>
> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
> >>>>>
> >>>>>> Hello, Alexei.
> >>>>>>
> >>>>>> I think we want to implement this feature without nodes restart.
> >>>>>> In the ideal scenario all nodes will stay alive and respond to the
> >> user
> >>>>>> requests.
> >>>>>>
> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >>>>>> alexey.scherbak...@gmail.com> написал(а):
> >>>>>>>
> >>>>>>> Pavel Pereslegin,
> >>>>>>>
> >>>>>>> I see another opportunity.
> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> a
> >>>>>> key,
> >>>>>>> start node and wait for rebalancing to complete.
> >>>>>>> Data will be re-encrypted during rebalancing.
> >>>>>>>
> >>>>>>> Did I miss something ?
> >>>>>>>
> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >>>>>>>
> >>>>>>>> Folks,
> >>>>>>>>
> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> interested
> >>>>>> in TDE
> >>>>>>>> in general and keys rotations spec

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
For me, the one big disadvantage for offline re-encryption is the
possibility to run out of WAL history.
If an re-encryption takes a long time we will get full rebalancing with
partition eviction.
This willl takes us to the re-encryption using full rebalancing, proposed
by me earlier.



пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :

> > And definitely this approach is much simplier to implement
>
> I agree.
>
> If we allow to made nodes offline for reencryption then we can implement a
> fully offline procedure:
>
> 1. Stop node.
> 2. Execute some control.sh command that will reencrypt all data without
> starting node
> 3. Start node.
>
> Pavel, can you, please, write it one more time - what disadvantages in
> offline procedure?
>
> > 25 мая 2020 г., в 11:20, Alexei Scherbakov 
> написал(а):
> >
> > And definitely this approach is much simplier to implement because all
> > corner cases are handled by rebalancing code.
> >
> > пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >> :
> >
> >> I mean: serving supply requests.
> >>
> >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com>:
> >>
> >>> Nikolay,
> >>>
> >>> Can you explain why such restriction is necessary ?
> >>> Most likely having a currently re-encrypting node serving only demand
> >>> requests will have least preformance impact on a grid.
> >>>
> >>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
> >>>
> >>>> Hello, Alexei.
> >>>>
> >>>> I think we want to implement this feature without nodes restart.
> >>>> In the ideal scenario all nodes will stay alive and respond to the
> user
> >>>> requests.
> >>>>
> >>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >>>> alexey.scherbak...@gmail.com> написал(а):
> >>>>>
> >>>>> Pavel Pereslegin,
> >>>>>
> >>>>> I see another opportunity.
> >>>>> We can use rebalancing to re-encrypt node data with a new key.
> >>>>> It's a trivial procedure for me: stop a node, clear database, change
> a
> >>>> key,
> >>>>> start node and wait for rebalancing to complete.
> >>>>> Data will be re-encrypted during rebalancing.
> >>>>>
> >>>>> Did I miss something ?
> >>>>>
> >>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >>>>>
> >>>>>> Folks,
> >>>>>>
> >>>>>> Just keeping you informed: I and my colleagues are highly interested
> >>>> in TDE
> >>>>>> in general and keys rotations specifically, but we don't have enough
> >>>> time
> >>>>>> so far.
> >>>>>> We'll dive into this feature and participate in reviews next month.
> >>>>>>
> >>>>>> --
> >>>>>> Best Regards,
> >>>>>> Ivan Rakov
> >>>>>>
> >>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin  >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello, Alexey.
> >>>>>>>
> >>>>>>>> is the encryption key for the data the same on all nodes in the
> >>>>>> cluster?
> >>>>>>> Yes, each encrypted cache group has its own encryption key, the key
> >>>> is
> >>>>>>> the same on all nodes.
> >>>>>>>
> >>>>>>>> Clearly, during the re-encryption there will exist pages
> >>>>>>>> encrypted with both new and old keys at the same time.
> >>>>>>> Yes, there will be pages encrypted with different keys at the same
> >>>> time.
> >>>>>>> Currently, we only store one key for one cache group. To rotate a
> >>>> key,
> >>>>>>> at a certain point in time it is necessary to support several keys
> >>>> (at
> >>>>>>> least for reading the WAL).
> >>>>>>> For the "in place" strategy, we'll store the encryption key
> >>>> identifier
> >>>>>>> on each encrypted page (we currently have some unused space on
> >>>>>>> encrypted page,

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
And definitely this approach is much simplier to implement because all
corner cases are handled by rebalancing code.

пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov :

> I mean: serving supply requests.
>
> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
>
>> Nikolay,
>>
>> Can you explain why such restriction is necessary ?
>> Most likely having a currently re-encrypting node serving only demand
>> requests will have least preformance impact on a grid.
>>
>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>>
>>> Hello, Alexei.
>>>
>>> I think we want to implement this feature without nodes restart.
>>> In the ideal scenario all nodes will stay alive and respond to the user
>>> requests.
>>>
>>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>> alexey.scherbak...@gmail.com> написал(а):
>>> >
>>> > Pavel Pereslegin,
>>> >
>>> > I see another opportunity.
>>> > We can use rebalancing to re-encrypt node data with a new key.
>>> > It's a trivial procedure for me: stop a node, clear database, change a
>>> key,
>>> > start node and wait for rebalancing to complete.
>>> > Data will be re-encrypted during rebalancing.
>>> >
>>> > Did I miss something ?
>>> >
>>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>>> >
>>> >> Folks,
>>> >>
>>> >> Just keeping you informed: I and my colleagues are highly interested
>>> in TDE
>>> >> in general and keys rotations specifically, but we don't have enough
>>> time
>>> >> so far.
>>> >> We'll dive into this feature and participate in reviews next month.
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Ivan Rakov
>>> >>
>>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>>> >> wrote:
>>> >>
>>> >>> Hello, Alexey.
>>> >>>
>>> >>>> is the encryption key for the data the same on all nodes in the
>>> >> cluster?
>>> >>> Yes, each encrypted cache group has its own encryption key, the key
>>> is
>>> >>> the same on all nodes.
>>> >>>
>>> >>>> Clearly, during the re-encryption there will exist pages
>>> >>>> encrypted with both new and old keys at the same time.
>>> >>> Yes, there will be pages encrypted with different keys at the same
>>> time.
>>> >>> Currently, we only store one key for one cache group. To rotate a
>>> key,
>>> >>> at a certain point in time it is necessary to support several keys
>>> (at
>>> >>> least for reading the WAL).
>>> >>> For the "in place" strategy, we'll store the encryption key
>>> identifier
>>> >>> on each encrypted page (we currently have some unused space on
>>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> >>> will have several keys for reading and one key for writing. I assume
>>> >>> that the old key will be automatically deleted when a specific WAL
>>> >>> segment is deleted (and re-encryption is finished).
>>> >>>
>>> >>>> Will a node continue to re-encrypt the data after it restarts?
>>> >>> Yes.
>>> >>>
>>> >>>> If a node goes down during the re-encryption, but the rest of the
>>> >>>> cluster finishes re-encryption, will we consider the procedure
>>> >> complete?
>>> >>> I'm not sure, but it looks like the key rotation is complete when we
>>> >>> set the new key on all nodes so that the updates will be encrypted
>>> >>> with the new key (as required by PCI DSS).
>>> >>> Status of re-encryption can be obtained separately (locally or
>>> cluster
>>> >>> wide).
>>> >>>
>>> >>> I forgot to mention that with “in place” re-encryption it will be
>>> >>> impossible to quickly cancel re-encryption, because by canceling we
>>> >>> mean re-encryption with the old key.
>>> >>>
>>> >>>> How do you see the whole key rotation procedure will work?
>>> >>> Initial design for re-encrypt

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
I mean: serving supply requests.

пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov :

> Nikolay,
>
> Can you explain why such restriction is necessary ?
> Most likely having a currently re-encrypting node serving only demand
> requests will have least preformance impact on a grid.
>
> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>
>> Hello, Alexei.
>>
>> I think we want to implement this feature without nodes restart.
>> In the ideal scenario all nodes will stay alive and respond to the user
>> requests.
>>
>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com> написал(а):
>> >
>> > Pavel Pereslegin,
>> >
>> > I see another opportunity.
>> > We can use rebalancing to re-encrypt node data with a new key.
>> > It's a trivial procedure for me: stop a node, clear database, change a
>> key,
>> > start node and wait for rebalancing to complete.
>> > Data will be re-encrypted during rebalancing.
>> >
>> > Did I miss something ?
>> >
>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>> >
>> >> Folks,
>> >>
>> >> Just keeping you informed: I and my colleagues are highly interested
>> in TDE
>> >> in general and keys rotations specifically, but we don't have enough
>> time
>> >> so far.
>> >> We'll dive into this feature and participate in reviews next month.
>> >>
>> >> --
>> >> Best Regards,
>> >> Ivan Rakov
>> >>
>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>> >> wrote:
>> >>
>> >>> Hello, Alexey.
>> >>>
>> >>>> is the encryption key for the data the same on all nodes in the
>> >> cluster?
>> >>> Yes, each encrypted cache group has its own encryption key, the key is
>> >>> the same on all nodes.
>> >>>
>> >>>> Clearly, during the re-encryption there will exist pages
>> >>>> encrypted with both new and old keys at the same time.
>> >>> Yes, there will be pages encrypted with different keys at the same
>> time.
>> >>> Currently, we only store one key for one cache group. To rotate a key,
>> >>> at a certain point in time it is necessary to support several keys (at
>> >>> least for reading the WAL).
>> >>> For the "in place" strategy, we'll store the encryption key identifier
>> >>> on each encrypted page (we currently have some unused space on
>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>> >>> will have several keys for reading and one key for writing. I assume
>> >>> that the old key will be automatically deleted when a specific WAL
>> >>> segment is deleted (and re-encryption is finished).
>> >>>
>> >>>> Will a node continue to re-encrypt the data after it restarts?
>> >>> Yes.
>> >>>
>> >>>> If a node goes down during the re-encryption, but the rest of the
>> >>>> cluster finishes re-encryption, will we consider the procedure
>> >> complete?
>> >>> I'm not sure, but it looks like the key rotation is complete when we
>> >>> set the new key on all nodes so that the updates will be encrypted
>> >>> with the new key (as required by PCI DSS).
>> >>> Status of re-encryption can be obtained separately (locally or cluster
>> >>> wide).
>> >>>
>> >>> I forgot to mention that with “in place” re-encryption it will be
>> >>> impossible to quickly cancel re-encryption, because by canceling we
>> >>> mean re-encryption with the old key.
>> >>>
>> >>>> How do you see the whole key rotation procedure will work?
>> >>> Initial design for re-encryption with "partition copying" is described
>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>> >>> we'll go this way. In short, send the new encryption key cluster-wide,
>> >>> each node adds a new key and starts background re-encryption.
>> >>>
>> >>> [1]
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>> .
>> >>>
>> >>> вс, 17 мая 2020 г. в 18:35, Alex

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
Nikolay,

Can you explain why such restriction is necessary ?
Most likely having a currently re-encrypting node serving only demand
requests will have least preformance impact on a grid.

пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :

> Hello, Alexei.
>
> I think we want to implement this feature without nodes restart.
> In the ideal scenario all nodes will stay alive and respond to the user
> requests.
>
> > 24 мая 2020 г., в 15:24, Alexei Scherbakov 
> написал(а):
> >
> > Pavel Pereslegin,
> >
> > I see another opportunity.
> > We can use rebalancing to re-encrypt node data with a new key.
> > It's a trivial procedure for me: stop a node, clear database, change a
> key,
> > start node and wait for rebalancing to complete.
> > Data will be re-encrypted during rebalancing.
> >
> > Did I miss something ?
> >
> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >
> >> Folks,
> >>
> >> Just keeping you informed: I and my colleagues are highly interested in
> TDE
> >> in general and keys rotations specifically, but we don't have enough
> time
> >> so far.
> >> We'll dive into this feature and participate in reviews next month.
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
> >> wrote:
> >>
> >>> Hello, Alexey.
> >>>
> >>>> is the encryption key for the data the same on all nodes in the
> >> cluster?
> >>> Yes, each encrypted cache group has its own encryption key, the key is
> >>> the same on all nodes.
> >>>
> >>>> Clearly, during the re-encryption there will exist pages
> >>>> encrypted with both new and old keys at the same time.
> >>> Yes, there will be pages encrypted with different keys at the same
> time.
> >>> Currently, we only store one key for one cache group. To rotate a key,
> >>> at a certain point in time it is necessary to support several keys (at
> >>> least for reading the WAL).
> >>> For the "in place" strategy, we'll store the encryption key identifier
> >>> on each encrypted page (we currently have some unused space on
> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
> >>> will have several keys for reading and one key for writing. I assume
> >>> that the old key will be automatically deleted when a specific WAL
> >>> segment is deleted (and re-encryption is finished).
> >>>
> >>>> Will a node continue to re-encrypt the data after it restarts?
> >>> Yes.
> >>>
> >>>> If a node goes down during the re-encryption, but the rest of the
> >>>> cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> I'm not sure, but it looks like the key rotation is complete when we
> >>> set the new key on all nodes so that the updates will be encrypted
> >>> with the new key (as required by PCI DSS).
> >>> Status of re-encryption can be obtained separately (locally or cluster
> >>> wide).
> >>>
> >>> I forgot to mention that with “in place” re-encryption it will be
> >>> impossible to quickly cancel re-encryption, because by canceling we
> >>> mean re-encryption with the old key.
> >>>
> >>>> How do you see the whole key rotation procedure will work?
> >>> Initial design for re-encryption with "partition copying" is described
> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
> >>> we'll go this way. In short, send the new encryption key cluster-wide,
> >>> each node adds a new key and starts background re-encryption.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>> .
> >>>
> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> >>> :
> >>>>
> >>>> Pavel, Anton,
> >>>>
> >>>> How do you see the whole key rotation procedure will work? Clearly,
> >>> during
> >>>> the re-encryption there will exist pages encrypted with both new and
> >> old
> >>>> keys at the same time. Will a node continue to re-encrypt the data
> >> after
> >>> it
&

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-24 Thread Alexei Scherbakov
; > Encryption was implemented [1], but some security standards (PCI
> DSS
> > > > > at least) require rotation of all encryption keys [2]. Currently,
> > > > > encryption occurs when reading/writing pages to disk, cache
> > encryption
> > > > > keys are stored in metastore.
> > > > >
> > > > > I'm going to contribute cache encryption key rotation and want to
> > > > > consult what is the best way to re-encrypting existing data, I see
> > two
> > > > > different strategies.
> > > > >
> > > > > 1. In place re-encryption:
> > > > > Using the old key, sequentially read all the pages from the
> > datastore,
> > > > > mark as dirty and log them into the WAL. After checkpoint pages
> will
> > > > > be stored to disk encrypted with the new key (as usual, along with
> > > > > updates). This strategy requires store the identifier (number) of
> the
> > > > > encryption key into the encrypted page.
> > > > > pros:
> > > > >   - can work in the background with minimal performance impact
> (this
> > > > > impact can be managed).
> > > > > cons:
> > > > >   - page duplication in the WAL may affect performance and
> historical
> > > > > rebalance.
> > > > >
> > > > > 2. Copy partition with re-encryption.
> > > > > This strategy is similar to partition snapshotting [3] - create
> > > > > partition copy encrypted with the new key and then replace the
> > > > > original partition file with the new one (see details [4]).
> > > > > pros:
> > > > >   - should work faster than "in place" re-encryption.
> > > > > cons:
> > > > >   - re-encryption in active cluster (and on unstable topology) can
> be
> > > > > difficult to implement.
> > > > >
> > > > > (See more detailed comparison [5])
> > > > >
> > > > > Re-encryption of existing data is a long and rare procedure (It is
> > > > > recommended to change the key every 6 months, but at least once
> every
> > > > > 2 years). Thus, re-encryption can be implemented for maintenance
> mode
> > > > > (for example, on a stable topology in a read-only cluster) and in
> > such
> > > > > case the approach with partition copying seems simpler and faster.
> > > > >
> > > > > So, what do you think - do we need "online" re-encryption and which
> > of
> > > > > the proposed options is best suited for this?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > > > [2]
> > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > > > [3]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > > > [4]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > > > .
> > > > > [5]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > > > >
> > > >
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: [MTCGA]: new failures in builds [5329193] needs to be handled

2020-05-24 Thread Alexei Scherbakov
I'll take a look at this.

сб, 23 мая 2020 г. в 06:46, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  *New test failure in master
> IgnitePdsContinuousRestartTest.testRebalancingDuringLoad_10_500_8_16
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-4575866686365489404=%3Cdefault%3E=testDetails
>  No changes in the build
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 06:46:34 23-05-2020
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSS] Data loss handling improvements

2020-05-07 Thread Alexei Scherbakov
Yes, it will work this way.

чт, 7 мая 2020 г. в 10:43, Anton Vinogradov :

> Seems I got the vision, thanks.
> There should be only 2 ways to reset lost partition: to gain an owner from
> resurrected first or to remove ex-owner from baseline (partition will be
> rearranged).
> And we should make a decision for every lost partition before calling the
> reset.
>
> On Wed, May 6, 2020 at 8:02 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > ср, 6 мая 2020 г. в 12:54, Anton Vinogradov :
> >
> > > Alexei,
> > >
> > > 1,2,4,5 - looks good to me, no objections here.
> > >
> > > >> 3. Lost state is impossible to reset if a topology doesn't have at
> > least
> > > >> one owner for each lost partition.
> > >
> > > Do you mean that, according to your example, where
> > > >> a node2 has left, soon a node3 has left. If the node2 is returned to
> > > >> the topology first, it would have stale data for some keys.
> > > we have to have node2 at cluster to be able to reset "lost" to node2's
> > > data?
> > >
> >
> > Not sure if I understand a question, but try to answer using an example:
> > Assume 3 nodes n1, n2, n3, 1 backup, persistence enabled, partition p is
> > owned by n2 and n3.
> > 1. Topology is activated.
> > 2. cache.put(p, 0) // n2 and n3 have p->0, updateCounter=1
> > 3. n2 has failed.
> > 4. cache.put(p, 1) // n3 has p->1, updateCounter=2
> > 5. n3 has failed, partition loss is happened.
> > 6. n2 joins a topology, it has stale data (p->0)
> >
> > We actually have 2 issues:
> > 7. cache.put(p, 2) will success, n2 has p->2, n3 has p->0, data is
> diverged
> > and will not be adjusted by counters rebalancing if n3 is later joins a
> > topology.
> > or
> > 8. n3 joins a topology, it has actual data (p->1) but rebalancing will
> not
> > work because joining node has highest counter (it can only be a demander
> in
> > this scenario).
> >
> > In both cases rebalancing by counters will not work causing data
> divergence
> > in copies.
> >
> >
> > >
> > > >> at least one owner for each lost partition.
> > > What the reason to have owners for all lost partitions when we want to
> > > reset only some (available)?
> > >
> >
> > It's never were possible to reset only subset of lost partitions. The
> > reason is to make guarantee of resetLostPartitions method - all cache
> > operations are resumed, data is correct.
> >
> >
> > > Will it be possible to perform operations on non-lost partitions when
> the
> > > cluster has at least one lost partition?
> > >
> >
> > Yes it will be.
> >
> >
> > >
> > > On Wed, May 6, 2020 at 11:45 AM Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com> wrote:
> > >
> > > > Folks,
> > > >
> > > > I've almost finished a patch bringing some improvements to the data
> > loss
> > > > handling code, and I wish to discuss proposed changes with the
> > community
> > > > before submitting.
> > > >
> > > > *The issue*
> > > >
> > > > During the grid's lifetime, it's possible to get into a situation
> when
> > > some
> > > > data nodes have failed or mistakenly stopped. If a number of stopped
> > > nodes
> > > > exceeds a certain threshold depending on configured backups, count a
> > data
> > > > loss will occur. For example, a grid having one backup (meaning at
> > least
> > > > two copies of each data partition exist at the same time) can
> tolerate
> > > only
> > > > one node loss at the time. Generally, data loss is guaranteed to
> occur
> > if
> > > > backups + 1 or more nodes have failed simultaneously using default
> > > affinity
> > > > function.
> > > >
> > > > For in-memory caches, data is lost forever. For persistent caches,
> data
> > > is
> > > > not physically lost and accessible again after failed nodes are
> > returned
> > > to
> > > > the topology.
> > > >
> > > > Possible data loss should be taken into consideration while designing
> > an
> > > > application.
> > > >
> > > >
> > > >
> > > > *Consider an example: money is transferred from one deposit to
> another,
> >

Re: Extended logging for rebalance performance analysis

2020-05-06 Thread Alexei Scherbakov
ses: pr -
> > > primary, bu - backup, su - supplier node, h - historical, nodeId
> mapping
> > > (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest0]
> > > [1=rebalancing.RebalanceStatisticsTest2]
> > > [2=rebalancing.RebalanceStatisticsTest1]
> > >
> > > Interrupted rebalance of group cache.
> > > Rebalance information per cache group (interrupted rebalance):
> > > [id=644280849, name=default2, startTime=2020-04-13 14:55:24,969,
> > > finishTime=2020-04-13 14:55:24,969, d=0 ms, restarted=0]
> > >
> > > Total full and historical rebalance for all cache groups.
> > > Rebalance total information (including successful and not rebalances):
> > > [startTime=2020-04-13 10:55:18,726, finishTime=2020-04-13 10:55:18,780,
> > > d=54 ms] Supplier statistics: [nodeId=0, p=60, e=250, b=25000, d=54 ms]
> > > [nodeId=1, p=60, e=250, b=24945, d=54 ms] Aliases: p - partitions, e -
> > > entries, b - bytes, d - duration, h - historical, nodeId mapping
> > > (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1]
> > > [1=rebalancing.RebalanceStatisticsTest0]
> > > Rebalance total information (including successful and not rebalances):
> > > [startTime=2020-04-13 15:01:43,822, finishTime=2020-04-13 15:01:44,116,
> > > d=294 ms] Supplier statistics: [nodeId=0, hp=20, he=500, hb=50445,
> d=294
> > > ms] Aliases: p - partitions, e - entries, b - bytes, d - duration, h -
> > > historical, nodeId mapping (nodeId=id,consistentId)
> > > [0=rebalancing.RebalanceStatisticsTest0]
> > >
> > > [1] - https://issues.apache.org/jira/browse/IGNITE-12080
>


-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSS] Data loss handling improvements

2020-05-06 Thread Alexei Scherbakov
ср, 6 мая 2020 г. в 12:54, Anton Vinogradov :

> Alexei,
>
> 1,2,4,5 - looks good to me, no objections here.
>
> >> 3. Lost state is impossible to reset if a topology doesn't have at least
> >> one owner for each lost partition.
>
> Do you mean that, according to your example, where
> >> a node2 has left, soon a node3 has left. If the node2 is returned to
> >> the topology first, it would have stale data for some keys.
> we have to have node2 at cluster to be able to reset "lost" to node2's
> data?
>

Not sure if I understand a question, but try to answer using an example:
Assume 3 nodes n1, n2, n3, 1 backup, persistence enabled, partition p is
owned by n2 and n3.
1. Topology is activated.
2. cache.put(p, 0) // n2 and n3 have p->0, updateCounter=1
3. n2 has failed.
4. cache.put(p, 1) // n3 has p->1, updateCounter=2
5. n3 has failed, partition loss is happened.
6. n2 joins a topology, it has stale data (p->0)

We actually have 2 issues:
7. cache.put(p, 2) will success, n2 has p->2, n3 has p->0, data is diverged
and will not be adjusted by counters rebalancing if n3 is later joins a
topology.
or
8. n3 joins a topology, it has actual data (p->1) but rebalancing will not
work because joining node has highest counter (it can only be a demander in
this scenario).

In both cases rebalancing by counters will not work causing data divergence
in copies.


>
> >> at least one owner for each lost partition.
> What the reason to have owners for all lost partitions when we want to
> reset only some (available)?
>

It's never were possible to reset only subset of lost partitions. The
reason is to make guarantee of resetLostPartitions method - all cache
operations are resumed, data is correct.


> Will it be possible to perform operations on non-lost partitions when the
> cluster has at least one lost partition?
>

Yes it will be.


>
> On Wed, May 6, 2020 at 11:45 AM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > Folks,
> >
> > I've almost finished a patch bringing some improvements to the data loss
> > handling code, and I wish to discuss proposed changes with the community
> > before submitting.
> >
> > *The issue*
> >
> > During the grid's lifetime, it's possible to get into a situation when
> some
> > data nodes have failed or mistakenly stopped. If a number of stopped
> nodes
> > exceeds a certain threshold depending on configured backups, count a data
> > loss will occur. For example, a grid having one backup (meaning at least
> > two copies of each data partition exist at the same time) can tolerate
> only
> > one node loss at the time. Generally, data loss is guaranteed to occur if
> > backups + 1 or more nodes have failed simultaneously using default
> affinity
> > function.
> >
> > For in-memory caches, data is lost forever. For persistent caches, data
> is
> > not physically lost and accessible again after failed nodes are returned
> to
> > the topology.
> >
> > Possible data loss should be taken into consideration while designing an
> > application.
> >
> >
> >
> > *Consider an example: money is transferred from one deposit to another,
> and
> > all nodes holding data for one of the deposits are gone.In such a case, a
> > transaction temporary cannot be completed until a cluster is recovered
> from
> > the data loss state. Ignoring this can cause data inconsistency.*
> > It is necessary to have an API telling us if an operation is safe to
> > complete from the perspective of data loss.
> >
> > Such an API exists for some time [1] [2] [3]. In short, a grid can be
> > configured to switch caches to the partial availability mode if data loss
> > is detected.
> >
> > Let's give two definitions according to the Javadoc for
> > *PartitionLossPolicy*:
> >
> > ·   *Safe* (data loss handling) *policy* - cache operations are only
> > available for non-lost partitions (PartitionLossPolicy != IGNORE).
> >
> > ·   *Unsafe policy* - cache operations are always possible
> > (PartitionLossPolicy = IGNORE). If the unsafe policy is configured, lost
> > partitions automatically re-created on the remaining nodes if needed or
> > immediately owned if a last supplier has left during rebalancing.
> >
> > *That needs to be fixed*
> >
> > 1. The default loss policy is unsafe, even for persistent caches in the
> > current implementation. It can result in unintentional data loss and
> > business invariants' failure.
> >
> > 2. Node restarts in the persistent grid with detected data loss will
> cause
> > automatic resetting

[DISCUSS] Data loss handling improvements

2020-05-06 Thread Alexei Scherbakov
Folks,

I've almost finished a patch bringing some improvements to the data loss
handling code, and I wish to discuss proposed changes with the community
before submitting.

*The issue*

During the grid's lifetime, it's possible to get into a situation when some
data nodes have failed or mistakenly stopped. If a number of stopped nodes
exceeds a certain threshold depending on configured backups, count a data
loss will occur. For example, a grid having one backup (meaning at least
two copies of each data partition exist at the same time) can tolerate only
one node loss at the time. Generally, data loss is guaranteed to occur if
backups + 1 or more nodes have failed simultaneously using default affinity
function.

For in-memory caches, data is lost forever. For persistent caches, data is
not physically lost and accessible again after failed nodes are returned to
the topology.

Possible data loss should be taken into consideration while designing an
application.



*Consider an example: money is transferred from one deposit to another, and
all nodes holding data for one of the deposits are gone.In such a case, a
transaction temporary cannot be completed until a cluster is recovered from
the data loss state. Ignoring this can cause data inconsistency.*
It is necessary to have an API telling us if an operation is safe to
complete from the perspective of data loss.

Such an API exists for some time [1] [2] [3]. In short, a grid can be
configured to switch caches to the partial availability mode if data loss
is detected.

Let's give two definitions according to the Javadoc for
*PartitionLossPolicy*:

·   *Safe* (data loss handling) *policy* - cache operations are only
available for non-lost partitions (PartitionLossPolicy != IGNORE).

·   *Unsafe policy* - cache operations are always possible
(PartitionLossPolicy = IGNORE). If the unsafe policy is configured, lost
partitions automatically re-created on the remaining nodes if needed or
immediately owned if a last supplier has left during rebalancing.

*That needs to be fixed*

1. The default loss policy is unsafe, even for persistent caches in the
current implementation. It can result in unintentional data loss and
business invariants' failure.

2. Node restarts in the persistent grid with detected data loss will cause
automatic resetting of LOST state after the restart, even if the safe
policy is configured. It can result in data loss or partition desync if not
all nodes are returned to the topology or returned in the wrong order.


*An example: a grid has three nodes, one backup. The grid is under load.
First, a node2 has left, soon a node3 has left. If the node2 is returned to
the topology first, it would have stale data for some keys. Most recent
data are on node3, which is not in the topology yet. Because a lost state
was reset, all caches are fully available, and most probably will become
inconsistent even in safe mode.*
3. Configured loss policy doesn't provide guarantees described in the
Javadoc depending on the cluster configuration[4]. In particular, unsafe
policy (IGNORE) cannot be guaranteed if a baseline is fixed (not
automatically readjusted on node left), because partitions are not
automatically get reassigned on topology change, and no nodes are existing
to fulfill a read/write request. Same for READ_ONLY_ALL and READ_WRITE_ALL.

4. Calling resetLostPartitions doesn't provide a guarantee for full cache
operations availability if a topology doesn't have at least one owner for
each lost partition.

The ultimate goal of the patch is to fix API inconsistencies and fix the
most crucial bugs related to data loss handling.

*The planned changes are:*

1. The safe policy is used by default, except for in-memory grids with
enabled baseline auto-adjust [5] with zero timeout [6]. In the latter case,
the unsafe policy is used by default. It protects from unintentional data
loss.

2. Lost state is never reset in the case of grid nodes restart (despite
full restart). It makes real data loss impossible in persistent grids if
following the recovery instruction.

3. Lost state is impossible to reset if a topology doesn't have at least
one owner for each lost partition. If nodes are physically dead, they
should be removed from a baseline first before calling resetLostPartitions.

4. READ_WRITE_ALL, READ_ONLY_ALL is a subject for deprecation because their
guarantees are impossible to fulfill, not on the full baseline.

5. Any operation failed due to data loss contains
CacheInvalidStateException as a root cause.

In addition to code fixes, I plan to write a tutorial for safe data loss
recovery in the persistent mode in the Ignite wiki.

Any comments for the proposed changes are welcome.

[1]
org.apache.ignite.configuration.CacheConfiguration#setPartitionLossPolicy(PartitionLossPolicy
partLossPlc)
[2] org.apache.ignite.Ignite#resetLostPartitions(caches)
[3] org.apache.ignite.IgniteCache#lostPartitions
[4]  https://issues.apache.org/jira/browse/IGNITE-10041
[5] 

Re: About Rebalance Mode (SYNC & NONE)

2020-05-03 Thread Alexei Scherbakov
Hi.

You are correct, NONE is a legacy mode and should be removed [1]

Note the baseline can be used with in-memory caches as well. To make it
working you have to disable baseline auto-adjust or set non-zero
auto-adjust timeout.
This will prevent unnecessary rebalancing.

[1] https://issues.apache.org/jira/browse/IGNITE-11417

вс, 3 мая 2020 г. в 05:35, 18624049226 <18624049...@163.com>:

> Hi Community,
>
> For partition mode cache(without persistence enabled),If rebalanceMode
> is configured as NONE, rebalancing will actually occur when a new node
> is added,This does not match the function definition,i think this is a bug.
>
> But,for persistent cache, rebalancing is controlled by the baseline
> topology,For disabling persistent cache, it seems that rebalanceMode
> config to (SYNC & NONE) has no practical significance, so what is the
> value of this parameter design? Is this a legacy of history that is
> about to be abandoned?
>
>

-- 

Best regards,
Alexei Scherbakov


Re: [DISCUSSION] Major changes in Ignite in 2020

2020-04-11 Thread Alexei Scherbakov
Folks,

I keep working on tasks related to data consistency.

This includes:

Lost partitions handling overhaul (almost done) and tombstones support
aforementioned by Ivan Rakov
Atomic protocol overhaul (see [1])

The ultimate goal of the year is to prepare Ignite for passing Jepsen tests.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-12+Make+ATOMIC+Caches+Consistent+Again


пт, 10 апр. 2020 г. в 18:49, Denis Magda :

> Steven,
>
> Please start a dedicated discussion for the Golang support. At the moment,
> I'm not aware if anybody from the community planned to provide support
> out-of-the-box. However, that's not a tricky task thanks to Ignite's binary
> protocol that lets enable any programming language support easily.
>
> -
> Denis
>
>
> On Fri, Apr 10, 2020 at 8:43 AM smeadows-abb 
> wrote:
>
> > First thanks for your quick response.
> >
> > I looked at  https://github.com/amsokol/ignite-go-client and its NOT
> > completed and nothing has been for last 16 months. Initial test with
> > package
> > failed, so trying to determine your project roadmap with regards to
> Golang
> > and maybe Rust support.
> >
> > I'm NOT sure of 'AFAIK' ?
> >
> > We may need to implement your Restful API to provide support for Golang
> and
> > Rust, provided it's complete?
> >
> > Thanks,
> >   Steve
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


-- 

Best regards,
Alexei Scherbakov


Re: Security Subject of thin client on remote nodes

2020-03-24 Thread Alexei Scherbakov
l should perform actual authentication, next calls
> > should retrieve context of already authenticated client). Presence of the
> > separate #securityContext(UUID) highlights that user indeed should care
> > about propagation of thin clients' contexts between the cluster nodes.
> >
> > --
> > Ivan
> >
> > On Fri, Mar 20, 2020 at 12:22 PM Veena Mithare  >
> > wrote:
> >
> > > Hi Alexei, Denis,
> > >
> > > One of the main usecases of thin client authentication is to be able to
> > > audit the changes done using the thin client user.
> > > To enable that :
> > > We really need to resolve this concern as well :
> > > https://issues.apache.org/jira/browse/IGNITE-12781
> > >
> > > ( Incorrect security subject id is  associated with a cache_put event
> > > when the originator of the event is a thin client. )
> > >
> > > Regards,
> > > Veena
> > >
> > >
> > > -Original Message-
> > > From: Alexei Scherbakov 
> > > Sent: 18 March 2020 08:11
> > > To: dev 
> > > Subject: Re: Security Subject of thin client on remote nodes
> > >
> > > Denis Garus,
> > >
> > > Both variants are capable of solving the thin client security context
> > > problem.
> > >
> > > My approach doesn't require any IEPs, just minor change in code and to
> > >
> > >
> >
> org.apache.ignite.internal.processors.security.IgniteSecurity#authenticate(AuthenticationContext)
> > > contract.
> > > We can add appropriate documentation to emphasize this.
> > > The argument "fragile" is not very convincing for me.
> > >
> > > I think we should collect more opinions before proceeding with IEP.
> > >
> > > Considering a fact we actually *may not care* about compatibility (I've
> > > already explained why), I'm thinking of another approach.
> > > Let's get rid of SecurityContext and use SecuritySubject instead.
> > > SecurityContext is just a POJO wrapper over SecuritySubject's
> > > org.apache.ignite.plugin.security.SecuritySubject#permissions.
> > > It's functionality can be easily moved to SecuritySubject.
> > >
> > > What do you think?
> > >
> > >
> > >
> > > пн, 16 мар. 2020 г. в 15:47, Denis Garus :
> > >
> > > >  Hello, Alexei!
> > > >
> > > > I agree with you if we may not care about compatibility at all, then
> > > > we can solve the problem much more straightforward way.
> > > >
> > > > In your case, the method GridSecurityProcessor#authenticate will have
> > > > an implicit contract:
> > > > [ if actx.subject() != null then
> > > >   returns SecurityContext
> > > > else
> > > >   do authenticate ]
> > > >
> > > > It looks fragile.
> > > >
> > > > When we extend the GridSecurityProcessor, there isn't this problem:
> > > > we have the explicit contract and can make default implementation
> that
> > > > throws an unsupported operation exception to enforcing compatibility
> > > > check.
> > > >
> > > > In any case, we need to change GridSecurityProcessor implementation.
> > > >
> > > > But I think your proposal to try to find a security context in the
> > > > node's attributes first is right for backward compatibility when
> > > > Ignite users don't use thin clients.
> > > >
> > > > Summary:
> > > > I suggest adding a new method to GridSecurityProcessor because it has
> > > > a clear contract and enforces compatibility check natural way.
> > > >
> > > > вс, 15 мар. 2020 г. в 17:13, Alexei Scherbakov <
> > > > alexey.scherbak...@gmail.com
> > > > >:
> > > >
> > > > > Denis Garus,
> > > > >
> > > > > I've looked at the IEP proposed by you and currently I'm thinking
> > > > > it's
> > > > not
> > > > > immediately required.
> > > > >
> > > > > The problem of missing SecurityContexts of thin clients can be
> > > > > solved
> > > > much
> > > > > easily.
> > > > >
> > > > > Below is the stub of a fix, it requires correct implementation of
> > > > > method
> > > > >
> > > >
> org.apache.ignite.internal.

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-24 Thread Alexei Scherbakov
y requires --yes) and JMX.
> >
> > Thoughts?
> >
> > [1]: https://issues.apache.org/jira/browse/IGNITE-12614
> > [2]: https://issues.apache.org/jira/browse/IGNITE-12701
> >
> > --
> > Ivan
> >
> >
> > On Tue, Mar 17, 2020 at 2:26 PM Vladimir Steshin 
> wrote:
> >
> >> Nikolay, I think we should reconsider clearing at least system caches
> >> when deactivating.
> >>
> >> 17.03.2020 14:18, Nikolay Izhikov пишет:
> >>> Hello, Vladimir.
> >>>
> >>> I don’t get it.
> >>>
> >>> What is your proposal?
> >>> What we should do?
> >>>
> >>>> 17 марта 2020 г., в 14:11, Vladimir Steshin 
> >> написал(а):
> >>>>
> >>>> Nikolay, hi.
> >>>>
> >>>>>>> And should be covered with the  —force parameter we added.
> >>>> As fix for user cases - yes. My idea is to emphasize overall ability
> to
> >> lose various objects, not only data. Probably might be reconsidered in
> >> future.
> >>>>
> >>>>
> >>>> 17.03.2020 13:49, Nikolay Izhikov пишет:
> >>>>> Hello, Vladimir.
> >>>>>
> >>>>> If there is at lease one persistent data region then system data
> >> region also becomes persistent.
> >>>>> Your example applies only to pure in-memory clusters.
> >>>>>
> >>>>> And should be covered with the —force parameter we added.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>>> 17 марта 2020 г., в 13:45, Vladimir Steshin 
> >> написал(а):
> >>>>>>
> >>>>>> Hi, all.
> >>>>>>
> >>>>>> Fixes for control.sh and the REST have been merged. Could anyone
> take
> >> a look to the previous email with an issue? Isn't this conductvery
> wierd?
> >>>>>>
> >>
>
>

-- 

Best regards,
Alexei Scherbakov


Re: Security Subject of thin client on remote nodes

2020-03-18 Thread Alexei Scherbakov
Denis Garus,

Both variants are capable of solving the thin client security context
problem.

My approach doesn't require any IEPs, just minor change in code and to
org.apache.ignite.internal.processors.security.IgniteSecurity#authenticate(AuthenticationContext)
contract.
We can add appropriate documentation to emphasize this.
The argument "fragile" is not very convincing for me.

I think we should collect more opinions before proceeding with IEP.

Considering a fact we actually *may not care* about compatibility (I've
already explained why), I'm thinking of another approach.
Let's get rid of SecurityContext and use SecuritySubject instead.
SecurityContext is just a POJO wrapper over
SecuritySubject's org.apache.ignite.plugin.security.SecuritySubject#permissions.
It's functionality can be easily moved to SecuritySubject.

What do you think?



пн, 16 мар. 2020 г. в 15:47, Denis Garus :

>  Hello, Alexei!
>
> I agree with you if we may not care about compatibility at all,
> then we can solve the problem much more straightforward way.
>
> In your case, the method GridSecurityProcessor#authenticate will have an
> implicit contract:
> [ if actx.subject() != null then
>   returns SecurityContext
> else
>   do authenticate ]
>
> It looks fragile.
>
> When we extend the GridSecurityProcessor, there isn't this problem:
> we have the explicit contract and can make default implementation
> that throws an unsupported operation exception to enforcing compatibility
> check.
>
> In any case, we need to change GridSecurityProcessor implementation.
>
> But I think your proposal to try to find a security context in the node's
> attributes first is right
> for backward compatibility when Ignite users don't use thin clients.
>
> Summary:
> I suggest adding a new method to GridSecurityProcessor because
> it has a clear contract and enforces compatibility check natural way.
>
> вс, 15 мар. 2020 г. в 17:13, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Denis Garus,
> >
> > I've looked at the IEP proposed by you and currently I'm thinking it's
> not
> > immediately required.
> >
> > The problem of missing SecurityContexts of thin clients can be solved
> much
> > easily.
> >
> > Below is the stub of a fix, it requires correct implementation of
> > method
> >
> org.apache.ignite.internal.processors.security.IgniteSecurityProcessor#authenticatedSubject
> > by GridSecurityProcessor:
> >
> > /** {@inheritDoc} */
> > @Override public OperationSecurityContext withContext(UUID nodeId) {
> > try {
> > SecurityContext ctx0 = secCtxs.get(nodeId);
> >
> > if (ctx0 == null) {
> > ClusterNode node =
> > Optional.ofNullable(ctx.discovery().node(nodeId))
> > .orElseGet(() ->
> > ctx.discovery().historicalNode(nodeId));
> >
> > // This is a cluster node.
> > if (node != null)
> > ctx0 = nodeSecurityContext(marsh,
> > U.resolveClassLoader(ctx.config()), findNode(nodeId));
> > else {
> > // This is already authenticated thin client.
> > SecuritySubject subj = authenticatedSubject(nodeId);
> >
> > assert subj != null : "Subject is null " + nodeId;
> >
> > AuthenticationContext actx = new
> > AuthenticationContext();
> > actx.subject(subj);
> >
> > ctx0 = secPrc.authenticate(actx);
> > }
> > }
> >
> > secCtxs.putIfAbsent(nodeId, ctx0);
> >
> > return withContext(ctx0);
> > } catch (IgniteCheckedException e) {
> > throw new IgniteException(e);
> > }
> >
> > The idea is to create a thin client SecurityContext on a node not having
> a
> > local context using existing SecuritySubject data.
> >
> > Method
> >
> org.apache.ignite.internal.processors.security.GridSecurityProcessor#authenticate
> > should check for not null SecuritySubject field and just recreate
> > SecurityContext using passed info (because it's already authenticated).
> >
> > We have all necessary information in SecuritySubject returned by
> >
> >
> org.apache.ignite.internal.processors.security.IgniteSecurityProcessor#authenticatedSubject
> > by GridSecurityProcessor method.
> >
> > Because it is internal API,  we may not care about compatibility at all,
> > but nevertheless it is possible

  1   2   3   4   >