[MTCGA]: new failures in builds [5716164] needs to be handled

2020-11-06 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 If your changes can lead to this failure(s): We're grateful that you were a 
volunteer to make the contribution to this project, but things change and you 
may no longer be able to finalize your contribution.
 Could you respond to this email and indicate if you wish to continue and fix 
test failures or step down and some committer may revert you commit. 

 *New test failure in master-nightly 
IgnitePersistentStoreDataStructuresTest.testLatchVolatility 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2498689540135370176=%3Cdefault%3E=testDetails

 *New test failure in master-nightly 
IgnitePersistentStoreDataStructuresTest.testLockVolatility 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=8536744125057342252=%3Cdefault%3E=testDetails

 *New test failure in master-nightly 
IgnitePersistentStoreDataStructuresTest.testSemaphoreVolatility 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-8607160794826656046=%3Cdefault%3E=testDetails
 Changes may lead to failure were done by 
 - zstan  
https://ci.ignite.apache.org/viewModification.html?modId=909509

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 06:52:42 07-11-2020 


[MTCGA]: new failures in builds [5709314] needs to be handled

2020-11-06 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 *Test with high flaky rate in master WebSessionSelfTest.testRestarts 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=6720374228021379378=%3Cdefault%3E=testDetails
 No changes in the build

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 05:22:46 07-11-2020 


[jira] [Created] (IGNITE-13685) Flaky failure of FunctionalTest.testOptimitsticRepeatableReadUpdatesValue

2020-11-06 Thread Aleksey Plekhanov (Jira)
Aleksey Plekhanov created IGNITE-13685:
--

 Summary: Flaky failure of 
FunctionalTest.testOptimitsticRepeatableReadUpdatesValue
 Key: IGNITE-13685
 URL: https://issues.apache.org/jira/browse/IGNITE-13685
 Project: Ignite
  Issue Type: Bug
  Components: thin client
Reporter: Aleksey Plekhanov
Assignee: Aleksey Plekhanov


Test FunctionalTest.testOptimitsticRepeatableReadUpdatesValue is flaky
Root cause: {{get()}} method on {{ForkJoinTask}} sometimes can help with the 
execution of the task in common {{ForkJoinPool}}, so task is executed in the 
current thread, that already holds a transaction, end {{cache.get()}} method 
returns a value relative to this transaction.
See sack trace:

{noformat}
java.util.concurrent.ExecutionException: org.junit.ComparisonFailure: 
expected: but was:
  at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006)
  at 
org.apache.ignite.client.FunctionalTest.testOptimitsticRepeatableReadUpdatesValue(FunctionalTest.java:719)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.junit.ComparisonFailure: expected: but was:
  at org.junit.Assert.assertEquals(Assert.java:115)
  at org.junit.Assert.assertEquals(Assert.java:144)
  at 
org.apache.ignite.client.FunctionalTest.lambda$testOptimitsticRepeatableReadUpdatesValue$10(FunctionalTest.java:712)
  at 
java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  at 
java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:381)
  at 
java.base/java.util.concurrent.ForkJoinTask.externalInterruptibleAwaitDone(ForkJoinTask.java:351)
  at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1004)
  ... 13 more
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Ignite 3.0 development approach

2020-11-06 Thread Kseniya Romanova
Here are the slides from Alexey Goncharuk. Let's think this over and
continue on Monday:
https://go.gridgain.com/rs/491-TWR-806/images/Ignite_3_Plans_and_development_process.pdf

чт, 5 нояб. 2020 г. в 11:13, Anton Vinogradov :

> Folks,
>
> Should we perform cleanup work before (r)evolutional changes?
> My huge proposal is to get rid of things which we don't need anyway
> - local caches,
> - strange tx modes,
> - code overcomplexity because of RollingUpgrade feature never attended at
> AI,
> - etc,
> before choosing the way.
>
> On Tue, Nov 3, 2020 at 3:31 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Ksenia, thanks for scheduling this on such short notice!
> >
> > As for the original topic, I do support Alexey's idea. We're not going to
> > rewrite anything from scratch, as most of the components are going to be
> > moved as-is or with minimal modifications. However, the changes that are
> > proposed imply serious rework of the core parts of the code, which are
> not
> > properly decoupled from each other and from other parts. This makes the
> > incremental approach borderline impossible. Developing in a new repo,
> > however, addresses this concern. As a bonus, we can also refactor the
> code,
> > introduce better decoupling, get rid of kernel context, and develop unit
> > tests (finally!).
> >
> > Basically, this proposal only affects the *process*, not the set of
> changes
> > we had discussed before. Ignite 3.0 is our unique chance to make things
> > right.
> >
> > -Val
> >
> > On Tue, Nov 3, 2020 at 3:06 AM Kseniya Romanova <
> romanova.ks@gmail.com
> > >
> > wrote:
> >
> > > Pavel, all the interesting points will be anyway published here in
> > English
> > > (as the principal "if it's not on devlist it doesn't happened" is still
> > > relevant). This is just a quick call for a group of developers. Later
> we
> > > can do a separate presentation of idea and discussion in English as we
> > did
> > > for the Ignite 3.0 draft of changes.
> > >
> > > вт, 3 нояб. 2020 г. в 13:52, Pavel Tupitsyn :
> > >
> > > > Kseniya,
> > > >
> > > > Thanks for scheduling this call.
> > > > Do you think we can switch to English if non-Russian speaking
> community
> > > > members decide to join?
> > > >
> > > > On Tue, Nov 3, 2020 at 1:32 PM Kseniya Romanova <
> > > romanova.ks@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Let's do this community discussion open. Here's the link on zoom
> call
> > > in
> > > > > Russian for Friday 6 PM:
> > > > >
> https://www.meetup.com/Moscow-Apache-Ignite-Meetup/events/274360378/
> > > > >
> > > > > вт, 3 нояб. 2020 г. в 12:49, Nikolay Izhikov  >:
> > > > >
> > > > > > Time works for me.
> > > > > >
> > > > > > > 3 нояб. 2020 г., в 12:40, Alexey Goncharuk <
> > > > alexey.goncha...@gmail.com
> > > > > >
> > > > > > написал(а):
> > > > > > >
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > I am up for the call. I will try to explain my reasoning in
> > greater
> > > > > > detail
> > > > > > > and will be glad to hear the concerns. Will this Friday, Nov
> 6th,
> > > > work?
> > > > > > >
> > > > > > > вт, 3 нояб. 2020 г. в 10:09, Nikolay Izhikov <
> > nizhi...@apache.org
> > > >:
> > > > > > >
> > > > > > >> Igniters, should we have a call for this topic?
> > > > > > >>
> > > > > > >>> 2 нояб. 2020 г., в 18:53, Pavel Tupitsyn <
> ptupit...@apache.org
> > >
> > > > > > >> написал(а):
> > > > > > >>>
> > > > > >  not intend to rewrite everything from scratch
> > > > > > >>>
> > > > > >  Every single test from Ignite 2.x should be moved to Ignite
> 3
> > > > > >  regardless of how we choose to proceed.
> > > > > > >>>
> > > > > > >>> Alexey, thank you for the explanation, this addresses all of
> my
> > > > > > concerns.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Mon, Nov 2, 2020 at 6:43 PM Andrey Mashenkov <
> > > > > > >> andrey.mashen...@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > >  Hi, Igniters.
> > > > > > 
> > > > > >  * AFAIU, we need a new repo if we want to apply different
> > > > > restrictions
> > > > > > >> to
> > > > > >  pull requests,
> > > > > >  otherwise I see no difference for myself.
> > > > > >  E.g. make static analysis (do we have?), compile, styles,
> and
> > > > > javadoc
> > > > > >  checks mandatory.
> > > > > > 
> > > > > >  I think that relaxed requirements here will lead to bad
> > product
> > > > > > quality.
> > > > > > 
> > > > > >  * Agree with Pavel, we should 'keep' integrations tests
> > somehow.
> > > > > >  During active development tests will be broken most of time,
> > so,
> > > > > >  I'd port them e.g. suite-by-suite once we will have a stable
> > and
> > > > > > >> featured
> > > > > >  environment to run them and of course make test's code clear
> > and
> > > > > avoid
> > > > > >  bad/non-relevant ones.
> > > > > > 
> > > > > >  * I like bottom-up 

[DISCUSSION] Apache Ignite Release 2.10 (time, scope, manager)

2020-11-06 Thread Maxim Muzafarov
Igniters,


Let's finalize the discussion [1] about the next upcoming major Apache
Ignite 2.10  release. The major improvements related to the proposed
release:
- Improvements for partition clearing related parts
- Add tracing of SQL queries.
- CPP: Implement Cluster API
- .NET: Thin client: Transactions
- .NET: Thin Client: Continuous Query
- Java Thin client Kubernetes discovery

etc.
Total: 166 RESOLVED issues [2].


Let's start the discussion about Time and Scope, and also I propose
myself as the release manager of the Apache Ignite 2.10. If you'd like
to lead this release, please, let us know, I see no problems to chose
a better candidate.


Proposed release timeline:

Scope Freeze: December 10, 2020
Code Freeze: December 24, 2020
Voting Date: January 18, 2021
Release Date: January 25, 2021


Proposed release scope:
[2]


WDYT?


[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/2-9-1-release-proposal-tp49769p49867.html
[2] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.10%20and%20status%20in%20(Resolved%2C%20Closed)%20and%20resolution%20%3D%20Fixed%20order%20by%20priority%20


[jira] [Created] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13684:
--

 Summary: Rewrite PageIo resolver from static to explicit dependency
 Key: IGNITE-13684
 URL: https://issues.apache.org/jira/browse/IGNITE-13684
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Ivan Bessonov


Right now, ignite has a static pageIo resolver which not allow substituting the 
different implementation if needed. So it is needed to rewrite the current 
implementation in order of this target.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13683) Added MVCC validation to ValidateIndexesClosure

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13683:
--

 Summary: Added MVCC validation to ValidateIndexesClosure
 Key: IGNITE-13683
 URL: https://issues.apache.org/jira/browse/IGNITE-13683
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Semyon Danilov


MVCC indexes validation should be added to ValidateIndexesClosure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13682) Added generic to maintenance mode feature

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13682:
--

 Summary: Added generic to maintenance mode feature
 Key: IGNITE-13682
 URL: https://issues.apache.org/jira/browse/IGNITE-13682
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


MaintenanceAction has no generic right now which lead to parametirezed problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13681) Non markers checkpoint implementation

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13681:
--

 Summary: Non markers checkpoint implementation
 Key: IGNITE-13681
 URL: https://issues.apache.org/jira/browse/IGNITE-13681
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It's needed to implement a new version of checkpoint which will be simpler than 
the current one. The main differences compared to the current checkpoint:
* It doesn't contain any write operation to WAL.
* It doesn't create checkpoint markers.
* It should be possible to configure checkpoint listener only on the exact data 
region
This checkpoint will be helpful for defragmentation and for recovery(it is not 
possible to use the current checkpoint during recovery right now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Why WAL archives enabled by default?

2020-11-06 Thread Ivan Daschinsky
Alex, thanks for pointing that out. Shame that I missed it.

пт, 6 нояб. 2020 г. в 13:45, Alex Plehanov :

> Guys,
>
> We already have FileWriteAheadLogManager#maxSegCountWithoutCheckpoint.
> Checkpoint triggered if there are too many WAL segments without checkpoint.
> Looks like you are talking about this feature.
>
> пт, 6 нояб. 2020 г. в 13:21, Ivan Daschinsky :
>
> > Kirill and I discussed privately proposed approach. As far as I
> understand,
> > Kirill suggests to implement some
> > heuristic to do a force checkpoint in some cases if user by mistake
> > misconfigured cluster in order to preserve
> > requested size of WAL archive.
> > Currently, as for me, this approach is questionable, because it can cause
> > some performance problems. But as an option,
> > it can be used and should be switchable.
> >
> > пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky :
> >
> > > Kirill, how your approach will help if user tuned a cluster to do
> > > checkpoints rarely under load?
> > > No way.
> > >
> > > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл :
> > >
> > >> Ivan, I agree with you that the archive is primarily about
> optimization.
> > >>
> > >> If the size of the archive is critical for the user, we have no
> > >> protection against this, we can always go beyond this limit.
> > >> Thus, the user needs to remember this and configure it in some way.
> > >>
> > >> I suggest not to exceed this limit and give the expected behavior for
> > the
> > >> user. At the same time, the segments needed for recovery will remain
> and
> > >> there will be no data loss.
> > >>
> > >> 06.11.2020, 11:29, "Ivan Daschinsky" :
> > >> > Guys, fisrt of all, archiving is not for PITR at all, this is
> > >> optimization.
> > >> > If we disable archiving, every rollover we need to create new file.
> If
> > >> we
> > >> > enable archiving, we reserve 10 (by default) segments filled with
> > >> zeroes.
> > >> > We use mmap by default, so if we use no-archiver approach:
> > >> > 1. We firstly create new empty file
> > >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
> > >> > a. If file is shorter, than wal segment size, it
> > >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the
> hood
> > >> just
> > >> > a system call truncate [1]
> > >> > b. Than it calls system call mmap on this
> > >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
> > >> > These manipulation are not free and cheap. So rollover will be much
> > much
> > >> > slower.
> > >> > If archiving is enabled, 10 segments are already preallocated at the
> > >> moment
> > >> > of node's start.
> > >> >
> > >> > When archiving is enabled, archiver just copy previous preallocated
> > >> segment
> > >> > and move it to archive directory.
> > >> > This archived segment is crucial for recovery. When new checkpoints
> > >> > finished, all eligible for trunocating segments are just removed.
> > >> >
> > >> > If archiving is disabled, we also write WAL segments in wal
> directory
> > >> and
> > >> > disabling archiving don't prevent you from storing segments, if they
> > are
> > >> > required for recovery.
> > >> >
> > >> >>> Before increasing the size of WAL archive (transferring to archive
> > >> >
> > >> > /rollOver, compression, decompression), we can make sure that there
> > >> will be
> > >> > enough space in the archive and if there is no such, then we will
> try
> > to
> > >> >>> clean it. We cannot delete those segments that are required for
> > >> recovery
> > >> >
> > >> > (between the last two checkpoints) and reserved for example for
> > >> historical
> > >> > rebalancing.
> > >> > First of all, compression/decompression is offtopic here.
> > >> > Secondly, wal segments are required only with idx higher than LAST
> > >> > checkpoint marker.
> > >> > Thirdly, archiving and rolling over can be during checkpoint and we
> > can
> > >> > broke everything accidentially.
> > >> > Fourthly, I see no benefits to overcomplicated already complicated
> > >> logic.
> > >> > This is basically problem of misunderstanding and tuning.
> > >> > There are a lot of similar topics for almost every DB. [3]
> > >> >
> > >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
> > >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
> > >> > [3] --
> > >> >
> > >>
> >
> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
> > >> >
> > >> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл  >:
> > >> >
> > >> >>  Hi, Ivan!
> > >> >>
> > >> >>  I have only described ideas. But here are a few more details.
> > >> >>
> > >> >>  We can take care not to go beyond
> > >> >>  DataStorageConfiguration#maxWalArchiveSize.
> > >> >>
> > >> >>  Before increasing the size of WAL archive (transferring to archive
> > >> >>  /rollOver, compression, decompression), we can make sure that
> there
> > >> will be
> > >> >>  enough space in the archive and if there is no such, then we will
> > try
> > >> to
> > >> >>  

Re: Why WAL archives enabled by default?

2020-11-06 Thread Alex Plehanov
Guys,

We already have FileWriteAheadLogManager#maxSegCountWithoutCheckpoint.
Checkpoint triggered if there are too many WAL segments without checkpoint.
Looks like you are talking about this feature.

пт, 6 нояб. 2020 г. в 13:21, Ivan Daschinsky :

> Kirill and I discussed privately proposed approach. As far as I understand,
> Kirill suggests to implement some
> heuristic to do a force checkpoint in some cases if user by mistake
> misconfigured cluster in order to preserve
> requested size of WAL archive.
> Currently, as for me, this approach is questionable, because it can cause
> some performance problems. But as an option,
> it can be used and should be switchable.
>
> пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky :
>
> > Kirill, how your approach will help if user tuned a cluster to do
> > checkpoints rarely under load?
> > No way.
> >
> > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл :
> >
> >> Ivan, I agree with you that the archive is primarily about optimization.
> >>
> >> If the size of the archive is critical for the user, we have no
> >> protection against this, we can always go beyond this limit.
> >> Thus, the user needs to remember this and configure it in some way.
> >>
> >> I suggest not to exceed this limit and give the expected behavior for
> the
> >> user. At the same time, the segments needed for recovery will remain and
> >> there will be no data loss.
> >>
> >> 06.11.2020, 11:29, "Ivan Daschinsky" :
> >> > Guys, fisrt of all, archiving is not for PITR at all, this is
> >> optimization.
> >> > If we disable archiving, every rollover we need to create new file. If
> >> we
> >> > enable archiving, we reserve 10 (by default) segments filled with
> >> zeroes.
> >> > We use mmap by default, so if we use no-archiver approach:
> >> > 1. We firstly create new empty file
> >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
> >> > a. If file is shorter, than wal segment size, it
> >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood
> >> just
> >> > a system call truncate [1]
> >> > b. Than it calls system call mmap on this
> >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
> >> > These manipulation are not free and cheap. So rollover will be much
> much
> >> > slower.
> >> > If archiving is enabled, 10 segments are already preallocated at the
> >> moment
> >> > of node's start.
> >> >
> >> > When archiving is enabled, archiver just copy previous preallocated
> >> segment
> >> > and move it to archive directory.
> >> > This archived segment is crucial for recovery. When new checkpoints
> >> > finished, all eligible for trunocating segments are just removed.
> >> >
> >> > If archiving is disabled, we also write WAL segments in wal directory
> >> and
> >> > disabling archiving don't prevent you from storing segments, if they
> are
> >> > required for recovery.
> >> >
> >> >>> Before increasing the size of WAL archive (transferring to archive
> >> >
> >> > /rollOver, compression, decompression), we can make sure that there
> >> will be
> >> > enough space in the archive and if there is no such, then we will try
> to
> >> >>> clean it. We cannot delete those segments that are required for
> >> recovery
> >> >
> >> > (between the last two checkpoints) and reserved for example for
> >> historical
> >> > rebalancing.
> >> > First of all, compression/decompression is offtopic here.
> >> > Secondly, wal segments are required only with idx higher than LAST
> >> > checkpoint marker.
> >> > Thirdly, archiving and rolling over can be during checkpoint and we
> can
> >> > broke everything accidentially.
> >> > Fourthly, I see no benefits to overcomplicated already complicated
> >> logic.
> >> > This is basically problem of misunderstanding and tuning.
> >> > There are a lot of similar topics for almost every DB. [3]
> >> >
> >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
> >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
> >> > [3] --
> >> >
> >>
> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
> >> >
> >> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл :
> >> >
> >> >>  Hi, Ivan!
> >> >>
> >> >>  I have only described ideas. But here are a few more details.
> >> >>
> >> >>  We can take care not to go beyond
> >> >>  DataStorageConfiguration#maxWalArchiveSize.
> >> >>
> >> >>  Before increasing the size of WAL archive (transferring to archive
> >> >>  /rollOver, compression, decompression), we can make sure that there
> >> will be
> >> >>  enough space in the archive and if there is no such, then we will
> try
> >> to
> >> >>  clean it. We cannot delete those segments that are required for
> >> recovery
> >> >>  (between the last two checkpoints) and reserved for example for
> >> historical
> >> >>  rebalancing.
> >> >>
> >> >>  We can receive a notification about the change of checkpoints and
> the
> >> >>  reservation / release of segments, thus we can know how many
> segments
> 

Re: Why WAL archives enabled by default?

2020-11-06 Thread Ivan Daschinsky
Kirill and I discussed privately proposed approach. As far as I understand,
Kirill suggests to implement some
heuristic to do a force checkpoint in some cases if user by mistake
misconfigured cluster in order to preserve
requested size of WAL archive.
Currently, as for me, this approach is questionable, because it can cause
some performance problems. But as an option,
it can be used and should be switchable.

пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky :

> Kirill, how your approach will help if user tuned a cluster to do
> checkpoints rarely under load?
> No way.
>
> пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл :
>
>> Ivan, I agree with you that the archive is primarily about optimization.
>>
>> If the size of the archive is critical for the user, we have no
>> protection against this, we can always go beyond this limit.
>> Thus, the user needs to remember this and configure it in some way.
>>
>> I suggest not to exceed this limit and give the expected behavior for the
>> user. At the same time, the segments needed for recovery will remain and
>> there will be no data loss.
>>
>> 06.11.2020, 11:29, "Ivan Daschinsky" :
>> > Guys, fisrt of all, archiving is not for PITR at all, this is
>> optimization.
>> > If we disable archiving, every rollover we need to create new file. If
>> we
>> > enable archiving, we reserve 10 (by default) segments filled with
>> zeroes.
>> > We use mmap by default, so if we use no-archiver approach:
>> > 1. We firstly create new empty file
>> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
>> > a. If file is shorter, than wal segment size, it
>> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood
>> just
>> > a system call truncate [1]
>> > b. Than it calls system call mmap on this
>> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
>> > These manipulation are not free and cheap. So rollover will be much much
>> > slower.
>> > If archiving is enabled, 10 segments are already preallocated at the
>> moment
>> > of node's start.
>> >
>> > When archiving is enabled, archiver just copy previous preallocated
>> segment
>> > and move it to archive directory.
>> > This archived segment is crucial for recovery. When new checkpoints
>> > finished, all eligible for trunocating segments are just removed.
>> >
>> > If archiving is disabled, we also write WAL segments in wal directory
>> and
>> > disabling archiving don't prevent you from storing segments, if they are
>> > required for recovery.
>> >
>> >>> Before increasing the size of WAL archive (transferring to archive
>> >
>> > /rollOver, compression, decompression), we can make sure that there
>> will be
>> > enough space in the archive and if there is no such, then we will try to
>> >>> clean it. We cannot delete those segments that are required for
>> recovery
>> >
>> > (between the last two checkpoints) and reserved for example for
>> historical
>> > rebalancing.
>> > First of all, compression/decompression is offtopic here.
>> > Secondly, wal segments are required only with idx higher than LAST
>> > checkpoint marker.
>> > Thirdly, archiving and rolling over can be during checkpoint and we can
>> > broke everything accidentially.
>> > Fourthly, I see no benefits to overcomplicated already complicated
>> logic.
>> > This is basically problem of misunderstanding and tuning.
>> > There are a lot of similar topics for almost every DB. [3]
>> >
>> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
>> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
>> > [3] --
>> >
>> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
>> >
>> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл :
>> >
>> >>  Hi, Ivan!
>> >>
>> >>  I have only described ideas. But here are a few more details.
>> >>
>> >>  We can take care not to go beyond
>> >>  DataStorageConfiguration#maxWalArchiveSize.
>> >>
>> >>  Before increasing the size of WAL archive (transferring to archive
>> >>  /rollOver, compression, decompression), we can make sure that there
>> will be
>> >>  enough space in the archive and if there is no such, then we will try
>> to
>> >>  clean it. We cannot delete those segments that are required for
>> recovery
>> >>  (between the last two checkpoints) and reserved for example for
>> historical
>> >>  rebalancing.
>> >>
>> >>  We can receive a notification about the change of checkpoints and the
>> >>  reservation / release of segments, thus we can know how many segments
>> we
>> >>  can delete right now.
>> >>
>> >>  06.11.2020, 09:53, "Ivan Daschinsky" :
>> >>  >>> For example, when trying to move a segment to the archive.
>> >>  >
>> >>  > We cannot do this, we will lost data. We can truncate archived
>> segment if
>> >>  > and only if it is not required for recovery. If last checkpoint
>> marker
>> >>  > points to segment
>> >>  > with lower index, we cannot delete any segment with higher index.
>> So the
>> >>  > only moment where we can remove 

Re: [DISCUSS] Disable socket linger by default in TCP discovery SPI.

2020-11-06 Thread Steshin Vladimir

    The tickets are: [1] disables linger by default and [2] is the doc.


[1] https://issues.apache.org/jira/browse/IGNITE-13643

[2] https://issues.apache.org/jira/browse/IGNITE-13662

05.11.2020 11:00, Anton Vinogradov пишет:

Folks,
Seems, we've got an agreement that the fix is necessary.
Do we need to do except the following?

zero linger as default + warning on SSL enabled on JVM before the fix +

warning at documentation + migration notes

On Tue, Nov 3, 2020 at 2:38 PM Steshin Vladimir  wrote:


  Ilya, hi.


  Of course: /TcpDiscoverySpi.setSoLinger(int)/ property. Always been.


02.11.2020 20:14, Ilya Kasnacheev пишет:

Hello!

Is there any option to re-enable linger on SSL sockets?

Telling people to re-configure does not help if they can't.

Regards,


Re: delete is too slow, sometimes even causes OOM

2020-11-06 Thread Юрий
Hi Frank!

There is an old ticket [1] - We will try to prioritize it to finish before
the end of the year it should prevent OOM for most cases.

[1] https://issues.apache.org/jira/browse/IGNITE-9182

вт, 3 нояб. 2020 г. в 18:53, frank li :

> Current code logic for DELETE is as follows:
> if WHERE clause contains a condition as "key=xxx", it uses fastUpdate
> which remove the related item directly.
>
> else
> do select for update;
> for each row, call closure code "RMV" to remove it.
>
> 1. As "executeSelectForDml" get _KEY and _VAL columns for all condidate
> rows, it often causes OOM when there are a lot of data  to delete. Why do
> we verify "val" during remove operation?
>
> 2. After selection,  why don't we just remove it with cache.remove as
> fastUpdate does?
>
>
>

-- 
Живи с улыбкой! :D


Re: Why WAL archives enabled by default?

2020-11-06 Thread Ivan Daschinsky
Kirill, how your approach will help if user tuned a cluster to do
checkpoints rarely under load?
No way.

пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл :

> Ivan, I agree with you that the archive is primarily about optimization.
>
> If the size of the archive is critical for the user, we have no protection
> against this, we can always go beyond this limit.
> Thus, the user needs to remember this and configure it in some way.
>
> I suggest not to exceed this limit and give the expected behavior for the
> user. At the same time, the segments needed for recovery will remain and
> there will be no data loss.
>
> 06.11.2020, 11:29, "Ivan Daschinsky" :
> > Guys, fisrt of all, archiving is not for PITR at all, this is
> optimization.
> > If we disable archiving, every rollover we need to create new file. If we
> > enable archiving, we reserve 10 (by default) segments filled with zeroes.
> > We use mmap by default, so if we use no-archiver approach:
> > 1. We firstly create new empty file
> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
> > a. If file is shorter, than wal segment size, it
> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood
> just
> > a system call truncate [1]
> > b. Than it calls system call mmap on this
> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
> > These manipulation are not free and cheap. So rollover will be much much
> > slower.
> > If archiving is enabled, 10 segments are already preallocated at the
> moment
> > of node's start.
> >
> > When archiving is enabled, archiver just copy previous preallocated
> segment
> > and move it to archive directory.
> > This archived segment is crucial for recovery. When new checkpoints
> > finished, all eligible for trunocating segments are just removed.
> >
> > If archiving is disabled, we also write WAL segments in wal directory and
> > disabling archiving don't prevent you from storing segments, if they are
> > required for recovery.
> >
> >>> Before increasing the size of WAL archive (transferring to archive
> >
> > /rollOver, compression, decompression), we can make sure that there will
> be
> > enough space in the archive and if there is no such, then we will try to
> >>> clean it. We cannot delete those segments that are required for
> recovery
> >
> > (between the last two checkpoints) and reserved for example for
> historical
> > rebalancing.
> > First of all, compression/decompression is offtopic here.
> > Secondly, wal segments are required only with idx higher than LAST
> > checkpoint marker.
> > Thirdly, archiving and rolling over can be during checkpoint and we can
> > broke everything accidentially.
> > Fourthly, I see no benefits to overcomplicated already complicated logic.
> > This is basically problem of misunderstanding and tuning.
> > There are a lot of similar topics for almost every DB. [3]
> >
> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
> > [3] --
> >
> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
> >
> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл :
> >
> >>  Hi, Ivan!
> >>
> >>  I have only described ideas. But here are a few more details.
> >>
> >>  We can take care not to go beyond
> >>  DataStorageConfiguration#maxWalArchiveSize.
> >>
> >>  Before increasing the size of WAL archive (transferring to archive
> >>  /rollOver, compression, decompression), we can make sure that there
> will be
> >>  enough space in the archive and if there is no such, then we will try
> to
> >>  clean it. We cannot delete those segments that are required for
> recovery
> >>  (between the last two checkpoints) and reserved for example for
> historical
> >>  rebalancing.
> >>
> >>  We can receive a notification about the change of checkpoints and the
> >>  reservation / release of segments, thus we can know how many segments
> we
> >>  can delete right now.
> >>
> >>  06.11.2020, 09:53, "Ivan Daschinsky" :
> >>  >>> For example, when trying to move a segment to the archive.
> >>  >
> >>  > We cannot do this, we will lost data. We can truncate archived
> segment if
> >>  > and only if it is not required for recovery. If last checkpoint
> marker
> >>  > points to segment
> >>  > with lower index, we cannot delete any segment with higher index. So
> the
> >>  > only moment where we can remove truncate segments is a finish of
> >>  checkpoint.
> >>  >
> >>  > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл :
> >>  >
> >>  >> Hello, everybody!
> >>  >>
> >>  >> As far as I know, WAL archive is used for PITP(GridGain feature) and
> >>  >> historical rebalancing.
> >>  >>
> >>  >> Facundo seems to have a problem with running out of directory
> >>  >> (/opt/work/walarchive) space.
> >>  >> Currently, WAL archive is cleared at the end of checkpoint.
> Potentially
> >>  >> long transaction may prevent checkpoint starting, thereby not
> cleaning
> >>  WAL
> >>  >> archive, 

Re: Why WAL archives enabled by default?

2020-11-06 Thread ткаленко кирилл
Ivan, I agree with you that the archive is primarily about optimization.

If the size of the archive is critical for the user, we have no protection 
against this, we can always go beyond this limit.
Thus, the user needs to remember this and configure it in some way. 

I suggest not to exceed this limit and give the expected behavior for the user. 
At the same time, the segments needed for recovery will remain and there will 
be no data loss.

06.11.2020, 11:29, "Ivan Daschinsky" :
> Guys, fisrt of all, archiving is not for PITR at all, this is optimization.
> If we disable archiving, every rollover we need to create new file. If we
> enable archiving, we reserve 10 (by default) segments filled with zeroes.
> We use mmap by default, so if we use no-archiver approach:
> 1. We firstly create new empty file
> 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
> a. If file is shorter, than wal segment size, it
> calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood just
> a system call truncate [1]
> b. Than it calls system call mmap on this
> file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
> These manipulation are not free and cheap. So rollover will be much much
> slower.
> If archiving is enabled, 10 segments are already preallocated at the moment
> of node's start.
>
> When archiving is enabled, archiver just copy previous preallocated segment
> and move it to archive directory.
> This archived segment is crucial for recovery. When new checkpoints
> finished, all eligible for trunocating segments are just removed.
>
> If archiving is disabled, we also write WAL segments in wal directory and
> disabling archiving don't prevent you from storing segments, if they are
> required for recovery.
>
>>> Before increasing the size of WAL archive (transferring to archive
>
> /rollOver, compression, decompression), we can make sure that there will be
> enough space in the archive and if there is no such, then we will try to
>>> clean it. We cannot delete those segments that are required for recovery
>
> (between the last two checkpoints) and reserved for example for historical
> rebalancing.
> First of all, compression/decompression is offtopic here.
> Secondly, wal segments are required only with idx higher than LAST
> checkpoint marker.
> Thirdly, archiving and rolling over can be during checkpoint and we can
> broke everything accidentially.
> Fourthly, I see no benefits to overcomplicated already complicated logic.
> This is basically problem of misunderstanding and tuning.
> There are a lot of similar topics for almost every DB. [3]
>
> [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
> [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
> [3] --
> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
>
> пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл :
>
>>  Hi, Ivan!
>>
>>  I have only described ideas. But here are a few more details.
>>
>>  We can take care not to go beyond
>>  DataStorageConfiguration#maxWalArchiveSize.
>>
>>  Before increasing the size of WAL archive (transferring to archive
>>  /rollOver, compression, decompression), we can make sure that there will be
>>  enough space in the archive and if there is no such, then we will try to
>>  clean it. We cannot delete those segments that are required for recovery
>>  (between the last two checkpoints) and reserved for example for historical
>>  rebalancing.
>>
>>  We can receive a notification about the change of checkpoints and the
>>  reservation / release of segments, thus we can know how many segments we
>>  can delete right now.
>>
>>  06.11.2020, 09:53, "Ivan Daschinsky" :
>>  >>> For example, when trying to move a segment to the archive.
>>  >
>>  > We cannot do this, we will lost data. We can truncate archived segment if
>>  > and only if it is not required for recovery. If last checkpoint marker
>>  > points to segment
>>  > with lower index, we cannot delete any segment with higher index. So the
>>  > only moment where we can remove truncate segments is a finish of
>>  checkpoint.
>>  >
>>  > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл :
>>  >
>>  >> Hello, everybody!
>>  >>
>>  >> As far as I know, WAL archive is used for PITP(GridGain feature) and
>>  >> historical rebalancing.
>>  >>
>>  >> Facundo seems to have a problem with running out of directory
>>  >> (/opt/work/walarchive) space.
>>  >> Currently, WAL archive is cleared at the end of checkpoint. Potentially
>>  >> long transaction may prevent checkpoint starting, thereby not cleaning
>>  WAL
>>  >> archive, which will lead to such an error.
>>  >> At the moment, I see such a WA to increase size of directory
>>  >> (/opt/work/walarchive) in k8s and avoid long transactions or something
>>  like
>>  >> that that modifies data and runs for a long time.
>>  >>
>>  >> And it is best to fix the logic of working with WAL archive. I think we
>>  >> should remove WAL archive cleanup from the end of 

Re: Why WAL archives enabled by default?

2020-11-06 Thread Ivan Daschinsky
Guys, fisrt of all, archiving is not for PITR at all, this is optimization.
If we disable archiving, every rollover we need to create new file. If we
enable archiving, we reserve 10 (by default) segments filled with zeroes.
We use mmap by default, so if we use no-archiver approach:
1. We firstly create new empty file
2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
a. If file is shorter, than wal segment size, it
calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood just
a system call truncate [1]
b. Than it calls system call mmap on this
file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
These manipulation are not free and cheap. So rollover will be much much
slower.
If archiving is enabled, 10 segments are already preallocated at the moment
of node's start.

When archiving is enabled, archiver just copy previous preallocated segment
and move it to archive directory.
This archived segment is crucial for recovery. When new checkpoints
finished, all eligible for trunocating segments are just removed.

If archiving is disabled, we also write WAL segments in wal directory and
disabling archiving don't prevent you from storing segments, if they are
required for recovery.

>>Before increasing the size of WAL archive (transferring to archive
/rollOver, compression, decompression), we can make sure that there will be
enough space in the archive and if there is no such, then we will try to
>>clean it. We cannot delete those segments that are required for recovery
(between the last two checkpoints) and reserved for example for historical
rebalancing.
First of all, compression/decompression is offtopic here.
Secondly, wal segments are required only with idx higher than LAST
checkpoint marker.
Thirdly, archiving and rolling over can be during checkpoint and we can
broke everything accidentially.
Fourthly, I see no benefits to overcomplicated already complicated logic.
This is basically problem of misunderstanding and tuning.
There are a lot of similar topics for almost every DB. [3]



[1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
[2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
[3] --
https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no

пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл :

> Hi, Ivan!
>
> I have only described ideas. But here are a few more details.
>
> We can take care not to go beyond
> DataStorageConfiguration#maxWalArchiveSize.
>
> Before increasing the size of WAL archive (transferring to archive
> /rollOver, compression, decompression), we can make sure that there will be
> enough space in the archive and if there is no such, then we will try to
> clean it. We cannot delete those segments that are required for recovery
> (between the last two checkpoints) and reserved for example for historical
> rebalancing.
>
> We can receive a notification about the change of checkpoints and the
> reservation / release of segments, thus we can know how many segments we
> can delete right now.
>
> 06.11.2020, 09:53, "Ivan Daschinsky" :
> >>>  For example, when trying to move a segment to the archive.
> >
> > We cannot do this, we will lost data. We can truncate archived segment if
> > and only if it is not required for recovery. If last checkpoint marker
> > points to segment
> > with lower index, we cannot delete any segment with higher index. So the
> > only moment where we can remove truncate segments is a finish of
> checkpoint.
> >
> > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл :
> >
> >>  Hello, everybody!
> >>
> >>  As far as I know, WAL archive is used for PITP(GridGain feature) and
> >>  historical rebalancing.
> >>
> >>  Facundo seems to have a problem with running out of directory
> >>  (/opt/work/walarchive) space.
> >>  Currently, WAL archive is cleared at the end of checkpoint. Potentially
> >>  long transaction may prevent checkpoint starting, thereby not cleaning
> WAL
> >>  archive, which will lead to such an error.
> >>  At the moment, I see such a WA to increase size of directory
> >>  (/opt/work/walarchive) in k8s and avoid long transactions or something
> like
> >>  that that modifies data and runs for a long time.
> >>
> >>  And it is best to fix the logic of working with WAL archive. I think we
> >>  should remove WAL archive cleanup from the end of the checkpoint and
> do it
> >>  on demand. For example, when trying to move a segment to the archive.
> >>
> >>  06.11.2020, 01:58, "Denis Magda" :
> >>  > Folks,
> >>  >
> >>  > In my understanding, you need the archives only for features such as
> >>  PITR.
> >>  > Considering, that the PITR functionality is not provided in Ignite
> why do
> >>  > we have the archives enabled by default?
> >>  >
> >>  > How about having this feature disabled by default to prevent the
> >>  following
> >>  > issues experienced by our users:
> >>  >
> >>
> http://apache-ignite-users.70518.x6.nabble.com/WAL-and-WAL-Archive-volume-size-recommendation-td34458.html