Re: Why WAL archives enabled by default?

2020-11-09 Thread ткаленко кирилл
Hello guys again!

Does anyone know why we are doing any calculation here 
IgniteUtils#adjustedWalHistorySize at all?
Would it be easier to always take the 
DataStorageConfiguration#maxWalArchiveSize? It seems that the user can easily 
do this himself by changing the value by 1 byte.

06.11.2020, 13:56, "Ivan Daschinsky" :
> Alex, thanks for pointing that out. Shame that I missed it.
>
> пт, 6 нояб. 2020 г. в 13:45, Alex Plehanov :
>
>>  Guys,
>>
>>  We already have FileWriteAheadLogManager#maxSegCountWithoutCheckpoint.
>>  Checkpoint triggered if there are too many WAL segments without checkpoint.
>>  Looks like you are talking about this feature.
>>
>>  пт, 6 нояб. 2020 г. в 13:21, Ivan Daschinsky :
>>
>>  > Kirill and I discussed privately proposed approach. As far as I
>>  understand,
>>  > Kirill suggests to implement some
>>  > heuristic to do a force checkpoint in some cases if user by mistake
>>  > misconfigured cluster in order to preserve
>>  > requested size of WAL archive.
>>  > Currently, as for me, this approach is questionable, because it can cause
>>  > some performance problems. But as an option,
>>  > it can be used and should be switchable.
>>  >
>>  > пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky :
>>  >
>>  > > Kirill, how your approach will help if user tuned a cluster to do
>>  > > checkpoints rarely under load?
>>  > > No way.
>>  > >
>>  > > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл :
>>  > >
>>  > >> Ivan, I agree with you that the archive is primarily about
>>  optimization.
>>  > >>
>>  > >> If the size of the archive is critical for the user, we have no
>>  > >> protection against this, we can always go beyond this limit.
>>  > >> Thus, the user needs to remember this and configure it in some way.
>>  > >>
>>  > >> I suggest not to exceed this limit and give the expected behavior for
>>  > the
>>  > >> user. At the same time, the segments needed for recovery will remain
>>  and
>>  > >> there will be no data loss.
>>  > >>
>>  > >> 06.11.2020, 11:29, "Ivan Daschinsky" :
>>  > >> > Guys, fisrt of all, archiving is not for PITR at all, this is
>>  > >> optimization.
>>  > >> > If we disable archiving, every rollover we need to create new file.
>>  If
>>  > >> we
>>  > >> > enable archiving, we reserve 10 (by default) segments filled with
>>  > >> zeroes.
>>  > >> > We use mmap by default, so if we use no-archiver approach:
>>  > >> > 1. We firstly create new empty file
>>  > >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
>>  > >> > a. If file is shorter, than wal segment size, it
>>  > >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the
>>  hood
>>  > >> just
>>  > >> > a system call truncate [1]
>>  > >> > b. Than it calls system call mmap on this
>>  > >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
>>  > >> > These manipulation are not free and cheap. So rollover will be much
>>  > much
>>  > >> > slower.
>>  > >> > If archiving is enabled, 10 segments are already preallocated at the
>>  > >> moment
>>  > >> > of node's start.
>>  > >> >
>>  > >> > When archiving is enabled, archiver just copy previous preallocated
>>  > >> segment
>>  > >> > and move it to archive directory.
>>  > >> > This archived segment is crucial for recovery. When new checkpoints
>>  > >> > finished, all eligible for trunocating segments are just removed.
>>  > >> >
>>  > >> > If archiving is disabled, we also write WAL segments in wal
>>  directory
>>  > >> and
>>  > >> > disabling archiving don't prevent you from storing segments, if they
>>  > are
>>  > >> > required for recovery.
>>  > >> >
>>  > >> >>> Before increasing the size of WAL archive (transferring to archive
>>  > >> >
>>  > >> > /rollOver, compression, decompression), we can make sure that there
>>  > >> will be
>>  > >> > enough space in the archive and if there is no such, then we will
>>  try
>>  > to
>>  > >> >>> clean it. We cannot delete those segments that are required for
>>  > >> recovery
>>  > >> >
>>  > >> > (between the last two checkpoints) and reserved for example for
>>  > >> historical
>>  > >> > rebalancing.
>>  > >> > First of all, compression/decompression is offtopic here.
>>  > >> > Secondly, wal segments are required only with idx higher than LAST
>>  > >> > checkpoint marker.
>>  > >> > Thirdly, archiving and rolling over can be during checkpoint and we
>>  > can
>>  > >> > broke everything accidentially.
>>  > >> > Fourthly, I see no benefits to overcomplicated already complicated
>>  > >> logic.
>>  > >> > This is basically problem of misunderstanding and tuning.
>>  > >> > There are a lot of similar topics for almost every DB. [3]
>>  > >> >
>>  > >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
>>  > >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
>>  > >> > [3] --
>>  > >> >
>>  > >>
>>  >
>>  
>> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device=pg+wal+no
>>  > >> >
>>  > >> > пт, 6 нояб. 

[jira] [Created] (IGNITE-13689) Extend test coverage [IGNITE-11512] Add counter left partition for index rebuild in CacheGroupMetricsMXBean

2020-11-09 Thread Alexand Polyakov (Jira)
Alexand Polyakov created IGNITE-13689:
-

 Summary: Extend test coverage [IGNITE-11512] Add counter left 
partition for index rebuild in CacheGroupMetricsMXBean
 Key: IGNITE-13689
 URL: https://issues.apache.org/jira/browse/IGNITE-13689
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.9
Reporter: Alexand Polyakov
Assignee: Alexand Polyakov


New test:
Partial rebuild
# Start cluster, load data with indexes
# Kill single node, create new index, start node.
# Make sure that index rebuild count is in range of total new index size and 0 
and decreasing
# Wait until rebuild finished, assert that no index errors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: IGNITE-12951 Update documents for migrated extensions

2020-11-09 Thread Saikat Maitra
Hi Nikolay,

Thank you for reviewing the changes. I have made changes only in the
README.txt file and updated the artifactId so that it matches with pom.xml.

I have also updated the version details in the PR so that it matches with
our release version.

https://github.com/apache/ignite-extensions/pull/30

With respect to naming convention I was thinking we would not be required
to change the groupId and we will only change the artifactId similar to
spring-boot-autoconfigure-ext module.

https://github.com/apache/ignite-extensions/blob/master/modules/spring-boot-autoconfigure-ext/pom.xml#L33

Please review and let me know your thoughts.

Regards,
Saikat




On Mon, Nov 9, 2020 at 12:32 AM Nikolay Izhikov  wrote:

> Hello, Saikat.
>
> As far as I can see you changed artifactId of the extensions not only docs.
>
> Do we have an agreement to name each extension as «{extension-name}-ext»
> like «ignite-camel-ext» or similar?
> We don’t have much extensions release, so maybe it will be better to have
> naming like
>
> groupId=org.apache.ignite.extensions
> artifactId=ignite-camel
>
> What do you think?
>
>
> > 9 нояб. 2020 г., в 05:36, Saikat Maitra 
> написал(а):
> >
> > Hi,
> >
> > I have raised a PR for the following issue.
> >
> > Jira : https://issues.apache.org/jira/browse/IGNITE-12951
> > PR : https://github.com/apache/ignite-extensions/pull/30
> >
> > This is an initial PR in Ignite Extensions repo. I will work on another
> PR
> > in ignite repo for the remaining changes.
> >
> > Regards,
> > Saikat
>
>


Re: delete is too slow, sometimes even causes OOM

2020-11-09 Thread Denis Magda
Frank,

The ticket doesn't suggest the lazy flag as a workaround. The flag is
supposed to be used to address the performance issue.

How about a workaround on your application side while you're waiting for
this improvement?

   - Query all the records for a deletion - "SELECT record_primary_key
   WHERE delete_condition"
   - Delete the records using the key-value API -
   cache.removeAll(all_primary_keys).

-
Denis


On Mon, Nov 9, 2020 at 8:20 AM frank li  wrote:

> I enforced  a lazy flag in DELETE code for tesing, but it is stil running
> very slow. I mean that "Lazy" flag cannot solve the problem of running too
> slow.
>
> On 2020/11/06 09:50:15, Юрий  wrote:
> > Hi Frank!
> >
> > There is an old ticket [1] - We will try to prioritize it to finish
> before
> > the end of the year it should prevent OOM for most cases.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9182
> >
> > вт, 3 нояб. 2020 г. в 18:53, frank li :
> >
> > > Current code logic for DELETE is as follows:
> > > if WHERE clause contains a condition as "key=xxx", it uses fastUpdate
> > > which remove the related item directly.
> > >
> > > else
> > > do select for update;
> > > for each row, call closure code "RMV" to remove it.
> > >
> > > 1. As "executeSelectForDml" get _KEY and _VAL columns for all condidate
> > > rows, it often causes OOM when there are a lot of data  to delete. Why
> do
> > > we verify "val" during remove operation?
> > >
> > > 2. After selection,  why don't we just remove it with cache.remove as
> > > fastUpdate does?
> > >
> > >
> > >
> >
> > --
> > Живи с улыбкой! :D
> >
>


Re: delete is too slow, sometimes even causes OOM

2020-11-09 Thread frank li
I enforced  a lazy flag in DELETE code for tesing, but it is stil running very 
slow. I mean that "Lazy" flag cannot solve the problem of running too slow.

On 2020/11/06 09:50:15, Юрий  wrote: 
> Hi Frank!
> 
> There is an old ticket [1] - We will try to prioritize it to finish before
> the end of the year it should prevent OOM for most cases.
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-9182
> 
> вт, 3 нояб. 2020 г. в 18:53, frank li :
> 
> > Current code logic for DELETE is as follows:
> > if WHERE clause contains a condition as "key=xxx", it uses fastUpdate
> > which remove the related item directly.
> >
> > else
> > do select for update;
> > for each row, call closure code "RMV" to remove it.
> >
> > 1. As "executeSelectForDml" get _KEY and _VAL columns for all condidate
> > rows, it often causes OOM when there are a lot of data  to delete. Why do
> > we verify "val" during remove operation?
> >
> > 2. After selection,  why don't we just remove it with cache.remove as
> > fastUpdate does?
> >
> >
> >
> 
> -- 
> Живи с улыбкой! :D
> 


[jira] [Created] (IGNITE-13688) Ignite Docs: Port Checkpointing Mapping from readme.io

2020-11-09 Thread YuJue Li (Jira)
YuJue Li created IGNITE-13688:
-

 Summary: Ignite Docs: Port Checkpointing Mapping from 
readme.io
 Key: IGNITE-13688
 URL: https://issues.apache.org/jira/browse/IGNITE-13688
 Project: Ignite
  Issue Type: Task
  Components: documentation
Affects Versions: 2.9
Reporter: YuJue Li
 Fix For: 2.9.1


The content in the link below is missing from the new version of the document:

[https://apacheignite.readme.io/docs/continuous-mapping]

[https://apacheignite.readme.io/docs/checkpointing]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Use GridNioServer in Java thin client

2020-11-09 Thread Ivan Daschinsky
I suppose that the best variant -- ability to switch to netty if this lib
is in classpath

пн, 9 нояб. 2020 г. в 15:58, Igor Sapego :

> Sounds like a good idea to me.
>
> Best Regards,
> Igor
>
>
> On Mon, Nov 9, 2020 at 3:32 PM Alex Plehanov 
> wrote:
>
> > +1 for using GridNioServer as java thin client communication layer.
> >
> > вс, 8 нояб. 2020 г. в 19:12, Pavel Tupitsyn :
> >
> > > Igniters,
> > >
> > > This is a continuation of "Use Netty for Java thin client" [1],
> > > I'm starting a new thread for better visibility.
> > >
> > > The problems with current Java thin client are:
> > > * Socket writes block user threads
> > > * Every connection uses a separate listener thread (with partition
> > > awareness there is a thread for every server node within a single
> > > IgniteClient)
> > >
> > > GridNioServer can work in client mode and solves both of these
> problems.
> > > It is the most practical choice as well at the moment - no extra
> > > dependencies required.
> > >
> > > A potential drawback is increased coupling between thin client and core
> > > code,
> > > which I'm going to mitigate by abstracting GridNioServer behind a
> simpler
> > > facade,
> > > so we can replace it with Netty or something else easier if we decide
> to
> > > split the code.
> > >
> > > Thoughts, objections?
> > >
> > > [1]
> > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Use-Netty-for-Java-thin-client-td49732.html
> > >
> >
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: [DISCUSS] Disable socket linger by default in TCP discovery SPI.

2020-11-09 Thread Anton Vinogradov
PR's merged.
Please make sure that users who use SSL will be notified to set linger at
2.10 migration doc.

On Fri, Nov 6, 2020 at 1:01 PM Steshin Vladimir  wrote:

>  The tickets are: [1] disables linger by default and [2] is the doc.
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13643
>
> [2] https://issues.apache.org/jira/browse/IGNITE-13662
>
> 05.11.2020 11:00, Anton Vinogradov пишет:
> > Folks,
> > Seems, we've got an agreement that the fix is necessary.
> > Do we need to do except the following?
> >>> zero linger as default + warning on SSL enabled on JVM before the fix +
> > warning at documentation + migration notes
> >
> > On Tue, Nov 3, 2020 at 2:38 PM Steshin Vladimir 
> wrote:
> >
> >>   Ilya, hi.
> >>
> >>
> >>   Of course: /TcpDiscoverySpi.setSoLinger(int)/ property. Always
> been.
> >>
> >>
> >> 02.11.2020 20:14, Ilya Kasnacheev пишет:
> >>> Hello!
> >>>
> >>> Is there any option to re-enable linger on SSL sockets?
> >>>
> >>> Telling people to re-configure does not help if they can't.
> >>>
> >>> Regards,
>


Re: [DISCUSS] Use GridNioServer in Java thin client

2020-11-09 Thread Igor Sapego
Sounds like a good idea to me.

Best Regards,
Igor


On Mon, Nov 9, 2020 at 3:32 PM Alex Plehanov 
wrote:

> +1 for using GridNioServer as java thin client communication layer.
>
> вс, 8 нояб. 2020 г. в 19:12, Pavel Tupitsyn :
>
> > Igniters,
> >
> > This is a continuation of "Use Netty for Java thin client" [1],
> > I'm starting a new thread for better visibility.
> >
> > The problems with current Java thin client are:
> > * Socket writes block user threads
> > * Every connection uses a separate listener thread (with partition
> > awareness there is a thread for every server node within a single
> > IgniteClient)
> >
> > GridNioServer can work in client mode and solves both of these problems.
> > It is the most practical choice as well at the moment - no extra
> > dependencies required.
> >
> > A potential drawback is increased coupling between thin client and core
> > code,
> > which I'm going to mitigate by abstracting GridNioServer behind a simpler
> > facade,
> > so we can replace it with Netty or something else easier if we decide to
> > split the code.
> >
> > Thoughts, objections?
> >
> > [1]
> >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Use-Netty-for-Java-thin-client-td49732.html
> >
>


Re: [DISCUSS] Use GridNioServer in Java thin client

2020-11-09 Thread Alex Plehanov
+1 for using GridNioServer as java thin client communication layer.

вс, 8 нояб. 2020 г. в 19:12, Pavel Tupitsyn :

> Igniters,
>
> This is a continuation of "Use Netty for Java thin client" [1],
> I'm starting a new thread for better visibility.
>
> The problems with current Java thin client are:
> * Socket writes block user threads
> * Every connection uses a separate listener thread (with partition
> awareness there is a thread for every server node within a single
> IgniteClient)
>
> GridNioServer can work in client mode and solves both of these problems.
> It is the most practical choice as well at the moment - no extra
> dependencies required.
>
> A potential drawback is increased coupling between thin client and core
> code,
> which I'm going to mitigate by abstracting GridNioServer behind a simpler
> facade,
> so we can replace it with Netty or something else easier if we decide to
> split the code.
>
> Thoughts, objections?
>
> [1]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Use-Netty-for-Java-thin-client-td49732.html
>


[jira] [Created] (IGNITE-13687) Improvement of human-readable format of WAL records (StandaloneWalRecordsIterator)

2020-11-09 Thread Alexand Polyakov (Jira)
Alexand Polyakov created IGNITE-13687:
-

 Summary: Improvement of human-readable format of WAL records 
(StandaloneWalRecordsIterator)
 Key: IGNITE-13687
 URL: https://issues.apache.org/jira/browse/IGNITE-13687
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.9
Reporter: Alexand Polyakov
Assignee: Alexand Polyakov


StandaloneWalRecordsIterator is used for PageHistoryDiagnoster and wal-reader 
utility for printing WAL records in human-readable format. We should add 
abilities for this iterator and options for IgniteWalIteratorFactory:

to print DataRecord entries keys in hex/base64 (see UnwrapDataEntry)
to print all records existing in WAL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [MTCGA]: new failures in builds [5716164] needs to be handled

2020-11-09 Thread Ilya Kasnacheev
Hello!

I believe that we have broken this test suite by merging volatile data
region ticket. I hope that we will fix it promptly.

https://issues.apache.org/jira/browse/IGNITE-13658

Currently, Disk Page Compression only runs in Nightly. I think we should
add it to Run All since the suite may easily be broken by any PDS changes.
What do you think?

Regards,
-- 
Ilya Kasnacheev


сб, 7 нояб. 2020 г. в 06:53, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New test failure in master-nightly
> IgnitePersistentStoreDataStructuresTest.testLatchVolatility
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2498689540135370176=%3Cdefault%3E=testDetails
>
>  *New test failure in master-nightly
> IgnitePersistentStoreDataStructuresTest.testLockVolatility
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=8536744125057342252=%3Cdefault%3E=testDetails
>
>  *New test failure in master-nightly
> IgnitePersistentStoreDataStructuresTest.testSemaphoreVolatility
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-8607160794826656046=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - zstan 
> https://ci.ignite.apache.org/viewModification.html?modId=909509
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 06:52:42 07-11-2020
>


Re: 2.9.1 release scope and dates

2020-11-09 Thread Yaroslav Molochkov
Ivan, thanks! 

Added it to the list.

> On 8 Nov 2020, at 14:13, Ivan Daschinsky  wrote:
> 
> Yaroslav, there is another bug for 2.9.1 release
> https://issues.apache.org/jira/browse/IGNITE-13572
> 
> чт, 5 нояб. 2020 г., 19:23 Yaroslav Molochkov :
> 
>> Ivan, hi!
>> Sure.
>> 
>> UPD: i am the release manager and will be doing this with Maxim's help
>> (since i don't have some user permissions)
>> 
>>> On Thu, Nov 5, 2020 at 6:24 PM Ivan Daschinsky 
>>> wrote:
>>> 
>>> Hi. I'd suggest to add this issue. This is a usability improvement for zk
>>> discovery, and also this patch incorporates fixes for JMX metrics
>>> concurrency issues
>>> 
>>> [1] -- https://issues.apache.org/jira/browse/IGNITE-13577
>>> 
>>> чт, 5 нояб. 2020 г., 16:20 Yaroslav Molochkov :
>>> 
 Igniters!
 
 I'd like to help with the 2.9.1 release. The scope of this release
>>> includes
 following issues:
 
 
>>> 
>> https://issues.apache.org/jira/browse/IGNITE-13676?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1
 
 Maxim Muzafarov agreed to help me with the process and he will be the
 release manager.
 
 Scope freeze: Nov. 12th
 Code freeze: Nov. 19th
 Voting date: Nov. 26th
 Release date: Nov. 31st
 
 Tickets that were added (or to be added) to the scope don't bring new
 features but various bug fixes.
 
>>> 
>>