Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-13 Thread Ivan Rakov
Agree with Alex. Now we perform extra WAL fsync() at the beginning of checkpoint. We *have* to wait for call completion before starting to write checkpoint pages - otherwise both physical records in WAL and partition files in storage will be in a mess in case of power loss. User threads

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-13 Thread Alexey Goncharuk
Dmitriy, The point of this fsync is to order FS disk writes to prevent data corruption, so this fsync has to be synchronous and cannot be asynchronous or delayed. Given that we fix correctness, I believe that current results are acceptable. 2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-12 Thread Dmitriy Setrakyan
On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov wrote: > Dmitriy, > > fsync() is really slow operation - it's the main reason why FSYNC mode is > way slower than LOG_ONLY. > Fix includes extra fsyncs in necessary parts of code and nothing more. > Every part is important - at

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-12 Thread Denis Magda
Ivan, Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY affected the performance under the operational load (after the preloading part you're referring to is over)? -- Denis On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov wrote: > Dmitriy, > > fsync()

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-12 Thread Ivan Rakov
Dmitriy, fsync() is really slow operation - it's the main reason why FSYNC mode is way slower than LOG_ONLY. Fix includes extra fsyncs in necessary parts of code and nothing more. Every part is important - at the beginning of the thread I described why. 20% slow in benchmark doesn't mean

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-11 Thread Dmitriy Setrakyan
On Tue, Apr 10, 2018 at 11:57 PM, Ilya Suntsov wrote: > Dmitriy, > > I've measured performance on the current master and haven't found any > problems with in-memory mode. > Got it. I would still say that the performance drop is too big with persistence turned on. It seems

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-11 Thread Ilya Suntsov
Dmitriy, I've measured performance on the current master and haven't found any problems with in-memory mode. On Tue, Apr 10, 2018, 20:33 Dmitriy Setrakyan wrote: > I am not convinced that the performance degradation is only due to the new > change that fixes the

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Dmitriy Setrakyan
I am not convinced that the performance degradation is only due to the new change that fixes the incorrect behavior. To my knowledge, there is also a drop in memory-only mode. Can someone explain why do we have such a drop? D. On Tue, Apr 10, 2018 at 9:08 AM, Vladimir Ozerov

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Dmitry Pavlov
++1 from me for Vladimir's point вт, 10 апр. 2018 г. в 19:08, Vladimir Ozerov : > 16% looks perfectly ok to me provided that we compare correct > implementation with incorrect one. > > вт, 10 апр. 2018 г. в 18:24, Dmitriy Setrakyan : > > > Ilya, can we

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Vladimir Ozerov
16% looks perfectly ok to me provided that we compare correct implementation with incorrect one. вт, 10 апр. 2018 г. в 18:24, Dmitriy Setrakyan : > Ilya, can we find out why pure in-memory scenario also had a performance > drop and which commit caused it? It should not be

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Dmitriy Setrakyan
Ilya, can we find out why pure in-memory scenario also had a performance drop and which commit caused it? It should not be affected by changes in persistence at all. D. On Tue, Apr 10, 2018 at 7:56 AM, Ilya Suntsov wrote: > Igniters, > > Looks like commit: > >

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Dmitry Pavlov
Hi Ilya, Thank you for checking Ignite performance. Is it slower than our old default WAL mode 'Default'? Sincerely, Dmitriy Pavlov вт, 10 апр. 2018 г. в 17:57, Ilya Suntsov : > Igniters, > > Looks like commit: > > d0adb61ecd9af0d9907e480ec747ea1465f97cd7 is the first

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-04-10 Thread Ilya Suntsov
Igniters, Looks like commit: d0adb61ecd9af0d9907e480ec747ea1465f97cd7 is the first bad commit > commit d0adb61ecd9af0d9907e480ec747ea1465f97cd7 > Author: Ivan Rakov > Date: Tue Mar 27 20:11:52 2018 +0300 > IGNITE-7754 WAL in LOG_ONLY mode doesn't execute fsync on

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-27 Thread Dmitry Pavlov
Ivan, sure :) Thank you for this contribution, merged to master. вт, 27 мар. 2018 г. в 20:08, Ivan Rakov : > Dmitry, > > Firstly PR contained dirty fix for performance measurement, but now it > contains good fix. :) Sorry for inconvenience. > I've renamed the PR. > > Best

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-27 Thread Ivan Rakov
Dmitry, Firstly PR contained dirty fix for performance measurement, but now it contains good fix. :) Sorry for inconvenience. I've renamed the PR. Best Regards, Ivan Rakov On 27.03.2018 19:40, Dmitry Pavlov wrote: Hi Eduard, thank you for review. Hi Ivan, I'm confused on PR naming

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-27 Thread Dmitry Pavlov
Hi Eduard, thank you for review. Hi Ivan, I'm confused on PR naming https://github.com/apache/ignite/pull/3656 Could you rename? Sincerely, Dmitriy Pavlov вт, 27 мар. 2018 г. в 19:38, Eduard Shangareev : > Ivan, I have reviewed your changes, looks good. > > On

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-27 Thread Eduard Shangareev
Ivan, I have reviewed your changes, looks good. On Tue, Mar 27, 2018 at 2:56 PM, Ivan Rakov wrote: > Igniters, > > I've completed development of https://issues.apache.org/jira > /browse/IGNITE-7754. TeamCity state is ok. Please, review my changes. > Please note that it

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-27 Thread Ivan Rakov
Igniters, I've completed development of https://issues.apache.org/jira/browse/IGNITE-7754. TeamCity state is ok. Please, review my changes. Please note that it will be possible to track time of WAL fsync on checkpoint begin by *walCpRecordFsyncDuration *metric in "Checkpoint started"

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-26 Thread Valentin Kulichenko
Ivan, It's all good then :) Thanks! -Val On Mon, Mar 26, 2018 at 1:50 AM, Ivan Rakov wrote: > Val, > > There's no any sense to use WalMode.NONE in production environment, it's > kept for testing and debugging purposes (including possible user activities > like capacity

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-26 Thread Ivan Rakov
Val, There's no any sense to use WalMode.NONE in production environment, it's kept for testing and debugging purposes (including possible user activities like capacity planning). We already print a warning at node start in case WalMode.NONE is set: U.quietAndWarn(log,"Started write-ahead log

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Valentin Kulichenko
Dmitry, Thanks for clarification. So it sounds like if we fix all other modes as we discuss here, NONE would be the only one allowing corruption. I also don't see much sense in this and I think we should clearly state this in the doc, as well print out a warning if NONE mode is used. Eventually,

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Dmitry Pavlov
Hi Val, NONE means that the WAL log is disabled and not written at all. Use of the mode is at your own risk. It is possible that restore state after the crash at the middle of checkpoint will not succeed. I do not see much sence in it, especially in production. BACKGROUND is full functional WAL

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Valentin Kulichenko
I agree. In my view, any possibility to get a corrupted storage is a bug which needs to be fixed. BTW, can someone explain semantics of NONE mode? What is the difference from BACKGROUND from user's perspective? Is there any particular use case where it can be used? -Val On Fri, Mar 23, 2018 at

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Dmitry Pavlov
Hi Ivan, IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree? Sincerely, Dmitriy Pavlov пт, 23 мар. 2018 г. в 12:23, Ivan Rakov : > Igniters, there's another important question about this matter. > Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Ivan Rakov
Igniters, there's another important question about this matter. Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we have to do it: it will cause similar performance drop, but if we consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well. Best Regards,

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-23 Thread Ivan Rakov
Fixes are quite simple. I expect them to be merged in master in a week in worst case. Best Regards, Ivan Rakov On 22.03.2018 17:49, Denis Magda wrote: Ivan, How quick are you going to merge the fix into the master? Many persistence related optimizations have already stacked up. Probably, we

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-22 Thread Denis Magda
Ivan, How quick are you going to merge the fix into the master? Many persistence related optimizations have already stacked up. Probably, we can release them sooner if the community agrees. -- Denis On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov wrote: > Thanks all! > We

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-22 Thread Ivan Rakov
Thanks all! We seem to have reached a consensus on this issue. I'll just add necessary fsyncs under IGNITE-7754. Best Regards, Ivan Rakov On 22.03.2018 15:13, Ilya Lantukh wrote: +1 for fixing LOG_ONLY. If current implementation doesn't protect from data corruption, it doesn't make sence.

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-22 Thread Ilya Lantukh
+1 for fixing LOG_ONLY. If current implementation doesn't protect from data corruption, it doesn't make sence. On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda wrote: > +1 for the fix of LOG_ONLY > > On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < >

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-21 Thread Denis Magda
+1 for the fix of LOG_ONLY On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < alexey.goncha...@gmail.com> wrote: > +1 for fixing LOG_ONLY to enforce corruption safety given the provided > performance results. > > 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov : > > > +1 for

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-21 Thread Alexey Goncharuk
+1 for fixing LOG_ONLY to enforce corruption safety given the provided performance results. 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov : > +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at > all, provided that we fixing a bug. I.e. should we implement

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-21 Thread Vladimir Ozerov
+1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at all, provided that we fixing a bug. I.e. should we implement it correctly in the first place we would never notice any "drop". I do not understand why someone would like to use current broken mode. On Wed, Mar 21, 2018 at

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-21 Thread Dmitry Pavlov
Hi, I think option 1 is better. As Val said any mode that allows corruption does not make much sense. What Ivan mentioned here as drop, in relation to old mode DEFAULT (FSYNC now), is still significant perfromance boost. Sincerely, Dmitriy Pavlov ср, 21 мар. 2018 г. в 17:56, Ivan Rakov

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-21 Thread Ivan Rakov
I've attached benchmark results to the JIRA ticket. We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL compaction enabled flag. It's pretty significant drop: WAL compaction itself gives only ~3% drop. I see two options here: 1) Change LOG_ONLY behavior. That implies that

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-20 Thread Ivan Rakov
Val, If a storage is in corrupted state, does it mean that it needs to be completely removed and cluster needs to be restarted without data? Yes, there's a chance that in LOG_ONLY all local data will be lost, but only in *power loss**/ OS crash* case. kill -9, JVM crash, death of critical

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-19 Thread Valentin Kulichenko
Guys, What do we understand under "data corruption" here? If a storage is in corrupted state, does it mean that it needs to be completely removed and cluster needs to be restarted without data? If so, I'm not sure any mode that allows corruption makes much sense to me. How am I supposed to use a

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Ivan Rakov
Ticket to track changes: https://issues.apache.org/jira/browse/IGNITE-7754 Best Regards, Ivan Rakov On 16.03.2018 10:58, Dmitriy Setrakyan wrote: On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov wrote: Vladimir, Unlike BACKGROUND, LOG_ONLY provides strict write

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Dmitriy Setrakyan
On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov wrote: > Vladimir, > > Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power > loss has happened. > Seems like we need to measure performance difference to decide whether do > we need separate WAL mode. If it

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Ivan Rakov
Vladimir, Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power loss has happened. Seems like we need to measure performance difference to decide whether do we need separate WAL mode. If it will be invisible, we'll just fix these bugs without introducing new mode; if it

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Dmitry Pavlov
Folks, I do not expect any performance degradation here for high load becase we already do fsync on rollover. So extra fsyncs will be almost free. We should do this fsync without holding CP lock , of course. (see also point 3: 3) We do perform fsync on rollover (switch of current WAL segment) in

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Vladimir Ozerov
Same question. It would be very difficult to explain these two modes to users. We should do our best to fix LOG_ONLY first. Without these guarantees there is no reason to keep LOG_ONLY at all, user could simply use BACKGROUND with high flush frequency. This is precisely how Cassandra works. p.1 -

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-16 Thread Ivan Rakov
It really depends on hardware and workload pattern. I expect that LOG_ONLY_SAFE will be either equal to LOG_ONLY or a few percent slower. We'll answer this question for sure after implementation of three fixes and benchmarking. Let's first of all get understanding whether extra durability

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-15 Thread Dmitriy Setrakyan
Ivan, Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE? D. On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov wrote: > Igniters and especially Native Persistence experts, > > We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in > 2.4