Agree with Alex.
Now we perform extra WAL fsync() at the beginning of checkpoint. We
*have* to wait for call completion before starting to write checkpoint
pages - otherwise both physical records in WAL and partition files in
storage will be in a mess in case of power loss. User threads
Dmitriy,
The point of this fsync is to order FS disk writes to prevent data
corruption, so this fsync has to be synchronous and cannot be asynchronous
or delayed.
Given that we fix correctness, I believe that current results are
acceptable.
2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan
On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov wrote:
> Dmitriy,
>
> fsync() is really slow operation - it's the main reason why FSYNC mode is
> way slower than LOG_ONLY.
> Fix includes extra fsyncs in necessary parts of code and nothing more.
> Every part is important - at
Ivan,
Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY
affected the performance under the operational load (after the preloading
part you're referring to is over)?
--
Denis
On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov wrote:
> Dmitriy,
>
> fsync()
Dmitriy,
fsync() is really slow operation - it's the main reason why FSYNC mode
is way slower than LOG_ONLY.
Fix includes extra fsyncs in necessary parts of code and nothing more.
Every part is important - at the beginning of the thread I described why.
20% slow in benchmark doesn't mean
On Tue, Apr 10, 2018 at 11:57 PM, Ilya Suntsov
wrote:
> Dmitriy,
>
> I've measured performance on the current master and haven't found any
> problems with in-memory mode.
>
Got it. I would still say that the performance drop is too big with
persistence turned on. It seems
Dmitriy,
I've measured performance on the current master and haven't found any
problems with in-memory mode.
On Tue, Apr 10, 2018, 20:33 Dmitriy Setrakyan wrote:
> I am not convinced that the performance degradation is only due to the new
> change that fixes the
I am not convinced that the performance degradation is only due to the new
change that fixes the incorrect behavior. To my knowledge, there is also a
drop in memory-only mode. Can someone explain why do we have such a drop?
D.
On Tue, Apr 10, 2018 at 9:08 AM, Vladimir Ozerov
++1 from me for Vladimir's point
вт, 10 апр. 2018 г. в 19:08, Vladimir Ozerov :
> 16% looks perfectly ok to me provided that we compare correct
> implementation with incorrect one.
>
> вт, 10 апр. 2018 г. в 18:24, Dmitriy Setrakyan :
>
> > Ilya, can we
16% looks perfectly ok to me provided that we compare correct
implementation with incorrect one.
вт, 10 апр. 2018 г. в 18:24, Dmitriy Setrakyan :
> Ilya, can we find out why pure in-memory scenario also had a performance
> drop and which commit caused it? It should not be
Ilya, can we find out why pure in-memory scenario also had a performance
drop and which commit caused it? It should not be affected by changes in
persistence at all.
D.
On Tue, Apr 10, 2018 at 7:56 AM, Ilya Suntsov wrote:
> Igniters,
>
> Looks like commit:
>
>
Hi Ilya,
Thank you for checking Ignite performance.
Is it slower than our old default WAL mode 'Default'?
Sincerely,
Dmitriy Pavlov
вт, 10 апр. 2018 г. в 17:57, Ilya Suntsov :
> Igniters,
>
> Looks like commit:
>
> d0adb61ecd9af0d9907e480ec747ea1465f97cd7 is the first
Igniters,
Looks like commit:
d0adb61ecd9af0d9907e480ec747ea1465f97cd7 is the first bad commit
> commit d0adb61ecd9af0d9907e480ec747ea1465f97cd7
> Author: Ivan Rakov
> Date: Tue Mar 27 20:11:52 2018 +0300
> IGNITE-7754 WAL in LOG_ONLY mode doesn't execute fsync on
Ivan, sure :)
Thank you for this contribution, merged to master.
вт, 27 мар. 2018 г. в 20:08, Ivan Rakov :
> Dmitry,
>
> Firstly PR contained dirty fix for performance measurement, but now it
> contains good fix. :) Sorry for inconvenience.
> I've renamed the PR.
>
> Best
Dmitry,
Firstly PR contained dirty fix for performance measurement, but now it
contains good fix. :) Sorry for inconvenience.
I've renamed the PR.
Best Regards,
Ivan Rakov
On 27.03.2018 19:40, Dmitry Pavlov wrote:
Hi Eduard, thank you for review.
Hi Ivan,
I'm confused on PR naming
Hi Eduard, thank you for review.
Hi Ivan,
I'm confused on PR naming
https://github.com/apache/ignite/pull/3656
Could you rename?
Sincerely,
Dmitriy Pavlov
вт, 27 мар. 2018 г. в 19:38, Eduard Shangareev :
> Ivan, I have reviewed your changes, looks good.
>
> On
Ivan, I have reviewed your changes, looks good.
On Tue, Mar 27, 2018 at 2:56 PM, Ivan Rakov wrote:
> Igniters,
>
> I've completed development of https://issues.apache.org/jira
> /browse/IGNITE-7754. TeamCity state is ok. Please, review my changes.
> Please note that it
Igniters,
I've completed development of
https://issues.apache.org/jira/browse/IGNITE-7754. TeamCity state is ok.
Please, review my changes.
Please note that it will be possible to track time of WAL fsync on
checkpoint begin by *walCpRecordFsyncDuration *metric in "Checkpoint
started"
Ivan,
It's all good then :) Thanks!
-Val
On Mon, Mar 26, 2018 at 1:50 AM, Ivan Rakov wrote:
> Val,
>
> There's no any sense to use WalMode.NONE in production environment, it's
> kept for testing and debugging purposes (including possible user activities
> like capacity
Val,
There's no any sense to use WalMode.NONE in production environment, it's
kept for testing and debugging purposes (including possible user
activities like capacity planning).
We already print a warning at node start in case WalMode.NONE is set:
U.quietAndWarn(log,"Started write-ahead log
Dmitry,
Thanks for clarification. So it sounds like if we fix all other modes as we
discuss here, NONE would be the only one allowing corruption. I also don't
see much sense in this and I think we should clearly state this in the doc,
as well print out a warning if NONE mode is used. Eventually,
Hi Val,
NONE means that the WAL log is disabled and not written at all. Use of the
mode is at your own risk. It is possible that restore state after the crash
at the middle of checkpoint will not succeed. I do not see much sence in
it, especially in production.
BACKGROUND is full functional WAL
I agree. In my view, any possibility to get a corrupted storage is a bug
which needs to be fixed.
BTW, can someone explain semantics of NONE mode? What is the difference
from BACKGROUND from user's perspective? Is there any particular use case
where it can be used?
-Val
On Fri, Mar 23, 2018 at
Hi Ivan,
IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?
Sincerely,
Dmitriy Pavlov
пт, 23 мар. 2018 г. в 12:23, Ivan Rakov :
> Igniters, there's another important question about this matter.
> Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think
Igniters, there's another important question about this matter.
Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we
have to do it: it will cause similar performance drop, but if we
consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well.
Best Regards,
Fixes are quite simple.
I expect them to be merged in master in a week in worst case.
Best Regards,
Ivan Rakov
On 22.03.2018 17:49, Denis Magda wrote:
Ivan,
How quick are you going to merge the fix into the master? Many persistence
related optimizations have already stacked up. Probably, we
Ivan,
How quick are you going to merge the fix into the master? Many persistence
related optimizations have already stacked up. Probably, we can release
them sooner if the community agrees.
--
Denis
On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov wrote:
> Thanks all!
> We
Thanks all!
We seem to have reached a consensus on this issue. I'll just add
necessary fsyncs under IGNITE-7754.
Best Regards,
Ivan Rakov
On 22.03.2018 15:13, Ilya Lantukh wrote:
+1 for fixing LOG_ONLY. If current implementation doesn't protect from data
corruption, it doesn't make sence.
+1 for fixing LOG_ONLY. If current implementation doesn't protect from data
corruption, it doesn't make sence.
On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda wrote:
> +1 for the fix of LOG_ONLY
>
> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>
+1 for the fix of LOG_ONLY
On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:
> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
> performance results.
>
> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov :
>
> > +1 for
+1 for fixing LOG_ONLY to enforce corruption safety given the provided
performance results.
2018-03-21 18:20 GMT+03:00 Vladimir Ozerov :
> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at
> all, provided that we fixing a bug. I.e. should we implement
+1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at
all, provided that we fixing a bug. I.e. should we implement it correctly
in the first place we would never notice any "drop".
I do not understand why someone would like to use current broken mode.
On Wed, Mar 21, 2018 at
Hi, I think option 1 is better. As Val said any mode that allows corruption
does not make much sense.
What Ivan mentioned here as drop, in relation to old mode DEFAULT (FSYNC
now), is still significant perfromance boost.
Sincerely,
Dmitriy Pavlov
ср, 21 мар. 2018 г. в 17:56, Ivan Rakov
I've attached benchmark results to the JIRA ticket.
We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
compaction enabled flag. It's pretty significant drop: WAL compaction
itself gives only ~3% drop.
I see two options here:
1) Change LOG_ONLY behavior. That implies that
Val,
If a storage is in
corrupted state, does it mean that it needs to be completely removed and
cluster needs to be restarted without data?
Yes, there's a chance that in LOG_ONLY all local data will be lost, but
only in *power loss**/ OS crash* case.
kill -9, JVM crash, death of critical
Guys,
What do we understand under "data corruption" here? If a storage is in
corrupted state, does it mean that it needs to be completely removed and
cluster needs to be restarted without data? If so, I'm not sure any mode
that allows corruption makes much sense to me. How am I supposed to use a
Ticket to track changes: https://issues.apache.org/jira/browse/IGNITE-7754
Best Regards,
Ivan Rakov
On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov wrote:
Vladimir,
Unlike BACKGROUND, LOG_ONLY provides strict write
On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov wrote:
> Vladimir,
>
> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power
> loss has happened.
> Seems like we need to measure performance difference to decide whether do
> we need separate WAL mode. If it
Vladimir,
Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless
power loss has happened.
Seems like we need to measure performance difference to decide whether
do we need separate WAL mode. If it will be invisible, we'll just fix
these bugs without introducing new mode; if it
Folks, I do not expect any performance degradation here for high load
becase we already do fsync on rollover. So extra fsyncs will be almost
free. We should do this fsync without holding CP lock , of course.
(see also point 3:
3) We do perform fsync on rollover (switch of current WAL segment) in
Same question. It would be very difficult to explain these two modes to
users. We should do our best to fix LOG_ONLY first. Without these
guarantees there is no reason to keep LOG_ONLY at all, user could simply
use BACKGROUND with high flush frequency. This is precisely how Cassandra
works.
p.1 -
It really depends on hardware and workload pattern. I expect that
LOG_ONLY_SAFE will be either equal to LOG_ONLY or a few percent slower.
We'll answer this question for sure after implementation of three fixes
and benchmarking.
Let's first of all get understanding whether extra durability
Ivan,
Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
D.
On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov wrote:
> Igniters and especially Native Persistence experts,
>
> We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
> 2.4
43 matches
Mail list logo