Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

KONDO Mitsumasa Thu, 25 Jul 2013 03:07:59 -0700

Hi,

I understand why my patch is faster than original, by executing Heikki's patch.His patch execute write() and fsync() in each relation files in write-phase incheckpoint. Therefore, I expected that write-phase would be slow, and fsync-phasewould be fast. Because disk-write had executed in write-phase. But fsync time inpostgresql with his patch is almost same time as original. It's very mysterious!

I checked /proc/meminfo in executing benchmark and other resources. As a result,this was caused by separating checkpointer process and writer process. In 9.1 orolder, checkpoint and background-write are executed in writer process by serialschedule. But in 9.2 or later, it is executed by parallel schedule, regardlessexecuting checkpoint. Therefore, less fsync and long-term fsync schedule methodwhich likes my patch are so faster. Because waste disk-write was descend bythease method. In worst case his patch, same peges disk-write are executed twicein one checkpoint, moreover it might be random disk-write.

By the way, when dirty buffers which have always under dirty_background_ratio *physical memory / 100, write-phase does not disk-write at all. Therefore, infsync-phase disk-write all of dirty buffer. So when this case, write-schedule isnot making sense. It's very heavy and waste, but it might not change by OS andpostgres parameters. I set small dirty_backjground_ratio, but the result was verymiserable...

Now, I am confirming my theory by dbt-2 benchmark in lru_max_pages = 0. And Iwill be told about OS background-writing mechanism by my colleague who is kernelhacker next week.


What do you think?

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

Reply via email to