On Wed, Jun 28, 2017 at 5:19 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Wed, Aug 24, 2016 at 2:58 AM, Robert Haas <robertmh...@gmail.com> wrote: >> Now, for bigger segment sizes, I think there actually could be a >> little bit of a noticeable performance hit here, because it's not just >> about total elapsed time. Even if the code eventually touches all of >> the memory, it might not touch it all before starting to fire up >> workers or whatever else it wants to do with the DSM segment. But I'm >> thinking we still need to bite the bullet and pay the expense, because >> crash-and-restart cycles are *really* bad. > > Here is a new rebased version of this patch, primarily to poke this > thread as an unresolved question. This patch is not committable as is > though: I discovered that parallel query can cause fallocate to return > with errno == EINTR. I haven't yet investigated whether fallocate is > supposed to be restartable, or signals should be blocked, or something > else is wrong. Another question is whether the call to ftruncate() is > actually necessary before the call to fallocate().
I think this line is saying that it won't restart automatically: https://github.com/torvalds/linux/blob/590dce2d4934fb909b112cd80c80486362337744/mm/shmem.c#L2884 Compare this patch (not in the kernel tree) that suggests that line should be changed to cause restart: https://lkml.org/lkml/2016/3/3/987 - error = -EINTR; + error = -ERESTARTSYS; So I think we either need to mask signals with or put in an explicit retry loop, as shown in the attached version of the patch. With the v3 patch I posted earlier, I see interrupted system call failures in the select_parallel regression test, but with the v4 it passes. Thoughts? > Unfounded speculation: fallocate() might actually *improve* > performance of DSM segments if your access pattern involves random > access (just to pick an example out of the air, something like... > building a hash table), since it's surely easier to allocate a big > contiguous chunk than a squillion random pages most of which divide an > existing hole into two smaller holes... Bleugh... I retract this, of course we initialise the hash table in order anyway so this doesn't make any sense. -- Thomas Munro http://www.enterprisedb.com
Description: Binary data
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers