Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Robert Haas
On Tue, Dec 15, 2015 at 11:08 AM, Andres Freund wrote: >> However, I don't think this is exactly what you are proposing. I'm >> skeptical of the idea that _mdfd_getseg() should probe ahead to see >> whether we're dealing with a malformed relation where the intermediate >>

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund
On 2015-12-15 11:28:10 -0500, Robert Haas wrote: > Hmm, yes it does. But now that I think about it, we're not otherwise > doing _mdnblocks() in that loop. So that would add a system call per > loop iteration. That doesn't seem like such a swell idea. We'd do that only when intially open the

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund
On 2015-12-15 10:53:58 -0500, Robert Haas wrote: > On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund wrote: > > Unless in recovery in the startup process, or when EXTENSION_CREATE is > > passed to it. Depending on whether it or mdnblocks were called first, > > and depending on

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Robert Haas
On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund wrote: > On 2015-12-09 16:50:06 -0500, Robert Haas wrote: >> Now, if subsequent to this an index scan happens to sweep through and >> try to fetch a block in 123456.2, it will work! This happens because >> _mdfd_getseg() doesn't

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund
On 2015-12-09 16:50:06 -0500, Robert Haas wrote: > Now, if subsequent to this an index scan happens to sweep through and > try to fetch a block in 123456.2, it will work! This happens because > _mdfd_getseg() doesn't care about the length of the segments; it only > cares whether or not they

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 1:48 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Dec 10, 2015 at 1:22 PM, Simon Riggs wrote: >>> We can seq scan the array at relcache build time and invalidate relcache >>> when we extend. WAL

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 11:36 AM, Andres Freund wrote: >> In fact, having no way to get the relation length other than scanning >> 1000 files doesn't seem like an especially good choice even if we used >> a better data structure. Putting a header page in the heap would make

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Simon Riggs
On 10 December 2015 at 16:47, Robert Haas wrote: > On Thu, Dec 10, 2015 at 11:36 AM, Andres Freund > wrote: > >> In fact, having no way to get the relation length other than scanning > >> 1000 files doesn't seem like an especially good choice even if

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Greg Stark
On Thu, Dec 10, 2015 at 7:09 PM, Robert Haas wrote: >> I really don't like Robert's proposal of a metapage though. We've got too >> darn many forks per relation already. > > Oh, I wasn't thinking of adding a fork, just repurposing block 0 of > the main fork, as we do for

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 1:22 PM, Simon Riggs wrote: > On 10 December 2015 at 16:47, Robert Haas wrote: >> >> On Thu, Dec 10, 2015 at 11:36 AM, Andres Freund >> wrote: >> >> In fact, having no way to get the relation length other

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 12:55 PM, Greg Stark wrote: > On Thu, Dec 10, 2015 at 4:47 PM, Robert Haas wrote: >> It's not straightforward, but I don't think that's the reason. What >> we could do is look at the call sites that use >> RelationGetNumberOfBlocks()

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Greg Stark
On Thu, Dec 10, 2015 at 4:47 PM, Robert Haas wrote: > > It's not straightforward, but I don't think that's the reason. What > we could do is look at the call sites that use > RelationGetNumberOfBlocks() and change some of them to get the > information some other way

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Tom Lane
Robert Haas writes: > On Thu, Dec 10, 2015 at 1:22 PM, Simon Riggs wrote: >> We can seq scan the array at relcache build time and invalidate relcache >> when we extend. WAL log any extension to a new segment and write the table >> to disk at

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Andres Freund
On 2015-12-10 17:55:37 +, Greg Stark wrote: > It seems to me that if you want to fix the linked lists of files > that's orthogonal to whether the file lengths on disk are > authoritative. You can always keep the lengths or at least the number > of files cached and updated in shared memory in a

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Robert Haas
On Wed, Dec 9, 2015 at 4:54 PM, Tom Lane wrote: > Robert Haas writes: >> The comment in mdnblocks.c says this: > >> * Because we pass O_CREAT, we will create the >> next segment (with >> * zero length)

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-10 Thread Andres Freund
On 2015-12-10 11:10:10 -0500, Robert Haas wrote: > (More broadly, as Kevin was pointing out to me yesterday, md.c looks > like it could do with a face lift. Keeping a linked list of 1GB > segments and chasing down the list to find the length of the file may > have been fine when relations over

[HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-09 Thread Robert Haas
This is per a report by an EnterpriseDB customer and a bunch of off-list analysis by Kevin Grittner and Rahila Syed. Suppose you have a large relation with OID 123456. There are segment files 123456, 123456.1, and 123456.2. Due to some kind of operating system malfeasance, 123456.1 disappears;

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-09 Thread Tom Lane
Robert Haas writes: > The comment in mdnblocks.c says this: > * Because we pass O_CREAT, we will create the > next segment (with > * zero length) immediately, if the last > segment is of length > *