* Robert Haas (robertmh...@gmail.com) wrote:
> I think it's pretty unrealistic to suppose that this can be made to
> work.  The most obvious problem is that a sequential scan is coded to
> assume that every block between 0 and the last block in the relation
> is worth reading, 

You don't change that.  However, when a seq scan asks the storage layer
for blocks that it knows don't actually exist, it can simply skip over
them or return "empty" records or something equivilant...  Yes, that's
hand-wavy, but I also think it's doable.

> I suspect there are
> slightly less obvious problems that would turn out to be highly
> intractable.

Entirely possible. :)

> The assumption that block numbers are dense is probably
> embedded in the system in a lot of subtle ways; if we start trying to
> change I think we're dooming ourselves to an unending series of crocks
> trying to undo the mess we've created.

Perhaps.

> Also, I think that's really a red herring anyway.  Relation extension
> per se is not slow - we can grow a file by adding zero bytes at a
> pretty good clip, and don't really gain anything at the database level
> by spreading the growth across multiple files.

That's true when the file is on a single filesystem and a single set of
drives.  Make them be split across multiple filesystems/volumes where
you get more drives involved...

> The problem is the
> relation extension LOCK, and I think that's where we should be
> focusing our attention.  I'm pretty confident we can find a way to
> take the pressure off the lock without actually changing anything all
> at the storage layer.

That would certainly be very neat and if possible might render my idea
moot, which I would be more than happy with.

> As a thought experiment, suppose for example
> that we have a background process that knows, by magic, how many new
> blocks will be needed in each relation.  And it knows this just enough
> in advance to have time to extend each such relation by the requisite
> number of blocks and add those blocks to the free space map.  Since
> only that process ever needs a relation extension lock, there is no
> longer any contention for any such lock.  Problem solved!

Sounds cute, but perhaps a bit too cute to be realistic (that's
certainly been my opinion when suggested by others, which is has been,
in the past).

> Actually, I'm not convinced that a background process is the right
> approach at all, and of course there's no actual magic that lets us
> foresee exact extension needs.  But I still feel like that thought
> experiment indicates that there must be a solution here just by
> rejiggering the locking, and maybe with a bit of modest pre-extension.
>  The mediocre results of my last couple tries must indicate that I
> wasn't entirely successful in getting the backends out of each others'
> way, but I tend to think that's just an indication that I don't
> understand exactly what's happening in the contention scenarios yet,
> rather than a fundamental difficulty with the approach.

Perhaps.

> > How many concurrent writers did you have and what kind of filesystem was
> > backing this?  Was it a temp filesystem where writes are essentially to
> > memory, causing this relation extention lock to be much more
> > contentious?
> 
> 10.  ext4.  No.

Ok.

> If I took 30 seconds to pre-extend the relation before writing any
> data into it, then writing the data went pretty much exactly 10 times
> faster with 10 writers than with 1.

That's rather fantastic..

> But small on-the-fly
> pre-extensions during the write didn't work as well.  I don't remember
> exactly what formulas I tried, but I do remember that the few I tried
> were not really any better than "always pre-extend by 1 extra block";
> and that alone eliminated about half the contention, but then I
> couldn't do better.  

That seems quite odd to me- I would have thought extending by more than
2 blocks would have helped with the contention.  Still, it sounds like
extending requires a fair bit of writing, and that sucks in its own
right because we're just going to rewrite that- is that correct?  If so,
I like proposal even more...

> I wonder if I need to use LWLockAcquireOrWait().

I'm not seeing how/why that might help?

        Thanks,

                Stephen

Attachment: signature.asc
Description: Digital signature

Reply via email to