date:20190416

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

2019-04-16 Thread John Naylor

On Wed, Apr 17, 2019 at 2:04 AM Andres Freund  wrote:
>
> Hi,
>
> I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
> complexity that looks like it should be purely in freespacemap.c to
> callers.
>
>
>  extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
> -extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
> +extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
> +bool check_fsm_only);
>
> So now freespace.c has an argument that says we should only check the
> fsm. That's confusing. And it's not explained to callers what that
> argument means, and when it should be set.

When first looking for free space, it's "false": Within
GetPageWithFreeSpace(), we call RelationGetNumberOfBlocks() if the FSM
returns invalid.

If we have to extend, after acquiring the lock to extend the relation,
we call GetPageWithFreeSpace() again to see if another backend already
extended while waiting on the lock. If there's no FSM, the thinking
is, it's not worth it to get the number of blocks again.

> @@ -176,20 +269,44 @@ RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber 
> oldPage,
>   * Note that if the new spaceAvail value is higher than the old value stored
>   * in the FSM, the space might not become visible to searchers until the next
>   * FreeSpaceMapVacuum call, which updates the upper level pages.
> + *
> + * Callers have no need for a local map.
>   */
>  void
> -RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
> +RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
> +   Size spaceAvail, BlockNumber nblocks)
>
> There's no explanation as to what that "nblocks" argument is. One
> basically has to search other callers to figure it out. It's not even
> clear to which fork it relates to. Nor that one can set it to
> InvalidBlockNumber if one doesn't have the relation size conveniently
> reachable.  But it's not exposed to RecordAndGetPageWithFreeSpace(), for
> a basically unexplained reason.  There's a comment above
> fsm_allow_writes() - but that's  file-local function that external
> callers basically have need to know about.

Okay.

> I can't figure out what "Callers have no need for a local map." is
> supposed to mean.

It was meant to contrast with [RecordAnd]GetPageWithFreeSpace(), but I
see how it's confusing.

> +/*
> + * Clear the local map.  We must call this when we have found a block with
> + * enough free space, when we extend the relation, or on transaction abort.
> + */
> +void
> +FSMClearLocalMap(void)
> +{
> +   if (FSM_LOCAL_MAP_EXISTS)
> +   {
> +   fsm_local_map.nblocks = 0;
> +   memset(_local_map.map, FSM_LOCAL_NOT_AVAIL,
> +  sizeof(fsm_local_map.map));
> +   }
> +}
> +
>
> So now there's a new function one needs to call after successfully using
> the block returned by [RecordAnd]GetPageWithFreeSpace().  But it's not
> referenced from those functions, so basically one has to just know that.

Right.

> +/* Only create the FSM if the heap has greater than this many blocks */
> +#define HEAP_FSM_CREATION_THRESHOLD 4
>
> Hm, this seems to be tying freespace.c closer to heap than I think is
> great - think of new AMs like zheap, that also want to use it.

Amit and I kept zheap in mind when working on the patch. You'd have to
work around the metapage, but everything else should work the same.

> I think this is mostly fallout about the prime issue I'm unhappy
> about. There's now some global variable in freespacemap.c that code
> using freespace.c has to know about and maintain.
>
>
> +static void
> +fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> +{
> +   BlockNumber blkno,
> +   cached_target_block;
> +
> +   /* The local map must not be set already. */
> +   Assert(!FSM_LOCAL_MAP_EXISTS);
> +
> +   /*
> +* Starting at the current last block in the relation and working
> +* backwards, mark alternating blocks as available.
> +*/
> +   blkno = cur_nblocks - 1;
>
> That comment explains very little about why this is done, and why it's a
> good idea.

Short answer: performance -- it's too expensive to try every block.
The explanation is in storage/freespace/README -- maybe that should be
referenced here?

> +/* Status codes for the local map. */
> +
> +/* Either already tried, or beyond the end of the relation */
> +#define FSM_LOCAL_NOT_AVAIL 0x00
> +
> +/* Available to try */
> +#define FSM_LOCAL_AVAIL0x01
>
> +/* Local map of block numbers for small heaps with no FSM. */
> +typedef struct
> +{
> +   BlockNumber nblocks;
> +   uint8   map[HEAP_FSM_CREATION_THRESHOLD];
> +}  FSMLocalMap;
> +
>
> Hm, given realistic HEAP_FSM_CREATION_THRESHOLD, and the fact that we
> really only need one bit per relation, it seems like map should really
> be just a uint32 with one bit per page.

I fail to see the advantage of that.

> +static bool
> +fsm_allow_writes(Relation rel, BlockNumber

RE: Copy data to DSA area

2019-04-16 Thread Ideriha, Takeshi

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>Sent: Wednesday, December 5, 2018 2:42 PM
>Subject: RE: Copy data to DSA area

Hi
It's been a long while since we discussed this topic.
Let me recap first and I'll give some thoughts.

It seems things we got consensus is:

- Want to palloc/pfree transparently in DSA
- Use Postgres-initialized shared memory as DSA
- Don’t leak memory in shared memory

Things under discussion:
- How we prevent memory leak
- How we prevent dangling pointer after cleaning up about-to-leak-objects

Regarding memory leak, I think Robert's idea that allocate objects under
temporal context
while building and re-parent it to permanent one at some point is promising.
While building objects they are under temporal DSA-MemoryContext, which is
child of TopTransactionContext (if it's in the transaction) and are freed all
at once when error happens.
To do delete all the chunks allocated under temporal DSA context, we need to
search
or remember all chunks location under the context. Unlike AllocAset we don't
have block information
to delete them altogether.

So I'm thinking to manage dsa_allocated chunks with single linked list to keep
track of them and delete them.
The context has head of linked list and all chunks have pointer to next
allocated chunk.
But this way adds space overhead to every dsa_allocated chunk and we maybe want
to avoid it because shared memory size is limited.
In this case, we can free these pointer area at some point when we make sure
that allocation is successful.

Another debate is when we should think the allocation is successful (when we
make sure object won't leak).
If allocation is done in the transaction, we think if transaction is committed
we can think it's safe.
Or I assume this DSA memory context for cache such as relcache, catcache,
plancache and so on.
In this case cache won't leak once it's added to hash table or list because I
assume some eviction mechanism like LRU will be implemented
and it will erase useless cache some time later.

What do you think about these ideas?

Regarding dangling pointer I think it's also problem.
After cleaning up objects to prevent memory leak we don't have mechanism to
reset dangling pointer.
On this point I gave some thoughts while ago though begin_allocate/end_allocate
don't seem good names.
Maybe more explaining names are like
start_pointing_to_dsa_object_under_construction() and
end_pointing_to_dsa_object_under_construction().
https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F1F259F%40G01JPEXMBKW04

If we make sure that such dangling pointer never happen, we don't need to use
it.
As Thomas mentioned before, where these interface should be put needs review
but couldn't hit upon another solution right now.

Do you have some thoughts?

best regards,
Ideriha, Takeshi

69 matches

Mail list logo