> On Jun 15, 2022, at 8:18 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> Hi,
> 
> On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
>>> On Jun 15, 2022, at 7:30 PM, Andres Freund <and...@anarazel.de> wrote:
>>>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>>>> does anything at all based on that.  A TAM could decide to do the exact
>>>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>>>> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
>>>> behavior here>.
>>> 
>>> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
>>> that's possible with all extension APIs.
>> 
>> I don't think it's "stupid stuff".  A major motivation, perhaps the only
>> useful motivation, for implementing a TAM is to get a non-trivial
>> performance boost (relative to heap) for some target workload, almost
>> certainly at the expense of worse performance for another workload.  To
>> optimize any particular workload sufficiently to make it worth the bother,
>> you've pretty much got to do something meaningfully different than what heap
>> does.
> 
> Sure. I just don't see what that has to do with doing something widely
> differing in VACUUM FULL. Which AM can't support that? I guess there's some
> where implementing the full visibility semantics isn't feasible, but that's
> afaics OK.

The problem isn't that the TAM can't do it.  Any TAM can probably copy its 
contents verbatim from the OldHeap to the NewHeap.  But it's really stupid to 
have to do that if you're not going to change anything along the way, and I 
think if you divorce your thinking from how heap-AM works sufficiently, you 
might start to see how other TAMs might have nothing useful to do at this step. 
 So there's a strong motivation to not be forced into a "move all the data 
uselessly" step.

I don't really want to discuss the proprietary details of any TAMs I'm 
developing, so I'll use some made up examples.  Imagine you have decided to 
trade off the need to vacuum against the cost of storing 64bit xids.  That 
doesn't mean that compaction isn't maybe still a good thing, but you don't need 
to vacuum for anti-wraparound purposes anymore.  Now imagine you've also 
decided to trade off insert speed for select speed, and you sort the entire 
table on every insert, and to keep indexes happy, you keep a "externally 
visible TID" to "internal actual location" mapping in a separate fork.   Let's 
say you also don't support UPDATE or DELETE at all.  Those immediately draw an 
error, "not implemented by this tam".

At this point, you have a fully sorted and completely compacted table at all 
times.  It's simply an invariant of the TAM.  So, now what exactly is VACUUM 
FULL or CLUSTER supposed to do?  Seems like the answer is "diddly squat", and 
yet you seem to propose requiring the TAM to do it.  I don't like that.

>>>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>>>> synonyms.  Or am I missing something?
>>> 
>>> The callback gets passed use_sort. So just implement it use_sort = false and
>>> error out if use_sort = true?
>> 
>> I'm not going to say that your idea is unreasonable for a TAM that you might
>> choose to implement, but I don't see why that should be required of all TAMs
>> anybody might ever implement.
> 
>> The callback that gets use_sort is called from copy_table_data().  By that 
>> time, it's too late to avoid the
>> 
>>    /*
>>     * Open the relations we need.
>>     */
>>    NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
>>    OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
>> 
>> code that has already happened in cluster.c's copy_table_data() function,
>> and unless I raise an error, after returning from my TAM's callback, the
>> cluster code will replace the old table with the new one.  I'm left no
>> choices but to copy my data over, loose my data, or abort the command.  None
>> of those are OK options for me.
> 
> I think you need to do a bit more explaining of what you're actually trying to
> achieve here. You're just saying "I don't want to", which doesn't really help
> me to understand the set of useful options.

I'm trying to opt out of cluster/vacfull.

>> I'm open to different solutions.  If a simple callback like
>> relation_supports_cluster(Relation rel) is too simplistic, and more
>> parameters with more context information is wanted, then fine, let's do
>> that.
> 
> FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
> relation_copy_for_cluster optional.
> 
> I still think it's a terrible idea to silently ignore tables in CLUSTER or
> VACUUM FULL.

I'm not entirely against you on that, but it makes me cringe that we impose 
design decisions like that on any and all future TAMs.  It seems better to me 
to let the TAM author decide to emit an error, warning, notice, or whatever, as 
they see fit.

>> But I can't really complete my work with the interface as it stands
>> now.
> 
> Since you've not described that work to a meaningful degree...

I don't think I should have to do so.  It's like saying, "I think I should have 
freedom of speech", and you say, "well, I'm not sure about that; tell me what 
you want to say, and I'll decide if I'm going to let you say it".  That's not 
freedom.  I think TAM authors should have broad discretion over anything that 
the core system doesn't have a compelling interest in controlling.  You've not 
yet said why a TAM should be prohibited from opting out of cluster/vacfull.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to