Re: pg_dump versus hash partitioning

2023-03-17 Thread Julien Rouhaud
On Fri, Mar 17, 2023 at 01:44:12PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: > >> I think the odds of that yielding a usable dump are nil, so I don't > >> see why we should bother. > > > No objection from me. > > OK, pushed with t

Re: pg_dump versus hash partitioning

2023-03-17 Thread Tom Lane
Julien Rouhaud writes: > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: >> I think the odds of that yielding a usable dump are nil, so I don't >> see why we should bother. > No objection from me. OK, pushed with the discussed changes. regards, tom lane

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud
On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: > >> Yeah, we need to do both. Attached find an updated patch series: > > > I didn't find a CF entry, is it intended? > > Yeah, it's there: > > https://

Re: pg_dump versus hash partitioning

2023-03-16 Thread Tom Lane
Julien Rouhaud writes: > On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: >> Yeah, we need to do both. Attached find an updated patch series: > I didn't find a CF entry, is it intended? Yeah, it's there: https://commitfest.postgresql.org/42/4226/ > I'm not sure if you intend to keep

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud
On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > >> The trick is to detect in pg_restore whether pg_dump chose to do > >> load-via-partition-root. > > > Given that this approach wouldn't help with exi

Re: pg_dump versus hash partitioning

2023-03-13 Thread Tom Lane
Julien Rouhaud writes: > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: >> The trick is to detect in pg_restore whether pg_dump chose to do >> load-via-partition-root. > Given that this approach wouldn't help with existing dump files (at least if > using COPY, in any case the one using

Re: pg_dump versus hash partitioning

2023-03-13 Thread Julien Rouhaud
On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > The BEGIN + TRUNCATE is only there to avoid generating WAL records just in > > case > > the wal_level is minimal. I don't remember if that optimization still > > exists, > > but if yes we could avoid doing th

Re: pg_dump versus hash partitioning

2023-03-12 Thread Tom Lane
Justin Pryzby writes: > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: >> What I propose we do about that is further tweak things so that >> load-via-partition-root forces dumping via COPY. AFAIK the only >> compelling use-case for dump-as-INSERTs is in transferring data >> to a non-Po

Re: pg_dump versus hash partitioning

2023-03-12 Thread Justin Pryzby
On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > What I propose we do about that is further tweak things so that > load-via-partition-root forces dumping via COPY. AFAIK the only > compelling use-case for dump-as-INSERTs is in transferring data > to a non-Postgres database, which is a c

Re: pg_dump versus hash partitioning

2023-03-12 Thread Tom Lane
Julien Rouhaud writes: > The BEGIN + TRUNCATE is only there to avoid generating WAL records just in > case > the wal_level is minimal. I don't remember if that optimization still exists, > but if yes we could avoid doing that if the server's wal_level is replica or > higher? That's not perfect

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Fri, Mar 10, 2023 at 10:10:14PM -0500, Tom Lane wrote: > Julien Rouhaud writes: > > Working on some side project that can cause dump of hash partitions to be > > routed to a different partition, I realized that --load-via-partition-root > > can > > indeed cause deadlock in such case without FK

Re: pg_dump versus hash partitioning

2023-03-10 Thread Tom Lane
Julien Rouhaud writes: > Working on some side project that can cause dump of hash partitions to be > routed to a different partition, I realized that --load-via-partition-root can > indeed cause deadlock in such case without FK dependency or anything else. > The problem is that each worker will p

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Tue, Feb 14, 2023 at 02:21:33PM -0500, Tom Lane wrote: > Here's a set of draft patches around this issue. > > 0001 does what I last suggested, ie force load-via-partition-root for > leaf tables underneath a partitioned table with a partitioned-by-hash > enum column. It wasn't quite as messy as

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > Sure, but I was responding to your assertion that there's no case in > which --load-via-partition-root could cause a restore failure. I'm not > sure that's accurate. Perhaps it's not, but it's certainly far less likely to cause a restore failure than the behavior I want to r

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 12:50 PM Tom Lane wrote: > Robert Haas writes: > > On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: > >> Well, that's a user error not pg_dump's fault. Particularly so for hash > >> partitioning, where there is no defensible reason to make the partitions > >> semanticall

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: >> Well, that's a user error not pg_dump's fault. Particularly so for hash >> partitioning, where there is no defensible reason to make the partitions >> semantically different. > I am still of the opinion that you're going

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: > Well, that's a user error not pg_dump's fault. Particularly so for hash > partitioning, where there is no defensible reason to make the partitions > semantically different. I am still of the opinion that you're going down a dangerous path of red

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > On Tue, Feb 14, 2023 at 2:21 PM Tom Lane wrote: >> This made me wonder if this could be a usable solution at all, but >> after thinking for awhile, I don't see how the claim about foreign key >> constraints is anything but FUD. pg_dump/pg_restore have sufficient >> dependen

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Tue, Feb 14, 2023 at 2:21 PM Tom Lane wrote: > This made me wonder if this could be a usable solution at all, but > after thinking for awhile, I don't see how the claim about foreign key > constraints is anything but FUD. pg_dump/pg_restore have sufficient > dependency logic to prevent that fr

Re: pg_dump versus hash partitioning

2023-02-14 Thread Tom Lane
Here's a set of draft patches around this issue. 0001 does what I last suggested, ie force load-via-partition-root for leaf tables underneath a partitioned table with a partitioned-by-hash enum column. It wasn't quite as messy as I first feared, although we do need a new query (and pg_dump now kn

Re: pg_dump versus hash partitioning

2023-02-02 Thread Tom Lane
Alvaro Herrera writes: > ... so for --load-via-partition-root=auto (or > whatever), we need to ensure that we detect hash partitioning all the > way down from the topmost to the leaves. Yeah, that had already occurred to me, which is why I was not feeling confident about it being an easy hack in

Re: pg_dump versus hash partitioning

2023-02-02 Thread Tom Lane
Andrew Dunstan writes: > On 2023-02-01 We 20:03, Tom Lane wrote: >> Anyway, after re-reading the old thread I wonder if my first instinct >> (force --load-via-partition-root for enum hash cases only) was the >> best compromise after all. I'm not sure how painful it is to get >> pg_dump to detect

Re: pg_dump versus hash partitioning

2023-02-02 Thread Robert Haas
On Wed, Feb 1, 2023 at 6:14 PM Tom Lane wrote: > You waved your arms about inventing some new hashing infrastructure, > but it was phrased in such a way that it wasn't clear to me if that > was actually a serious proposal or not. But if it is: how will you > get around the fact that any change to

Re: pg_dump versus hash partitioning

2023-02-02 Thread Andrew Dunstan
On 2023-02-01 We 20:03, Tom Lane wrote: > > Anyway, after re-reading the old thread I wonder if my first instinct > (force --load-via-partition-root for enum hash cases only) was the > best compromise after all. I'm not sure how painful it is to get > pg_dump to detect such cases, but it's proba

Re: pg_dump versus hash partitioning

2023-02-02 Thread Alvaro Herrera
On 2023-Feb-01, Robert Haas wrote: > I think you can construct plausible cases where it's not just > academic. For instance, suppose I intend to use some kind of logical > replication system, not necessarily the one built into PostgreSQL, to > replicate data between two systems. Before engaging th

Re: pg_dump versus hash partitioning

2023-02-02 Thread Laurenz Albe
On Wed, 2023-02-01 at 17:49 -0500, Tom Lane wrote: > Robert Haas writes: > > On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: > > > I can agree with that argument for range or list partitioning, where > > > the partitions have some semantic meaning to the user.  I don't buy it > > > for hash partit

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
David Rowley writes: > Digging into the history a bit, I found [2] and particularly [3] that > seem to indicate this option was thought about due to concerns about > hash functions not returning consistent results on different > architectures. I suspect it might have been defaulted to load into th

Re: pg_dump versus hash partitioning

2023-02-01 Thread David Rowley
On Thu, 2 Feb 2023 at 11:38, Tom Lane wrote: > > Peter Geoghegan writes: > > You mentioned "minor releases" here. Who said anything about that? > > I did: I'd like to back-patch the fix if possible. I think changing > the default --load-via-partition-root choice could be back-patchable. > > If R

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 2:49 PM Tom Lane wrote: > It's precisely because you want to analyze it in the same terms > as range/list partitioning that we have these issues. Or we could > have built it on some other infrastructure than hash index opclasses > ... but we didn't do that, and now we have

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 4:12 PM Tom Lane wrote: >> That being the case, I don't think moving the goalposts for hash >> function stability is going to lead to a workable solution. > I don't see that there is any easy, clean way to solve this in > released branches. The idea t

Re: pg_dump versus hash partitioning

2023-02-01 Thread David G. Johnston
On Wed, Feb 1, 2023 at 3:38 PM Tom Lane wrote: > Peter Geoghegan writes: > > You mentioned "minor releases" here. Who said anything about that? > > I did: I'd like to back-patch the fix if possible. I think changing > the default --load-via-partition-root choice could be back-patchable. > > If

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: >> I can agree with that argument for range or list partitioning, where >> the partitions have some semantic meaning to the user. I don't buy it >> for hash partitioning. It was an implementation artifact to begin >> with that

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Peter Geoghegan writes: > You mentioned "minor releases" here. Who said anything about that? I did: I'd like to back-patch the fix if possible. I think changing the default --load-via-partition-root choice could be back-patchable. If Robert is resistant to that but would accept it in master, I'

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: > > Here, you'd like to argue that it's perfectly > > fine if we instead insert some of the rows into different tables than > > where they were on the original system. > > I can agree with that argument for range or list partitioning, where > the part

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 2:12 PM Robert Haas wrote: > On Wed, Feb 1, 2023 at 4:44 PM Peter Geoghegan wrote: > > This is a misrepresentation of Tom's words. It isn't actually > > self-evident what "we end up with all of the same objects, each > > defined in the same way, and that all of the tables e

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > Tom, as I understand it, is arguing that the > --load-via-partition-root behavior has negligible downsides and is > almost categorically better than the current default behavior, and > thus making that the new default in some or all situations in a minor > release is totally

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 4:44 PM Peter Geoghegan wrote: > This is a misrepresentation of Tom's words. It isn't actually > self-evident what "we end up with all of the same objects, each > defined in the same way, and that all of the tables end up with all > the same contents that they had before" ac

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > It seems to me that the job of pg_dump is to produce a dump that, when > reloaded on another system, recreates the same database state. That > means that we end up with all of the same objects, each defined in the > same way, and that all of the tables end up with all the sam

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 4:12 PM Tom Lane wrote: > > I don't think the fact that our *traditional* standard for how stable > > a hash function needs to be has been XYZ carries any water. > > Well, it wouldn't need to if we had a practical way of changing the > behavior of an existing hash function,

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 12:39 PM Robert Haas wrote: > I don't think the fact that our *traditional* standard for how stable > a hash function needs to be has been XYZ carries any water. Needs > change over time, and we adapt the code to meet the new needs. Since > we have no system for type propert

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 1:14 PM Robert Haas wrote: > It seems to me that the job of pg_dump is to produce a dump that, when > reloaded on another system, recreates the same database state. That > means that we end up with all of the same objects, each defined in the > same way, and that all of the

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 3:34 PM Tom Lane wrote: > I spent a bit more time thinking about that, and while I agree that > it's an oddity, I don't see that it matters in the case of hash > partitioning. You would notice an issue if you tried to do a selective > restore of just one partition --- but u

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 12:34 PM Tom Lane wrote: > > Also, and I think pretty > > significantly, using --load-via-partition-root forces you to pay the > > overhead of rerouting every tuple to the target partition whether you > > need it or not, which is potentially a large unnecessary expense. > >

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 1:23 PM Tom Lane wrote: >> In the meantime, I think we need to recognize that hash values are >> not very portable. I do not think we do our users a service by >> letting them discover the corner cases the hard way. > I think you're not really engagi

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 1:23 PM Tom Lane wrote: > Well, that was what I thought too to start with, but I now think that > it is far too narrow-minded a view of the problem. The real issue > is something I said that you trimmed: > > >> In general, we've never thought that hash values are > >> requi

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > ... I like the > fact that we have --load-via-partition-root, but it is a bit of a > hack. You don't get a single copy into the partition root, you get one > per child table -- and those COPY statements are listed as data for > the partitions where the data lives now, not for

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 11:18 AM Tom Lane wrote: >> Over at [1] we have a complaint that dump-and-restore fails for >> hash-partitioned tables if a partitioning column is an enum, >> because the enum values are unlikely to receive the same OIDs >> in the destination database

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 11:18 AM Tom Lane wrote: > Over at [1] we have a complaint that dump-and-restore fails for > hash-partitioned tables if a partitioning column is an enum, > because the enum values are unlikely to receive the same OIDs > in the destination database as they had in the source,