Does anyone have a suggestion on how to handle this issue?  The report
is a year old.

---------------------------------------------------------------------------

On Thu, Oct  6, 2022 at 10:28:15AM -0400, Bruce Momjian wrote:
> On Thu, Sep 29, 2022 at 02:30:13PM +0000, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> > 
> > Page: https://www.postgresql.org/docs/14/app-pgrestore.html
> > Description:
> > 
> > pg_restore seems to have two ways to restore data:
> > 
> > --section=data 
> > 
> > or
> > 
> >  --data-only
> > 
> > There is this cryptic warning that --data-only is "similar to but for
> > historical reasons different from" --section=data
> > 
> > But there is no further explanation of what those differences are or what
> > might be missed or different in your restore if you pick one option or the
> > other. Maybe one or the other option is the "preferred current way" and one
> > is the "historical way" or they are aimed at different types of use cases,
> > but that's not clear.
> 
> [Thread moved from docs to hackers because there are behavioral issues.]
> 
> Very good question.  I dug into this and found this commit which says
> --data-only and --section=data were equivalent:
> 
>       commit a4cd6abcc9
>       Author: Andrew Dunstan <and...@dunslane.net>
>       Date:   Fri Dec 16 19:09:38 2011 -0500
>       
>           Add --section option to pg_dump and pg_restore.
>       
>           Valid values are --pre-data, data and post-data. The option can be
>           given more than once. --schema-only is equivalent to
> -->       --section=pre-data --section=post-data. --data-only is equivalent
> -->       to --section=data.
>       
> and then this commit which says they are not:
> 
>       commit 4317e0246c
>       Author: Tom Lane <t...@sss.pgh.pa.us>
>       Date:   Tue May 29 23:22:14 2012 -0400
>       
>           Rewrite --section option to decouple it from 
> --schema-only/--data-only.
>       
> -->       The initial implementation of pg_dump's --section option supposed 
> that the
> -->       existing --schema-only and --data-only options could be made 
> equivalent to
> -->       --section settings.  This is wrong, though, due to dubious but long 
> since
>           set-in-stone decisions about where to dump SEQUENCE SET items, as 
> seen in
>           bug report from Martin Pitt.  (And I'm not totally convinced there 
> weren't
>           other bugs, either.)  Undo that coupling and instead drive --section
>           filtering off current-section state tracked as we scan through the 
> TOC
>           list to call _tocEntryRequired().
>       
>           To make sure those decisions don't shift around and hopefully save 
> a few
>           cycles, run _tocEntryRequired() only once per TOC entry and save 
> the result
>           in a new TOC field.  This required minor rejiggering of ACL 
> handling but
>           also allows a far cleaner implementation of 
> inhibit_data_for_failed_table.
>       
>           Also, to ensure that pg_dump and pg_restore have the same behavior 
> with
>           respect to the --section switches, add _tocEntryRequired() 
> filtering to
>           WriteToc() and WriteDataChunks(), rather than trying to implement 
> section
>           filtering in an entirely orthogonal way in dumpDumpableObject().  
> This
>           required adjusting the handling of the special ENCODING and 
> STDSTRINGS
>           items, but they were pretty weird before anyway.
> 
> and this commit which made them closer:
> 
>       commit 5a39114fe7
>       Author: Tom Lane <t...@sss.pgh.pa.us>
>       Date:   Fri Oct 26 12:12:42 2012 -0400
>       
>           In pg_dump, dump SEQUENCE SET items in the data not pre-data 
> section.
>       
>           Represent a sequence's current value as a separate TableDataInfo 
> dumpable
>           object, so that it can be dumped within the data section of the 
> archive
> -->       rather than in pre-data.  This fixes an undesirable inconsistency 
> between
> -->       the meanings of "--data-only" and "--section=data", and also fixes 
> dumping
>           of sequences that are marked as extension configuration tables, as 
> per a
>           report from Marko Kreen back in July.  The main cost is that we do 
> one more
>           SQL query per sequence, but that's probably not very meaningful in 
> most
>           databases.
> 
> Looking at the restore code, I see --data-only disabling triggers, while
> --section=data doesn't.  I also tested --data-only vs. --section=data in
> pg_dump for the regression database and saw the only differences as the
> creation and comments on large objects, e.g.,
> 
>       -- Name: 2121; Type: BLOB; Schema: -; Owner: postgres
>       --
>       
>       SELECT pg_catalog.lo_create('2121');
> 
> 
>       ALTER LARGE OBJECT 2121 OWNER TO postgres;
> 
>       --
>       -- Name: LARGE OBJECT 2121; Type: COMMENT; Schema: -; Owner: postgres
>       --
> 
>       COMMENT ON LARGE OBJECT 2121 IS 'testing comments';
> 
> but the large object _data_ was dumped in both cases.
> 
> So, where does this leave us?  We know we need --section=data because
> the pre/post-data options are clearly useful, so why would someone use
> --data-only vs. --section=data.  We don't document why to use one rather
> than the other, so the --data-only option looks useless to me.  Do we
> remove it, adjust it, or leave it alone?
> 
> -- 
>   Bruce Momjian  <br...@momjian.us>        https://momjian.us
>   EDB                                      https://enterprisedb.com
> 
>   Indecision is a decision.  Inaction is an action.  Mark Batterson
> 
> 
> 

-- 
  Bruce Momjian  <br...@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.


Reply via email to