On Tue, Jan 23, 2024 at 1:02 AM torikoshia <torikos...@oss.nttdata.com> wrote: > > On 2024-01-17 14:47, Masahiko Sawada wrote: > > On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikos...@oss.nttdata.com> > > wrote: > >> > >> Hi, > >> > >> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to > >> skip malformed data, but there is no way to watch the number of > >> skipped > >> rows during COPY. > >> > >> Attached patch adds tuples_skipped to pg_stat_progress_copy, which > >> counts the number of skipped tuples because source data is malformed. > >> If SAVE_ERROR_TO is not specified, this column remains zero. > >> > >> The advantage would be that users can quickly notice and stop COPYing > >> when there is a larger amount of skipped data than expected, for > >> example. > >> > >> As described in commit log, it is expected to add more choices for > >> SAVE_ERROR_TO like 'log' and using such options may enable us to know > >> the number of skipped tuples during COPY, but exposed in > >> pg_stat_progress_copy would be easier to monitor. > >> > >> > >> What do you think? > > > > +1 > > > > The patch is pretty simple. Here is a comment: > > > > + (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise > > zero). > > + </para></entry> > > + </row> > > > > To be precise, this counter only advances when a value other than > > 'ERROR' is specified to SAVE_ERROR_TO option. > > Thanks for your comment and review! > > Updated the patch according to your comment and option name change by > b725b7eec.
Thanks! The patch looks good to me. I'm going to push it tomorrow, barring any objections. > > > BTW, based on this patch, I think we can add another option which > specifies the maximum tolerable number of malformed rows. > I remember this was discussed in [1], and feel it would be useful when > loading 'dirty' data but there is a limit to how dirty it can be. > Attached 0002 is WIP patch for this(I haven't added doc yet). Yeah, it could be a good option. > This may be better discussed in another thread, but any comments(e.g. > necessity of this option, option name) are welcome. I'd recommend forking a new thread for this option. As far as I remember, there also was an opinion that "reject limit" stuff is not very useful. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com