2014-12-26 11:41 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>: > > > 2014-12-25 22:23 GMT+01:00 Alex Shulgin <a...@commandprompt.com>: > >> Trent Shipley <trent_ship...@qwest.net> writes: >> >> > On Friday 2007-12-14 16:22, Tom Lane wrote: >> >> Neil Conway <ne...@samurai.com> writes: >> >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct >> COPY >> >> > to drop (and log) rows that contain malformed data. That is, rows >> with >> >> > too many or too few columns, rows that result in constraint >> violations, >> >> > and rows containing columns where the data type's input function >> raises >> >> > an error. The last case is the only thing that would be a bit tricky >> to >> >> > implement, I think: you could use PG_TRY() around the >> InputFunctionCall, >> >> > but I guess you'd need a subtransaction to ensure that you reset your >> >> > state correctly after catching an error. >> >> >> >> Yeah. It's the subtransaction per row that's daunting --- not only the >> >> cycles spent for that, but the ensuing limitation to 4G rows imported >> >> per COPY. >> > >> > You could extend the COPY FROM syntax with a COMMIT EVERY n clause. >> This >> > would help with the 4G subtransaction limit. The cost to the ETL >> process is >> > that a simple rollback would not be guaranteed send the process back to >> it's >> > initial state. There are easy ways to deal with the rollback issue >> though. >> > >> > A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY >> > option is selected then the COPY FROM can run without subtransactions >> and in >> > excess of the 4G per transaction limit. NO RETRY should be the default >> since >> > it preserves the legacy behavior of COPY FROM. >> > >> > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not >> give the >> > option of sending exceptions to a table since they are presumably >> malformed, >> > otherwise they would not be exceptions. (Users should re-process >> exception >> > files if they want an if good then table a else exception to table b >> ...) >> > >> > EXCEPTIONS TO and NO RETRY would be mutually exclusive. >> > >> > >> >> If we could somehow only do a subtransaction per failure, things would >> >> be much better, but I don't see how. >> >> Hello, >> >> Attached is a proof of concept patch for this TODO item. There is no >> docs yet, I just wanted to know if approach is sane. >> >> The added syntax is like the following: >> >> COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout] >> >> The way it's done it is abusing Copy Both mode and from my limited >> testing, that seems to just work. The error trapping itself is done >> using PG_TRY/PG_CATCH and can only catch formatting or before-insert >> trigger errors, no attempt is made to recover from a failed unique >> constraint, etc. >> >> Example in action: >> >> postgres=# \d test_copy2 >> Table "public.test_copy2" >> Column | Type | Modifiers >> --------+---------+----------- >> id | integer | >> val | integer | >> >> postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout; >> 1 >> NOTICE: missing data for column "val" >> CONTEXT: COPY test_copy2, line 1: "1" >> 2 >> NOTICE: missing data for column "val" >> CONTEXT: COPY test_copy2, line 2: "2" >> 3 >> NOTICE: missing data for column "val" >> CONTEXT: COPY test_copy2, line 3: "3" >> NOTICE: total exceptions ignored: 3 >> >> postgres=# \d test_copy1 >> Table "public.test_copy1" >> Column | Type | Modifiers >> --------+---------+----------- >> id | integer | not null >> >> postgres=# set client_min_messages to warning; >> SET >> postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout; >> ... >> vmstat >> zoneinfo >> postgres=# >> >> Limited performance testing shows no significant difference between >> error-catching and plain code path. For example, timing >> >> copy test_copy1 from program 'seq 1000000' [exceptions to stdout] >> >> shows similar numbers with or without the added "exceptions to" clause. >> >> Now that I'm sending this I wonder if the original comment about the >> need for subtransaction around every loaded line still holds. Any >> example of what would be not properly rolled back by just PG_TRY? >> > > this method is unsafe .. exception handlers doesn't free memory usually - > there is risk of memory leaks, source leaks > > you can enforce same performance with block subtransactions - when you use > subtransaction for 1000 rows, then impact of subtransactions is minimal > > when block fails, then you can use row level subtransaction - it works > well when you expect almost correct data. >
Two years ago I wrote a extension that did it - but I have not time to finish it and push to upstream. Regards Pavel > > Regards > > Pavel > > >> >> Happy hacking! >> -- >> Alex >> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> >
ftcopy-04.tgz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers