Re: [GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?
I have analyzed the PostgreSQL protocol using Wireshark (an open source packet analyzer), and I observed that the PostgreSQL backend, while doing a COPY ... FROM STDIN, reports errors as soon as possible (especially errors related to invalid data). Therefore, the late reporting of errors while doing a COPY ... FROM STDIN is not a limitation of the underlying protocol; it is a limitation (or a design choice) of the libpq library. It looks like this is a well known issue because it is listed on the todo list: http://wiki.postgresql.org/wiki/Todo#COPY And was discussed before: http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php Do you think it is possible to change that behavior, or work around it? While reading libpq source code, I noticed the function pqParseInput3 (file fe-protocol3.c) ignores error responses while the connection is in PGASYNC_COPY_IN state. Maybe we can make a special case for the COPY FROM subprotocol and handle errors early, in order to make them available to PQgetResult? Is is feasible in a simple way or is it a bad idea? Regards, Nicolas Grilly On Wed, Feb 2, 2011 at 20:06, John R Pierce pie...@hogranch.com wrote: On 02/02/11 10:20 AM, Nicolas Grilly wrote: Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send gigabytes of data with just one copy ... from stdio query, and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 1 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk? I would batch the data, maybe 1000 lines or even 100 lines at a time if these errors are at all frequent. put the errored batches in an exception list or something so you can sort them out later.
[GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?
Hello, I am importing gigabytes of data into PostgreSQL, and I don't want to wait 10 minutes just to discover an error in the 10th line of my input file. I tried the command \copy ... from stdio in psql and it looks like psql has to read the entire input before returning a potential error, even if the invalid value is in one of the first rows. Is it a limitation of PostgreSQL protocol, of the library lipq, or of the tool psql? Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send gigabytes of data with just one copy ... from stdio query, and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 1 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk? Thanks for your help and advice. Regards, Nicolas Grilly
Re: [GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?
On 02/02/11 10:20 AM, Nicolas Grilly wrote: Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send gigabytes of data with just one copy ... from stdio query, and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 1 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk? I would batch the data, maybe 1000 lines or even 100 lines at a time if these errors are at all frequent. put the errored batches in an exception list or something so you can sort them out later. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general