I'd been looking over the COPY FROM code tonight when I saw something pretty scary looking:
/* on input just throw the header line away */ if (cstate->cur_lineno == 0 && cstate->header_line) { cstate->cur_lineno++; if (CopyReadLine(cstate)) return false; /* done */ } while it might not look too scary by itself, it gets a bit more so when you learn that the cur_lineno is only 32 bits wide. This will result in skipping a tuple every time the 32-bit variable wraps back around to 0 again. Maybe when this code was written copying > 4 billion rows was just a far-off dream, but with today's hardware, it really didn't take that long to see this actually happen for real. postgres=# create unlogged table t(a int); CREATE TABLE Time: 1.339 ms postgres=# insert into t select 0 from generate_series(1, 4300000000); INSERT 0 4300000000 Time: 2128367.019 ms (35:28.367) postgres=# copy t to '/home/ubuntu/t.csv' with (format csv, header); COPY 4300000000 Time: 2294331.494 ms (38:14.331) postgres=# truncate t; TRUNCATE TABLE Time: 30.367 ms postgres=# copy t from '/home/ubuntu/t.csv' with (format csv, header); COPY 4299999999 Time: 2693186.475 ms (44:53.186) A patch to fix is attached. (I just made the variable 64-bit) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
0001-Fix-COPY-FROM-not-to-skip-a-tuple-every-2-32-tuples.patch
Description: Binary data