On 12/11/2011 09:27 AM, Jon Nelson wrote:
The first method involved writing a C program to parse a file, parse
the lines and output newly-formatted lines in a format that
postgresql's COPY function can use.
End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
output new data to new file -- 4 seconds, COPY new file -- 10
seconds).
Why not `COPY tablename FROM /path/to/myfifo' ?

Just connect your import program up to a named pipe (fifo) created with `mknod myfifo p` either by redirecting stdout or by open()ing the fifo for write. Then have Pg read from the fifo. You'll save a round of disk writes and reads.
The next approach I took was to write a C function in postgresql to
parse a single TEXT datum into an array of C strings, and then use
BuildTupleFromCStrings. There are 8 columns involved.
Eliding the time it takes to COPY the (raw) file into a temporary
table, this method took 120 seconds, give or take.

The difference was /quite/ a surprise to me. What is the probability
that I am doing something very, very wrong?
Have a look at how COPY does it within the Pg sources, see if that's any help. I don't know enough about Pg's innards to answer this one beyond that suggestion, sorry.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to