Re: [PATCHES] COPY-able csv log outputs

Greg Smith Sun, 20 May 2007 18:26:00 -0700

I got a chance to review this patch over the weekend. Basic API seemsgood, met all my requirements, no surprises with how the GUC variablecontrolled the feature.

The most fundamental issue I have with the interface is that using COPYmakes it difficult to put any unique index on the resulting table. I liketo have a unique index on my imported log table because it rejects thedupe records if you accidentally import the same section of log filetwice. COPY tosses the whole thing if there's an index violation, whichis a problem during a regular import because you will occasionally comeacross lines with the same timestamp that are similar in every way exceptfor their statment; putting an index on the timestamp+statement seemsimpractical.

I've had a preference for INSERT from the beginning here that thisreinforces. I'm planning to just work around this issue by doing the COPYinto a temporary table and then INSERTing from there. I didn't want tojust let the concern pass by without mentioning it though. It crosses mymind that inserting some sort of unique log file line ID number wouldprevent the dupe issue and make for better ordering (it's possible to havetwo lines with the same timestamp show up in the wrong order now), notsure that's a practical idea to consider.

The basic coding of the patch seemed OK to me, but someone who is muchmore familiar than myself with the mechanics of pipes should take a lookat that part of the patch before committing; it's complicated code and Ican't comment on it. There are some small formatting issues that need tobe fixed, particularly in the host+port mapping. I can fix those myselfand submit a slightly updated patch. There's some documentationimprovements I want to make before this goes in as well.

The patch is actually broken fairly hard right now because of the switchfrom INSERT to COPY FROM CSV as the output format at the last minute. Itoutputs missing fields as NULL (fine for INSERT) that chokes the CSVimport when the session_start timestamp is missing. All of those NULLvalues need to be just replaced with nothing for proper CSV syntax; thereshould just the comma for the next field. I worked around this with


copy pglog from '/opt/pgsql/testlog.csv' with CSV null as 'NULL';

I can fix that too when I'm revising. I plan to have a version free ofobvious bugs to re-submit ready by next weekend.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PATCHES] COPY-able csv log outputs

Reply via email to