Re: [PATCHES] COPY-able csv log outputs

Andrew Dunstan Sun, 20 May 2007 20:24:55 -0700


Greg Smith wrote:

I got a chance to review this patch over the weekend. Basic API seemsgood, met all my requirements, no surprises with how the GUC variablecontrolled the feature.
The most fundamental issue I have with the interface is that usingCOPY makes it difficult to put any unique index on the resultingtable. I like to have a unique index on my imported log table becauseit rejects the dupe records if you accidentally import the samesection of log file twice. COPY tosses the whole thing if there's anindex violation, which is a problem during a regular import becauseyou will occasionally come across lines with the same timestamp thatare similar in every way except for their statment; putting an indexon the timestamp+statement seems impractical.

Does the format not include the per-process line number? (I know ibriefly looked at this patch previously, but I forget the details.) Onereason I originally included line numbers in log_line-prefix was tohandle this sort of problem.

I've had a preference for INSERT from the beginning here that thisreinforces.

COPY is our standard bulk insert mechanism. I think arguing against itwould be a very hard sell.

I'm planning to just work around this issue by doing the COPY into atemporary table and then INSERTing from there. I didn't want to justlet the concern pass by without mentioning it though. It crosses mymind that inserting some sort of unique log file line ID number wouldprevent the dupe issue and make for better ordering (it's possible tohave two lines with the same timestamp show up in the wrong ordernow), not sure that's a practical idea to consider.

I guess that answers my question. We should definitely provide a uniqueline key.

The basic coding of the patch seemed OK to me, but someone who is muchmore familiar than myself with the mechanics of pipes should take alook at that part of the patch before committing; it's complicatedcode and I can't comment on it. There are some small formattingissues that need to be fixed, particularly in the host+port mapping.I can fix those myself and submit a slightly updated patch. There'ssome documentation improvements I want to make before this goes in aswell.
The patch is actually broken fairly hard right now because of theswitch from INSERT to COPY FROM CSV as the output format at the lastminute. It outputs missing fields as NULL (fine for INSERT) thatchokes the CSV import when the session_start timestamp is missing.All of those NULL values need to be just replaced with nothing forproper CSV syntax; there should just the comma for the next field. Iworked around this with
copy pglog from '/opt/pgsql/testlog.csv' with CSV null as 'NULL';


I missed that before - yes it should be fixed.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [PATCHES] COPY-able csv log outputs

Reply via email to