Re: [HACKERS] CSV Logging questions

David Fetter Mon, 04 Sep 2017 09:32:44 -0700

On Mon, Sep 04, 2017 at 05:27:40PM +0100, Greg Stark wrote:
> I was just looking over the CSV logging code and have a few questions
> about why things were done the way they were done.
> 
> 1) Why do we gather a per-session log line number? Is it just to aid
> people importing to avoid duplicate entries from partial files? Is
> there some other purpose given that entries will already be sequential
> in the csv file?
> 
> 2) Why is the file error conditional on log_error_verbosity? Surely
> the whole point of a structured log is that you can log everything and
> choose what to display later -- i.e. why csv logging doesn't look at
> log_line_prefix to determine which other bits to display. There's no
> added cost to include this information unconditionally and they're far
> from the largest piece of data being logged either.
> 
> 3) Similarly I wonder if the statement should always be included even
> with hide_stmt is set so that users can write sensible queries against
> the data even if it means duplicating data.
> 
> 4) Why the session start time? Is this just so that <process_id,
> session_start_time> uniquely identiifes a session? Should we perhaps
> generate a unique session identifier instead?
> 
> The real reason I'm looking at this is because I'm looking at the
> json_log plugin from Michael Paquier. It doesn't have the log line
> numbers and I can't figure whether this is something it should have
> because I can't quite figure out why they exist in CSV files. I think
> there are a few other fields that have been added in Postgres but are
> missing from the JSON log because of version skew.
> 
> I'm wondering if we should abstract out the CSV format so instead of
> using emit_log_hook you would add a new format and it would specify a
> "add_log_attribute(key,val)" hook which would get called once per log
> format so you could have as many log formats as you want and be sure
> they would all have the same data. That would also mean that the
> timestamps would be in sync and we could probably eliminate the
> occurrences of the wrong format appearing in the wrong logs.


+1 for making the emitters all work off the same source.

Any idea how much work we're talking about to do these things?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CSV Logging questions

Reply via email to