Re: [HACKERS] CSV Logging questions

2017-09-04 Thread David Fetter
On Mon, Sep 04, 2017 at 05:27:40PM +0100, Greg Stark wrote:
> I was just looking over the CSV logging code and have a few questions
> about why things were done the way they were done.
> 
> 1) Why do we gather a per-session log line number? Is it just to aid
> people importing to avoid duplicate entries from partial files? Is
> there some other purpose given that entries will already be sequential
> in the csv file?
> 
> 2) Why is the file error conditional on log_error_verbosity? Surely
> the whole point of a structured log is that you can log everything and
> choose what to display later -- i.e. why csv logging doesn't look at
> log_line_prefix to determine which other bits to display. There's no
> added cost to include this information unconditionally and they're far
> from the largest piece of data being logged either.
> 
> 3) Similarly I wonder if the statement should always be included even
> with hide_stmt is set so that users can write sensible queries against
> the data even if it means duplicating data.
> 
> 4) Why the session start time? Is this just so that  session_start_time> uniquely identiifes a session? Should we perhaps
> generate a unique session identifier instead?
> 
> The real reason I'm looking at this is because I'm looking at the
> json_log plugin from Michael Paquier. It doesn't have the log line
> numbers and I can't figure whether this is something it should have
> because I can't quite figure out why they exist in CSV files. I think
> there are a few other fields that have been added in Postgres but are
> missing from the JSON log because of version skew.
> 
> I'm wondering if we should abstract out the CSV format so instead of
> using emit_log_hook you would add a new format and it would specify a
> "add_log_attribute(key,val)" hook which would get called once per log
> format so you could have as many log formats as you want and be sure
> they would all have the same data. That would also mean that the
> timestamps would be in sync and we could probably eliminate the
> occurrences of the wrong format appearing in the wrong logs.

+1 for making the emitters all work off the same source.

Any idea how much work we're talking about to do these things?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CSV Logging questions

2017-09-04 Thread Greg Stark
I was just looking over the CSV logging code and have a few questions
about why things were done the way they were done.

1) Why do we gather a per-session log line number? Is it just to aid
people importing to avoid duplicate entries from partial files? Is
there some other purpose given that entries will already be sequential
in the csv file?

2) Why is the file error conditional on log_error_verbosity? Surely
the whole point of a structured log is that you can log everything and
choose what to display later -- i.e. why csv logging doesn't look at
log_line_prefix to determine which other bits to display. There's no
added cost to include this information unconditionally and they're far
from the largest piece of data being logged either.

3) Similarly I wonder if the statement should always be included even
with hide_stmt is set so that users can write sensible queries against
the data even if it means duplicating data.

4) Why the session start time? Is this just so that  uniquely identiifes a session? Should we perhaps
generate a unique session identifier instead?

The real reason I'm looking at this is because I'm looking at the
json_log plugin from Michael Paquier. It doesn't have the log line
numbers and I can't figure whether this is something it should have
because I can't quite figure out why they exist in CSV files. I think
there are a few other fields that have been added in Postgres but are
missing from the JSON log because of version skew.

I'm wondering if we should abstract out the CSV format so instead of
using emit_log_hook you would add a new format and it would specify a
"add_log_attribute(key,val)" hook which would get called once per log
format so you could have as many log formats as you want and be sure
they would all have the same data. That would also mean that the
timestamps would be in sync and we could probably eliminate the
occurrences of the wrong format appearing in the wrong logs.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers