On Mon, Sep 4, 2017 at 12:27 PM, Greg Stark <st...@mit.edu> wrote: > 1) Why do we gather a per-session log line number? Is it just to aid > people importing to avoid duplicate entries from partial files? Is > there some other purpose given that entries will already be sequential > in the csv file?
I think the idea is that if you see a line for a session you can tell whether there are any earlier lines for the same session. It might not be obvious, because they could be much earlier in the log if a session was idle for a while. I've certainly run into this problem in real-world troubleshooting. > 2) Why is the file error conditional on log_error_verbosity? Surely > the whole point of a structured log is that you can log everything and > choose what to display later -- i.e. why csv logging doesn't look at > log_line_prefix to determine which other bits to display. There's no > added cost to include this information unconditionally and they're far > from the largest piece of data being logged either. > > 3) Similarly I wonder if the statement should always be included even > with hide_stmt is set so that users can write sensible queries against > the data even if it means duplicating data. I think the principle that the CSV log should contain all of the output fields can be taken too far, and I'd put both of these ideas in that category. I don't see any reason to believe there couldn't be a user who wants CSV logging but not at maximum verbosity -- and hide_stmt is used for cases like this: ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); Actually, I think it's poor design to force the CSV log to contain all of the output fields. For some users, that might make it unusable by making the output too big. I think it would be better if the data were self-identifying - e.g. by sticking a header line on each log file - and perhaps complete by default, but still configurable. We've had the idea of adding new %-escapes shot down on the grounds that that would force us to include them all the time in CSV output and they are too specialized to justify this, but that seems to me to be a case of the tail wagging the dog. > 4) Why the session start time? Is this just so that <process_id, > session_start_time> uniquely identiifes a session? Should we perhaps > generate a unique session identifier instead? We could do that, but it doesn't seem better... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company