On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao <masao.fu...@gmail.com> wrote: > On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur <samthaku...@gmail.com> wrote: >> Please find patch attached which adds documentation for session_start >> and introduced fields and corrects documentation for queryid to be >> query_id. session_start remains in the view as agreed. > > Thanks for updating the document! > > I'm not clear about the use case of 'session_start' and 'introduced' yet. > Could you elaborate it? Maybe we should add that explanation into > the document?
Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects "ncalls" may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. > In my test, I found that pg_stat_statements.total_time always indicates a > zero. > I guess that the patch might handle pg_stat_statements.total_time wrongly. > > + values[i++] = DatumGetTimestamp( > + instr_get_timestamptz(pgss->session_start)); > + values[i++] = DatumGetTimestamp( > + instr_get_timestamptz(entry->introduced)); > > These should be executed only when detected_version >= PGSS_TUP_LATEST? Yes. Oversight. > + <entry><structfield>session_start</structfield></entry> > + <entry><type>text</type></entry> > + <entry></entry> > + <entry>Start time of a statistics session.</entry> > + </row> > + > + <row> > + <entry><structfield>introduced</structfield></entry> > + <entry><type>text</type></entry> > > The data type of these columns is not text. Oops > + instr_time session_start; > + uint64 private_stat_session_key; > > it's better to add the comments explaining these fields. Yeah. > + microsec = INSTR_TIME_GET_MICROSEC(t) - > + ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY); > > HAVE_INT64_TIMESTAMP should be considered here. That's not a bad idea. > + if (log_cannot_read) > + ereport(LOG, > + (errcode_for_file_access(), > + errmsg("could not read pg_stat_statement file \"%s\": %m", > + PGSS_DUMP_FILE))); > > Is it better to use WARNING instead of LOG as the log level here? Is this new code? ....Why did I add it again? Seems reasonable though to call it a WARNING. > + /* > + * The role calling this function is unable to see > + * sensitive aspects of this tuple. > + * > + * Nullify everything except the "insufficient privilege" > + * message for this entry > + */ > + memset(nulls, 1, sizeof nulls); > + > + nulls[i] = 0; > + values[i] = CStringGetTextDatum("<insufficient privilege>"); > > Why do we need to hide *all* the fields in pg_stat_statements, when > it's accessed by improper user? This is a big change of pg_stat_statements > behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. >>> >This is not directly related to the patch itself, but why does the >>> > queryid >>> >need to be calculated based on also the "statistics session"? >> >> If we expose hash value of query tree, without using statistics session, >> it is possible that users might make wrong assumption that this value >> remains stable across version upgrades, when in reality it depends on >> whether the version has make changes to query tree internals. So to >> explicitly ensure that users do not make this wrong assumption, hash value >> generation use statistics session id, which is newly created under >> situations described above. > > So, ISTM that we can use, for example, the version number to calculate > the query_id. Right? That does seem like it may be a more reasonable stability vs. once per statistics session, because then use case with standbys work somewhat better. That said, the general concern was accidental contracts being assumed by consuming code, so this approach is tuned for making the query_id as unstable as possible while still being useful: stable, but only in one statistics gathering section. I did not raise the objection about over-aggressive contracts being exposed although I think the concern has merit...but given the use case involving standbys, I am for now charitable to increasing the stability to the level you indicate: a guaranteed break every point release. -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers