date:20130905

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-05 Thread Atri Sharma

On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Satoshi Nagayasu wrote:

 But, for now, I think we should have a real index for the
 statistics data because we already have several index storages,
 and it will allow us to minimize read/write operations.

 BTW, what kind of index would be preferred for this purpose?
 btree or hash?

 I find it hard to get excited about using the AM interface for this
 purpose.  To me it makes a lot more sense to have separate, much
 simpler code.  We don't need any transactionality, user defined types,
 user defined operators, or anything like that.

+1.

But, would not rewriting a lot of existing functionalities potentially
lead to points of contention and/or much more effort?

Regards,

Atri
-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-05 Thread KONDO Mitsumasa


(2013/09/05 0:04), Andres Freund wrote:
 I'd vote for adding zeroing *after* the fallocate() first.
+1, with FALLOC_FL_KEEP_SIZE flag.

At least, fallocate with FALLOC_FL_KEEP_SIZE flag is faster than nothing in my 
developing sorted checkpoint. I adopted it to relation file, so I don't know 
about WAL file.


With FALLOC_FL_KEEP_SIZE flag does not change file size and not do zero fill. It 
seems to only get continuous physical disk space in file system.


Here is draft patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dc47c47..85b582c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3409,6 +3409,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 (errcode_for_file_access(),
  errmsg(could not create file \%s\: %m, tmppath)));
 
+#if defined(HAVE_FALLOCATE)
+	fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, XLogSegSize);
+#endif
+
 	/*
 	 * Zero-fill the file.	We have to do this the hard way to ensure that all
 	 * the file space has really been allocated --- on platforms that allow

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Extending pg_stat_statements to expose queryid

2013-09-05 Thread Sameer Thakur

Hello All,

I am trying to revive the discussion about exposing queryid in
pg_stat_statements.

I did find the same request posted on hackers @
http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==z2m_vz5hhfkgw...@mail.gmail.com

and

http://www.postgresql.org/message-id/cacn56+nlmtwhg8eqqqnyzqe2q0negjokmgfiusk_aohw627...@mail.gmail.com

From the discussions I concluded

1. The main use case for exposing queryid, is it being a better substitute
to hashing the query text of a pg_stat_statements snapshot, to make a
candidate key. Problems occur when hash value should be different even if
query text is same. For example when a table referred in a query is dropped
and recreated or when the query text is same on different schemas and
schema name is not included in query text.

2. Exposing queryid was proposed earlier but was not accepted. The main
reason was that queryid could be unstable as well. Since queryid was
derived from hashing query tree and query tree could undergo changes
between minor PostgreSQL releases, meant the queryid for same query could
be different between releases, resulting in incorrect statement statistics
collection.

3. Another problem is to distinguish between queries whose statistics are
continuously maintained and queries which are intermittent, whose
statistics might be silent reset, without the reporting tool being wiser.

4. A solution to avoid misrepresentation of intermittent queries as
consistent queries would be to assign a unique number to each new row and
once that row is discarded, the unique number cannot be reused. The
drawbacks here is possible collision of unique values generated.

5. A patch implementing solution for identifying intermittent query is @
https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2.

The solution avoids using a counter, and achieves the same result by the
property that intermittent queries accumulate errors due to eviction from
hashtable while consistent queries do not. Error accumulation would be the
parameter by which a reporting tool can figure out if there was eviction of
queries between snapshots.

6. To address the problem of unstable queryid generated from query tree, it
was proposed to eliminate any possible misunderstanding that queryid will
remain the same between releases, by xoring the hash from query tree with
statistics sessionid. This also helps in all cases where the statistics
file is reset like crash recovery,recovery mode, ensuring a new hash value
for reset statistics.

To avoid increasing the chance of collision, a longer session key and
padding the queryid can be done to complete the XOR. Implementation of this
is @
https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3

7. The patch pg_stat_statements-identification-v3 was returned with
feedback for more documentation in commitfest 2013-01.

Questions:

1. Is there a plan to re-introduce this patch? The code seems to be
documented.

2. There was mention of further testing of error propagation using hooks.
Could this be elaborated?

3. There was a use case that exposing queryid could be used to aggregate
statistics across WAL based replication clusters. But now that queryid is
derived from statistics session id, which is randomly generated, this use
case is still not addressed. Is this correct?

68 matches

Mail list logo