Re: [HACKERS] TABLESAMPLE patch

2015-04-12 Thread Amit Kapila
On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 4/9/15 8:58 PM, Petr Jelinek wrote:
  Well, you can have two approaches to this, either allow some specific
  set of keywords that can be used to specify limit, or you let sampling
  methods interpret parameters, I believe the latter is more flexible.
  There is nothing stopping somebody writing sampling method which takes
  limit as number of rows, or anything else.
 
  Also for example for BERNOULLI to work correctly you'd need to convert
  the number of rows to fraction of table anyway (and that's exactly what
  the one database which has this feature does internally) and then it's
  no different than passing (SELECT 100/reltuples*number_of_rows FROM
  tablename) as a parameter.

 What is your intended use case for this feature?  I know that give me
 100 random rows from this table quickly is a common use case, but
 that's kind of cumbersome if you need to apply formulas like that.  I'm
 not sure what the use of a percentage is.  Presumably, the main use of
 this features is on large tables.  But then you might not even know how
 large it really is, and even saying 0.1% might be more than you wanted
 to handle.


The use case for specifying number of rows for sample scan is valid
and can be achieved by other means if required as suggested by Petr
Jelinek, however the current proposed syntax (w.r.t to Sample
Percentage [1]) seems to comply with SQL standard, so why not go
for it and then extend it based on more use-cases?

[1]
SQL Standard (2003) w.r.t Sample Percentage
sample clause ::=
TABLESAMPLE sample method left paren sample percentage right paren
[ repeatable clause ]


sample percentage ::= numeric value expression


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] FPW compression leaks information

2015-04-12 Thread Amit Kapila
On Thu, Apr 9, 2015 at 8:51 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 All in all, this is a bit clumsy and very time-consuming to pull off in
practice, but it's possible at least if the conditions are just right.

 What should we do about this? Make it configurable on a per-table basis?

I think making it configurable on a per-table basis have another advantage
of controlling impact of FPW compression for tables that have less
compressible data (there is a CPU overhead of compression even though
it doesn't actually compress much).  In-fact I had added such an option
during development of another related patch (WAL compression for Update),
if we think it is useful, I can extract that part of the patch and rebase
it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] moving from contrib to bin

2015-04-12 Thread Alvaro Herrera
Peter Eisentraut wrote:
 On 3/11/15 8:21 PM, Michael Paquier wrote:
  Attached is a series of patch rebased on current HEAD, there were some
  conflicts after perl-tidying the refactoring patch for MSVC. Note that
  this series still uses PGXS in the Makefiles, I am fine to update them
  if necessary once this matter is set (already did this stuff upthread
  with a previous version).
 
 OK, here we go.  I have committed the pg_archivecleanup move, with a
 complete makefile and your Windows help.  Let's see how it builds.

Hmm, should it appear automatically in babel.pg.org?  I don't see it
there yet.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-12 Thread Heikki Linnakangas

On 04/10/2015 05:17 AM, Robert Haas wrote:

On Apr 9, 2015, at 8:51 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

What should we do about this?


I bet that there are at least 1000 covert channel attacks that are more 
practically exploitable than this.


Care to name some? This is certainly quite cumbersome to exploit, but 
it's doable.


We've talked a lot about covert channels and timing attacks on RLS, but 
this makes me more worried because you can attack passwords stored in 
pg_authid.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-12 Thread Michael Paquier
On Mon, Apr 13, 2015 at 9:38 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 04/10/2015 05:17 AM, Robert Haas wrote:

 On Apr 9, 2015, at 8:51 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 What should we do about this?


 I bet that there are at least 1000 covert channel attacks that are more
 practically exploitable than this.


 Care to name some? This is certainly quite cumbersome to exploit, but it's
 doable.

 We've talked a lot about covert channels and timing attacks on RLS, but this
 makes me more worried because you can attack passwords stored in pg_authid.

Isn't the attack mentioned on this thread true as long as a user knows
that a given table stores a password? I don't see why this would be
limited to pg_authid.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW oddity

2015-04-12 Thread Andrew Dunstan


On 04/11/2015 07:30 PM, Andrew Dunstan wrote:


On 04/11/2015 05:10 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

I have just noticed something slightly odd. The traces (obtained by
setting client_min_messages to debug1) from the blackhole FDW show that
the handler function is called each time an INSERT is performed. 
This is
not the case for SELECT, UPDATE or DELETE. It looks at first glance 
like

a buglet. Can anyone suggest why this should be so?

What do you mean by the handler function, and for that matter what do
you define as each time?  The ExecForeignInsert method certainly ought
to be called once per row inserted, but the same is true of
ExecForeignUpdate and ExecForeignDelete.  The setup methods such as
BeginForeignModify should only be called once per query though.



I mean this function:

   Datum
   blackhole_fdw_handler(PG_FUNCTION_ARGS)
   {
FdwRoutine *fdwroutine = makeNode(FdwRoutine);

elog(DEBUG1,entering function %s,__func__);

/* assign the handlers for the FDW */

 ...
   }

And it is called at the beginning of every INSERT statement, before 
blackholePlanForeignModify() is called





Seems to be explained by this comment in createplan.c:

/*
 * If possible, we want to get the FdwRoutine from our 
RelOptInfo for
 * the table.  But sometimes we don't have a RelOptInfo and 
must get
 * it the hard way.  (In INSERT, the target relation is not 
scanned,

 * so it's not a baserel; and there are also corner cases for
 * updatable views where the target rel isn't a baserel.)
 */

cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-12 Thread Michael Paquier
On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote:
 I am using postgesql 9.4.0. Thanks for your great work on grouping sets
 patch effort.
 I am now compiling postgresql from source code 9.4.0 on Linux platform with
 [gsp-all.patch]  successed and grouping function well, but failed on window
 platform(windows 2003 or window 7).
  It shows unrecognized keywords: grouping, cube,  etc.
 Good suggestions or tips? thanks a lot!

When reviewing or testing a patch targeted for integration in 9.5, you
should avoid using a previous major version but you should apply the
patch on a version based on one of the latest commits of master branch
in the git repository of Postgres. I am guessing that you cannot get
the patch compiling because of some code conflicts.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove some duplicated words in comments

2015-04-12 Thread Heikki Linnakangas

On 04/11/2015 06:03 AM, David Rowley wrote:

Attached is a small patch which removes some duplicated words that have
crept into some comments.


Thanks, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-12 Thread 彭瑞华
ok, I will try it using git master branch source code. thanks! Of course, using 
gsp-all-latest.patch this time.




At 2015-04-12 14:23:46, Michael Paquier michael.paqu...@gmail.com wrote:
On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote:
 I am using postgesql 9.4.0. Thanks for your great work on grouping sets
 patch effort.
 I am now compiling postgresql from source code 9.4.0 on Linux platform with
 [gsp-all.patch]  successed and grouping function well, but failed on window
 platform(windows 2003 or window 7).
  It shows unrecognized keywords: grouping, cube,  etc.
 Good suggestions or tips? thanks a lot!

When reviewing or testing a patch targeted for integration in 9.5, you
should avoid using a previous major version but you should apply the
patch on a version based on one of the latest commits of master branch
in the git repository of Postgres. I am guessing that you cannot get
the patch compiling because of some code conflicts.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-12 Thread Simon Riggs
On 10 April 2015 at 15:26, Peter Eisentraut pete...@gmx.net wrote:

 What is your intended use case for this feature?

Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error

Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.

These have all come out of detailed discussions with two different
groups of data mining researchers.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL information view

2015-04-12 Thread Magnus Hagander
On Fri, Apr 10, 2015 at 2:14 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Fri, Apr 10, 2015 at 6:39 PM, Magnus Hagander wrote:
  +typedef struct PgBackendSSLStatus
  +{
  +/* Information about SSL connection */
  +int ssl_bits;
  +boolssl_compression;
  +charssl_version[NAMEDATALEN];  /* MUST be
  null-terminated */
  +charssl_cipher[NAMEDATALEN];   /* MUST be
  null-terminated */
  +charssl_clientdn[NAMEDATALEN]; /* MUST be
  null-terminated */
  +} PgBackendSSLStatus;
  git diff is showing in red here, spaces should be replaced with a tab.
 
 
  Ugh. Fixed. One too many copy/pastes I think.
 

 You forgot one here:
 +/* Information about SSL connection */


In other news, I have now fixed my git to show these things to be again. It
used to do that, but I broke it :)

Thanks!


Except for those style comments (feel free to ignore them), I tested
 the patch and it is doing what it claims. As I don't have more
 comments, let's switch that to Ready for Committer then...



Ok. Thanks - and patch applied!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Replication identifiers, take 4

2015-04-12 Thread Heikki Linnakangas

On 04/12/2015 02:56 AM, Petr Jelinek wrote:

On 10/04/15 18:03, Andres Freund wrote:

On 2015-04-07 17:08:16 +0200, Andres Freund wrote:

I'm starting benchmarks now.


What I'm benchmarking here is the WAL overhead, since that's what we're
debating.

The test setup I used was a pgbench scale 10 instance. I've run with
full_page_write=off to have more reproducible results. This of course
over-emphasizes the overhead a bit. But for a long checkpoint interval
and a memory resident working set it's not that unrealistic.

I ran 50k transactions in a signle b
baseline:
- 20445024
- 20437128
- 20436864
- avg: 20439672
extern 2byte identifiers:
- 23318368
- 23148648
- 23128016
- avg: 23198344
- avg overhead: 13.5%
padding 2byte identifiers:
- 21160408
- 21319720
- 21164280
- avg: 21214802
- avg overhead: 3.8%
extern 4byte identifiers:
- 23514216
- 23540128
- 23523080
- avg: 23525808
- avg overhead: 15.1%

To me that shows pretty clearly that a) reusing the padding is
worthwhile b) even without that using 2byte instead of 4 byte
identifiers is beneficial.



My opinion is that 10% of WAL size difference is quite high price to pay
so that we can keep the padding for some other, yet unknown feature that
hasn't come up in several years, which would need those 2 bytes.

But if we are willing to pay it then we can really go all the way and
just use Oids...


This needs to be weighed against removing the padding bytes altogether. 
See attached. That would reduce the WAL size further when you don't need 
replication IDs. It's very straightforward, but need to do some 
performance/scalability testing to make sure that using memcpy instead 
of a straight 32-bit assignment doesn't hurt performance, since it 
happens in very performance critical paths.


I'm surprised there's such a big difference between the extern and 
padding versions above. At a quick approximation, storing the ID as a 
separate fragment, along with XLogRecordDataHeaderShort and 
XLogRecordDataHeaderLong, should add one byte of overhead plus the ID 
itself. So that would be 3 extra bytes for 2-byte identifiers, or 5 
bytes for 4-byte identifiers. Does that mean that the average record 
length is only about 30 bytes? That's what it seems like, if adding the 
extern 2 byte identifiers added about 10% of overhead compared to the 
padding 2 byte identifiers version. That doesn't sound right, 30 bytes 
is very little. Perhaps the size of the records created by pgbench 
happen to cross a 8-byte alignment boundary at that point, making a big 
difference. In another workload, there might be no difference at all, 
due to alignment.


Also, you don't need to tag every record type with the replication ID. 
All indexam records can skip it, for starters, since logical decoding 
doesn't care about them. That should remove a fair amount of bloat.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24cf520..09934f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -963,10 +963,10 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
 		 * Now that xl_prev has been filled in, calculate CRC of the record
 		 * header.
 		 */
-		rdata_crc = rechdr-xl_crc;
+		memcpy(rdata_crc, rechdr-xl_crc, sizeof(pg_crc32));
 		COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 		FIN_CRC32C(rdata_crc);
-		rechdr-xl_crc = rdata_crc;
+		memcpy(rechdr-xl_crc, rdata_crc, sizeof(pg_crc32));
 
 		/*
 		 * All the record data, including the header, is now ready to be
@@ -4685,7 +4685,7 @@ BootStrapXLOG(void)
 	COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record-xl_tot_len - SizeOfXLogRecord);
 	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
 	FIN_CRC32C(crc);
-	record-xl_crc = crc;
+	memcpy(record-xl_crc, crc, sizeof(pg_crc32));
 
 	/* Create first XLOG segment file */
 	use_existent = false;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 88209c3..fbe97b1 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -724,7 +724,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	rechdr-xl_info = info;
 	rechdr-xl_rmid = rmid;
 	rechdr-xl_prev = InvalidXLogRecPtr;
-	rechdr-xl_crc = rdata_crc;
+	memcpy(rechdr-xl_crc, rdata_crc, sizeof(pg_crc32));
 
 	return hdr_rdt;
 }
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a4124d9..80a48ba 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -666,6 +666,9 @@ static bool
 ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 {
 	pg_crc32	crc;
+	pg_crc32	rec_crc;
+
+	memcpy(rec_crc, record-xl_crc, sizeof(pg_crc32));
 
 	/* Calculate the CRC */
 	INIT_CRC32C(crc);
@@ -674,7 +677,7 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 	COMP_CRC32C(crc, (char *) record,