date:20160307

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-07 Thread Fabien COELHO



Hello Andres,


Now I cannot see how having one context per table space would have a
significant negative performance impact.


The 'dirty data' etc. limits are global, not per block device. By having
several contexts with unflushed dirty data the total amount of dirty
data in the kernel increases.


Possibly, but how much?  Do you have experimental data to back up that 
this is really an issue?


We are talking about 32 (context size) * #table spaces * 8KB buffers = 4MB 
of dirty buffers to manage for 16 table spaces, I do not see that as a 
major issue for the kernel.


Thus you're more likely to see stalls by the kernel moving pages into 
writeback.


I do not see the above data having a 30% negative impact on tps, given the 
quite small amount of data under discussion, and switching to random IOs 
cost so much that it must really be avoided.


Without further experimental data, I still think that the one context per 
table space is the reasonnable choice.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Andres Freund

Hi,

On 2016-03-08 16:21:45 +0900, Michael Paquier wrote:
> +   durable_link_or_rename(tmppath, path, ERROR);
> +   durable_rename(path, xlogfpath, ERROR);

> You may want to add a (void) cast in front of those calls for correctness.

"correctness"?  This is neatnikism, not correctness. I've actually added
(void)'s to the sites that return on error (i.e. pass LOG or something),
but not the ones where we pass ERROR.

> -   ereport(LOG,
> -   (errcode_for_file_access(),
> -errmsg("could not link file \"%s\" to \"%s\"
> (initialization of log file): %m",
> -   tmppath, path)));
> We lose a portion of the error message here, but with the file name
> that's easy to guess where that is happening. I am not complaining
> (that's fine to me as-is), just mentioning for the archive's sake.

Yea, I think that's fine too.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Michael Paquier

On Tue, Mar 8, 2016 at 2:55 PM, Andres Freund  wrote:
> On 2016-03-08 12:26:34 +0900, Michael Paquier wrote:
>> On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund  wrote:
>> > On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
>> >> I have spent a couple of hours looking at that in details, and the
>> >> patch is neat.
>> >
>> > Cool. Doing some more polishing right now. Will be back with an updated
>> > version soonish.
>> >
>> > Did you do some testing?
>>
>> Not much in details yet, I just ran a check-world with fsync enabled
>> for the recovery tests, plus quick manual tests with a cluster
>> manually set up. I'll do more with your new version now that I know
>> there will be one.
>
> Here's my updated version.
>
> Note that I've split the patch into two. One for the infrastructure, and
> one for the callsites.

Thanks for the updated patches and the split, this makes things easier
to look at. I have been doing some testing as well mainly manually
using with pgbench and nothing looks broken.

+   durable_link_or_rename(tmppath, path, ERROR);
+   durable_rename(path, xlogfpath, ERROR);
You may want to add a (void) cast in front of those calls for correctness.

-   ereport(LOG,
-   (errcode_for_file_access(),
-errmsg("could not link file \"%s\" to \"%s\"
(initialization of log file): %m",
-   tmppath, path)));
We lose a portion of the error message here, but with the file name
that's easy to guess where that is happening. I am not complaining
(that's fine to me as-is), just mentioning for the archive's sake.

>> >> +   /* XXX: Add racy file existence check? */
>> >> +   if (rename(oldfile, newfile) < 0)
>> >
>> >> I am not sure we should worry about that, what do you think could
>> >> cause the old file from going missing all of a sudden. Other backend
>> >> processes are not playing with it in the code paths where this routine
>> >> is called. Perhaps adding a comment in the header to let users know
>> >> that would help?
>> >
>> > What I'm thinking of is adding a check whether the *target* file already
>> > exists, and error out in that case. Just like the link() based path
>> > normally does.
>>
>> Ah, OK. Well, why not. I'd rather have an assertion instead of an error 
>> though.
>
> I think it should definitely be an error if anything. But I'd rather
> only add it in master...

I guess I know why :) That's also why I was thinking about an assertion.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-03-07 Thread Peter Geoghegan

On Mon, Feb 15, 2016 at 3:45 PM, Greg Stark  wrote:
> I was thinking about this over the past couple weeks. I'm starting to
> think the quicksort runs gives at least the beginnings of a way
> forward on this front.

As I've already pointed out several times, I wrote a tool that makes
it easy to load sortbenchmark.org data into a PostgreSQL table:

https://github.com/petergeoghegan/gensort

(You should use the Python script that invokes the "gensort" utility
-- see its "--help" display for details).

This seems useful as a standard benchmark, since it's perfectly
deterministic, allowing the user to create arbitrarily large tables to
use for sort benchmarks. Still, it doesn't produce data that is any
way organic; sort data is uniformly distributed. Also, it produces a
table that really only has one attribute to sort on, a text attribute.

I suggest looking at real world data, too. I have downloaded UK land
registry data, which is a freely available dataset about property
sales in the UK since the 1990s, of which there have apparently been
about 20 million (I started with a 20 million line CSV file). I've
used COPY to load the data into one PostgreSQL table.

I attach instructions on how to recreate this, and some suggested
CREATE INDEX statements that seemed representative to me. There are a
variety of Postgres data types in use, including UUID, numeric, and
text. The final Postgres table is just under 3GB. I will privately
make available a URL that those CC'd here can use to download a custom
format dump of the table, which comes in at 1.1GB (ask me off-list if
you'd like to get that URL, but weren't CC'd here). This URL is
provided as a convenience for reviewers, who can skip my detailed
instructions.

An expensive rollup() query on the "land_registry_price_paid_uk" table
is interesting. Example:

select date_trunc('year', transfer_date), county, district, city,
sum(price) from land_registry_price_paid_uk group by rollup (1,
county, district, city);

Performance is within ~5% of an *internal* sort with the patch series
applied, even though ~80% of time is spent copying and sorting
SortTuples overall in the internal sort case (the internal case cannot
overlap sorting and aggregate processing, since it has no final merge
step). This is a nice demonstration of how this work has significantly
blurred the line between internal and external sorts.

-- 
Peter Geoghegan
Instructions

CSV File

The land registry file from http://data.gov.uk is 3.2GB. A CSV file that can be
loaded into PostgreSQL that has organic data. No registration required. See
https://theodi.org/blog/the-status-of-csvs-on-datagovuk for details on
downloaded the file pp-complete.csv.

SQL
---

begin;
create table land_registry_price_paid_uk(
  transaction uuid,
  price numeric,
  transfer_date date,
  postcode text,
  property_type char(1),
  newly_built boolean,
  duration char(1),
  paon text,
  saon text,
  street text,
  locality text,
  city text,
  district text,
  county text,
  ppd_category_type char(1));

copy land_registry_price_paid_uk FROM '/home/pg/Downloads/pp-complete.csv' with 
(format csv, freeze true, encoding 'win1252', header false, null '', quote '"', 
force_null (postcode, saon, paon, street, locality, city, district));
commit;

Resulting table
---

postgres=# \dt+
  List of relations
 Schema âName â Type  â Owner â  Size   â 
Description 
âââââââââ¼ââââââââââââââââââââââââââââââ¼ââââââââ¼ââââââââ¼ââââââââââ¼âââââââââââââ
 public â land_registry_price_paid_uk â table â pgâ 2779 MB â 
(1 row)

Interesting Indexes
===

Many attribute index (Low cardinality leading attributes):

postgres=# create index on land_registry_price_paid_uk_suffix(county, district, 
city, locality, street);

UUID pk index (UUID type, high cardinality):

postgres=# create index on land_registry_price_paid_uk (transaction);

Price index (numeric, moderate cardinality):

postgres=# create index on land_registry_price_paid_uk (price);

Preview
===

pg@hamster:~$ head ~/Downloads/pp-complete.csv
"{0C7ADEF5-878D-4066-B785-003ED74A}","163000","2003-02-21 00:00","UB5 
4PJ","T","N","F","106","","READING 
ROAD","NORTHOLT","NORTHOLT","EALING","GREATER LONDON","A"
"{35F67271-ABD4-40DA-AB09-0085B9D3}","247500","2005-07-15 00:00","TA19 
9DD","D","N","F","58","","ADAMS MEADOW","ILMINSTER","ILMINSTER","SOUTH 
SOMERSET","SOMERSET","A"
"{B20B1C74-E8E1-4137-AB3E-011DF342}","32","2010-09-10 00:00","W4 
1DZ","F","N","L","58","","WHELLOCK ROAD","","LONDON","EALING","GREATER 
LONDON","A"
"{7D6B0915-C56B-4275-AF9B-0156BCE7}","104000","1997-08-27 00:00","NE61 
2BH","D","N","F","17","","WESTGATE","MORPETH","MORPETH","CASTLE 
MORPETH","NORTHUMBERLAND","A"

[HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly

2016-03-07 Thread Andres Freund

Hi,

Per the new valgrind animal we get:

http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink=2016-03-08%2004%3A22%3A00
2016-03-08 05:56:05.566 UTC [56de6971.723:5] LOG:  statement: select 
plperl_sum_array('{}');
==1827== Invalid write of size 4
==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
==1827==by 0x14E3608C: plperl_call_perl_func (plperl.c:2135)
==1827==by 0x14E3C24F: plperl_func_handler (plperl.c:2357)
==1827==by 0x14E3C24F: plperl_call_handler (plperl.c:1778)
==1827==by 0x5E0531: ExecMakeFunctionResultNoSets (execQual.c:2041)
==1827==by 0x5E641C: ExecTargetList (execQual.c:5367)
==1827==by 0x5E641C: ExecProject (execQual.c:5582)
==1827==by 0x5FC1C1: ExecResult (nodeResult.c:155)
==1827==by 0x5DF577: ExecProcNode (execProcnode.c:392)
==1827==by 0x5DB675: ExecutePlan (execMain.c:1566)
==1827==by 0x5DB675: standard_ExecutorRun (execMain.c:338)
==1827==by 0x6F4DD7: PortalRunSelect (pquery.c:942)
==1827==by 0x6F631D: PortalRun (pquery.c:786)
==1827==by 0x6F2FFA: exec_simple_query (postgres.c:1094)
==1827==by 0x6F2FFA: PostgresMain (postgres.c:4021)
==1827==by 0x46D33F: BackendRun (postmaster.c:4258)
==1827==by 0x46D33F: BackendStartup (postmaster.c:3932)
==1827==by 0x46D33F: ServerLoop (postmaster.c:1690)
==1827==  Address 0x15803220 is 304 bytes inside a block of size 8,192 alloc'd
==1827==at 0x4C28BB5: malloc (vg_replace_malloc.c:299)
==1827==by 0x808A37: AllocSetAlloc (aset.c:864)
==1827==by 0x80A3B3: palloc (mcxt.c:907)
==1827==by 0x7E0802: get_func_signature (lsyscache.c:1483)
==1827==by 0x14E367D6: plperl_call_perl_func (plperl.c:2116)
==1827==by 0x14E3C24F: plperl_func_handler (plperl.c:2357)
==1827==by 0x14E3C24F: plperl_call_handler (plperl.c:1778)
==1827==by 0x5E0531: ExecMakeFunctionResultNoSets (execQual.c:2041)
==1827==by 0x5E641C: ExecTargetList (execQual.c:5367)
==1827==by 0x5E641C: ExecProject (execQual.c:5582)
==1827==by 0x5FC1C1: ExecResult (nodeResult.c:155)
==1827==by 0x5DF577: ExecProcNode (execProcnode.c:392)
==1827==by 0x5DB675: ExecutePlan (execMain.c:1566)
==1827==by 0x5DB675: standard_ExecutorRun (execMain.c:338)
==1827==by 0x6F4DD7: PortalRunSelect (pquery.c:942)
==1827== 

looking at the code

static SV  *
plperl_ref_from_pg_array(Datum arg, Oid typid)
{
...
/* Get total number of elements in each dimension */
info->nelems = palloc(sizeof(int) * info->ndims);
info->nelems[0] = nitems;

that's not suprising. If ndims is zero nelemes will be a zero-length
array. Adding
Assert(info->ndims > 0);
makes it die reliably, without gdb.

ISTM the assumption that an array always has a dimension is a bit more
widespread... E.g. split_array() looks like it'd not work nicely with a
zero dimensional array...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-07 Thread Andres Freund

Hi,

On 2016-03-07 17:39:30 -0800, Andres Freund wrote:
> I'm setting up a buildfarm animal that runs under valgrind.

Which is now running as 'skink'. The first failed due to a missing trick
in the wrapper script, but the second one looks like it had a legit
issue:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink=2016-03-08%2004%3A22%3A00

==1827== Invalid write of size 4
==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
==1827==by 0x14E3608C: plperl_call_perl_func (plperl.c:2135)
==1827==by 0x14E3C24F: plperl_func_handler (plperl.c:2357)
==1827==by 0x14E3C24F: plperl_call_handler (plperl.c:1778)
==1827==by 0x5E0531: ExecMakeFunctionResultNoSets (execQual.c:2041)
==1827==by 0x5E641C: ExecTargetList (execQual.c:5367)
==1827==by 0x5E641C: ExecProject (execQual.c:5582)
==1827==by 0x5FC1C1: ExecResult (nodeResult.c:155)
==1827==by 0x5DF577: ExecProcNode (execProcnode.c:392)
==1827==by 0x5DB675: ExecutePlan (execMain.c:1566)
==1827==by 0x5DB675: standard_ExecutorRun (execMain.c:338)
==1827==by 0x6F4DD7: PortalRunSelect (pquery.c:942)
==1827==by 0x6F631D: PortalRun (pquery.c:786)
==1827==by 0x6F2FFA: exec_simple_query (postgres.c:1094)
==1827==by 0x6F2FFA: PostgresMain (postgres.c:4021)
==1827==by 0x46D33F: BackendRun (postmaster.c:4258)
==1827==by 0x46D33F: BackendStartup (postmaster.c:3932)
==1827==by 0x46D33F: ServerLoop (postmaster.c:1690)
==1827==  Address 0x15803220 is 304 bytes inside a block of size 8,192 alloc'd
==1827==at 0x4C28BB5: malloc (vg_replace_malloc.c:299)
==1827==by 0x808A37: AllocSetAlloc (aset.c:864)
==1827==by 0x80A3B3: palloc (mcxt.c:907)
==1827==by 0x7E0802: get_func_signature (lsyscache.c:1483)
==1827==by 0x14E367D6: plperl_call_perl_func (plperl.c:2116)
==1827==by 0x14E3C24F: plperl_func_handler (plperl.c:2357)
==1827==by 0x14E3C24F: plperl_call_handler (plperl.c:1778)
==1827==by 0x5E0531: ExecMakeFunctionResultNoSets (execQual.c:2041)
==1827==by 0x5E641C: ExecTargetList (execQual.c:5367)
==1827==by 0x5E641C: ExecProject (execQual.c:5582)
==1827==by 0x5FC1C1: ExecResult (nodeResult.c:155)
==1827==by 0x5DF577: ExecProcNode (execProcnode.c:392)
==1827==by 0x5DB675: ExecutePlan (execMain.c:1566)
==1827==by 0x5DB675: standard_ExecutorRun (execMain.c:338)
==1827==by 0x6F4DD7: PortalRunSelect (pquery.c:942)
==1827==

I've now added configuration to valgrind so it wraps VALGRINDERROR-BEGIN
/ VALGRINDERROR-END around errors, to make the logs easier to search.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Andres Freund

On 2016-03-08 12:26:34 +0900, Michael Paquier wrote:
> On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund  wrote:
> > On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
> >> I have spent a couple of hours looking at that in details, and the
> >> patch is neat.
> >
> > Cool. Doing some more polishing right now. Will be back with an updated
> > version soonish.
> >
> > Did you do some testing?
> 
> Not much in details yet, I just ran a check-world with fsync enabled
> for the recovery tests, plus quick manual tests with a cluster
> manually set up. I'll do more with your new version now that I know
> there will be one.

Here's my updated version.

Note that I've split the patch into two. One for the infrastructure, and
one for the callsites.

> >> +   /* XXX: Add racy file existence check? */
> >> +   if (rename(oldfile, newfile) < 0)
> >
> >> I am not sure we should worry about that, what do you think could
> >> cause the old file from going missing all of a sudden. Other backend
> >> processes are not playing with it in the code paths where this routine
> >> is called. Perhaps adding a comment in the header to let users know
> >> that would help?
> >
> > What I'm thinking of is adding a check whether the *target* file already
> > exists, and error out in that case. Just like the link() based path
> > normally does.
> 
> Ah, OK. Well, why not. I'd rather have an assertion instead of an error 
> though.

I think it should definitely be an error if anything. But I'd rather
only add it in master...

Andres
>From 9dc71e059cc50d57e7f4f42c68b1c4afa07279a3 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 7 Mar 2016 15:04:17 -0800
Subject: [PATCH 1/2] Introduce durable_rename() and durable_link_or_rename().

Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. To be certain that a rename() atomically replaces the
previous file contents in the face of crashes and different filesystems,
one has to fsync the old filename, rename the file, fsync the new
filename, fsync the containing directory.  This sequence is not
correctly adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583bdd.9060...@2ndquadrant.com
Backpatch: All supported branches
---
 src/backend/replication/slot.c |   2 +-
 src/backend/storage/file/fd.c  | 287 -
 src/include/storage/fd.h   |   4 +-
 3 files changed, 228 insertions(+), 65 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index affa9b9..ead221d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1095,7 +1095,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	START_CRIT_SECTION();
 
 	fsync_fname(path, false);
-	fsync_fname((char *) dir, true);
+	fsync_fname(dir, true);
 	fsync_fname("pg_replslot", true);
 
 	END_CRIT_SECTION();
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 1b30100..c9f9b7d 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -306,7 +306,10 @@ static void walkdir(const char *path,
 #ifdef PG_FLUSH_DATA_WORKS
 static void pre_sync_fname(const char *fname, bool isdir, int elevel);
 #endif
-static void fsync_fname_ext(const char *fname, bool isdir, int elevel);
+static void datadir_fsync_fname(const char *fname, bool isdir, int elevel);
+
+static int	fsync_fname_ext(const char *fname, bool isdir, bool ignore_perm, int elevel);
+static int	fsync_parent_path(const char *fname, int elevel);
 
 
 /*
@@ -413,54 +416,158 @@ pg_flush_data(int fd, off_t offset, off_t amount)
  * indicate the OS just doesn't allow/require fsyncing directories.
  */
 void
-fsync_fname(char *fname, bool isdir)
+fsync_fname(const char *fname, bool isdir)
 {
-	int			fd;
-	int			returncode;
-
-	/*
-	 * Some OSs require directories to be opened read-only whereas other
-	 * systems don't allow us to fsync files opened read-only; so we need both
-	 * cases here
-	 */
-	if (!isdir)
-		fd = OpenTransientFile(fname,
-			   O_RDWR | PG_BINARY,
-			   S_IRUSR | S_IWUSR);
-	else
-		fd = OpenTransientFile(fname,
-			   O_RDONLY | PG_BINARY,
-			   S_IRUSR | S_IWUSR);
-
-	/*
-	 * Some OSs don't allow us to open directories at all (Windows returns
-	 * EACCES)
-	 */
-	if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES))
-		return;
-
-	else if (fd < 0)
-		ereport(ERROR,
-(errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m",

Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-07 Thread Michael Paquier

On Wed, Mar 2, 2016 at 2:04 PM, Michael Paquier
 wrote:
> Here are a couple of ways to address this problem:
> 1) Remove the check before applying the delay
> 2) Increase recovery_min_apply_delay to a time that will allow even
> slow machines to see a difference. By experience with the other tests
> 30s would be enough. The sleep time needs to be increased as well,
> making the time taken for the test to run longer
> 3) Remove all together 005, because doing either 1) or 2) reduces the
> value of the test.
> I'd like 1) personally, I still see value in this test.

So, as doing 1) would be actually equivalent to simply having a master
and checking that its standby replicates correctly, I have been
looking at 2) to see to how long the delay has to be set to make the
test failure-proof. After doing some measurements with hamster, 10s
and 15s have proved to not be enough unfortunately, 20s has not failed
in 10 attempts though. Attached is a patch to bump it to 20s, though I
would not complain if the test is actually removed to accelerate the
runs of this test suite.
-- 
Michael
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 986851b..9b25f9e 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -24,19 +24,18 @@ $node_standby->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
 $node_standby->append_conf(
 	'recovery.conf', qq(
-recovery_min_apply_delay = '2s'
+recovery_min_apply_delay = '20s'
 ));
 $node_standby->start;
 
 # Make new content on master and check its presence in standby
-# depending on the delay of 2s applied above.
+# depending on the delay of 20s applied above.
 $node_master->safe_psql('postgres',
 	"INSERT INTO tab_int VALUES (generate_series(11,20))");
-sleep 1;
 
 # Here we should have only 10 rows
 my $result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-is($result, qq(10), 'check content with delay of 1s');
+is($result, qq(10), 'check content with no delay');
 
 # Now wait for replay to complete on standby
 my $until_lsn =
@@ -46,4 +45,4 @@ my $caughtup_query =
 $node_standby->poll_query_until('postgres', $caughtup_query)
   or die "Timed out while waiting for standby to catch up";
 $result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-is($result, qq(20), 'check content with delay of 2s');
+is($result, qq(20), 'check content with delay of 20s');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Tsunakawa, Takayuki

Hello, Josh,

> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Josh berkus> 
> Crossing this over to pgsql-advocacy list where it really belongs.
> That's what that list is *for*.
> 
> Especially since the discussion on -hackers has focused on new PostgreSQL
> Features, which while also good don't address the general question.
> 

Thank you for pointing me to the correct place.  I wondered which list is 
better, because I thought this topic whould be better discussed among hackers.  
I'll post subsequent mails only to pgsql-advocacy.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Michael Paquier

On Tue, Mar 8, 2016 at 1:51 PM, Andres Freund  wrote:
> On 2016-03-08 13:45:25 +0900, Michael Paquier wrote:
>> On Mon, Mar 7, 2016 at 10:26 AM, Andres Freund  wrote:
>> > FWIW I'm considering implementing elog/ereport properly for the
>> > frontend.  We've grown several hacks around that not being present, and
>> > I think those by now have a higher aggregate complexity than a proper
>> > implementation would have.
>>
>> That would be handy. And this is are going to need something like
>> callbacks to allow frontend applications how to apply elevel. Take for
>> example pg_rewind, it has an interface with DEBUG and PROGRESS level
>> directly embedded with FATAL controlled by user-defined options.
>
> What does "directly embedded with FATAL" mean?

Incorrect words. I just mean that there is a central code path used by
DEBUG and FATAL in this case, and DEBUG is controlled by a user-side
switch, meaning that if we want to get into something aimed at being
used by any in-core or community frontend clients, we are going to
need something carefully designed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Andres Freund

On 2016-03-08 13:45:25 +0900, Michael Paquier wrote:
> On Mon, Mar 7, 2016 at 10:26 AM, Andres Freund  wrote:
> > FWIW I'm considering implementing elog/ereport properly for the
> > frontend.  We've grown several hacks around that not being present, and
> > I think those by now have a higher aggregate complexity than a proper
> > implementation would have.
> 
> That would be handy. And this is are going to need something like
> callbacks to allow frontend applications how to apply elevel. Take for
> example pg_rewind, it has an interface with DEBUG and PROGRESS level
> directly embedded with FATAL controlled by user-defined options.

What does "directly embedded with FATAL" mean?

I don't really want to go further than what our system already is
capable of; once we have that we can look for the next steps.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Michael Paquier

On Mon, Mar 7, 2016 at 10:26 AM, Andres Freund  wrote:
> FWIW I'm considering implementing elog/ereport properly for the
> frontend.  We've grown several hacks around that not being present, and
> I think those by now have a higher aggregate complexity than a proper
> implementation would have.

That would be handy. And this is are going to need something like
callbacks to allow frontend applications how to apply elevel. Take for
example pg_rewind, it has an interface with DEBUG and PROGRESS level
directly embedded with FATAL controlled by user-defined options.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pam auth - add rhost item

2016-03-07 Thread Haribabu Kommi

On Tue, Dec 29, 2015 at 10:46 AM, Grzegorz Sampolski  wrote:
> Hi.
> I thought link on commitfest to github url was sufficient.
> Sorry. Attached new patch.

I reviewed and tested the patch. With the addition of
new RHOST member to the passed items in the PAM
authentication doesn't have any impact with existing
behavior.

As Tomas said in up thread that RHOST is the item
that I also that can be added to PAM authentication.

I am not able to test PAM authentication using the
RHOST, can you please let me know the way for
the same?

And also the patch lacks of documentation changes,
As it adds the new pamusedns option and also it
sends the RHOST, so the documentation needs to be
updated.

Regards,
Hari Babu
Fujitsu Australia

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-07 Thread Kyotaro HORIGUCHI

Hello, thank you for updating this tool.

At Mon, 7 Mar 2016 14:03:08 -0500, Robert Haas  wrote in 

> On Mon, Mar 7, 2016 at 12:41 PM, Masahiko Sawada  
> wrote:
> > Attached latest version optimisation patch.
> > I'm still consider regarding pg_upgrade regression test code, so I
> > will submit that patch later.
> 
> I was thinking more about this today and I think that we don't
> actually need the PD_ALL_FROZEN page-level bit for anything.  It's
> enough that the bit is present in the visibility map.  The only point
> of PD_ALL_VISIBLE is that it tells us that we need to clear the
> visibility map bit, but that bit is enough to tell us to clear both
> visibility map bits.  So I propose the attached cleanup patch.

It seems reasonable to me.  Although I haven't played it (or even
it didn't apply for me for now), but at a glance,
PD_VALID_FLAG_BITS seems should be changed to 0x0007 since
PD_ALL_FROZEN has been removed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Tom Lane

Craig Ringer  writes:
> On 8 March 2016 at 08:56, Igal @ Lucee.org  wrote:
>> I'm not sure why it was not accepted at the end?

> The biggest issue, though it might not be clear from that thread, is that
> what exactly it means to "return generated keys" is poorly defined by JDBC,
> and not necessarily the same thing as "return the PRIMARY KEY".
>
> Should we return the DEFAULT on a UNIQUE column, for example?
>
> IMO other vendors' drivers should be tested for behaviour in a variety of
> cases.

Yeah.  It was asserted in the earlier thread that other vendors implement
this feature as "return the pkey", but that seems to conflict with the
plain language of the JDBC spec: generated columns are an entirely
different thing than primary key columns.  So really what I'd like to see
is some work on surveying other implementations to confirm exactly what
behavior they implement.  If we're to go against what the spec seems to
say, I want to see a whole lot of evidence that other people do it
consistently in a different way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Craig Ringer

On 7 March 2016 at 20:55, MauMau  wrote:

> From: Craig Ringer
> --
> We could help ORMs solve the N+1 SELECTs problem and help them avoid
> transferring vast join projections unnecessarily. That'd make PostgreSQL
> pretty compelling for exactly the users we're mostly too busy dismissing to
> consider.
>
> I'd be interested in reaching out to some Hibernate/JPA and ActiveRecord
> folks about how the DB could help the ORM and have been meaning to explore
> this area for a while, but -ENOTIME. If anyone pursues it I'll be really
> interested in hearing how things go.
> --
>
> You have various ideas and experience, don't you?
> Are those ideas on ORMs beneficial exclusively to PostgreSQL or to all
> DBMSs?  I don't know the structure of ORMs allows for improvements to be
> advantageous to a specific DBMS.
>

Most ORMs have dialect layers for query generation, DB-specific
customisations, etc.

Whether they're flexible enough to handle this sort of change - I don't
know. That's part of why I'd like to explore the ideas with ActiveRecord,
Hibernate, JPA WG, etc folks.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Craig Ringer

On 8 March 2016 at 08:56, Igal @ Lucee.org  wrote:

>
> I'm not sure why it was not accepted at the end?

The biggest issue, though it might not be clear from that thread, is that
what exactly it means to "return generated keys" is poorly defined by JDBC,
and not necessarily the same thing as "return the PRIMARY KEY".

Should we return the DEFAULT on a UNIQUE column, for example?

IMO other vendors' drivers should be tested for behaviour in a variety of
cases. Ideally the JDBC test suite too. Then specify the exact behaviour of
what we need to satisfy the JDBC driver's requirements and anything else
that might be related.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Optimizer questions

2016-03-07 Thread Tom Lane

Konstantin Knizhnik  writes:
> Attached please find improved version of the optimizer patch for LIMIT clause.

This patch isn't anywhere close to working after 3fc6e2d7f5b652b4.
(TBH, the reason I was negative about this upthread is that I had that
one in the oven and knew it would conflict spectacularly.)  I encourage
you to think about how an optimization of this sort could be made to
work in a non-kluge fashion in the new code structure.

I've not spent a lot of time on this, but I think maybe what would make
sense is to consider both the case where function calculations are
postponed to after ORDER BY and the case where they aren't, and generate
Paths for both.  Neither approach is a slam-dunk win.  For example,
suppose that one of the tlist columns is md5(wide_column) --- it will
likely not be preferable to pass the wide column data through the sort
step rather than reducing it to a hash first.  This would require some
work in grouping_planner to track two possible pathtargets, and work in
create_ordered_paths to generate paths for both possibilities.  A possible
objection is that this would add planning work even when no real benefit
is possible; so maybe we should only consider the new way if the tlist has
significant eval cost?  Not sure about that.  There is also something
to be said for the idea that we should try to guarantee consistent
semantics when the tlist contains volatile functions.

For now, I've set this commitfest entry to Waiting on Author.  There's
still time to consider a rewrite in this 'fest, if you can get it done
in a week or two.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Craig Ringer

On 7 March 2016 at 23:02, Robert Haas  wrote:

> On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer 
> wrote:
> > If FDW-based sharding works, I'm happy enough, I have no horse in this
> race.
> > If it doesn't work I don't much care either. What I'm worried about is
> it if
> > works like partitioning using inheritance works - horribly badly, but
> just
> > well enough that it's served as an effective barrier to doing anything
> > better.
> >
> > That's what I want to prevent. Sharding that only-just-works and then
> stops
> > us getting anything better into core.
>
> That's a reasonable worry.  Thanks for articulating it so clearly.
> I've thought about that issue and I admit it's both real and serious,
> but I've sort of taken the attitude of saying, well, I don't know how
> to solve that problem, but there's so much other important work that
> needs to be done before we get to the point where that's the blocker
> that solving that problem doesn't seem like the most important thing
> right now.

[snip explanation]

> I think your concern is
> valid, and I share it.  But I just fundamentally believe that it's
> better to enhance what we have than to start inventing totally new
> abstractions.  The FDW API is *really* powerful, and getting more
> powerful, and I just have a very hard time believing that starting
> over will be better.  Somebody can do that if they like and I'm not
> gonna get in the way, but if it's got problems that could have been
> avoided by basing that same work on the FDW stuff we've already got, I
> do plan to point that out.

Yep. As has been noted, each of these improvements is useful in their own
right, and I'm not sure anyone's against them, just
concerned about whether the overall vision for sharding will work out.

Personally I think that once the FDW infrastructure is closer to being
usable for sharding, when we're at the point where new patches are proposed
that're really specifically for sharding and not so general-use FDW
improvements, that's when it'd be well worth building a proof of concept
sharding implementation. Find unexpected wrinkles and issues before
starting to stream stuff into core that can't be easily removed again. That
was certainly useful when building BDR, and even then we still found lots
of things that required revision, often repeatedly.

Either that, or bless experimental features/API as an official concept. I'd
quite like that myself - stuff that's in Pg, but documented as "might
change or go away in the next release, experimental feature". As we're
doing more stuff that spans multiple release cycles, where patches in a
prior cycle might need revision based on what we learn in a later one, we
might need more freedom to change things that're committed and user visible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Michael Paquier

On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund  wrote:
> On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
>> I have spent a couple of hours looking at that in details, and the
>> patch is neat.
>
> Cool. Doing some more polishing right now. Will be back with an updated
> version soonish.
>
> Did you do some testing?

Not much in details yet, I just ran a check-world with fsync enabled
for the recovery tests, plus quick manual tests with a cluster
manually set up. I'll do more with your new version now that I know
there will be one.

>> +   /* XXX: Add racy file existence check? */
>> +   if (rename(oldfile, newfile) < 0)
>
>> I am not sure we should worry about that, what do you think could
>> cause the old file from going missing all of a sudden. Other backend
>> processes are not playing with it in the code paths where this routine
>> is called. Perhaps adding a comment in the header to let users know
>> that would help?
>
> What I'm thinking of is adding a check whether the *target* file already
> exists, and error out in that case. Just like the link() based path
> normally does.

Ah, OK. Well, why not. I'd rather have an assertion instead of an error though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Andres Freund

Hi,

On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
> I have spent a couple of hours looking at that in details, and the
> patch is neat.

Cool. Doing some more polishing right now. Will be back with an updated
version soonish.

Did you do some testing?

> + * This routine ensures that, after returning, the effect of renaming file
> + * persists in case of a crash. A crash while this routine is running will
> + * leave you with either the old, or the new file.

> "you" is not really Postgres-like, "the server" or "the backend" perhaps?

Hm. I think your alternative proposals are more awkward.

> +   /* XXX: perform close() before? might be outside a
> transaction. Consider errno! */
> ereport(elevel,
> (errcode_for_file_access(),
>  errmsg("could not fsync file \"%s\": %m", fname)));
> +   (void) CloseTransientFile(fd);
> +   return -1;
> close() should be called before. slot.c for example does so and we
> don't want to link an fd here in case of elevel >= ERROR.

Note that the transient file machinery will normally prevent fd leakage
- but it can only do so if called in a transaction context. I've added
int save_errno;

/* close file upon error, might not be in transaction context */
save_errno = errno;
CloseTransientFile(fd);
errno = save_errno;
stanzas.

> + * It does so by using fsync on the sourcefile before the rename, and the
> + * target file and directory after.

> fsync is issued as well on the target file if it exists. I think
> that's worth mentioning in the header.

Ok.


> +   /* XXX: Add racy file existence check? */
> +   if (rename(oldfile, newfile) < 0)

> I am not sure we should worry about that, what do you think could
> cause the old file from going missing all of a sudden. Other backend
> processes are not playing with it in the code paths where this routine
> is called. Perhaps adding a comment in the header to let users know
> that would help?

What I'm thinking of is adding a check whether the *target* file already
exists, and error out in that case. Just like the link() based path
normally does.


> Instead of "durable" I think that "persistent" makes more sense.

I find durable a lot more descriptive. persistent could refer to
retrying the rename or something.

> We
> want to make those renames persistent on disk on case of a crash. So I
> would suggest the following routine names:
> - rename_persistent
> - rename_or_link_persistent
> Having the verb first also helps identifying that this is a
> system-level, rename()-like, routine.

I prefer the current names.

> > I sure wish we had the recovery testing et al. in all the back
> > branches...
> 
> Sure, what we have now is something that should really be backpatched,
> I was just waiting to have all the existing stability issues
> addressed, the last one on my agenda being the failure of hamster for
> test 005 I mentioned in another thread before sending patches for
> other branches. I proposed a couple of potential regarding that
> actually, see here:
> http://www.postgresql.org/message-id/cab7npqsaz9hnucmoua30jo2wj8mnrem18p2a7mcra-zrjxj...@mail.gmail.com

Yea. Will be an interesting discussion... Anyway, I did run the patch
through the existing checks, after enabling fsync in PostgresNode.pm.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Tom Lane

Joe Conway  writes:
> Committed/pushed with exit(EXIT_FAILURE)

Thanks!  I lit off a new run on gaur/pademelon to confirm.  Should
have results in six hours or so...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-07 Thread Michael Paquier

On Mon, Mar 7, 2016 at 3:38 PM, Andres Freund  wrote:
> On 2016-03-05 19:54:05 -0800, Andres Freund wrote:
>> I started working on this; delayed by taking longer than planned on the
>> logical decoding stuff (quite a bit complicated by
>> e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8).  I'm not very happy with the
>> error handling as it is right now.  For one, you have rename_safe return
>> error codes, and act on them in the callers, but on the other hand you
>> call fsync_fname which always errors out in case of failure.  I also
>> don't like the new messages much.
>>
>> Will continue working on this tomorrow.
>
> So, here's my current version of this. I've not performed any testing
> yet, and it's hot of the press. There's some comment smithing
> needed. But otherwise I'm starting to like this.
>
> Changes:
> * renamed rename_safe to durable_rename
> * renamed replace_safe to durable_link_or_rename (there was never any
>   replacing going on)
> * pass through elevel to the underlying routines, otherwise we could
>   error out with ERROR when we don't want to. That's particularly
>   important in case of things like InstallXLogFileSegment().
> * made fsync_fname use fsync_fname_ext, add 'accept permission errors'
>   param
> * have walkdir call a wrapper, to add ignore_perms param
>
> What do you think?

I have spent a couple of hours looking at that in details, and the
patch is neat.

+ * This routine ensures that, after returning, the effect of renaming file
+ * persists in case of a crash. A crash while this routine is running will
+ * leave you with either the old, or the new file.
"you" is not really Postgres-like, "the server" or "the backend" perhaps?

+   /* XXX: perform close() before? might be outside a
transaction. Consider errno! */
ereport(elevel,
(errcode_for_file_access(),
 errmsg("could not fsync file \"%s\": %m", fname)));
+   (void) CloseTransientFile(fd);
+   return -1;
close() should be called before. slot.c for example does so and we
don't want to link an fd here in case of elevel >= ERROR.

+ * It does so by using fsync on the sourcefile before the rename, and the
+ * target file and directory after.
fsync is issued as well on the target file if it exists. I think
that's worth mentioning in the header.

+   /* XXX: Add racy file existence check? */
+   if (rename(oldfile, newfile) < 0)
I am not sure we should worry about that, what do you think could
cause the old file from going missing all of a sudden. Other backend
processes are not playing with it in the code paths where this routine
is called. Perhaps adding a comment in the header to let users know
that would help?

Instead of "durable" I think that "persistent" makes more sense. We
want to make those renames persistent on disk on case of a crash. So I
would suggest the following routine names:
- rename_persistent
- rename_or_link_persistent
Having the verb first also helps identifying that this is a
system-level, rename()-like, routine.

> I sure wish we had the recovery testing et al. in all the back
> branches...

Sure, what we have now is something that should really be backpatched,
I was just waiting to have all the existing stability issues
addressed, the last one on my agenda being the failure of hamster for
test 005 I mentioned in another thread before sending patches for
other branches. I proposed a couple of potential regarding that
actually, see here:
http://www.postgresql.org/message-id/cab7npqsaz9hnucmoua30jo2wj8mnrem18p2a7mcra-zrjxj...@mail.gmail.com
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in logicaldecoding docs

2016-03-07 Thread Tom Lane

David Rowley  writes:
> The attached fixes a small error in the logicaldecoding docs.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-07 Thread Andres Freund

Hi,

I'm setting up a buildfarm animal that runs under
valgrind. Unfortunately there's not really any good solution to force
make check et al. to start postgres wrapped in valgrind.  For now I've
resorted to adding something like

sub replace_postgres
{
my $srcdir=$use_vpath ? "../pgsql/" : ".";
my $builddir=abs_path("$pgsql");
$srcdir=abs_path("$pgsql/$srcdir");
chdir "$pgsql/src/backend/";
rename "postgres", "postgres.orig";
sysopen my $fh, "postgres", O_CREAT|O_TRUNC|O_RDWR, 0700
or die "Could not create postgres wrapper";
print $fh <<"END";
#!/bin/bash
~/src/valgrind/vg-in-place \\
--quiet \\
--error-exitcode=128 \\
--suppressions=$srcdir/src/tools/valgrind.supp \\
--trace-children=yes --track-origins=yes --read-var-info=yes \\
--leak-check=no \\
$builddir/src/backend/postgres.orig \\
"\$@"
END
close $fh;
chdir $branch_root;
}
to the buildfarm client.

i.e. a script that replaces the postgres binary with a wrapper that
invokes postgres via valgrind.

That's obviously not a very good approach though. It's buildfarm
specific and thus can't be invoked by developers and it doesn't really
support being installed somewhere.

Does anybody have a better idea about how to do this?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix misspelling of "parallel"

2016-03-07 Thread Tom Lane

David Rowley  writes:
> The attached fixes a small spelling error in a comment.

"grep" found another one.  Pushed, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Michael Paquier

On Mon, Mar 7, 2016 at 9:25 PM, Craig Ringer wrote:
> pg_rewind will help a lot there if it proves robust enough -

FWIW, some of my colleagues are doing a lot of QE/QA on a HA solution
based on pg_rewind, and it is proving to be quite stable for the
moment, they are having a hard time breaking it (that's the 9.4
version, though it is intentionally close enough to what is in core,
I'm careful about that).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-03-07 Thread Vitaly Burovoy

On 2016-02-29 17:19+00, Dmitry Dolgov <9erthali...@gmail.com> wrote:
On 2016-02-24 19:37+00, Petr Jelinek  wrote:
>> Also this patch needs documentation.
> I've added new version in attachments, thanks.

Hello! The first pass of a review is below.

1. Rename the "flag" variable to something more meaningful. (e.g.
"op_type" - "operation_type")

2. There is two extra spaces after the "_DELETE" word
+#define JB_PATH_DELETE 0x0002

3.
res = setPath(, path_elems, path_nulls, path_len, ,
- 0, newval, create);
+ 0, newval, create ? JB_PATH_CREATE : 0x0);

What is the magic constant "0x0"? If not "create", then what?
Introduce something like JB_PATH_NOOP = 0x0...

4. In the "setPathArray" the block:
if (newval != NULL)
"newval == NULL" is considered as "to be deleted" from the previous
convention and from the comments for the "setPath" function.
I think since you've introduced special constants one of which is
JB_PATH_DELETE (which is not used now) you should replace convention
from checking for a value to checking for a constant:
if (flag != JB_PATH_DELETE)
or even better:
if (!flag & JB_PATH_DELETE)

5. Change checking for equality (to bit constants) to bit operations:
(flag == JB_PATH_INSERT_BEFORE || flag == JB_PATH_INSERT_AFTER)
which can be replaced to:
(flag & (JB_PATH_INSERT_BEFORE | JB_PATH_INSERT_AFTER))

also here:
(flag == JB_PATH_CREATE || flag == JB_PATH_INSERT_BEFORE || flag
== JB_PATH_INSERT_AFTER))
can be:
(flag & (JB_PATH_CREATE | JB_PATH_INSERT_BEFORE | JB_PATH_INSERT_AFTER))

6. Pay attention to parenthesises to make the code looks like similar
one around:
+   if ((npairs == 0) && flag == JB_PATH_CREATE && (level == path_len - 1))
should be:
+   if ((npairs == 0) && (flag == JB_PATH_CREATE) && (level == path_len - 
1))

7. Why did you remove "skip"? It is a comment what "true" means...
-   r = JsonbIteratorNext(it, , true);/* skip 
*/
+   r = JsonbIteratorNext(it, , true);

8. After your changes some statements exceed 80-column threshold...
The same rules for the documentation.

9. And finally... it does not work as expected in case of:
postgres=# select jsonb_insert('{"a":[0,1,2,3]}', '{"a", 10}', '"4"');
jsonb_insert
-
 {"a": [0, 1, 2, 3]}
(1 row)

wheras jsonb_set works:
postgres=# select jsonb_set('{"a":[0,1,2,3]}', '{"a", 10}', '"4"');
jsonb_set
--
 {"a": [0, 1, 2, 3, "4"]}
(1 row)

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-03-07 Thread Amit Langote

Hi Corey,

Sorry for replying so late.

On 2016/02/25 3:31, Corey Huinker wrote:
> 

[ ... ]

> So I would assume that we'd use a syntax that presumed the columns were in
> a composite range type.
> 
> Which means your creates would look like (following Robert Haas's implied
> suggestion that we leave off the string literal quotes):
> 
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES (  , (b,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );
> 
> That's not terrible looking.

So I tried a grammar that looked like the following:

range_spec: lb_inc bound ',' bound ub_inc{}

lb_inc: '[' { $$ = true; } | '('  { $$ = false; }
ub_inc: ']' { $$ = true; } | ')' { $$ = false; }

bound: a_expr
   {
  if (IsA($1, RowExpr))/* (key1, key2, ...) */
  $$ = ((RowExpr) $1)->args;
  else /* key */
  $$ = list_make1($1);
   }
   | /* EMPTY */ { $$ = NIL; }

Everything seemed to go dandy until I tried FOR VALUES (blah , blah],
where psql wouldn't send the command string without accepting the closing
parenthesis, :(.  So maybe I should try to put the whole thing in '', that
is, accept the full range_spec in a string, but then we are back to
requiring full-blown range parse function which I was trying to avoid by
using the aforementioned grammar.  So, I decided to move ahead with the
following grammar for time being:

START (lower-bound) [ EXCLUSIVE ]
| END (upper-bound) [ INCLUSIVE ]
| START (lower-bound) [ EXCLUSIVE ] END (upper-bound) [ INCLUSIVE ]

Where,

*-bound: a_expr
 | *-bound ',' a_expr

Note that in the absence of explicit specification, lower-bound is
inclusive and upper-bound is exclusive.

So the verbosity couldn't be helped unless accepting range literal in
string form and exporting rangetypes.c:range_parse() with range
partitioning-specific hacks (consider composite bounds) to parse it are
acceptable things.

>> IOW, one shouldn't create an unbounded partition if more partitions in the
>> unbounded direction are expected to be created.  It would be OK for
>> unbounded partitions to be on the lower end most of the times.
>>
> 
> On this I'll have to disagree. My own use case where I use my
> range_partitioning extension starts off with a single partition () and all
> new partitions are splits of that. The ranges evolve over time as
> partitions grow and slow down. It's nice because we're not trying to
> predict where growth will be, we split where growth is.

Okay, perhaps I should not presume a certain usage.  However, as you know,
the usage like yours requires some mechanism of data redistribution (also
not without some syntax), which I am not targeting with the initial patch.
 If that was the only way of creating partitions, matters would be a
little easier - you only specify a split point and have some rule about
inclusivity around the split point.  But we have to start with the usage
where each new partition is separately created with explicit partition
bound specification that is complete in itself and that's where the logic
to check partition invariants may get a bit complicated.

> Ok, I'll wait a bit. In the mean time I can tell you a bit about the
> existing production system I'm hoping to replicate in true partitioning
> looks like this:
> 
> Big Master Table:
>  Range partition by C collated text
>Date Range
>Date Range
>...
>  Range partition by C collated text
>Date Range
>Date Range
>...
> ...
> 
> Currently this is accomplished through my range_partitioning module, and
> then using pg_partman on those partitions. It works, but it's a lot of
> moving parts.
> 
> The machine will be a 32 core AWS box. As per usual with AWS, it will be
> have ample memory and CPU, and be somewhat starved for I/O.
> 
> Question: I haven't dove into the code, but I was curious about your tuple
> routing algorithm. Is there any way for the algorithm to begin it's scan of
> candidate partitions based on the destination of the last row inserted this
> statement? I ask because most use cases (that I am aware of) have data that
> would naturally cluster in the same partition.

No.  Actually the tuple-routing function starts afresh for each row.  For
range partitions, it's binary search over an array of upper bounds.  There
is no row-to-row state caching in the partition module itself.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-07 Thread Peter Geoghegan

On Mon, Mar 7, 2016 at 4:50 PM, Robert Haas  wrote:
> Here's an updated patch with an API that I think is much more
> reasonable to expose to users, and documentation!  It assumes that the
> patch I posted a few hours ago to remove PD_ALL_FROZEN will be
> accepted; if that falls apart for some reason, I'll update this.  I
> plan to push this RSN if nobody objects.

Thanks for making the effort to make the tool generally available.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Igal @ Lucee.org


Ian,

On 3/7/2016 4:17 PM, Ian Barwick wrote:


FYI something similar has been proposed before:

   http://www.postgresql.org/message-id/53953efb.8070...@2ndquadrant.com

The linked thread might provide more insights into the issues surrounding
this proposal.


It's funny how I've encountered the same issue and reached the same 
conclusion as you did.  The main difference is that I suggested 
returning NULL values instead of throwing an error.


I read through the whole thread and it seems to me like there was quite 
a bit of support for that feature, with Tom still unconvinced that this 
feature is useful -- but quite a few others who see the benefit in it, 
especially Java users who experience that problem first hand -- and 
Rushabh complaining about white space in the patch?


I'm not sure why it was not accepted at the end?


Igal




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-07 Thread Robert Haas

On Sat, Mar 5, 2016 at 9:25 AM, Masahiko Sawada  wrote:
>> I actually think end-users might well want to use it.  Also, I created
>> it by hacking up pg_freespacemap, so it may make sense to have it in
>> the same place.
>> I would also be tempted to add an additional C functions that scan the
>> entire visibility map and return counts of the total number of bits of
>> each type that are set, and similarly for the page level bits.
>> Presumably that would be much faster than
>
> +1.
>
>> I am also tempted to change the API to be a bit more friendly,
>> although I am not sure exactly how.  This was a quick and dirty hack
>> so that I could test, but the hardest thing about making it not a
>> quick and dirty hack is probably deciding on a good UI.
>
> Does it mean visibility map API in visibilitymap.c?

Here's an updated patch with an API that I think is much more
reasonable to expose to users, and documentation!  It assumes that the
patch I posted a few hours ago to remove PD_ALL_FROZEN will be
accepted; if that falls apart for some reason, I'll update this.  I
plan to push this RSN if nobody objects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/Makefile b/contrib/Makefile
index bd251f6..d12dd63 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -37,6 +37,7 @@ SUBDIRS = \
 		pgcrypto	\
 		pgrowlocks	\
 		pgstattuple	\
+		pg_visibility	\
 		postgres_fdw	\
 		seg		\
 		spi		\
diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
new file mode 100644
index 000..fbbaa2e
--- /dev/null
+++ b/contrib/pg_visibility/Makefile
@@ -0,0 +1,19 @@
+# contrib/pg_visibility/Makefile
+
+MODULE_big = pg_visibility
+OBJS = pg_visibility.o $(WIN32RES)
+
+EXTENSION = pg_visibility
+DATA = pg_visibility--1.0.sql
+PGFILEDESC = "pg_visibility - page visibility information"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_visibility
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_visibility/pg_visibility--1.0.sql b/contrib/pg_visibility/pg_visibility--1.0.sql
new file mode 100644
index 000..9616e1f
--- /dev/null
+++ b/contrib/pg_visibility/pg_visibility--1.0.sql
@@ -0,0 +1,52 @@
+/* contrib/pg_visibility/pg_visibility--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_visibility" to load this file. \quit
+
+-- Show visibility map information.
+CREATE FUNCTION pg_visibility_map(regclass, blkno bigint,
+  all_visible OUT boolean,
+  all_frozen OUT boolean)
+RETURNS record
+AS 'MODULE_PATHNAME', 'pg_visibility_map'
+LANGUAGE C STRICT;
+
+-- Show visibility map and page-level visibility information.
+CREATE FUNCTION pg_visibility(regclass, blkno, bigint,
+			  all_visible OUT boolean,
+			  all_frozen OUT boolean,
+			  pd_all_visible OUT boolean)
+RETURNS record
+AS 'MODULE_PATHNAME', 'pg_visibility'
+LANGUAGE C STRICT;
+
+-- Show visibility map information for each block in a relation.
+CREATE FUNCTION pg_visibility_map(regclass, blkno OUT bigint,
+  all_visible OUT boolean,
+  all_frozen OUT boolean)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_visibility_map_rel'
+LANGUAGE C STRICT;
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_visibility(regclass, blkno OUT bigint,
+			  all_visible OUT boolean,
+			  all_frozen OUT boolean,
+			  pd_all_visible OUT boolean)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_visibility_rel'
+LANGUAGE C STRICT;
+
+-- Show summary of visibility map bits for a relation.
+CREATE FUNCTION pg_visibility_map_summary(regclass,
+OUT all_visible bigint, OUT all_frozen bigint)
+RETURNS record
+AS 'MODULE_PATHNAME', 'pg_visibility_map_summary'
+LANGUAGE C STRICT;
+
+-- Don't want these to be available to public.
+REVOKE ALL ON FUNCTION pg_visibility_map(regclass, bigint) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_visibility(regclass, bigint) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_visibility_map(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_visibility(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_visibility_map_summary(regclass) FROM PUBLIC;
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
new file mode 100644
index 000..d4336ce
--- /dev/null
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -0,0 +1,346 @@
+/*-
+ *
+ * pg_visibility.c
+ *	  display visibility map information and page-level visibility bits
+ *
+ *	  contrib/pg_visibility/pg_visibility.c
+ *-
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include

Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-03-07 Thread Regina Obe

>> On Fri, Mar 4, 2016 at 9:29 PM, Regina Obe > wrote:
>> I think the answer to this question is NO, but thought I'd ask.
>>
>> A lot of folks in PostGIS land are suffering from restore issues, 
>> materialized view issues etc. because we have functions such as
>>
>> ST_Intersects
>>
>> Which does _ST_Intersects  AND &&
>>
>> Since _ST_Intersects is not schema qualified, during database restore 
>> (which sets the schema to the table or view schema), materialized 
>> views that depend on this do not come back.

> Could you provide a self-contained, reproducible test case that illustrates 
> this problem?  Ideally, one that doesn't involve installing PostGIS?

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Here is a script just involving the built in geometric types that has the same 
issue:


-- script starts here --
CREATE schema funcs;

set search_path=public,funcs;
CREATE OR REPLACE FUNCTION funcs._helper(box, box) RETURNS float8 AS
$$
  SELECT box_distance($1,$2);
$$
language 'sql' IMMUTABLE STRICT;


CREATE OR REPLACE FUNCTION funcs.no_inline(box,box) RETURNS boolean AS
$$
 SELECT $1 && $2 AND _helper($1,$2) = 0;
$$
language 'sql' IMMUTABLE;

--doing this kills inlining
ALTER FUNCTION funcs.no_inline(box, box) SET search_path=funcs;


--this one doesn't have search_path set so inlining works
CREATE OR REPLACE FUNCTION funcs.inline(box,box) RETURNS boolean AS
$$
 SELECT $1 && $2 AND _helper($1,$2) = 0;
$$
language 'sql' IMMUTABLE;


CREATE TABLE bag_boxes(id serial primary key, geom box);
CREATE INDEX idx_bag_boxes_geom ON bag_boxes USING gist(geom);

INSERT INTO bag_boxes(geom)
SELECT ('((' || i::text || ',' || j::text || '), (' || k::text || ', ' || 
l::text || '))')::box
FROM generate_series(1,10) i , generate_series(11,20) j, generate_series(5,10) 
k, generate_series(10, 15) l ;


SELECT b1.id, b2.id As id2
FROM bag_boxes AS b1 INNER JOIN bag_boxes As b2 ON no_inline(b1.geom, b2.geom);

-- plan looks like this -- PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 
32-bit
/** Nested Loop  (cost=0.00..3402141.00 rows=432 width=8)
  Join Filter: no_inline(b1.geom, b2.geom)
  ->  Seq Scan on bag_boxes b1  (cost=0.00..66.00 rows=3600 width=36)
  ->  Materialize  (cost=0.00..84.00 rows=3600 width=36)
->  Seq Scan on bag_boxes b2  (cost=0.00..66.00 rows=3600 width=36) **/



SELECT b1.id, b2.id As id2
FROM bag_boxes AS b1 INNER JOIN bag_boxes As b2 ON inline(b1.geom, b2.geom);


-- plan looks like this PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 
32-bit
/** Nested Loop  (cost=0.15..2359.00 rows=324 width=8)
  ->  Seq Scan on bag_boxes b1  (cost=0.00..66.00 rows=3600 width=36)
  ->  Index Scan using idx_bag_boxes_geom on bag_boxes b2  (cost=0.15..0.63 
rows=1 width=36)
Index Cond: (b1.geom && geom)
Filter: (box_distance(b1.geom, geom) = '0'::double precision) **/

-- end script --

Thanks,
Regina







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Ian Barwick

Hi

On 08/03/16 05:32, Igal @ Lucee.org wrote:
> THE ISSUE:
> 
> In JDBC there is a flag called RETURN_GENERATED_KEYS -- 
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#RETURN_GENERATED_KEYS
> 
(...)
> THE PROPOSAL:
> 
> The proposal is to allow something like RETURNING primary_key() (it can be a 
> keyword, not
> necessarily a function), e.g.
> 
> INSERT INTO test VALUES ('PostgresQL') RETURNING primary_key();

FYI something similar has been proposed before:

  http://www.postgresql.org/message-id/53953efb.8070...@2ndquadrant.com

The linked thread might provide more insights into the issues surrounding
this proposal.


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Joe Conway

On 03/07/2016 08:02 AM, Joe Conway wrote:
> On 03/06/2016 07:34 PM, Tom Lane wrote:
>> Joe Conway  writes:
>>> On 03/06/2016 05:47 PM, Tom Lane wrote:
 That's much better, but is there a reason you're using exit(2)
 and not exit(EXIT_FAILURE) ?
>>
>>> Only because I was trying to stick with what was originally in
>>> src/bin/pg_controldata/pg_controldata.c
>>
>> Meh.  It looks to me like the standard way to handle this
>> for code in src/common/ is exit(EXIT_FAILURE).
> 
> I have no issue with using EXIT_FAILURE, but note:

Committed/pushed with exit(EXIT_FAILURE)

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

[HACKERS] Typo in logicaldecoding docs

2016-03-07 Thread David Rowley

The attached fixes a small error in the logicaldecoding docs.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


logicaldecoding_docs_typo_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix misspelling of "parallel"

2016-03-07 Thread David Rowley

The attached fixes a small spelling error in a comment.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_spelling_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Josh berkus


On 03/07/2016 01:43 PM, Josh berkus wrote:

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people
using it as an alternative to PostgreSQL.  Especially since they're
picking up a lot of our better features, like R support.



Sorry, that was meant to be posted to pgsql-advocacy.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Splitting lengthy sgml files

2016-03-07 Thread Tatsuo Ishii

> Surely that's a github bug that you should be complaining to them about?

No, it's a known limitation:
https://help.github.com/articles/what-are-the-limits-for-viewing-content-and-diffs-in-my-repository/

> I'm disinclined to split existing files because (a) it would complicate
> back-patching and (b) it would be completely destructive to git history.
> git claims to understand about file moves but it doesn't do a terribly
> good job with that history-wise (try git log or git blame on
> recently-moved files such as pgbench).  And I've never heard even
> a claim that it understands splits.
> 
> There might be reasons to override those disadvantages and do it
> anyway ... but this doesn't sound like a very good reason.

Ok, I will try to find workarounds for this, including forking.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Álvaro Hernández Tortosa




On 07/03/16 23:32, Joshua D. Drake wrote:

On 03/07/2016 01:43 PM, Josh berkus wrote:

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people
using it as an alternative to PostgreSQL.  Especially since they're
picking up a lot of our better features, like R support.



Yes but:

1. MSSQL is not going to magically run well on Linux. It was never 
designed to run on a *NIX platform and it long left the parts that 
would (Sybase) in the dust.


I'd be surprised if it would *not* run well on Linux. Linux is an 
easy platform to develop, it's fast and reliable and they have plenty of 
time (till release).


2. This is actually good for us. It validates our primary deployment 
platform. Why run a closed source MSSQL when you can get Open Source 
PostgreSQL on the same platform major platform?


SQL Server seems to have some nice features that PostgreSQL don't 
have. Encryption support and Hekaton (columnar-store "shared buffers") 
come to my mind. It's definitely not a toy database, even less on Linux.




3. Competition is good when we have it, our community steps up.



Absolutely, competition is very welcome!

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Yorick de Wid

Microsoft is only trying to raise some awareness for their new ASP.NET 5 
multi-platform, which works closely with MSSQL. I don't believe they are 
honestly interested in the *NIX user. They extend their userbase in the hope to 
get them onto Azure.

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Joshua D. Drake
Sent: maandag 7 maart 2016 23:32
To: Josh berkus ; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] New competition from Microsoft?

On 03/07/2016 01:43 PM, Josh berkus wrote:
> All,
>
> http://blogs.microsoft.com/?p=67248
>
> Once SQL Server is available on Linux, we're going to see more people 
> using it as an alternative to PostgreSQL.  Especially since they're 
> picking up a lot of our better features, like R support.
>

Yes but:

1. MSSQL is not going to magically run well on Linux. It was never designed to 
run on a *NIX platform and it long left the parts that would
(Sybase) in the dust.

2. This is actually good for us. It validates our primary deployment platform. 
Why run a closed source MSSQL when you can get Open Source PostgreSQL on the 
same platform major platform?

3. Competition is good when we have it, our community steps up.

-- 
Command Prompt, Inc.  http://the.postgres.company/
 +1-503-667-4564 PostgreSQL Centered full stack 
support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Joe Conway

On 03/07/2016 02:26 PM, Stephen Frost wrote:
> * Joe Conway (m...@joeconway.com) wrote:
>> On 03/07/2016 01:43 PM, Josh berkus wrote:
>>> http://blogs.microsoft.com/?p=67248
>>>
>>> Once SQL Server is available on Linux, we're going to see more people
>>> using it as an alternative to PostgreSQL.  Especially since they're
>>> picking up a lot of our better features, like R support.
>>
>> IANAL, but I wonder how they can have R support given that libR.so is
>> GPL licensed, not LPGL? Have they open sourced SQL Server?
> 
> I thought they had purchased/partnered with an R implementor that didn't
> use the GPL one...

Microsoft bought Revolution R, but that implementation is still GPL --
it has to be because it is derived from the open source R project which
is GPL.

Anyway, shouldn't we move this to advocacy?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Joshua D. Drake


On 03/07/2016 01:43 PM, Josh berkus wrote:

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people
using it as an alternative to PostgreSQL.  Especially since they're
picking up a lot of our better features, like R support.



Yes but:

1. MSSQL is not going to magically run well on Linux. It was never 
designed to run on a *NIX platform and it long left the parts that would 
(Sybase) in the dust.


2. This is actually good for us. It validates our primary deployment 
platform. Why run a closed source MSSQL when you can get Open Source 
PostgreSQL on the same platform major platform?


3. Competition is good when we have it, our community steps up.

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Stephen Frost

* Joe Conway (m...@joeconway.com) wrote:
> On 03/07/2016 01:43 PM, Josh berkus wrote:
> > http://blogs.microsoft.com/?p=67248
> > 
> > Once SQL Server is available on Linux, we're going to see more people
> > using it as an alternative to PostgreSQL.  Especially since they're
> > picking up a lot of our better features, like R support.
> 
> IANAL, but I wonder how they can have R support given that libR.so is
> GPL licensed, not LPGL? Have they open sourced SQL Server?

I thought they had purchased/partnered with an R implementor that didn't
use the GPL one...

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [pgsql-advocacy] [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Adrian Klaver

On 03/07/2016 01:37 PM, Josh berkus wrote:
> MauMau,
> 
> Crossing this over to pgsql-advocacy list where it really belongs.
> That's what that list is *for*.
> 
> Especially since the discussion on -hackers has focused on new
> PostgreSQL Features, which while also good don't address the general
> question.
> 
> On 03/05/2016 09:29 PM, MauMau wrote:
>> Hello,
>> As I said in the previous greeting mail, I'd like to discuss how to
>> expand PostgreSQL ecosystem.  Here, ecosystem means "interoperability"
>> -- the software products and cloud services which use/support
>> PostgreSQL.  If pgsql-advocacy or somewhere else is better for this
>> topic, just tell me so.
>> THE BACKGROUND
>> ==
>> Thanks to the long and hard efforts by the community, PostgreSQL has
>> been evolving to be a really great software comparable to existing
>> strong commercial products.  Open source databases are gaining more
>> popularity to influence the database market.
>> Open source threatens to eat the database market
>> http://www.infoworld.com/article/2916057/open-source-software/open-source-threatens-to-eat-the-database-market.html
>> "Though the proprietary RDBMS market grew at a sluggish 5.4 percent in
>> 2014, the open source database market grew 31 percent to hit $562 million."
>> "As Gartner highlights in a recent research report, open source
>> databases now consume 25 percent of relational database usage."
>> Perhaps related to this is that the revenues of Oracle, IBM and
>> Microsoft have been declining (but I read in an article that SQL Server
>> is gaining more revenue).
>> On the other hand, there is a gulf between the two top popular databases
>> -- Oracle and MySQL -- and PostgreSQL.  They are nearly five times more
>> popular than PostgreSQL.
>> DB-Engines Ranking
>> http://db-engines.com/en/ranking
>> Yes, I understand this ranking doesn't necessarily reflect the actual
>> use, but I also don't think the ranking is far from the real
>> popularity.  In fact, some surveys show that MySQL has been in more
>> widespread use even here in Japan than PostgreSQL since around 2010 (IIRC).
>> What should we do to boost the popularity of PostgreSQL?  One challenge
>> is to increase the number of software which supports PostgreSQL. To take
>> advantage of the trend of shift from commercial products to open source,
>> PostgreSQL needs to interoperate with many software that are used
>> together with the commercial databases.
>> The easily understandable target is Oracle, because it is anticipated
>> that more users of Oracle will seek another database to avoid the
>> expensive Oracle Standard Edition 2 and increasing maintenance costs.
>> In addition, PostgreSQL has affinity for Oracle.
>> However, there is a problem.  The number of software is very small that
>> the users can know to interoperate with PostgreSQL.  That is, when the
>> users want to migrate from commercial databases to PostgreSQL, they
>> can't get information on whether they can continue to use their assets
>> with PostgreSQL.  Many applications might be interoperable through
>> standard interfaces like JDBC/ODBC, but the case is unknown.  For example:
>> * Only 24 open source projects are listed as interoperable.
>> Open Source Projects Using PostgreSQL
>> https://wiki.postgresql.org/wiki/OpenSource_Projects_Using_PostgreSQL
>> * Even EnterpriseDB has only 12 certified application vendors.
>> http://www.enterprisedb.com/partner-programs/enterprisedb-certified-application-vendors
>> * PostgreSQL Enterprise Consortium lists only about30 related products
>> (Japanese only).
>> https://www.pgecons.org/postgresql-info/business_sw/
>> * MySQL touts more than 2,000 ISV/OEM/VARs.
>> http://www.mysql.com/oem/

Though if you go to the actual list:
http://www.mysql.com/customers/embedded/

There is at a rough count, 300.

This points to the central dilemma, some of which is addressed below,
separating fact from fiction.


>> Besides, in practice, we probably should increase the number of software
>> interoperable with PostgreSQL.  e.g. one customer asked us whether
>> Arcserve can be used to back up PostgreSQL databases, but unfortunately
>> we had to answer no.  They are using Arcserve to back up Oracle
>> databases and other resources.  "Then, you can use NetVault instead" is
>> not the best answer; they just want to replace the database.
>> PROPOSAL
>> ==
>> Last month, I attended the steering committee of PostgreSQL Enterprise
>> Consortium (PGECons) for the first time and proposed starting the
>> following activity.  PGECons is a Japanese non-profit organization to
>> promote PostgreSQL for enterprise use.  The members include NTT, SRA OSS
>> (Tatsuo Ishii runs), NEC, Hitachi, HP, Fujitsu, etc.  We concluded that
>> we need to consult the PostgreSQL community on how to proceed the
>> activity and work in cooperation with the community.
>> * Attract and ask

Re: [HACKERS] Parallel Aggregate

2016-03-07 Thread David Rowley

On 5 March 2016 at 07:25, Robert Haas  wrote:
> On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
>> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
>> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
>> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
>> stage. I just thought doing this is more complex than what's really
>> needed, but if someone can think of a case where this would be a great
>> win then I'll listen, but you have to remember we don't have any
>> pre-sorted partial paths at this stage, so an explicit sort is
>> required *always*. This might change if someone invented partial btree
>> index scans... but until then...
>
> Actually, Rahila Syed is working on that.  But it's not done yet, so
> presumably will not go into 9.6.
>
> I don't really see the logic of this, though.  Currently, Gather
> destroys the input ordering, so it seems preferable for the
> finalize-aggregates stage to use a hash aggregate whenever possible,
> whatever the partial-aggregate stage did.  Otherwise, we need an
> explicit sort.  Anyway, it seems like the two stages should be costed
> and decided on their own merits - there's no reason to chain the two
> decisions together.

Thanks for looking at this.
I've attached an updated patch which re-bases the whole patch on top
of the upper planner changes which have just been committed.
In this version create_grouping_paths() does now consider mixed
strategies of hashed and sorted, although I have a few concerns with
the code that I've written. I'm solely posting this early to minimise
any duplicate work.

My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?

2. In mixed parallel aggregate mode, when the query has no aggregate
functions, the code currently will use a nodeAgg for AGG_SORTED
strategy rather than a nodeGroup, as it would in serial agg mode. This
probably needs to be changed.

3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.

The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_cc75f61_2016-03-08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Joe Conway

On 03/07/2016 01:43 PM, Josh berkus wrote:
> http://blogs.microsoft.com/?p=67248
> 
> Once SQL Server is available on Linux, we're going to see more people
> using it as an alternative to PostgreSQL.  Especially since they're
> picking up a lot of our better features, like R support.

IANAL, but I wonder how they can have R support given that libR.so is
GPL licensed, not LPGL? Have they open sourced SQL Server?

;-P

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] unexpected result from to_tsvector

2016-03-07 Thread Artur Zakirov


Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:



Hello,

Should we added tests for this case?


I think we should. I have added tests for teo...@123-stack.net and 
1...@stack.net emails.




123_reg.ro  is not valid domain name, bacause of
symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub


Thank you for the information. Fixed.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/src/backend/tsearch/wparser_def.c
--- b/src/backend/tsearch/wparser_def.c
***
*** 1121,1126  static const TParserStateActionItem actionTPS_InUnsignedInt[] = {
--- 1121,1128 
  	{p_iseqC, '.', A_PUSH, TPS_InUDecimalFirst, 0, NULL},
  	{p_iseqC, 'e', A_PUSH, TPS_InMantissaFirst, 0, NULL},
  	{p_iseqC, 'E', A_PUSH, TPS_InMantissaFirst, 0, NULL},
+ 	{p_iseqC, '-', A_PUSH, TPS_InHostFirstAN, 0, NULL},
+ 	{p_iseqC, '@', A_PUSH, TPS_InEmail, 0, NULL},
  	{p_isasclet, 0, A_PUSH, TPS_InHost, 0, NULL},
  	{p_isalpha, 0, A_NEXT, TPS_InNumWord, 0, NULL},
  	{p_isspecial, 0, A_NEXT, TPS_InNumWord, 0, NULL},
*** a/src/test/regress/expected/tsearch.out
--- b/src/test/regress/expected/tsearch.out
***
*** 264,270  SELECT * FROM ts_token_type('default');
  23 | entity  | XML entity
  (23 rows)
  
! SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
   tokid |token 
--- 264,270 
  23 | entity  | XML entity
  (23 rows)
  
! SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net teo...@123-stack.net 1...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
   tokid |token 
***
*** 332,337  SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.w
--- 332,341 
  12 |  
   4 | teo...@stack.net
  12 |  
+  4 | teo...@123-stack.net
+ 12 |  
+  4 | 1...@stack.net
+ 12 |  
  16 | qwe-wer
  11 | qwe
  12 | -
***
*** 404,425  SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.w
  12 |  
  12 | <> 
   1 | qwerty
! (133 rows)
  
! SELECT to_tsvector('english', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
!to_tsvector
!

[HACKERS] Minor bug affecting ON CONFLICT lock wait log messages

2016-03-07 Thread Peter Geoghegan

Attached patch fixes a bug reported privately by Stephen this morning.
He complained about deadlocking ON CONFLICT DO NOTHING statements.
There were no exclusion constraints involved, and yet they were
incorrectly indicated as being involved in log messages that related
to these deadlocks.

-- 
Peter Geoghegan
From bc481af77994057cb1ffe4a0e471b38bb00dc228 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 7 Mar 2016 13:16:24 -0800
Subject: [PATCH] Avoid incorrectly indicating exclusion constraint wait

INSERT ... ON CONFLICT's precheck may have to wait on the outcome of
another insertion, which may or may not itself be a speculative
insertion.  This wait is not necessarily associated with an exclusion
constraint, but was always reported that way in log messages if the wait
happened to involve a tuple that had no speculative token.

Bug reported privately by Stephen Frost.  His case involved ON CONFLICT
DO NOTHING, where spurious references to exclusion constraints in log
messages were more likely.
---
 src/backend/executor/execIndexing.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 838cee7..5d553d5 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -725,6 +725,7 @@ retry:
 	{
 		TransactionId xwait;
 		ItemPointerData ctid_wait;
+		XLTW_Oper		reason_wait;
 		Datum		existing_values[INDEX_MAX_KEYS];
 		bool		existing_isnull[INDEX_MAX_KEYS];
 		char	   *error_new;
@@ -783,13 +784,14 @@ retry:
 			  TransactionIdPrecedes(GetCurrentTransactionId(), xwait
 		{
 			ctid_wait = tup->t_data->t_ctid;
+			reason_wait = indexInfo->ii_ExclusionOps ?
+XLTW_RecheckExclusionConstr : XLTW_InsertIndex;
 			index_endscan(index_scan);
 			if (DirtySnapshot.speculativeToken)
 SpeculativeInsertionWait(DirtySnapshot.xmin,
 		 DirtySnapshot.speculativeToken);
 			else
-XactLockTableWait(xwait, heap, _wait,
-  XLTW_RecheckExclusionConstr);
+XactLockTableWait(xwait, heap, _wait, reason_wait);
 			goto retry;
 		}
 
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New competition from Microsoft?

2016-03-07 Thread Josh berkus


All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people 
using it as an alternative to PostgreSQL.  Especially since they're 
picking up a lot of our better features, like R support.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Josh berkus

MauMau,

Crossing this over to pgsql-advocacy list where it really belongs.
That's what that list is *for*.

Especially since the discussion on -hackers has focused on new
PostgreSQL Features, which while also good don't address the general
question.

On 03/05/2016 09:29 PM, MauMau wrote:
> Hello,
> As I said in the previous greeting mail, I'd like to discuss how to 
> expand PostgreSQL ecosystem.  Here, ecosystem means "interoperability" 
> -- the software products and cloud services which use/support 
> PostgreSQL.  If pgsql-advocacy or somewhere else is better for this 
> topic, just tell me so.
> THE BACKGROUND
> ==
> Thanks to the long and hard efforts by the community, PostgreSQL has 
> been evolving to be a really great software comparable to existing 
> strong commercial products.  Open source databases are gaining more 
> popularity to influence the database market.
> Open source threatens to eat the database market
> http://www.infoworld.com/article/2916057/open-source-software/open-source-threatens-to-eat-the-database-market.html
> "Though the proprietary RDBMS market grew at a sluggish 5.4 percent in 
> 2014, the open source database market grew 31 percent to hit $562 million."
> "As Gartner highlights in a recent research report, open source 
> databases now consume 25 percent of relational database usage."
> Perhaps related to this is that the revenues of Oracle, IBM and 
> Microsoft have been declining (but I read in an article that SQL Server 
> is gaining more revenue).
> On the other hand, there is a gulf between the two top popular databases 
> -- Oracle and MySQL -- and PostgreSQL.  They are nearly five times more 
> popular than PostgreSQL.
> DB-Engines Ranking
> http://db-engines.com/en/ranking
> Yes, I understand this ranking doesn't necessarily reflect the actual 
> use, but I also don't think the ranking is far from the real 
> popularity.  In fact, some surveys show that MySQL has been in more 
> widespread use even here in Japan than PostgreSQL since around 2010 (IIRC).
> What should we do to boost the popularity of PostgreSQL?  One challenge 
> is to increase the number of software which supports PostgreSQL. To take 
> advantage of the trend of shift from commercial products to open source, 
> PostgreSQL needs to interoperate with many software that are used 
> together with the commercial databases.
> The easily understandable target is Oracle, because it is anticipated 
> that more users of Oracle will seek another database to avoid the 
> expensive Oracle Standard Edition 2 and increasing maintenance costs.  
> In addition, PostgreSQL has affinity for Oracle.
> However, there is a problem.  The number of software is very small that 
> the users can know to interoperate with PostgreSQL.  That is, when the 
> users want to migrate from commercial databases to PostgreSQL, they 
> can't get information on whether they can continue to use their assets 
> with PostgreSQL.  Many applications might be interoperable through 
> standard interfaces like JDBC/ODBC, but the case is unknown.  For example:
> * Only 24 open source projects are listed as interoperable.
> Open Source Projects Using PostgreSQL
> https://wiki.postgresql.org/wiki/OpenSource_Projects_Using_PostgreSQL
> * Even EnterpriseDB has only 12 certified application vendors.
> http://www.enterprisedb.com/partner-programs/enterprisedb-certified-application-vendors
> * PostgreSQL Enterprise Consortium lists only about30 related products 
> (Japanese only).
> https://www.pgecons.org/postgresql-info/business_sw/
> * MySQL touts more than 2,000 ISV/OEM/VARs.
> http://www.mysql.com/oem/
> Besides, in practice, we probably should increase the number of software 
> interoperable with PostgreSQL.  e.g. one customer asked us whether 
> Arcserve can be used to back up PostgreSQL databases, but unfortunately 
> we had to answer no.  They are using Arcserve to back up Oracle 
> databases and other resources.  "Then, you can use NetVault instead" is 
> not the best answer; they just want to replace the database.
> PROPOSAL
> ==
> Last month, I attended the steering committee of PostgreSQL Enterprise 
> Consortium (PGECons) for the first time and proposed starting the 
> following activity.  PGECons is a Japanese non-profit organization to 
> promote PostgreSQL for enterprise use.  The members include NTT, SRA OSS 
> (Tatsuo Ishii runs), NEC, Hitachi, HP, Fujitsu, etc.  We concluded that 
> we need to consult the PostgreSQL community on how to proceed the 
> activity and work in cooperation with the community.
> * Attract and ask product/service vendors to support/use PostgreSQL.
> Provide technical assistance to those vendors as an organization so that 
> they can support PostgreSQL smoothly.
> If the vendors aren't proactive, we verify the interoperability with 
> their software by executing it.
> * Make a directory of software/services that

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Igal @ Lucee.org


On 3/7/2016 1:20 PM, Tom Lane wrote:


Yeah.  I'm rather suspicious of this proposal; I do not think it's
actually very useful to return a primary-key value without any indication
of what the primary key is.  There are also corner cases where it seems
pretty ill-defined.  For example, suppose you do this on an inheritance
parent table that has a pkey defined, but not all its child tables do
(or maybe they do but their pkeys aren't identical to the parent's).
What should happen then?
First, thank you for your reply.  I appreciate it.  I do not know the 
answer to that question.


That's an exceptionally weak use-case to argue for this with.  Unless
you can get *all* those DBMS suppliers to invent equivalent features,
you're going to have to have pkey-querying logic anyway.  The argument
for bespoke syntax for it in just one DBMS seems pretty weak.
Fair enough, but my idea was that this will be used by the JDBC driver 
in this case.  The other DBMS suppliers have their JDBC driver return a 
value, usually it is SERIAL type.  But there is no standard for the 
column name.  In SQL Server, for example, it is IDENTITYCOL while in 
MySQL it is GENERATED_KEY.


The thing is that in SQL Server I can do, for example, "SELECT 
@@identity" and get the last value that was inserted.  In SQL Server, 
however, Microsoft took the easy way and enforced only a single 
auto-generated identity column per table.  The closest thing I can do in 
PostgreSQL is "SELECT lastval()" but what if there are multiple 
sequences in that table?

I am fairly sure, also, that all of those systems have support for the
SQL-standard information_schema views.  So if you write a pkey-identifying
query against those views, you'd have some chance of a solution that
actually did work everywhere.
The other JDBC drivers return the last SERIAL value from the table that 
had the insert, so there's no issue there.  Querying the 
information_schema views with each INSERT will probably cause a major 
performance hit.


Anyway, I trust that you know much more about databases than I do, so if 
you don't think that it's a good idea, I accept that.


Best,


Igal


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Álvaro Hernández Tortosa




On 07/03/16 11:54, José Luis Tallón wrote:

On 03/07/2016 07:30 AM, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mark Kirkwood
For cloud - in particular Openstack (which I am working with ATM), the
biggest thing would be:

- multi-master replication

or failing that:

- self managing single master failover (voting/quorum etc)


Hmm consul (optionally in a dockerized setup) ?
https://www.consul.io/



There are already HA solutions based on consensus, like Patroni: 
https://github.com/zalando/patroni





so that operators can essentially 'set and forget'. We currently use
Mysql+ Galera (multi master) and Mongodb (self managing single master)
and the convenience and simplicity is just so important (Openstack is a
huge complex collection of services - hand holding of any one 
service is

pretty much a non starter).
Yes, I was also asked whether PostgreSQL has any optional 
functionality like Galera Cluster for MySQL.  He was planning a 
scalable PaaS service which performs heavy reads and writes. Demand 
exists.


AFAIK, Galera has its own set of drawbacks


Right, some of them are explained here: 
https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Tom Lane

"Igal @ Lucee.org"  writes:
> On 3/7/2016 12:45 PM, Joshua D. Drake wrote:
>> I agree that the problem is that you don't always know what the 
>> primary key is.
>> I would argue the solution is to check before you write the query.

Yeah.  I'm rather suspicious of this proposal; I do not think it's
actually very useful to return a primary-key value without any indication
of what the primary key is.  There are also corner cases where it seems
pretty ill-defined.  For example, suppose you do this on an inheritance
parent table that has a pkey defined, but not all its child tables do
(or maybe they do but their pkeys aren't identical to the parent's).
What should happen then?

> Sure, that would be great, but perhaps I should have give some more context:
> We have an application server which allows our developers to query 
> databases with simplified syntax.  Our code is written in a generic way 
> to allow the developers that use our application server to pass whatever 
> query they want into the database server, whether it's SQL Server, 
> MySQL, Oracle, etc.

That's an exceptionally weak use-case to argue for this with.  Unless
you can get *all* those DBMS suppliers to invent equivalent features,
you're going to have to have pkey-querying logic anyway.  The argument
for bespoke syntax for it in just one DBMS seems pretty weak.

I am fairly sure, also, that all of those systems have support for the
SQL-standard information_schema views.  So if you write a pkey-identifying
query against those views, you'd have some chance of a solution that
actually did work everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-07 Thread Andres Freund

On 2016-03-07 21:10:19 +0100, Fabien COELHO wrote:
> Now I cannot see how having one context per table space would have a
> significant negative performance impact.

The 'dirty data' etc. limits are global, not per block device. By having
several contexts with unflushed dirty data the total amount of dirty
data in the kernel increases. Thus you're more likely to see stalls by
the kernel moving pages into writeback.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread David Rowley

On 8 March 2016 at 10:01, Tom Lane  wrote:
> I've pushed it now; we'll soon see if the buildfarm finds any problems.

Fantastic! I'm looking forward to all the future optimisation
opportunities that this opens up.
Thanks for making this happen.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Tom Lane

I wrote:
>> Attached is a version that addresses today's concerns, and also finishes
>> filling in the loose ends I'd left before, such as documentation and
>> outfuncs.c support.  I think this is in a committable state now, though
>> I plan to read through the whole thing again.

The extra read-through located some minor bugs, mainly places where I'd
forgotten to ensure that Path cost info was transposed into the generated
Plan.  That would only have the cosmetic effect that EXPLAIN would print
zeroes for estimated costs, and since we only use EXPLAIN COSTS OFF in
the regression tests, no test failures ensued :-(.

I've pushed it now; we'll soon see if the buildfarm finds any problems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unexpected result from to_tsvector

2016-03-07 Thread Dmitrii Golub

2016-02-23 20:53 GMT+03:00 Artur Zakirov :

> Hello,
>
> Here is a little patch. It fixes this issue
> http://www.postgresql.org/message-id/20160217080048.26357.49...@wrigleys.postgresql.org
>
> Without patch we get wrong result for the second email 't...@123-reg.ro':
>
> => SELECT * FROM ts_debug('simple', 't...@vauban-reg.ro');
>  alias |  description  |   token| dictionaries | dictionary |
>  lexemes
>
> ---+---++--++--
>  email | Email address | t...@vauban-reg.ro | {simple} | simple  | {
> t...@vauban-reg.ro}
> (1 row)
>
> => SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
>alias   |   description| token  | dictionaries | dictionary |
> lexemes
>
> ---+--++--++--
>  asciiword | Word, all ASCII  | test   | {simple} | simple | {test}
>  blank | Space symbols| @  | {}   ||
>  uint  | Unsigned integer | 123| {simple} | simple | {123}
>  blank | Space symbols| -  | {}   ||
>  host  | Host | reg.ro | {simple} | simple | {
> reg.ro}
> (5 rows)
>
> After patch we get correct result for the second email:
>
> => SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
>  alias |  description  |  token  | dictionaries | dictionary |
>lexemes
>
> ---+---+-+--++--
>  email | Email address | t...@123-reg.ro | {simple} | simple  | {
> t...@123-reg.ro}
> (1 row)
>
> This patch allows to parser work with emails 't...@123-reg.ro', '
> 1...@123-reg.ro' and 'test@123_reg.ro' correctly.
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Hello,

Should we added tests for this case?

123_reg.ro is not valid domain name, bacause of symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Igal @ Lucee.org


On 3/7/2016 12:45 PM, Joshua D. Drake wrote:


I agree that the problem is that you don't always know what the 
primary key is.


I would argue the solution is to check before you write the query.


Sure, that would be great, but perhaps I should have give some more context:

We have an application server which allows our developers to query 
databases with simplified syntax.  Our code is written in a generic way 
to allow the developers that use our application server to pass whatever 
query they want into the database server, whether it's SQL Server, 
MySQL, Oracle, etc.


The code that we use to wrap the SQL statement, as well as the JDBC 
code, has no idea about the table or its constraints, so it's not like 
I'm writing my own queries, and am just being lazy at checking what the 
primary key is.  I just can't know what the developer has in his database.


Sure, I can probably query it via metadata tables, etc., but that would 
be a much slower process.



Igal


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Joshua D. Drake


On 03/07/2016 12:32 PM, Igal @ Lucee.org wrote:


The problem is that we do not always know in advance what the Primary
Key is, and therefore a solution that was implemented in the pgjdbc


I agree that the problem is that you don't always know what the primary 
key is.


I would argue the solution is to check before you write the query.

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: RETURNING primary_key()

2016-03-07 Thread Igal @ Lucee.org


THE ISSUE:

In JDBC there is a flag called RETURN_GENERATED_KEYS -- 
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#RETURN_GENERATED_KEYS


Which is left quite ambiguous, but in general it is used to return the 
"generated" Primary Key on INSERT/UPDATE/DELETE -- which is mostly 
useful in the case of INSERT, of course, as the other commands do not 
generate a key.


We can already add RETURNING after an INSERT, e.g.

  CREATE TABLE test (name TEXT, id SERIAL PRIMARY KEY);
  INSERT INTO test VALUES ('PostgresQL') RETURNING id;

But the problem is that we need to know in advance the name of the "id" 
column, because if we had created the table like so:


  CREATE TABLE test (name TEXT, test_id SERIAL PRIMARY KEY);

Then we would need to use RETURNING "test_id" instead of "id".

The problem is that we do not always know in advance what the Primary 
Key is, and therefore a solution that was implemented in the pgjdbc 
driver was to append " RETURNING * " to the query, but that has its own 
problems, like returning a lot of data that is not needed, etc.  (you 
can see a longer discussion at https://github.com/pgjdbc/pgjdbc/issues/488 )



THE PROPOSAL:

The proposal is to allow something like RETURNING primary_key() (it can 
be a keyword, not necessarily a function), e.g.


  INSERT INTO test VALUES ('PostgresQL') RETURNING primary_key();

Which will return a record set according to the PRIMARY KEY that is set 
on the table.  So if the primary is "id", then you would get a column 
named "id", and if it is "test_id" you would get a column named 
"test_id" with the correct values.


If the PRIMARY KEY is made of multiple column, then all of those columns 
will be returned.


If the table does not have a PRIMARY KEY constraint then NULL will be 
returned with some arbitrary column name.


I would go further and suggest to add a function that will return the 
last primary key from a table, e.g.:


  SELECT last_primary_key() FROM test;

This of course can be beneficial for many users, and not only the JDBC 
community.


Thank you for your time and consideration,


Igal


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-07 Thread Fabien COELHO



Hello Andres,


(1) with 16 tablespaces (1 per table) on 1 disk : 680.0 tps
   per second avg, stddev [ min q1 median d3 max ] <=300tps
   679.6 ± 750.4 [0.0, 317.0, 371.0, 438.5, 2724.0] 19.5%

(2) with 1 tablespace on 1 disk : 956.0 tps
   per second avg, stddev [ min q1 median d3 max ] <=300tps
   956.2 ± 796.5 [3.0, 488.0, 583.0, 742.0, 2774.0] 2.1%


Well, that's not a particularly meaningful workload. You increased the 
number of flushed to the same number of disks considerably.


It is just a simple workload designed to emphasize the effect of having 
one context shared for all table space instead of on per tablespace, 
without rewriting the patch and without a large host with multiple disks.


For a meaningful comparison you'd have to compare using one writeback 
context for N tablespaces on N separate disks/raids, and using N 
writeback contexts for the same.


Sure, it would be better to do that, but that would require (1) rewriting 
the patch, which is a small work, and also (2) having access to a machine 
with a number of disks/raids, that I do NOT have available.



What happens in the 16 tb workload is that much smaller flushes are 
performed on the 16 files writen in parallel, so the tps performance is 
significantly degraded, despite the writes being sorted in each file. On 
one tb, all buffers flushed are in the same file, so flushes are much more 
effective.


When the context is shared and checkpointer buffer writes are balanced 
against table spaces, then when the limit is reached the flushing gets few 
buffers per tablespace, so this limits sequential writes to few buffers, 
hence the performance degradation.


So I can explain the performance degradation *because* the flush context 
is shared between the table spaces, which is a logical argument backed 
with experimental data, so it is better than handwaving. Given the 
available hardware, this is the best proof I can have that context should 
be per table space.


Now I cannot see how having one context per table space would have a 
significant negative performance impact.


So the logical conclusion for me is that without further experimental data 
it is better to have one context per table space.


If you have a hardware with plenty disks available for testing, that would 
provide better data, obviously.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecGather() + nworkers

2016-03-07 Thread Robert Haas

On Mon, Mar 7, 2016 at 2:13 PM, Peter Geoghegan  wrote:
> On Mon, Mar 7, 2016 at 4:04 AM, Amit Kapila  wrote:
>> Your point is genuine, but OTOH let us say if max_parallel_degree = 1 means
>> parallelism is disabled then when somebody sets max_parallel_degree = 2,
>> then it looks somewhat odd to me that, it will mean that 1 worker process
>> can be used for parallel query.
>
> I'm not sure that that has to be true.
>
> What is the argument for only using one worker process, say in the
> case of parallel seq scan? I understand that parallel seq scan can
> consume tuples itself, which seems like a good principle, but how far
> does it go, and how useful is it in the general case? I'm not
> suggesting that it isn't, but I'm not sure.
>
> How common is it for the leader process to do anything other than
> coordinate and consume from worker processes?

1 worker is often a very big speedup vs. 0 workers, and the work can
easily be evenly distributed between the worker and the leader.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Joshua D. Drake


On 03/07/2016 11:31 AM, MauMau wrote:


Why don't we enrich the catalog?  I'd like to hear ideas on how to
enrich the catalog efficiently.  It's ideal for software vendors and
users to voluntarily add to the catalog.


I think the product/software directory has vastly outlived its purpose. 
We are not longer a, "I wonder if Pg works with X". It is more, "Why 
doesn't X work with Pg" nowadays.


I could see a page that discusses various tools in general that are 
supported but by no means do we need a "list" anymore.


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread MauMau

From: Craig Ringer 
--
* Make a directory of software/services that can be used with PostgreSQL on the 
community web site (wiki.postgresql.org or www.postgresql.org).
Software/services vendors and PostgreSQL developers/users can edit this 
directory.

I thought we had that? Yep.

http://www.postgresql.org/download/product-categories/
--

There are 172 software listed here in total.  Excluding the following 
categories which can be considered part of a DBMS, the number is 121.  
Certainly not much.

•Drivers and interfaces
•PostgreSQL-derived servers
•PostgreSQL extensions
•Procedural languages

Speaking without fear of misunderstanding, there are few software products 
listed that migrators from commercial databases would want to know...  Even 
famous software like Pentaho, Microstrategy, Tableau, Eclipse and Visual Studio 
are not listed, which are known to work or should work with PostgreSQL.  This 
is a shame.

Why don't we enrich the catalog?  I'd like to hear ideas on how to enrich the 
catalog efficiently.  It's ideal for software vendors and users to voluntarily 
add to the catalog.


Regards
MauMau

Re: [HACKERS] ExecGather() + nworkers

2016-03-07 Thread Peter Geoghegan

On Mon, Mar 7, 2016 at 4:04 AM, Amit Kapila  wrote:
> Your point is genuine, but OTOH let us say if max_parallel_degree = 1 means
> parallelism is disabled then when somebody sets max_parallel_degree = 2,
> then it looks somewhat odd to me that, it will mean that 1 worker process
> can be used for parallel query.

I'm not sure that that has to be true.

What is the argument for only using one worker process, say in the
case of parallel seq scan? I understand that parallel seq scan can
consume tuples itself, which seems like a good principle, but how far
does it go, and how useful is it in the general case? I'm not
suggesting that it isn't, but I'm not sure.

How common is it for the leader process to do anything other than
coordinate and consume from worker processes?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-07 Thread Robert Haas

On Mon, Mar 7, 2016 at 12:41 PM, Masahiko Sawada  wrote:
> Attached latest version optimisation patch.
> I'm still consider regarding pg_upgrade regression test code, so I
> will submit that patch later.

I was thinking more about this today and I think that we don't
actually need the PD_ALL_FROZEN page-level bit for anything.  It's
enough that the bit is present in the visibility map.  The only point
of PD_ALL_VISIBLE is that it tells us that we need to clear the
visibility map bit, but that bit is enough to tell us to clear both
visibility map bits.  So I propose the attached cleanup patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8a64321..34ba385 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7855,10 +7855,7 @@ heap_xlog_visible(XLogReaderState *record)
 		 */
 		page = BufferGetPage(buffer);
 
-		if (xlrec->flags & VISIBILITYMAP_ALL_VISIBLE)
-			PageSetAllVisible(page);
-		if (xlrec->flags & VISIBILITYMAP_ALL_FROZEN)
-			PageSetAllFrozen(page);
+		PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2e64fc3..eaab4be 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -39,15 +39,15 @@
  *
  * When we *set* a visibility map during VACUUM, we must write WAL.  This may
  * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is all-visible or
- * all-frozen we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE
- * or PD_ALL_FROZEN bit on the page itself, and the corresponding visibility
- * map bit.  If a crash occurs after the visibility map page makes it to disk
- * and before the updated heap page makes it to disk, redo must set the bit on
- * the heap page.  Otherwise, the next insert, update, or delete on the heap
- * page will fail to realize that the visibility map bit must be cleared,
- * possibly causing index-only scans to return wrong answers.
+ * it may still be the case that every tuple on the page is visible to all
+ * transactions; we just don't know that for certain.  The difficulty is that
+ * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
+ * on the page itself, and the visibility map bit.  If a crash occurs after the
+ * visibility map page makes it to disk and before the updated heap page makes
+ * it to disk, redo must set the bit on the heap page.  Otherwise, the next
+ * insert, update, or delete on the heap page will fail to realize that the
+ * visibility map bit must be cleared, possibly causing index-only scans to
+ * return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -251,11 +251,10 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer buf)
  * to InvalidTransactionId when a page that is already all-visible is being
  * marked all-frozen.
  *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE or PD_ALL_FROZEN
- * bit before calling this function. Except in recovery, caller should also
- * pass the heap buffer and flags which indicates what flag we want to set.
- * When checksums are enabled and we're not in recovery, we must add the heap
- * buffer to the WAL chain to protect it from being torn.
+ * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
+ * this function. Except in recovery, caller should also pass the heap
+ * buffer. When checksums are enabled and we're not in recovery, we must add
+ * the heap buffer to the WAL chain to protect it from being torn.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -315,10 +314,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 {
 	Page		heapPage = BufferGetPage(heapBuf);
 
-	/* Caller is expected to set page-level bits first. */
-	Assert((flags & VISIBILITYMAP_ALL_VISIBLE) == 0 || PageIsAllVisible(heapPage));
-	Assert((flags & VISIBILITYMAP_ALL_FROZEN) == 0 || PageIsAllFrozen(heapPage));
-
+	/* caller is expected to set PD_ALL_VISIBLE first */
+	Assert(PageIsAllVisible(heapPage));
 	PageSetLSN(heapPage, recptr);
 }
 			}
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 8f7b248..363b2d0 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -766,7 +766,6 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	log_newpage_buffer(buf, true);

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-07 Thread Andres Freund

On 2016-02-22 20:44:35 +0100, Fabien COELHO wrote:
> 
> >>Random updates on 16 tables which total to 1.1GB of data, so this is in
> >>buffer, no significant "read" traffic.
> >>
> >>(1) with 16 tablespaces (1 per table) on 1 disk : 680.0 tps
> >>per second avg, stddev [ min q1 median d3 max ] <=300tps
> >>679.6 ± 750.4 [0.0, 317.0, 371.0, 438.5, 2724.0] 19.5%
> >>
> >>(2) with 1 tablespace on 1 disk : 956.0 tps
> >>per second avg, stddev [ min q1 median d3 max ] <=300tps
> >>956.2 ± 796.5 [3.0, 488.0, 583.0, 742.0, 2774.0] 2.1%
> >
> >Interesting. That doesn't reflect my own tests, even on rotating media,
> >at all. I wonder if it's related to:
> >https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=23d0127096cb91cb6d354bdc71bd88a7bae3a1d5
> >
> >If you use your 12.04 kernel, that'd not be fixed. Which might be a
> >reason to do it as you suggest.
> >
> >Could you share the exact details of that workload?
> 
> See attached scripts (sh to create the 16 tables in the default or 16 table
> spaces, small sql bench script, stat computation script).
> 
> The per-second stats were computed with:
> 
>   grep progress: pgbench.out | cut -d' ' -f4 | avg.py --length=1000 
> --limit=300
> 
> Host is 8 cpu 16 GB, 2 HDD in RAID 1.

Well, that's not a particularly meaningful workload. You increased the
number of flushed to the same number of disks considerably. For a
meaningful comparison you'd have to compare using one writeback context
for N tablespaces on N separate disks/raids, and using N writeback
contexts for the same.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-03-07 Thread Andres Freund

On 2016-03-07 09:41:51 -0800, Andres Freund wrote:
> > Due to the difference in amount of RAM, each machine used different scales -
> > the goal is to have small, ~50% RAM, >200% RAM sizes:
> > 
> > 1) Xeon: 100, 400, 6000
> > 2) i5: 50, 200, 3000
> > 
> > The commits actually tested are
> > 
> >cfafd8be  (right before the first patch)
> >7975c5e0  Allow the WAL writer to flush WAL at a reduced rate.
> >db76b1ef  Allow SetHintBits() to succeed if the buffer's LSN ...
> 
> Huh, now I'm a bit confused. These are the commits you tested? Those
> aren't the ones doing sorting and flushing?

To clarify: The reason we'd not expect to see much difference here is
that the above commits really only have any affect above noise if you
use synchronous_commit=off. Without async commit it's just one
additional gettimeofday() call and a few additional branches in the wal
writer every wal_writer_delay.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-03-07 Thread Andres Freund

On 2016-03-01 16:06:47 +0100, Tomas Vondra wrote:
> 1) HP DL380 G5 (old rack server)
> - 2x Xeon E5450, 16GB RAM (8 cores)
> - 4x 10k SAS drives in RAID-10 on H400 controller (with BBWC)
> - RedHat 6
> - shared_buffers = 4GB
> - min_wal_size = 2GB
> - max_wal_size = 6GB
> 
> 2) workstation with i5 CPU
> - 1x i5-2500k, 8GB RAM
> - 6x Intel S3700 100GB (in RAID0 for this benchmark)
> - Gentoo
> - shared_buffers = 2GB
> - min_wal_size = 1GB
> - max_wal_size = 8GB


Thinking about with that hardware I'm not suprised if you're only seing
small benefits. The amount of ram limits the amount of dirty data; and
you have plenty have on-storage buffering in comparison to that.


> Both machines were using the same kernel version 4.4.2 and default io
> scheduler (cfq). The
> 
> The test procedure was quite simple - pgbench with three different scales,
> for each scale three runs, 1h per run (and 30 minutes of warmup before each
> run).
> 
> Due to the difference in amount of RAM, each machine used different scales -
> the goal is to have small, ~50% RAM, >200% RAM sizes:
> 
> 1) Xeon: 100, 400, 6000
> 2) i5: 50, 200, 3000
> 
> The commits actually tested are
> 
>cfafd8be  (right before the first patch)
>7975c5e0  Allow the WAL writer to flush WAL at a reduced rate.
>db76b1ef  Allow SetHintBits() to succeed if the buffer's LSN ...

Huh, now I'm a bit confused. These are the commits you tested? Those
aren't the ones doing sorting and flushing?


> Also, I really wonder what will happen with non-default io schedulers. I
> believe all the testing so far was done with cfq, so what happens on
> machines that use e.g. "deadline" (as many DB machines actually do)?

deadline and noop showed slightly bigger benefits in my testing.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-07 Thread Masahiko Sawada

On Sat, Mar 5, 2016 at 11:25 PM, Masahiko Sawada  wrote:
> On Sat, Mar 5, 2016 at 1:25 AM, Robert Haas  wrote:
>> On Wed, Mar 2, 2016 at 6:41 PM, Tom Lane  wrote:
>>> Jim Nasby  writes:
 On 3/2/16 4:21 PM, Peter Geoghegan wrote:
> I think you should commit this. The chances of anyone other than you
> and Masahiko recalling that you developed this tool in 3 years is
> essentially nil. I think that the cost of committing a developer-level
> debugging tool like this is very low. Modules like pg_freespacemap
> currently already have no chance of being of use to ordinary users.
> All you need to do is restrict the functions to throw an error when
> called by non-superusers, out of caution.
>
> It's a problem that modules like pg_stat_statements and
> pg_freespacemap are currently lumped together in the documentation,
> but we all know that.
>>>
 +1.
>>>
>>> Would it make any sense to stick it under src/test/modules/ instead of
>>> contrib/ ?  That would help make it clear that it's a debugging tool
>>> and not something we expect end users to use.
>>
>> I actually think end-users might well want to use it.  Also, I created
>> it by hacking up pg_freespacemap, so it may make sense to have it in
>> the same place.
>> I would also be tempted to add an additional C functions that scan the
>> entire visibility map and return counts of the total number of bits of
>> each type that are set, and similarly for the page level bits.
>> Presumably that would be much faster than
>
> +1.
>
>>
>> I am also tempted to change the API to be a bit more friendly,
>> although I am not sure exactly how.  This was a quick and dirty hack
>> so that I could test, but the hardest thing about making it not a
>> quick and dirty hack is probably deciding on a good UI.
>>
>
> Does it mean visibility map API in visibilitymap.c?
>

Attached latest version optimisation patch.
I'm still consider regarding pg_upgrade regression test code, so I
will submit that patch later.

Regards,

--
Masahiko Sawada


000_optimize_vacuum_using_freezemap_v37.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-07 Thread Robert Haas

On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita
 wrote:
> Another option to avoid such a hazard would be to remove the two changes
> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in
> the pushdown case.  I made the changes because we won't use ExecAuxRowMarks
> in that case since we don't need to do EvalPlanQual rechecks and because we
> won't use junk filters in that case since we do UPDATE/DELETE in the
> subplan.  But the creating cost is enough small, so simply removing the
> changes seems like a good idea.

Sure, that works.

>> This issue crops up elsewhere as well.  The changes to
>> ExecModifyTable() have the same problem -- in that case, it might be
>> wise to move the code that's going to have to be indented yet another
>> level into a separate function.   That code also says this:
>>
>> +   /* No need to provide scan tuple to
>> ExecProcessReturning. */
>> +   slot = ExecProcessReturning(resultRelInfo,
>> NULL, planSlot);
>>
>> ...but, uh, why not?  The comment says what the code does, but what it
>> should do is explain why it does it.
>
> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that
> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to
> get actually inserted/updated/deleted tuples and make those tuples available
> to the ExecProcessReturning.  I'll add comments.

Comments are good things to have.  :-)

>> On a broader level, I'm not very happy with the naming this patch
>> uses.  Here's an example:
>>
>> +
>> + If an FDW supports optimizing foreign table updates, it still needs
>> to
>> + provide PlanDMLPushdown, BeginDMLPushdown,
>> + IterateDMLPushdown and EndDMLPushdown
>> + described below.
>> +
>>
>> "Optimizing foreign table updates" is both inaccurate (since it
>> doesn't only optimize updates) and so vague as to be meaningless
>> unless you already know what it means.  The actual patch uses
>> terminology like "fdwPushdowns" which is just as bad.  We might push a
>> lot of things to the foreign side -- sorts, joins, aggregates, limits
>> -- and this is just one of them.  Worse, "pushdown" is itself
>> something of a term of art - will people who haven't been following
>> all of the mammoth, multi-hundred-email threads on this topic know
>> what that means?  I think we need some better terminology here.
>>
>> The best thing that I can come up with offhand is "bulk modify".  So
>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
>> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
>> ResultRelInfo flag could be ri_usesFDWBulkModify.
>
> I'm not sure that "bulk modify" is best.  Yeah, this would improve the
> performance especially in the bulk-modification case, but would improve the
> performance even in the case where an UPDATE/DELETE modifies just a single
> row.  Let me explain using an example.  Without the patch, we have the
> following plan for an UPDATE on a foreign table that updates a single row:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
> QUERY PLAN
> --
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
>->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
>  Output: (a + 1), b, ctid
>  Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR
> UPDATE
> (5 rows)
>
> The plan requires two queries, SELECT and UPDATE, to do the update.
> (Actually, the plan have additional overheads in creating a cursor for the
> SELECT and establishing a prepared statement for the UPDATE.)  But with the
> patch, we have:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
> QUERY PLAN
> ---
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>  Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
> (3 rows)
>
> The optimized plan requires just a single UPDATE query to do that!  So, even
> in the single-row-modification case the patch could improve the performance.
>
> How about "Direct Modify"; PlanDirectModify, BeginDirectModify,
> IterateDirectModify, EndDirectModify, ExplainDirectModify, and
> ri_usesFDWDirectModify.

Works for me!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-07 Thread Robert Haas

On Sun, Mar 6, 2016 at 11:24 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 4 Mar 2016 12:30:17 -0500, Robert Haas  wrote 
> in 
>> >>> I committed this and back-patched this but (1) I avoided changing the
>> >>> other functions for now and (2) I gave both the byte length and the
>> >>> character length new names to avoid confusion.
>> >>
>> >> These tweaks appear to have been universally disliked by buildfarm
>> >> members.
>> >
>> > Crap.  Wasn't careful enough, sorry.  Will fix shortly.
>>
>> Fix pushed.
>
> Thank you for committing this. I can see only one commit for this
> in the repository but I believe it is the fixed one.

It was OK in master, but the back-branches had problems.  See
369c0b09080812943a2efcebe91cf4b271bc4f86.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Robert Haas

On Mon, Mar 7, 2016 at 11:09 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> The currently-committed code generates paths where nested loops and
>> hash joins get pushed beneath the Gather node, but does not generate
>> paths where merge joins have been pushed beneath the Gather node.  And
>> the reason I didn't try to generate those paths is because I believe
>> they will almost always suck.
>
> That's a perfectly reasonable engineering judgment (and especially so
> for a first release).  What I'd really like to see documented is how
> that conclusion is related, or not, to the rules about how path nodes
> should be decorated with parallel_safe, parallel_degree, etc annotations.
> The existing documentation is barely adequate to explain what those fields
> mean for primitive scan nodes; it's impossible for anyone but you to
> know what they are supposed to mean for joins and higher-level nodes.

It is unrelated, I think.

If a path is parallel_safe, that is supposed to mean that, in theory,
the plan generated from that path could be executed within a worker
without crashing the server, giving wrong answers, or otherwise
destroying the world.  However, as an optimization, if we've already
decided that the query can't ever be parallelized at all, for example
because it contains write operations, we don't bother trying to set
the parallel_safe flags correctly; they're just all false.  Generally,
a path is definitely not parallel_safe if it contains a path that is
not parallel_safe; if all of the paths under it are parallel_safe,
then it is also parallel_safe except when there's some unsafe
computation added at the new level -- like an unsafe join qual between
two safe relations.

If a path is parallel_aware, that means that the plan generated by
that path wants to do something different when run in parallel mode.
Presumably, the difference will be that the plan will establish some
shared state in the dynamic shared memory segment created to service
that parallel query.  For example, a sequential scan can be
parallel_aware, which will allow that sequential scan to be
simultaneously executed in multiple processes and return only a subset
of the rows in each.  A non-parallel_aware sequential scan can still
be used in parallel mode; for example, consider this:

Gather
-> Hash Join
  -> Parallel Seq Scan
  -> Hash
-> Seq Scan

The outer seq scan needs to return each row only once across all
workers, but the inner seq scan needs to return every row in every
worker.  Therefore, the outer seq scan is flagged parallel_aware and
displays in the EXPLAIN output as "Parallel Seq Scan", while the inner
one is not and does not.

parallel_degree is a horrible kludge whose function is to communicate
to the Gather node the number of workers for which it should budget.
Currently, every parallel plan's leftmost descendent will be a
Parallel Seq Scan, and that Parallel Seq Scan will estimate the degree
of parallelism that makes sense using a simplistic, bone-headed
algorithm based on the size of the table.  That then bubbles up the
plan tree to the Gather node, which adopts the Parallel Seq Scan's
suggestion.  I really hope this is going to go away eventually and be
replaced by something better.  Really, I think we should try to figure
out the amount of parallelizable work (CPU, and effective I/O
parallelism) that is going to be required per leftmost tuple and
compare that to the amount of non-parallelizable work (presumably, the
reset of the I/O cost) and use that to judge the optimal parallel
degree.  But I think that's going to take a lot of work to get right,
and it ties into some other issues, like the fact that we estimate a
scan of a 1MB table to have the same cost per page as a scan of a 10TB
table even though the former should probably be assumed to be fully
cached and the latter should probably be assumed not to be cached at
all.  I think a lot more thought is needed here than I've given it
thus far, and one of the things that I'm hoping is that people will
test parallel query and actually report the results so that we can
accumulate some data on which problems are most important to go fix
and, also, what the shape of those fixes might look like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-07 Thread Jeff Janes

On Mon, Mar 7, 2016 at 3:17 AM, Shulgin, Oleksandr
 wrote:
> On Fri, Mar 4, 2016 at 7:27 PM, Robert Haas  wrote:
>>
>> On Thu, Mar 3, 2016 at 2:48 AM, Shulgin, Oleksandr
>>  wrote:
>> > On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera
>> > 
>> > wrote:
>> >> Shulgin, Oleksandr wrote:
>> >>
>> >> > Alright.  I'm attaching the latest version of this patch split in two
>> >> > parts: the first one is NULLs-related bugfix and the second is the
>> >> > "improvement" part, which applies on top of the first one.
>> >>
>> >> So is this null-related bugfix supposed to be backpatched?  (I assume
>> >> it's not because it's very likely to change existing plans).
>> >
>> > For the good, because cardinality estimations will be more accurate in
>> > these
>> > cases, so yes I would expect it to be back-patchable.
>>
>> -1.  I think the cost of changing existing query plans in back
>> branches is too high.  The people who get a better plan never thank
>> us, but the people who (by bad luck) get a worse plan always complain.
>
>
> They might get that different plan when they upgrade to the latest major
> version anyway.  Is it set somewhere that minor version upgrades should
> never affect the planner?  I doubt so.

People with meticulous standards are expected to re-validate their
application, including plans and performance, before doing major
version updates into production. They can continue to use a *fully
patched* server from a previous major release while they do that.

This is not the case for minor version updates.  We do not want to put
people in the position where getting a security or corruption-risk
update forces them to also accept changes which may destroy the
performance of their system.

I don't know if it is set out somewhere else, but there are many
examples in this list of us declining to back-patch performance bug
fixes which might negatively impact some users.  The only times we
have done it that I can think of are when there is almost no
conceivable way it could have a meaningful negative effect, or if the
bug was tied in with security or stability bugs that needed to be
fixed anyway and couldn't be separated.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Greg Stark

On Mon, Mar 7, 2016 at 3:37 PM, Robert Haas  wrote:
>
> The currently-committed code generates paths where nested loops and
> hash joins get pushed beneath the Gather node, but does not generate
> paths where merge joins have been pushed beneath the Gather node.  And
> the reason I didn't try to generate those paths is because I believe
> they will almost always suck.  As of now, what we know how to do is
> build a partial path for a join by joining a partial path for the
> outer input rel against an ordinary path for the inner rel.  That
> means that the work of generating the inner rel has to be redone in
> each worker.  That's not a problem if we've got something like a
> nested loop with a parameterized inner index scan, because that sort
> of plan redoes all the work for every row anyway.  It is a problem for
> a hash join, but it's not too hard for it to be worthwhile anyway if
> the build table is small.  For a merge join, though, it seems rather
> unpromising.  It's really doubtful that we want each worker to
> independently sort the inner rel and then have them join their own
> subset of the outer rel against their own copy of the sort.  *Maybe*
> it could win if the inner path is an index scan, but I wasn't really
> sure that would come up and be a win often enough to be worth the cost
> of generating the path.  We tend to only use merge joins when both of
> the relations involved are large, and index-scanning a large relation
> tends to lose to sorting it.  So it just seemed like a dead end.

This is the first message on this subthread that actually gave me a
feeling I understood the issue under discussion. It explains the
distinction between plans that are parallel-safe and plans that would
actually do something different under a parallel worker

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Tom Lane

Robert Haas  writes:
> The currently-committed code generates paths where nested loops and
> hash joins get pushed beneath the Gather node, but does not generate
> paths where merge joins have been pushed beneath the Gather node.  And
> the reason I didn't try to generate those paths is because I believe
> they will almost always suck.

That's a perfectly reasonable engineering judgment (and especially so
for a first release).  What I'd really like to see documented is how
that conclusion is related, or not, to the rules about how path nodes
should be decorated with parallel_safe, parallel_degree, etc annotations.
The existing documentation is barely adequate to explain what those fields
mean for primitive scan nodes; it's impossible for anyone but you to
know what they are supposed to mean for joins and higher-level nodes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Badly designed error reporting code in controldata_utils.c

2016-03-07 Thread Joe Conway

On 03/06/2016 07:34 PM, Tom Lane wrote:
> Joe Conway  writes:
>> On 03/06/2016 05:47 PM, Tom Lane wrote:
>>> That's much better, but is there a reason you're using exit(2)
>>> and not exit(EXIT_FAILURE) ?
> 
>> Only because I was trying to stick with what was originally in
>> src/bin/pg_controldata/pg_controldata.c
> 
> Meh.  It looks to me like the standard way to handle this
> for code in src/common/ is exit(EXIT_FAILURE).

I have no issue with using EXIT_FAILURE, but note:

1) That will change the exit error from the previous value of 2 to 1 for
   pg_controldata

2) There are many examples in other parts of the source that do not use
   EXIT_FAILURE, and even in src/common:

8<-
grep -rnE "exit\(EXIT_FAILURE\)" src/common/* --include=*.c
src/common/fe_memutils.c:36:exit(EXIT_FAILURE);
src/common/fe_memutils.c:76:exit(EXIT_FAILURE);
src/common/fe_memutils.c:93:exit(EXIT_FAILURE);
src/common/fe_memutils.c:99:exit(EXIT_FAILURE);
src/common/psprintf.c:135:  exit(EXIT_FAILURE);
src/common/psprintf.c:182:  exit(EXIT_FAILURE);

grep -rnE "exit\((1|2)\)" src/common/* --include=*.c
src/common/restricted_token.c:187:  exit(1);
src/common/username.c:86:   exit(1);
8<-

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Robert Haas

On Sun, Mar 6, 2016 at 10:32 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Sat, Mar 5, 2016 at 10:11 PM, Tom Lane  wrote:
>>> Is there some reason why hash and nestloop are safe but merge isn't?
>
>> I think it is because we consider to pushdown hash and nestloop to workers,
>> but not merge join and the reason for not pushing mergejoin is that
>> currently we don't have executor support for same, more work is needed
>> there.
>
> If that's true, then mergejoin paths ought to be marked parallel-unsafe
> explicitly (with a comment as to why), not just silently reduced to degree
> zero in a manner that looks more like an oversight than anything
> intentional.
>
> I also note that the regression tests pass with this patch and parallel
> mode forced, which seems unlikely if allowing a parallel worker to execute
> a join works for only two out of the three join types.  And checking the
> git history for nodeHashjoin.c, nodeHash.c, and nodeNestloop.c shows no
> evidence that any of those files have been touched for parallel query,
> so it's pretty hard to see a reason why those would work in parallel
> queries but nodeMergejoin.c not.
>
> I still say the code as it stands is merely a copy-and-pasteo.

I might call it a thinko rather than a copy-and-pasteo, but basically,
you are right and Amit is wrong.  I feel confident making that
statement because I wrote the code, so I think I'm well-positioned to
judge whether I did a particular thing on purpose or not.

The currently-committed code generates paths where nested loops and
hash joins get pushed beneath the Gather node, but does not generate
paths where merge joins have been pushed beneath the Gather node.  And
the reason I didn't try to generate those paths is because I believe
they will almost always suck.  As of now, what we know how to do is
build a partial path for a join by joining a partial path for the
outer input rel against an ordinary path for the inner rel.  That
means that the work of generating the inner rel has to be redone in
each worker.  That's not a problem if we've got something like a
nested loop with a parameterized inner index scan, because that sort
of plan redoes all the work for every row anyway.  It is a problem for
a hash join, but it's not too hard for it to be worthwhile anyway if
the build table is small.  For a merge join, though, it seems rather
unpromising.  It's really doubtful that we want each worker to
independently sort the inner rel and then have them join their own
subset of the outer rel against their own copy of the sort.  *Maybe*
it could win if the inner path is an index scan, but I wasn't really
sure that would come up and be a win often enough to be worth the cost
of generating the path.  We tend to only use merge joins when both of
the relations involved are large, and index-scanning a large relation
tends to lose to sorting it.  So it just seemed like a dead end.

Now, if somebody comes along with a patch to create partial merge join
paths and shows that it improves performance on some class of queries
I haven't thought about, I am not going to complain.  And in the long
run, I would like to have a facility to partition both relations on
the fly and then have the workers do a merge join per partition.
That's one of the two standard algorithms in the literature for
parallel join - hash join is the other.  I'm quite certain that's an
important piece of technology to develop, but it's a huge project unto
itself.  My priority for 9.6 is to have a user-visible feature that
takes the infrastructure that we already have as far as it reasonably
can go.  Building new infrastructure will have to wait for a future
release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-07 Thread Robert Haas

On Sat, Mar 5, 2016 at 7:22 AM, Dilip Kumar  wrote:
> On Wed, Mar 2, 2016 at 11:05 AM, Dilip Kumar  wrote:
>> And this latest result (no regression) is on X86 but on my local machine.
>>
>> I did not exactly saw what this new version of patch is doing different,
>> so I will test this version in other machines also and see the results.
>
>
> I tested this on PPC again, This time in various order (sometime patch first
> and then base first).
>  I tested with latest patch pinunpin-cas-2.patch on Power8.
>
> Shared Buffer = 8GB
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> BASE
> -
> Clientsrun1run2run3
> 1   212001875420537
> 2   403313952038746
>
>
> Patch
> -
> Clientsrun1run2run3
> 1   202251980619778
> 2   398304189836620
>
> I think, here we can not see any regression, (If I take median then it may
> looks low with patch so posting all 3 reading).

If the median looks low, how is that not a regression?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Kevin Grittner

On Mon, Mar 7, 2016 at 6:13 AM, Craig Ringer  wrote:
> On 5 March 2016 at 23:41, Kevin Grittner  wrote:

>> The only place you *need* to vary from commit order for correctness
>> is when there are overlapping SERIALIZABLE transactions, one
>> modifies data and commits, and another reads the old version of the
>> data but commits later.
>
> Ah, right. So here, even though X1 commits before X2 running concurrently
> under SSI, the logical order in which the xacts could've occurred serially
> is that where xact 2 runs and commits before X1, since xact 2 doesn't depend
> on xact 1. X2 read the old row version before xact 1 modified it, and
> logically occurs before xact1 in the serial rearrangement.

Right, because X2 is *seeing* data in a state that existed before X1 ran.

> I don't fully grasp how that can lead to a situation where xacts can commit
> in an order that's valid upstream but not valid as a downstream apply order.

With SSI, it can matter whether an intermediate state is *read*.

> I presume we're looking at read-only logical replicas here (rather than
> multimaster),

I have not worked out how this works with MMR.  I'm not sure that
there is one clear answer to that.

> and it's only a concern for SERIALIZABLE xacts since a READ
> COMMITTED xact on the master and replica would both be able to see the state
> where X1 is commited but X2 isn't yet.

REPEATABLE READ would allow the anomaly to be seen, too, if a
transaction acquired its snapshot between the two commits.

> But I don't see how a read-only xact
> in SERIALIZABLE on the replica can get different results to what it'd get
> with SSI on the master. It's entirely possible for a read xact on the master
> to get a snapshot after X1 commits and after X2 commits, same as READ
> COMMITTED. SSI shouldn't AFAIK come into play with no writes to create a
> pivot. Is that wrong?

As mentioned earlier in this thread, look at the examples in this
section of the Wiki page, and imagine that the READ ONLY
transaction involved did *not* run on the primary, but *did* run on
the replica:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

> If we applied this sequence to the downstream in commit order we'd still get
> correct results on the heap after applying both.

... eventually.

> We'd have an intermediate
> state where X1 is commited but X2 isn't, but we can have the same on the
> master. SSI doesn't AFAIK mask X1 from becoming visible in a snapshot until
> X2 commits or anything, right?

If that intermediate state is *seen* on the master, a transaction
is rolled back.

>> The key is that
>> there is a read-write dependency (a/k/a rw-conflict) between the
>> two transactions which tells you that the second to commit has to
>> come before the first in any graph of apparent order of execution.
>
> Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from
> getting a snapshot between the two commits and reading from there?

Serializable Snapshot Isolation doesn't generally block anything
that REPEATABLE READ (which is straight Snapshot Isolation) doesn't
block -- unless you explicitly request READ ONLY DEFERRABLE.  What
is does is monitor for situations that can present anomalies and
rolls back transactions as necessary to prevent anomalies in
successfully committed transactions.  We tried very hard to avoid
rolling back a transaction that could fail a second time on
conflict the same set of transactions, although there were some
corner cases where it could not be avoided when a transaction was
PREPARED and not yet committed.  Another possibly useful fact is
that we were able to guarantee that whenever there was a rollback,
some SERIALIZABLE transaction which overlaps the one being rolled
back has modified data and successfully committed -- ensuring that
there is some forward progress even in worst case situations.

>> The tricky part is that when there are two overlapping SERIALIZABLE
>> transactions and one of them has modified data and committed, and
>> there is an overlapping SERIALIZABLE transaction which is not READ
>> ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
>> correct ordering remains in doubt -- there is no way to know which
>> might need to commit first, or whether it even matters.  I am
>> skeptical about whether in logical replication (including MMR), it
>> is going to be possible to manage this by finding "safe snapshots".
>> The only alternative I can see, though, is to suspend replication
>> while correct transaction ordering remains in doubt.  A big READ
>> ONLY transaction would not cause a replication stall, but a big
>> READ WRITE transaction could cause an indefinite stall.  Simon
>> seemed to be saying that this is unacceptable, but I tend to think
>> it is a viable approach for some workloads, especially if the READ
>> ONLY transaction property is used when possible.
>
> We already have huge replication stalls when big write xacts occur. We

Re: [HACKERS] Splitting lengthy sgml files

2016-03-07 Thread Tom Lane

Tatsuo Ishii  writes:
> There are very lengthy (over 10k lines, for example) SGML files in
> docs. While working on translating docs using GitHub, I noticed that
> sometimes diffs are not showed in pull requests due to the limitation
> of GitHub, which makes me pretty difficult to review PR. Any chance to
> split those lengthy SGML files into smaller SGML files?

Surely that's a github bug that you should be complaining to them about?

I'm disinclined to split existing files because (a) it would complicate
back-patching and (b) it would be completely destructive to git history.
git claims to understand about file moves but it doesn't do a terribly
good job with that history-wise (try git log or git blame on
recently-moved files such as pgbench).  And I've never heard even
a claim that it understands splits.

There might be reasons to override those disadvantages and do it
anyway ... but this doesn't sound like a very good reason.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Robert Haas

On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer  wrote:
> If FDW-based sharding works, I'm happy enough, I have no horse in this race.
> If it doesn't work I don't much care either. What I'm worried about is it if
> works like partitioning using inheritance works - horribly badly, but just
> well enough that it's served as an effective barrier to doing anything
> better.
>
> That's what I want to prevent. Sharding that only-just-works and then stops
> us getting anything better into core.

That's a reasonable worry.  Thanks for articulating it so clearly.
I've thought about that issue and I admit it's both real and serious,
but I've sort of taken the attitude of saying, well, I don't know how
to solve that problem, but there's so much other important work that
needs to be done before we get to the point where that's the blocker
that solving that problem doesn't seem like the most important thing
right now.

The sharding discussion we had in Vienna convinced me that, in the
long run, having PostgreSQL servers talk to other PostgreSQL servers
only using SQL is not going to be a winner.  I believe Postgres-XL has
already done something about that; I think it is passing plans around
directly.  So you could look at that and say - ha, the FDW approach is
a dead end!  But from my point of view, the important thing about the
FDW interface is that it provides a pluggable interface to the
planner.  We can now push down joins and sorts; hopefully soon we will
be able to push down aggregates and limits and so on.  That's the hard
part.  The deparsing code that turns the plan we want to execute in to
an SQL query that can be shipped over the wire is a detail.
Serializing some other on-the-wire representation of what we want the
remote side to do is small potatoes compared to having all of the
logic that lets you decide, in the first instance, what you want the
remote side to do.  I can imagine, in the long term, adding a new
sub-protocol (probably mediated via COPY BOTH) that uses a different
and more expressive on-the-wire representation.

Another foreseeable problem with the FDW approach is that you might
want to have a hash-partitioned table where there are multiple copies
of each piece data and they are spread out across the shards and you
can add and remove shards and the data automatically rebalances.
Table inheritance (or table partitioning) + postgres_fdw doesn't sound
so great in this situation because when you rebalance you need to
change the partitioning constraints and that requires a full table
lock on every node and the whole thing seems likely to end up being
somewhat annoyingly manual and overly constrained by locking.  But I'd
say two things about that.  The first is that I honestly think that
this would be a pretty nice problem to have.  If we had things working
well enough that this was the kind of problem we were trying to
tackle, we'd be light-years ahead of where we are today.  Sure,
everybody hates table inheritance, but I don't think it's right to say
that partitioning work is blocked because table inheritance exists: I
think the problem is that getting true table partitioning correct is
*hard*.  And Amit Langote is working on that and hopefully we will get
there, but it's not an easy problem.  I don't think sharding is an
easy problem either, and I think getting to a point where ease-of-use
is our big limiting factor would actually be better than the current
scenario where "it doesn't work at all" is the limiting factor.  I
don't want that to *block* other approaches, BUT I also think that
anybody who tries to start over from scratch and ignore all the good
work that has been done in FDW-land is not going to have a very fun
time.

The second thing I want to say about this problem is that I don't want
to presume that it's not a *solvable* problem.  Just because we use
the FDW technology as a base doesn't mean we can't invent new and
quite different stuff along the way.  One idea I've been toying with
is trying to create some notion of a "distributed" table.  This would
be a new relkind.  You'd have a single relation at the SQL level, not
an inheritance hierarchy, but under the hood the data would be spread
across a bunch of remote servers using the FDW interface.  So then you
reuse all of the query planner work and other enhancements that have
been put into the FDW stuff, but you'd present a much cleaner user
interface.  Or, maybe better, you could create a new FDW,
sharding_fdw, that works like postgres_fdw except that instead of
putting the data on one particular foreign server, it spreads the data
out across multiple servers and manages the sharding process under the
hood.  That would, again, let you reuse a lot of the work that's been
done to improve the FDW infrastructure while creating something
significantly more powerful than what postgres_fdw is today.  I don't
know, I don't have any ideas about this.  I think your concern is
valid, and I share it.  But I just

Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-03-07 Thread Robert Haas

On Fri, Mar 4, 2016 at 9:29 PM, Regina Obe  wrote:
> I think the answer to this question is NO, but thought I'd ask.
>
> A lot of folks in PostGIS land are suffering from restore issues,
> materialized view issues etc. because we have functions such as
>
> ST_Intersects
>
> Which does _ST_Intersects  AND &&
>
> Since _ST_Intersects is not schema qualified, during database restore (which
> sets the schema to the table or view schema), materialized views that depend
> on this do not come back.

Could you provide a self-contained, reproducible test case that
illustrates this problem?  Ideally, one that doesn't involve
installing PostGIS?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-03-07 Thread Robert Haas

On Fri, Mar 4, 2016 at 2:37 PM, Andres Freund  wrote:
> On 2016-02-12 15:56:45 +0100, Andres Freund wrote:
>> Hi,
>>
>>
>> On 2016-02-10 23:26:20 -0500, Robert Haas wrote:
>> > I think the part about whacking around the FDW API is a little more
>> > potentially objectionable to others, so I want to hold off doing that
>> > unless a few more people chime in with +1.  Perhaps we could start a
>> > new thread to talk about that specific idea.  This is useful even
>> > without that, though.
>>
>> FWIW, I can delete a couple hundred lines of code from citusdb thanks to
>> this...
>
> And I'm now working on doing that.
>
>
>> why exactly did you expose read/writeBitmapset(), and nothing else?
>> Afaics a lot of the other read routines are also pretty necessary from
>> the outside?
>
> I'd like to also expose at least outDatum()/readDatum() - they're not
> entirely trivial, so it'd be sad to copy them.
>
>
> What I'm wondering about right now is how an extensible node should
> implement the equivalent of
> #define WRITE_NODE_FIELD(fldname) \
> (appendStringInfo(str, " :" CppAsString(fldname) " "), \
>  _outNode(str, node->fldname))
>
> given that _outNode isn't public, that seems to imply having to do
> something like
>
> #define WRITE_NODE_FIELD(fldname) \
> (appendStringInfo(str, " :" CppAsString(fldname) " "), \
>  appendStringInfo(str, nodeToString(node->fldname)))
>
> i.e. essentially doubling memory overhead. Istm we should make
> outNode() externally visible?

I suggest that you write a patch to do whatever you think best and
commit it, with the understanding that future API stability isn't
guaranteed if we decide what you picked is not actually for the best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-07 Thread Robert Haas

On Fri, Mar 4, 2016 at 11:49 PM, Amit Kapila  wrote:
> I think one thing which needs more thoughts about this approach is that we
> need to maintain some number of slots so that group extend for different
> relations can happen in parallel.  Do we want to provide simultaneous
> extension for 1, 2, 3, 4, 5 or more number of relations?  I think providing
> it for three or four relations should be okay as higher the number we want
> to provide, bigger the size of PGPROC structure will be.

Hmm.  Can we drive this off of the heavyweight lock manager's idea of
how big the relation extension lock wait queue is, instead of adding
more stuff to PGPROC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-07 Thread Robert Haas

On Sun, Mar 6, 2016 at 11:02 PM, Kyotaro HORIGUCHI
 wrote:
> At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote  
> wrote in 
>> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
>> > So, I took the Vinayak's latest patch and rewrote it a little
>> ...
>> > I broke it into two:
>> >
>> > 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
>> > 0002-Implement-progress-reporting-for-VACUUM-command.patch
>>
>> Oops, unamended commit messages in those patches are misleading.  So,
>> please find attached corrected versions.
>
> The 0001-P.. adds the following interface functions.
>
> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> +extern void pgstat_progress_set_command_target(Oid objid);
> +extern void pgstat_progress_update_param(int index, uint32 val);
> +extern void pgstat_reset_local_progress(void);
> +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
>
> I don't like to treat the target object id differently from other
> parameters. It could not be needed at all, or could be needed two
> or more in contrast. Although oids are not guaranteed to fit
> uint32, we have already stored BlockNumber there.

Well...

There's not much point in deciding that the parameters are uint32,
because we don't have that type at the SQL level.
pgstat_progress_update_param() really ought to take either int32 or
int64 as an argument, because that's what we can actually handle from
SQL, and it seems pretty clear that int64 is better since otherwise we
can't fit, among other things, a block number.

Given that, I tend to think that treating the command target specially
and passing that as an OID is reasonable.  We're not going to be able
to pass variable-sized arrays through this mechanism, ever, because
our shared memory segment doesn't work like that.  And it seems to me
that nearly every command somebody might want to report progress on
will touch, basically, one relation a a time.  So I don't see the harm
in hardcoding that idea into the facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Splitting lengthy sgml files

2016-03-07 Thread Tatsuo Ishii

There are very lengthy (over 10k lines, for example) SGML files in
docs. While working on translating docs using GitHub, I noticed that
sometimes diffs are not showed in pull requests due to the limitation
of GitHub, which makes me pretty difficult to review PR. Any chance to
split those lengthy SGML files into smaller SGML files?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Tatsuo Ishii

> The killer there was that the plugin could only alter queries used by
> Wordpress core. Nobody just uses Wordpress core.  The whole reason
> Wordpress became popular is the vast collection of plugins, themes, etc.
> 90% of which are written by three stoned monkeys who once saw a PHP 4
> manual in a mirror, which is part of why it has such an appalling security
> history. Plugins can just talk straight to the DB, and are written by
> people who have never heard of parametrized queries or, half the time,
> transactions.

Well, I think one of the reasons is they only know MySQL. MySQL had
been lack of transactions and parametrized queries.

> What makes it popular and successful is also what makes supporting Pg in a
> way that'll actually see useful adoption hard.

Let's enlighten WordPress users by supporting PostgreSQL.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-07 Thread Michael Paquier

On Sun, Mar 6, 2016 at 5:55 AM, Andrew Dunstan  wrote:
> On 03/05/2016 01:31 PM, Michael Paquier wrote:
>> On Sat, Mar 5, 2016 at 11:34 PM, Andrew Dunstan 
>> wrote:
>>>
>>> Here is a translation into perl of the sed script, courtesy of the s2p
>>> incarnation of psed:
>>> 
>>> The sed script appears to have been stable for a long time, so I don't
>>> think
>>> we need to be too concerned about possibly maintaining two versions.
>>
>> That's 95% of the work already done, nice! If I finish wrapping up a
>> patch for this issue at least would you backpatch? It would be saner
>> to get rid of this dependency everywhere I think regarding compilation
>> with perl 5.22.
>
> Sure.

OK, so after some re-lecture of the script and perltidy-ing I finish
with the attached. How does that look?
-- 
Michael
diff --git a/src/backend/utils/Gen_dummy_probes.pl b/src/backend/utils/Gen_dummy_probes.pl
new file mode 100644
index 000..30c6d65
--- /dev/null
+++ b/src/backend/utils/Gen_dummy_probes.pl
@@ -0,0 +1,247 @@
+#! /usr/bin/perl -w
+#-
+#
+# Gen_dummy_probes.pl
+#Perl script that generates probes.h file when dtrace is not available
+#
+# Portions Copyright (c) 2008-2016, PostgreSQL Global Development Group
+#
+#
+# IDENTIFICATION
+#src/backend/utils/Gen_dummy_probes.pl
+#
+#-
+
+$0 =~ s/^.*?(\w+)[\.\w+]*$/$1/;
+
+use strict;
+use Symbol;
+use vars qw{ $isEOF $Hold %wFiles @Q $CondReg
+  $doAutoPrint $doOpenWrite $doPrint };
+$doAutoPrint = 1;
+$doOpenWrite = 1;
+
+# prototypes
+sub openARGV();
+sub getsARGV(;\$);
+sub eofARGV();
+sub printQ();
+
+# Run: the sed loop reading input and applying the script
+#
+sub Run()
+{
+	my ($h, $icnt, $s, $n);
+
+	# hack (not unbreakable :-/) to avoid // matching an empty string
+	my $z = "\000";
+	$z =~ /$z/;
+
+	# Initialize.
+	openARGV();
+	$Hold= '';
+	$CondReg = 0;
+	$doPrint = $doAutoPrint;
+  CYCLE:
+	while (getsARGV())
+	{
+		chomp();
+		$CondReg = 0;# cleared on t
+	  BOS:;
+
+		# /^[ 	]*probe /!d
+		unless (m /^[ \t]*probe /s)
+		{
+			$doPrint = 0;
+			goto EOS;
+		}
+
+		# s/^[ 	]*probe \([^(]*\)\(.*\);/\1\2/
+		{
+			$s = s /^[ \t]*probe ([^(]*)(.*);/${1}${2}/s;
+			$CondReg ||= $s;
+		}
+
+		# s/__/_/g
+		{
+			$s = s /__/_/sg;
+			$CondReg ||= $s;
+		}
+
+		# y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
+		{ y{abcdefghijklmnopqrstuvwxyz}{ABCDEFGHIJKLMNOPQRSTUVWXYZ}; }
+
+		# s/^/#define TRACE_POSTGRESQL_/
+		{
+			$s = s /^/#define TRACE_POSTGRESQL_/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\})/(INT1)/
+		{
+			$s = s /\([^,)]+\)/(INT1)/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2)/
+		{
+			$s = s /\([^,)]+, [^,)]+\)/(INT1, INT2)/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3)/
+		{
+			$s = s /\([^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4, INT5)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4, INT5, INT6)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6, INT7)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4, INT5, INT6, INT7)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6, INT7, INT8)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4, INT5, INT6, INT7, INT8)/s;
+			$CondReg ||= $s;
+		}
+
+		# P
+		{
+			if (/^(.*)/) { print $1, "\n"; }
+		}
+
+		# s/(.*$/_ENABLED() (0)/
+		{
+			$s = s /\(.*$/_ENABLED() (0)/s;
+			$CondReg ||= $s;
+		}
+	  EOS: if ($doPrint)
+		{
+			print $_, "\n";
+		}
+		else
+		{
+			$doPrint = $doAutoPrint;
+		}
+		printQ() if @Q;
+	}
+
+	exit(0);
+}
+Run();
+
+# openARGV: open 1st input file
+#
+sub openARGV()
+{
+	unshift(@ARGV, '-') unless @ARGV;
+	my $file = shift(@ARGV);
+	open(ARG, "<$file")
+	  || die("$0: can't open $file for reading ($!)\n");
+	$isEOF = 0;
+}
+
+# getsARGV: Read another input line into argument (default: $_).
+#

Re: [HACKERS] (pgaudit) Audit log is not output after the SET ROLE.

2016-03-07 Thread David Steele

On 3/7/16 4:39 AM, Toshi Harada wrote:
> 
> I am testing the pgaudit(https://commitfest.postgresql.org/9/463/).
> (use "http://www.postgresql.org/message-id/56b0101b.6070...@pgmasters.net; 
> attached patch on 9.6-devel)
> 
> I found strange thing.
> 
> - After SET ROLE, part of the SQL is not the audit log output.
> - SQL comprising a relation is not output to the audit log.
> 
> * Reproduce:
> ** prepare
> 
> createuser test_user -U postgres
> createdb test -U postgres -O test_user
> psql test -U test_user -c "CREATE TABLE team(id int, name text)"
> 
> ** pgaudit settings
> 
> shared_preload_libraries = 'pgaudit'
> pgaudit.log = 'all'

Both of these are in postgresql.conf?

> ** test sql script (test.sql)
> 
> SELECT 1;
> SELECT * FROM team; -- output audit log
> SET ROLE test_user;
> SELECT 2;
> SELECT * FROM team; -- no output audit log
> SELECT 3;
> RESET ROLE;
> SELECT * FROM team; -- output audit log
> 
> ** run script
> 
> psql test -U postgres -f test.sql
> 
> ** audit log
> 
> LOG:  AUDIT: SESSION,1,1,READ,SELECT,,,SELECT 1;,
> LOG:  AUDIT: SESSION,2,1,READ,SELECT,,,SELECT * FROM team;,
> LOG:  AUDIT: SESSION,3,1,MISC,SET,,,SET ROLE test_user;,
> LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,SELECT 2;,
> LOG:  AUDIT: SESSION,5,1,READ,SELECT,,,SELECT 3;,
> LOG:  AUDIT: SESSION,6,1,MISC,RESET,,,RESET ROLE;,
> LOG:  AUDIT: SESSION,7,1,READ,SELECT,,,SELECT * FROM team;,

Well, that definitely doesn't look right.

You may have noticed that the pgaudit patch is marked as "returned with
feedback" so it is closed for the current commitfest and will not be
included in 9.6.

I'll definitely look at this bug but I would ask that you resubmit it at
https://github.com/pgaudit/pgaudit/issues so we can continue the
conversation there.

Thanks!
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Fix handling of invalid sockets returned by PQsocket()

2016-03-07 Thread Michael Paquier

On Sun, Mar 6, 2016 at 12:52 PM, Alvaro Herrera
 wrote:
> Peter Eisentraut wrote:
>> On 2/17/16 10:52 PM, Michael Paquier wrote:
>> > On Thu, Feb 18, 2016 at 1:58 AM, Alvaro Herrera
>> >  wrote:
>> >> Michael Paquier wrote:
>> >>> Hi all,
>> >>>
>> >>> After looking at Alvaro's message mentioning the handling of
>> >>> PQsocket() for invalid sockets, I just had a look by curiosity at
>> >>> other calls of this routine, and found a couple of issues:
>> >>> 1) In vacuumdb.c, init_slot() does not check for the return value of 
>> >>> PQsocket():
>> >>> slot->sock = PQsocket(conn);
>> >>> 2) In isolationtester.c, try_complete_step() should do the same.
>> >>> 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same 
>> >>> problem.
>> >>> I guess those ones should be fixed as well, no?
>> >>
>> >> I patched pgbench to use PQerrorMessage rather than strerror(errno).  I
>> >> think your patch should do the same.
>> >
>> > OK, this looks like a good idea. I would suggest doing the same in
>> > receivelog.c then.
>>
>> Let's make the error messages consistent as "invalid socket".  "bad
>> socket" isn't really our style, and pg_basebackup saying "socket not
>> open" is just plain incorrect.
>
> You're completely right.  Let's patch pgbench that way too.

Here is v3 then, switching to "invalid socket" for those error
messages. There are two extra messages in fe-misc.c and
libpqwalreceiver.c that need a rewording that I have detected as well.
-- 
Michael
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index f670957..4ee4d71 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -331,7 +331,7 @@ libpq_select(int timeout_ms)
 	if (PQsocket(streamConn) < 0)
 		ereport(ERROR,
 (errcode_for_socket_access(),
- errmsg("socket not open")));
+ errmsg("invalid socket: %s", PQerrorMessage(streamConn;
 
 	/* We use poll(2) if available, otherwise select(2) */
 	{
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 832f9f9..6d12705 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -360,6 +360,14 @@ StreamLogicalLog(void)
 			struct timeval timeout;
 			struct timeval *timeoutptr = NULL;
 
+			if (PQsocket(conn) < 0)
+			{
+fprintf(stderr,
+		_("%s: invalid socket: %s"),
+		progname, PQerrorMessage(conn));
+goto error;
+			}
+
 			FD_ZERO(_mask);
 			FD_SET(PQsocket(conn), _mask);
 
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 6d7e635..01c42fc 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -956,7 +956,8 @@ CopyStreamPoll(PGconn *conn, long timeout_ms)
 
 	if (PQsocket(conn) < 0)
 	{
-		fprintf(stderr, _("%s: socket not open"), progname);
+		fprintf(stderr, _("%s: invalid socket: %s"), progname,
+PQerrorMessage(conn));
 		return -1;
 	}
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..92df750 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3797,7 +3797,7 @@ threadRun(void *arg)
 			sock = PQsocket(st->con);
 			if (sock < 0)
 			{
-fprintf(stderr, "bad socket: %s", PQerrorMessage(st->con));
+fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con));
 goto done;
 			}
 
@@ -3867,7 +3867,8 @@ threadRun(void *arg)
 
 if (sock < 0)
 {
-	fprintf(stderr, "bad socket: %s", PQerrorMessage(st->con));
+	fprintf(stderr, "invalid socket: %s",
+			PQerrorMessage(st->con));
 	goto done;
 }
 if (FD_ISSET(sock, _mask) ||
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index c6afcd5..b673be8 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -70,7 +70,7 @@ static void DisconnectDatabase(ParallelSlot *slot);
 
 static int	select_loop(int maxFd, fd_set *workerset, bool *aborting);
 
-static void init_slot(ParallelSlot *slot, PGconn *conn);
+static void init_slot(ParallelSlot *slot, PGconn *conn, const char *progname);
 
 static void help(const char *progname);
 
@@ -421,14 +421,14 @@ vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
 	 * array contains the connection.
 	 */
 	slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons);
-	init_slot(slots, conn);
+	init_slot(slots, conn, progname);
 	if (parallel)
 	{
 		for (i = 1; i < concurrentCons; i++)
 		{
 			conn = connectDatabase(dbname, host, port, username, prompt_password,
    progname, false, true);
-			init_slot(slots + i, conn);
+			init_slot(slots + i, conn, progname);
 		}
 	}
 
@@ -917,11 +917,18 @@ select_loop(int maxFd, fd_set *workerset, bool *aborting)
 }
 
 static void
-init_slot(ParallelSlot *slot, PGconn *conn)

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-07 Thread Etsuro Fujita


On 2016/03/05 5:45, Robert Haas wrote:

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.


Thank you for taking the time to review the patch!


I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+   resultRelInfo = mtstate->resultRelInfo;
 for (i = 0; i < nplans; i++)
 {
 ExecAuxRowMark *aerm;

+   /*
+* ignore subplan if the FDW pushes down the
command to the remote
+* server; the ModifyTable won't have anything
to do except for
+* evaluation of RETURNING expressions
+*/
+   if (resultRelInfo->ri_FdwPushdown)
+   {
+   resultRelInfo++;
+   continue;
+   }
+
 subplan = mtstate->mt_plans[i]->plan;
 aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
 mtstate->mt_arowmarks[i] =
lappend(mtstate->mt_arowmarks[i], aerm);
+   resultRelInfo++;
 }


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).


Another option to avoid such a hazard would be to remove the two changes 
from ExecInitModifyTable and create ExecAuxRowMarks and junk filters 
even in the pushdown case.  I made the changes because we won't use 
ExecAuxRowMarks in that case since we don't need to do EvalPlanQual 
rechecks and because we won't use junk filters in that case since we do 
UPDATE/DELETE in the subplan.  But the creating cost is enough small, so 
simply removing the changes seems like a good idea.



This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+   /* No need to provide scan tuple to
ExecProcessReturning. */
+   slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.


As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for 
that is that in the pushdown case it's the IterateDMLPushdown's 
responsiblity to get actually inserted/updated/deleted tuples and make 
those tuples available to the ExecProcessReturning.  I'll add comments.



On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+
+ If an FDW supports optimizing foreign table updates, it still needs to
+ provide PlanDMLPushdown, BeginDMLPushdown,
+ IterateDMLPushdown and EndDMLPushdown
+ described below.
+

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.


I'm not sure that "bulk modify" is best.  Yeah, this would improve the 
performance especially in the bulk-modification case, but would improve 
the performance even in the case where an UPDATE/DELETE modifies just a 
single row.  Let me explain using an example.  Without the patch, we 
have the following plan for an UPDATE on a foreign table that updates a 
single row:


postgres=# explain verbose update foo set a = a + 1 where a = 1;
QUERY PLAN
--
 Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
   Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
   ->  Foreign

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread MauMau

From: Craig Ringer 
--
We could help ORMs solve the N+1 SELECTs problem and help them avoid 
transferring vast join projections unnecessarily. That'd make PostgreSQL pretty 
compelling for exactly the users we're mostly too busy dismissing to consider.

I'd be interested in reaching out to some Hibernate/JPA and ActiveRecord folks 
about how the DB could help the ORM and have been meaning to explore this area 
for a while, but -ENOTIME. If anyone pursues it I'll be really interested in 
hearing how things go.
--

You have various ideas and experience, don't you?
Are those ideas on ORMs beneficial exclusively to PostgreSQL or to all DBMSs?  
I don't know the structure of ORMs allows for improvements to be advantageous 
to a specific DBMS.



From: Craig Ringer 
--
When you consider the adoption of PostgreSQL in the Rails/ActiveRecord crowd I 
think there's some interesting potential there. What there isn't is funding 
AFAIK.
--

Yes, this needs investigation.  But I'm not sure Ruby ecosystem is the world 
where PostgreSQL should go.  Do many users of commercial databases have assets 
based on Ruby which they want to continue to use?

Recently, my colleague, a Ruby committer, told me that Ruby may be losing 
momentum because developers are moving toward JavaScript+Node.js.  Addressing 
JavaScript+Node.js ecosystem may be more beneficial to PostgreSQL popularity.



From: Craig Ringer 
--
Besides, in practice, we probably should increase the number of software 
interoperable with PostgreSQL.  e.g. one customer asked us whether Arcserve can 
be used to back up PostgreSQL databases, but unfortunately we had to answer no. 
 They are using Arcserve to back up Oracle databases and other resources.  
"Then, you can use NetVault instead" is not the best answer; they just want to 
replace the database.

The "we" here is the problem. It's not likely to be folks focused on PostgreSQL 
core dev, but ... who, exactly?
--

I simply meant PostgreSQL developers and users (of course, including me) who 
wish PostgreSQL to become more popular and thrive for a long time.  Most users 
use databases combined with applications (not just the DBMS alone), so 
increasing the number of applications supporting PostgreSQL will attract more 
users.  PostgreSQL developers will gain wider market and business/job 
oppotunities.



From: Craig Ringer 
--
Provide technical assistance to those vendors as an organization so that they 
can support PostgreSQL smoothly.

This one is a help. That said, pgsql-general is pretty helpful already...
--

Certainly.



From: Craig Ringer 
--
* Make a directory of software/services that can be used with PostgreSQL on the 
community web site (wiki.postgresql.org or www.postgresql.org).
Software/services vendors and PostgreSQL developers/users can edit this 
directory.

I thought we had that? Yep.

http://www.postgresql.org/download/product-categories/

It's pretty invisible though, partly due to the postgresql.org landing page's 
need for a trim-down and tidy. (Don't even get me started on 
http://www.postgresql.org/about/ )
--

Thanks, I know the page.  It totally slipped off my head.  Yes, I'm afraid the 
path to the page (Home -> Download -> Software Catalog) is not good.



From: Craig Ringer 
--
* How/Where can we get the knowledge of expanding the software ecosystem?  Is 
there any OSS project that we can learn from?

Mongo.

http://www.postgresql.org/
https://www.mongodb.org/

... and tell me which you'd look over first if you were evaluating things.
--

Yes, I want something like a site map.  But I couldn't find information about 
software supporting MongoDB within a few minutes, and I gave up.  MongoDB 
website may not be necessarily nice.

In that respect, MySQL website is good, because "Partners" link is on the home 
page.  PostgreSQL website can also have "Related software" or something like 
that on the home page.




From: Craig Ringer 
--
How can we attract software vendors to support PostgreSQL?  What words are 
convincing to appeal the increasing potential of PostgreSQL as a good 
replacement for commercial databases?

Change the name :p
--

Sorry, I couldn't catch the implication.  Do you mean changing the name 
PostgreSQL to something else, or just a joke?

Regards
MauMau

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Craig Ringer

On 6 March 2016 at 13:46, Mark Kirkwood 
wrote:

>
> For cloud - in particular Openstack (which I am working with ATM), the
> biggest thing would be:
>
> - multi-master replication
>

Working on it ;)

> or failing that:
>
> - self managing single master failover (voting/quorum etc)
>

I don't think that's either/or, frankly, in that you can have multimaster
that needs a lot of babying or multimaster that has magic autoscaling and
autosharding and autofailover (or some approximation of).

That said, I firmly believe that autofailover is a giant foot-cannon unless
*regularly tested in the live environment*. The same is true of most of the
other autoFOOs. Products that claim to offer them often do some creative
bending of the guarantees they claim to offer elsewhere, so I also tend to
take such claims with a grain or ten of salt. Witness the recent
discussions of Galleria.

Pg currently makes testing autofailover pretty painful, since failover is a
major and disruptive event that reduces your redundancy until the old
master has resynced. pg_rewind will help a lot there if it proves robust
enough - hopefully enough that we can start adopting a
ChaosMonkey-flavoured approach to HA.

Eventually, a long way down the track, I'd like to see logical replication
based failover be part of that picture too.

so that operators can essentially 'set and forget'. We currently use
> Mysql+ Galera (multi master) and Mongodb (self managing single master)
> and the convenience and simplicity is just so important (Openstack is a
> huge complex collection of services - hand holding of any one service is
> pretty much a non starter).
>

I rarely see a service where "self managing" doesn't turn into "needs lots
of tuning, tweaking and fixing" at significant scales, but it's nice to
have it be automagic and correct enough of the time for small to medium
deployments.

That's one area where Pg core's focus on getting it right can hold us back.
Not necessarily for the worse. We're not usually big on saying "oh, it
probably won't corrupt your data if you use it at reasonable transaction
rates with reasonable inter-node latencies". Having had a few interesting
surprises with
set-and-forget-until-it-brings-the-whole-network-down-and-you-have-no-idea-how-to-fix-it
systems I tend to favour doing it right myself. Which is why some of the
compromises currently necessary in the logical replication space on Pg irk
me so

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WIP: Upper planner pathification

2016-03-07 Thread Amit Kapila

On Mon, Mar 7, 2016 at 11:52 AM, Peter Geoghegan  wrote:
>
> On Sun, Mar 6, 2016 at 9:59 PM, Tom Lane  wrote:
> > Perhaps it was intentional when written, but if Robert's advice is
correct
> > that the new upper-planner path nodes should copy up parallel_degree
from
> > their children, then it cannot be the case that parallel_degree>0 in a
> > node above the scan level implies that that node type has any special
> > behavior for parallelism.
> >

Right.

>
> > I continue to bemoan the lack of documentation about what these fields
> > mean.
> >

Point taken and if Robert doesn't feel otherwise, I can try to write a
patch to explain the newly added fields.

> > As far as I can find, the sum total of the documentation about
> > this field is
> >
> > int parallel_degree; /* desired parallel degree; 0 = not
parallel */
>
> While it doesn't particularly relate to parallel joins, I've expressed
> a general concern about the max_parallel_degree GUC that I think is
> worth considering again:
>
>
http://www.postgresql.org/message-id/cam3swzrs1mtvrkkasy1xbshgzxkd6-hnxx3gq7x-p-dz0zt...@mail.gmail.com
>
> In summary, I think it's surprising that a max_parallel_degree of 1
> doesn't disable parallel workers entirely.
>

I have responded on the thread where you have raised that point with my
thoughts, discussing it here on a separate point can dilute the purpose of
this thread.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Craig Ringer

On 7 March 2016 at 17:06, Tatsuo Ishii  wrote:

> > * Only 24 open source projects are listed as interoperable.
> > Open Source Projects Using PostgreSQL
> > https://wiki.postgresql.org/wiki/OpenSource_Projects_Using_PostgreSQL
>
> It's pity that some very popular OSS projects do not care about
> PostgreSQL. WordPress is one of such OSS projects. Long time ago
> someone came up with "PostgreSQL plugin" but it didn't catch up
> WordPress releases and now cannot be used with the latest version of
> WordPress. In my understanding WordPress fixes security issues only
> with the latest version; as a result, we cannot use safe version of
> WordPress if we want to use PostgreSQL for its database engine.
> 
>

The killer there was that the plugin could only alter queries used by
Wordpress core. Nobody just uses Wordpress core.  The whole reason
Wordpress became popular is the vast collection of plugins, themes, etc.
90% of which are written by three stoned monkeys who once saw a PHP 4
manual in a mirror, which is part of why it has such an appalling security
history. Plugins can just talk straight to the DB, and are written by
people who have never heard of parametrized queries or, half the time,
transactions.

What makes it popular and successful is also what makes supporting Pg in a
way that'll actually see useful adoption hard.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Craig Ringer

On 5 March 2016 at 23:41, Kevin Grittner  wrote:

>
> > I'd be really interested in some ideas on how that information might be
> > usefully accessed. If we could write info on when to apply commits to the
> > xlog in serializable mode that'd be very handy, especially when looking
> to
> > the future with logical decoding of in-progress transactions, parallel
> > apply, etc.
>
> Are you suggesting the possibility of holding off on writing the
> commit record for a SERIALIZABLE transaction to WAL until it is
> known that no other SERIALIZABLE transaction comes ahead of it in
> the apparent order of execution?  If so, that's an interesting idea
> that I hadn't given much thought to yet -- I had been assuming
> current WAL writes, with adjustments to the timing of application
> of the records.
>

I wasn't, I simply wrote less than clearly. I intended to say "from the
xlog" where I wrote "to the xlog". Nonetheless, that'd be a completely
unrelated but interesting thing to explore...

> > For parallel apply I anticipated that we'd probably have workers applying
> > xacts in parallel and committing them in upstream commit order. They'd
> > sometimes deadlock with each other; when this happened all workers whose
> > xacts committed after the first aborted xact would have to abort and
> start
> > again. Not ideal, but safe.
> >
> > Being able to avoid that by using SSI information was in the back of my
> > mind, but with no idea how to even begin to tackle it. What you've
> mentioned
> > here is helpful and I'd be interested if you could share a bit more of
> your
> > experience in the area.
>
> My thinking so far has been that reordering the application of
> transaction commits on a replica would best be done as the minimal
> rearrangement possible from commit order which allows the work of
> transactions to become visible in an order consistent with some
> one-at-a-time run of those transactions.  Partly that is because
> the commit order is something that is fairly obvious to see and is
> what most people intuitively look at, even when it is wrong.
> Deviating from this intuitive order seems likely to introduce
> confusion, even when the results are 100% correct.
>

The only place you *need* to vary from commit order for correctness
> is when there are overlapping SERIALIZABLE transactions, one
> modifies data and commits, and another reads the old version of the
> data but commits later.

Ah, right. So here, even though X1 commits before X2 running concurrently
under SSI, the logical order in which the xacts could've occurred serially
is that where xact 2 runs and commits before X1, since xact 2 doesn't
depend on xact 1. X2 read the old row version before xact 1 modified it,
and logically occurs before xact1 in the serial rearrangement.

I don't fully grasp how that can lead to a situation where xacts can commit
in an order that's valid upstream but not valid as a downstream apply
order. I presume we're looking at read-only logical replicas here (rather
than multimaster), and it's only a concern for SERIALIZABLE xacts since a
READ COMMITTED xact on the master and replica would both be able to see the
state where X1 is commited but X2 isn't yet. But I don't see how a
read-only xact in SERIALIZABLE on the replica can get different results to
what it'd get with SSI on the master. It's entirely possible for a read
xact on the master to get a snapshot after X1 commits and after X2 commits,
same as READ COMMITTED. SSI shouldn't AFAIK come into play with no writes
to create a pivot. Is that wrong?

If we applied this sequence to the downstream in commit order we'd still
get correct results on the heap after applying both. We'd have an
intermediate state where X1 is commited but X2 isn't, but we can have the
same on the master. SSI doesn't AFAIK mask X1 from becoming visible in a
snapshot until X2 commits or anything, right?

>   Due to the action of SSI on the source
> machine, you know that there could not be any SERIALIZABLE
> transaction which saw the inconsistent state between the two
> commits, but on replicas we don't yet manage that.

OK, maybe that's what I'm missing. How exactly does SSI ensure that? (A
RTFM link / hint is fine, but I didn't find it in the SSI section of TFM at
least in a way I recognised).

The key is that
> there is a read-write dependency (a/k/a rw-conflict) between the
> two transactions which tells you that the second to commit has to
> come before the first in any graph of apparent order of execution.
>

Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from
getting a snapshot between the two commits and reading from there?

> The tricky part is that when there are two overlapping SERIALIZABLE
> transactions and one of them has modified data and committed, and
> there is an overlapping SERIALIZABLE transaction which is not READ
> ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
> correct ordering remains in doubt -- there is no

Re: [HACKERS] ExecGather() + nworkers

2016-03-07 Thread Amit Kapila

On Mon, Jan 11, 2016 at 3:14 AM, Peter Geoghegan  wrote:
>
> On Sun, Jan 10, 2016 at 9:13 AM, Robert Haas 
wrote:
>
> Now, you might wonder why it is that the leader cannot also sort runs,
> just as a worker would. It's possible, but it isn't exactly
> straightforward. You have to have special cases in several places,
> even though it probably is going to be uncommon to only have one
> BackgroundWorkerSlot available in practice. It's simpler to just
> opt-out, and seems better given that max_parallel_degree is a way of
> resource limiting based on available cores (it's certainly not about
> the availability of shared memory for the BackgroundWorkerSlot array).
>
> More importantly, I have other, entirely general concerns. Other major
> RDBMSs have settings that are very similar to max_parallel_degree,
> with a setting of 1 effectively disabling all parallelism. Both Oracle
> and SQL Server have settings that they both call the "maximum degree
> or parallelism". I think it's a bit odd that with Postgres,
> max_parallel_degree = 1 can still use parallelism at all. I have to
> wonder: are we conflating controlling the resources used by parallel
> operations with how shared memory is doled out?
>

Your point is genuine, but OTOH let us say if max_parallel_degree = 1 means
parallelism is disabled then when somebody sets max_parallel_degree = 2,
then it looks somewhat odd to me that, it will mean that 1 worker process
can be used for parallel query.  Also, I think the parallel query will be
able to get parallel workers till max_worker_processes + 1 which seems less
intuitive than the current.

On your point of other databases, I have also checked and it seems like
some of other databases like sybase [1] also provide a similar parameter
and value 1 means serial execution, so we might also want to consider it
similarly, but to me the current setting sounds quite intuitive, however if
more people also feel the same as you, then we should change it.

[1] -
http://infocenter.sybase.com/archive/index.jsp?topic=/com.sybase.help.ase_15.0.sag1/html/sag1/sag1234.htm

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-07 Thread Tomas Vondra

Hi,

On Mon, 2016-03-07 at 12:17 +0100, Shulgin, Oleksandr wrote:
> On Fri, Mar 4, 2016 at 7:27 PM, Robert Haas 
> wrote:
> On Thu, Mar 3, 2016 at 2:48 AM, Shulgin, Oleksandr
>  wrote:
> > On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera
> 
> > wrote:
> >> Shulgin, Oleksandr wrote:
> >>
> >> > Alright.  I'm attaching the latest version of this patch
> split in two
> >> > parts: the first one is NULLs-related bugfix and the
> second is the
> >> > "improvement" part, which applies on top of the first
> one.
> >>
> >> So is this null-related bugfix supposed to be backpatched?
> (I assume
> >> it's not because it's very likely to change existing
> plans).
> >
> > For the good, because cardinality estimations will be more
> accurate in these
> > cases, so yes I would expect it to be back-patchable.
> 
> -1.  I think the cost of changing existing query plans in back
> branches is too high.  The people who get a better plan never
> thank
> us, but the people who (by bad luck) get a worse plan always
> complain.
> 
> 
> They might get that different plan when they upgrade to the latest
> major version anyway.  Is it set somewhere that minor version upgrades
> should never affect the planner?  I doubt so.

Major versions are supposed to add features, which may easily result in
plan changes. Moreover people are expected to do more thorough testing
on major version upgrade, so they're more likely to spot them.

OTOH minor versions are bugfix-only relases, and sometimes the fixes are
security related and people are supposed to install them ASAP. So many
people simply upgrade them without much additional testing and while we
can't promise any of the fixes won't change the plans, we kinda try to
minimize such cases.

That being said, I don't have a clear opinion whether to backpatch this.
I think that it's clearly a bug (especially the first part dealing with
NULL values), and it'd be good to backpatch that. OTOH I can't really
quantify the risks of changing some plans to worse ones.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 113 matches

Mail list logo