date:20140925

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-09-25 Thread Amit Kapila

On Wed, Sep 24, 2014 at 3:18 PM, Dilip kumar  wrote:
> On 24 August 2014 11:33, Amit Kapila Wrote
>
> >7. I think in new mechanism cancel handler will not work.
>
> >In single connection vacuum it was always set/reset
>
> >in function executeMaintenanceCommand(). You might need
>
> >to set/reset it in function run_parallel_vacuum().
>
>
>
> Good catch, Now i have called SetCancelConn(pSlot[0].connection), on
first connection. this will enable cancle
>
> handler to cancle the query on first connection so that select loop will
come out.
>

I don't think this can handle cancel requests properly because
you are just setting it in GetIdleSlot() what if the cancel
request came during GetQueryResult() after sending sql for
all connections (probably thats the reason why Jeff is not able
to cancel the vacuumdb when using parallel option).

>
>
>
> >1. I could see one shortcoming in the way the patch has currently
parallelize the
>
> >   work for --analyze-in-stages. Basically patch is performing the work
for each stage
>
> >   for multiple tables in concurrent connections that seems okay for the
cases when
>
> >   number of parallel connections is less than equal to number of
tables, but for
>
> >   the case when user has asked for more number of connections than
number of tables,
>
> >   then I think this strategy will not be able to use the extra
connections.
>
>
>
> I think --analyze-in-stages should maintain the order.

Yes, you are right.  So lets keep the code as it is for this case.

>
>
> >2. Similarly for the case of multiple databases, currently it will not
be able
>
> >   to use connections more than number of tables in each database
because the
>
> >   parallelizing strategy is to just use the conncurrent connections for
>
> >   tables inside single database.
>
>
>
> I think for multiple database doing the parallel execution we need to
maintain the multiple connection with multiple databases.
> And we need to maintain a table list for all the databases together to
run them concurrently. I think this may impact the startup cost,
> as we need to create a multiple connection and disconnect for preparing
the list

Yeah probably startup cost will be bit higher, but that cost we
will anyway incur during overall operation.

> and i think it will increase the complexity also.

I understand that complexity of code might increase and the strategy
to parallelize can also be different incase we want to parallelize
for all databases case, so lets leave as it is unless someone else
raises voice for the same.

Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case).  Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables.  Do
you think we should optimize for any such cases?

Few other points
1.
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
{
..
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons *
sizeof(ParallelSlot));
+ connSlot[0].connection = conn;

a.
Does above memory gets freed anywhere, if not isn't it
good idea to do the same
b. For slot 0, you are not seeting it as PQsetnonblocking,
where as I think it can be used to run commands like any other
connection.

2.
+ /*
+  * If user has given the vacuum of complete db, then if
+  * any of the object vacuum
failed it can be ignored and vacuuming
+  * of other object can be continued, this is the same
behavior as
+  * vacuuming of complete db is handled without --jobs option
+  */

s/object/object's

3.
+ if(!completedb ||
+ (sqlState && strcmp(sqlState,
ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of
database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage
(conn));

Indentation on both places is wrong.  Check other palces for
similar issues.

4.
+ bool analyze_only, bool freeze, int numAsyncCons,

In code still there is reference to AsyncCons, as decided lets
change it to concurrent_connections | conc_cons

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] KNN-GiST with recheck

2014-09-25 Thread Alexander Korotkov

On Fri, Sep 26, 2014 at 5:18 AM, Bruce Momjian  wrote:

> On Sun, Sep 14, 2014 at 11:34:26PM +0400, Alexander Korotkov wrote:
> > > Cost estimation of GiST is a big problem anyway. It doesn't care
> (and
> > > can't) about amount of recheck for regular operators. In this
> patch, same
> > > would be for knn recheck. The problem is that touching heap from
> access
>
> This is very important work.  While our existing KNN-GiST index code
> works fine for scalar values and point-to-point distance ordering, it
> doesn't work well for 2-dimensional objects because they are only
> indexed by their bounding boxes (a rectangle around the object).  The
> indexed bounding box can't produce accurate distances to other objects.
>
> As an example, see this PostGIS blog post showing how to use LIMIT in a
> CTE to filter results and then compute the closest object (search for
> "LIMIT 50"):
>
>
> http://shisaa.jp/postset/postgis-postgresqls-spatial-partner-part-3.html
>
> This patch fixes our code for distances from a point to indexed 2-D
> objects.
>
> Does this also fix the identical PostGIS problem or is there something
> PostGIS needs to do?
>

This patch provides general infrastructure for recheck in KNN-GiST. PostGIS
need corresponding change in its GiST opclass. Since PostGIS already define
<-> and <#> operators as distance to bounding box border and bounding box
center, it can't change their behaviour.
it has to support new operator "exact distance" in opclass.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 08:10 PM, Tom Lane wrote:
> I wrote:
>> The "offsets-and-lengths" patch seems like the approach we ought to
>> compare to my patch, but it looks pretty unfinished to me: AFAICS it
>> includes logic to understand offsets sprinkled into a mostly-lengths
>> array, but no logic that would actually *store* any such offsets,
>> which means it's going to act just like my patch for performance
>> purposes.
> 
>> In the interests of pushing this forward, I will work today on
>> trying to finish and review Heikki's offsets-and-lengths patch
>> so that we have something we can do performance testing on.
>> I doubt that the performance testing will tell us anything we
>> don't expect, but we should do it anyway.
> 
> I've now done that, and attached is what I think would be a committable
> version.  Having done this work, I no longer think that this approach
> is significantly messier code-wise than the all-lengths version, and
> it does have the merit of not degrading on very large objects/arrays.
> So at the moment I'm leaning to this solution not the all-lengths one.
> 
> To get a sense of the compression effects of varying the stride distance,
> I repeated the compression measurements I'd done on 14 August with Pavel's
> geometry data (<24077.1408052...@sss.pgh.pa.us>).  The upshot of that was
> 
>   min max avg
> 
> external text representation  220 172685  880.3
> JSON representation (compressed text) 224 78565   541.3
> pg_column_size, JSONB HEAD repr.  225 82540   639.0
> pg_column_size, all-lengths repr. 225 66794   531.1
> 
> Here's what I get with this patch and different stride distances:
> 
> JB_OFFSET_STRIDE = 8  225 68551   559.7
> JB_OFFSET_STRIDE = 16 225 67601   552.3
> JB_OFFSET_STRIDE = 32 225 67120   547.4
> JB_OFFSET_STRIDE = 64 225 66886   546.9
> JB_OFFSET_STRIDE = 128225 66879   546.9
> JB_OFFSET_STRIDE = 256225 66846   546.8
> 
> So at least for that test data, 32 seems like the sweet spot.
> We are giving up a couple percent of space in comparison to the
> all-lengths version, but this is probably an acceptable tradeoff
> for not degrading on very large arrays.
> 
> I've not done any speed testing.

I'll do some tommorrow.  I should have some different DBs to test on, too.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane

I wrote:
> The "offsets-and-lengths" patch seems like the approach we ought to
> compare to my patch, but it looks pretty unfinished to me: AFAICS it
> includes logic to understand offsets sprinkled into a mostly-lengths
> array, but no logic that would actually *store* any such offsets,
> which means it's going to act just like my patch for performance
> purposes.

> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.

I've now done that, and attached is what I think would be a committable
version.  Having done this work, I no longer think that this approach
is significantly messier code-wise than the all-lengths version, and
it does have the merit of not degrading on very large objects/arrays.
So at the moment I'm leaning to this solution not the all-lengths one.

To get a sense of the compression effects of varying the stride distance,
I repeated the compression measurements I'd done on 14 August with Pavel's
geometry data (<24077.1408052...@sss.pgh.pa.us>).  The upshot of that was

min max avg

external text representation220 172685  880.3
JSON representation (compressed text)   224 78565   541.3
pg_column_size, JSONB HEAD repr.225 82540   639.0
pg_column_size, all-lengths repr.   225 66794   531.1

Here's what I get with this patch and different stride distances:

JB_OFFSET_STRIDE = 8225 68551   559.7
JB_OFFSET_STRIDE = 16   225 67601   552.3
JB_OFFSET_STRIDE = 32   225 67120   547.4
JB_OFFSET_STRIDE = 64   225 66886   546.9
JB_OFFSET_STRIDE = 128  225 66879   546.9
JB_OFFSET_STRIDE = 256  225 66846   546.8

So at least for that test data, 32 seems like the sweet spot.
We are giving up a couple percent of space in comparison to the
all-lengths version, but this is probably an acceptable tradeoff
for not degrading on very large arrays.

I've not done any speed testing.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..9beebb3 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*** jsonb_from_cstring(char *json, int len)
*** 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len > JENTRY_POSMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg("string too long to represent as jsonb string"),
   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
! 		   JENTRY_POSMASK)));
  
  	return len;
  }
--- 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len > JENTRY_OFFLENMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg("string too long to represent as jsonb string"),
   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
! 		   JENTRY_OFFLENMASK)));
  
  	return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..f157df3 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***
*** 26,40 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
  			   JsonbValue *result);
! static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_OFFLENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JsonbContainer *container, int index,
! 			   char *base_addr, uint32 offset,
  			   JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);

Re: [HACKERS] Replication identifiers, take 3

2014-09-25 Thread Robert Haas

Thanks for this write-up.

On Tue, Sep 23, 2014 at 2:24 PM, Andres Freund  wrote:
> 1) The ability Monitor how for replication has progressed in a
>crashsafe manner to allow it to restart at the right point after
>errors/crashes.
> 2) Efficiently identify the origin of individual changes and
>transactions. In multimaster and some cascading scenarios it is
>necessary to do so to avoid sending out replayed changes again.
> 3) Allow to efficiently filter out replayed changes from logical
>decoding. It's currently possible to filter changes from inside the
>output plugin, but it's much more efficient to filter them out
>before decoding.

I agree with these goals.

Let me try to summarize the information requirements for each of these
things.  For #1, you need to know, after crash recovery, for each
standby, the last commit LSN which the client has confirmed via a
feedback message.  For #2, you need to know, when decoding each
change, what the origin node was.  And for #3, you need to know, when
decoding each change, whether it is of local origin.  The information
requirements for #3 are a subset of those for #2.

> A rather efficient solution for 1) is to attach the 'origin node' and
> the remote commit LSN to every local commit record produced by
> replay. That allows to have a shared memory "table" (remote_node,
> local_lsn, remote_lsn).

This seems OK to me, modulo some arguing about what the origin node
information ought to look like.  People who are not using logical
replication can use the compact form of the commit record in most
cases, and people who are using logical replication can pay for it.

> Similarly, to solve the problem of identifying the origin of changes
> during decoding, the problem can be solved nicely by adding the origin
> node of every change to changes/transactions. At first it might seem
> to be sufficient to do so on transaction level, but for cascading
> scenarios it's very useful to be able to have changes from different
> source transactions combinded into a larger one.

I think this is a lot more problematic.  I agree that having the data
in the commit record isn't sufficient here, because for filtering
purposes (for example) you really want to identify the problematic
transactions at the beginning, so you can chuck their WAL, rather than
reassembling the transaction first and then throwing it out.  But
putting the origin ID in every insert/update/delete is pretty
unappealing from my point of view - not just because it adds bytes to
WAL, though that's a non-trivial concern, but also because it adds
complexity - IMHO, a non-trivial amount of complexity.  I'd be a lot
happier with a solution where, say, we have a separate WAL record that
says "XID 1234 will be decoding for origin 567 until further notice".

> == Replication Identifiers ==
>
> The above explains the need to have as small as possible identifiers
> for nodes. Two years back I'd suggested that we rely on the user to
> manually assign 16bit ids to individual nodes. Not very surprisingly
> that was shot down because a) 16bit numbers are not descriptive b) a
> per node identifier is problematic because it prohibits replication
> inside the same cluster.
>
> So, what I've proposed since is to have two different forms of
> identifiers. A long one, that's as descriptive as
> $replication_solution wants. And a small one (16bit in my case) that's
> *only meaningful within one node*. The long, user facing, identifier
> is locally mapped to the short one.
>
> In the first version I proposed these long identifiers had a fixed
> form, including the system identifier, timeline id, database id, and a
> freeform name. That wasn't well received and I agree that that's too
> narrow. I think it should be a freeform text of arbitrary length.
>
> Note that it depends on the replication solution whether these
> external identifiers need to be coordinated across systems or not. I
> think it's *good* if we don't propose a solution for that - different
> replication solutions will have different requirements.

I'm pretty fuzzy on how this actually works.  Like, the short form
here is just getting injected into WAL by the apply process.  How does
it figure out what value to inject?  What if it injects a value that
doesn't have a short-to-long mapping?  What's the point of the
short-to-long mappings in the first place?  Is that only required
because of the possibility that there might be multiple replication
solutions in play on the same node?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-09-25 Thread Michael Paquier

On Fri, Sep 26, 2014 at 6:59 AM, Alvaro Herrera
 wrote:
> Actually here's a different split of these patches, which I hope makes
> more sense.  My intention here is that patches 0001 to 0004 are simple
> changes that can be pushed right away; they are not directly related to
> the return-creation-command feature.  Patches 0005 to 0027 implement
> that feature incrementally.  You can see in patch 0005 the DDL commands
> that are still not implemented in deparse (they are the ones that have
> an elog(ERROR) rather than a "command = NULL").  Patch 0006 adds calls
> in ProcessUtilitySlow() to each command, so that the object(s) being
> touched are added to the event trigger command stash.
>
> Patches from 0007 to 0027 (excepting patch 0017) implement one or a
> small number of commands in deparse.  Patch 0017 is necessary
> infrastructure in ALTER TABLE to support deparsing that one.
>
> My intention with the later patches is that they would all be pushed as
> a single commit, i.e. the deparse support would be implemented for all
> commands in a fell swoop rather than piecemeal -- except possibly patch
> 0017 (the ALTER TABLE infrastructure).  I split them up only for ease of
> review.  Of course, before pushing we (I) need to implement deparsing
> for all the remaining commands.

Thanks for the update. This stuff is still on my TODO list and I was
planning to look at it at some extent today btw :) Andres has already
sent some comments (20140925235724.gh16...@awork2.anarazel.de) though,
I'll try to not overlap with what has already been mentioned.

Patch 1: I still like this patch as it gives a clear separation of the
built-in functions and the sub-functions of ruleutils.c that are
completely independent. Have you considered adding the external
declaration of quote_all_identifiers as well? It is true that this
impacts extensions (some of my stuff as well), but my point is to bite
the bullet and make the separation cleaner between builtins.h and
ruleutils.h. Attached is a patch that can be applied on top of patch 1
doing so... Feel free to discard for the potential breakage this would
create though.

Patch 2:
1) Declarations of renameatt are inconsistent:
@tablecmds.c:
-renameatt(RenameStmt *stmt)
+renameatt(RenameStmt *stmt, int32 *objsubid)
@tablecmds.h:
-extern Oid renameatt(RenameStmt *stmt);
+extern Oid renameatt(RenameStmt *stmt, int *attnum);
Looking at this code, for correctness renameatt should use everywhere
"int *attnum" and ExecRenameStmt should use "int32 *subobjid".
2) Just a smallish picky formatting remark, I would have done that on
a single line:
+   attnum =
+   renameatt_internal(relid,
3) in ExecRenameStmt, I think that you should set subobjid for
aggregate, collations, fdw, etc. There is an access to ObjectAddress
so that's easy to set using address.objectSubId.
4) This comment is misleading, as for an attribute what is returned is
actually an attribute number:
+ * Return value is the OID of the renamed object.  The objectSubId, if any,
+ * is returned in objsubid.
  */
5) Perhaps it is worth mentioning at the top of renameatt that the
attribute number is returned as well?

Patch 3: Looks fine, a bit of testing is showing up that this works as expected:
=# CREATE ROLE foo;
CREATE ROLE
=# CREATE TABLE aa (a int);
CREATE TABLE
=# CREATE OR REPLACE FUNCTION abort_grant()
RETURNS event_trigger AS $$
BEGIN
RAISE NOTICE 'Event: %', tg_event;
RAISE EXCEPTION 'Execution of command % forbidden', tg_tag;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
=# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_start
WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant();
CREATE EVENT TRIGGER
=# GRANT SELECT ON aa TO foo;
NOTICE:  0: Event: ddl_command_start
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command GRANT forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068
=# DROP EVENT TRIGGER abort_grant_trigger;
DROP EVENT TRIGGER
=# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_end
WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant();
CREATE EVENT TRIGGER
=# REVOKE SELECT ON aa FROM foo;
NOTICE:  0: Event: ddl_command_end
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command REVOKE forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068

Patch 4: Shouldn't int32 be used instead of uint32 in the declaration
of CommentObject? And yes, adding support for only ddl_command_start
and ddl_command_end is enough. Let's not play with dropped objects in
this area... Similarly to the test above:
=# comment on table aa is 'test';
NOTICE:  0: Event: ddl_command_start
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command COMMENT forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068
Regards,
-- 
Michael
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index d77b3ee..fc8dc80 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -57,6 +57,7 @@
 #include "utils/l

Re: [HACKERS] KNN-GiST with recheck

2014-09-25 Thread Bruce Momjian

On Sun, Sep 14, 2014 at 11:34:26PM +0400, Alexander Korotkov wrote:
> > Cost estimation of GiST is a big problem anyway. It doesn't care (and
> > can't) about amount of recheck for regular operators. In this patch, 
> same
> > would be for knn recheck. The problem is that touching heap from access

This is very important work.  While our existing KNN-GiST index code
works fine for scalar values and point-to-point distance ordering, it
doesn't work well for 2-dimensional objects because they are only
indexed by their bounding boxes (a rectangle around the object).  The
indexed bounding box can't produce accurate distances to other objects. 

As an example, see this PostGIS blog post showing how to use LIMIT in a
CTE to filter results and then compute the closest object (search for
"LIMIT 50"):

http://shisaa.jp/postset/postgis-postgresqls-spatial-partner-part-3.html

This patch fixes our code for distances from a point to indexed 2-D
objects.

Does this also fix the identical PostGIS problem or is there something
PostGIS needs to do?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-25 Thread Andres Freund

On 2014-09-25 21:02:28 -0400, Robert Haas wrote:
> I feel like this could really use a developer README: what primitives
> do we have, which ones are likely to be efficient or inefficient on
> which platforms, how do atomics interact with barriers, etc.

atomics.h has most of that. Including documenting which barrier
semantics the individual operations have. One change I can see as being
a good idea is to move/transform the definition of acquire/release
barriers from s_lock.h to README.barrier.

It doesn't document which operations are efficient on which platform,
but I wouldn't know what to say about that topic? The current set of
operations should be fast on pretty much all platforms.

What information would you like to see? An example of how to use
atomics?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 6:03 PM, Andres Freund  wrote:
> On 2014-09-24 20:27:39 +0200, Andres Freund wrote:
>> On 2014-09-24 21:19:06 +0300, Heikki Linnakangas wrote:
>> I won't repost a version with it removed, as removing a function as the
>> only doesn't seem to warrant reposting it.
>
> I've fixed (thanks Alvaro!) some minor additional issues besides the
> removal and addressing earlier comments from Heikki:
> * removal of two remaining non-ascii copyright signs. The only non ascii
>   name character that remains is Alvaro's name in the commit message.
> * additional comments for STATIC_IF_INLINE/STATIC_IF_INLINE_DECLARE
> * there was a leftover HAVE_GCC_INT_ATOMICS reference - it's now split
>   into different configure tests and thus has a different name. A later
>   patch entirely removes that reference, which is why I'd missed that...
>
> Heikki has marked the patch as 'ready for commiter' in the commitfest
> (conditional on his remarks being addressed) and I agree. There seems to
> be little benefit in waiting further. There *definitely* will be some
> platform dependant issues, but that won't change by waiting longer.
>
> I plan to commit this quite soon unless somebody protests really
> quickly.
>
> Sorting out the issues on platforms I don't have access to based on
> buildfarm feedback will take a while given how infrequent some of the
> respective animals run... But that's just life.

I feel like this could really use a developer README: what primitives
do we have, which ones are likely to be efficient or inefficient on
which platforms, how do atomics interact with barriers, etc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-09-25 Thread Craig Ringer

On 09/25/2014 11:54 PM, Kevin Grittner wrote:
> I have fixed the bug reported by Heikki; be sure to grab that.

Will do.

> I have been merging in changes to master as I go, so that bit rot
> doesn't accumulate, but I don't squash or rebase; hopefully that
> style works for you.

IMO it only really matters before the final push to master; before then
it's all just a matter of how you prefer to work.

I'm a big fan of rebasing my feature branches as I go:

  git tag before-rebase
  git pull --rebase
  ... do any merges during rebase ...
  git tag -d before-rebase

For bug fixes I tend to commit them separately, then when I rebase I
squash them into the relevant patch. Git's "fixup! " commits are really
handy for this; if I have a commit:

Add widget support

Everyone wants more widgets.

and want to fix an issue in that commit I can just commit

fixup! Add widget support

It's spelled widget not wodget

and when I "git rebase --autosquash master" they get automatically
squashed into the relevant changeset. (I usually run with the config
rebase.autosquash enabled so this happens during my rebase pulls on top
of master).

I got in the habit while working on RLS, to keep me sane with that
patchset, and find it works well for me.

However, everyone has a different work style. Colour me happy if it's in
git at all.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-09-25 Thread Andres Freund

On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
>  /* tid.c */
> diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
> new file mode 100644
> index 000..520b066
> --- /dev/null
> +++ b/src/include/utils/ruleutils.h
> @@ -0,0 +1,34 @@
> +/*-
> + *
> + * ruleutils.h
> + *   Declarations for ruleutils.c
> + *
> + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/include/ruleutils.h
> + *
> + *-
> + */
> +#ifndef RULEUTILS_H
> +#define RULEUTILS_H
> +
> +#include "nodes/nodes.h"
> +#include "nodes/parsenodes.h"
> +#include "nodes/pg_list.h"
> +
> +
> +extern char *pg_get_indexdef_string(Oid indexrelid);
> +extern char *pg_get_indexdef_columns(Oid indexrelid, bool pretty);
> +
> +extern char *pg_get_constraintdef_string(Oid constraintId);
> +extern char *deparse_expression(Node *expr, List *dpcontext,
> +bool forceprefix, bool showimplicit);
> +extern List *deparse_context_for(const char *aliasname, Oid relid);
> +extern List *deparse_context_for_planstate(Node *planstate, List *ancestors,
> +   List *rtable, List 
> *rtable_names);
> +extern List *select_rtable_names_for_explain(List *rtable,
> + Bitmapset 
> *rels_used);
> +extern char *generate_collation_name(Oid collid);
> +
> +#endif   /* RULEUTILS_H */
> -- 
> 1.9.1
> 

I wondered for a minute whether any of these are likely to cause
problems for code just including builtins.h - but I think there will be
sufficiently few callers for them to make that not much of a concern.

> From 5e3555058a1bbb93e25a3a104c26e6b96dce994d Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Thu, 25 Sep 2014 16:34:50 -0300
> Subject: [PATCH 02/27] deparse/core: have RENAME return attribute number

Looks "harmless". I.e. +1.

> From 72c4d0ef875f9e9b0f3b424499738fd1bd946c32 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Fri, 9 May 2014 18:32:23 -0400
> Subject: [PATCH 03/27] deparse/core: event triggers support GRANT/REVOKE

Hm. The docs don't mention whether these only work for database local
objects or also shared objects. Generally event trigger are only
triggered for the former... But the code doesn't seem to make a
difference?

> From 5bb35d2e51fe6c6e219fe5cf7ddbab5423e6be1b Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Thu, 25 Sep 2014 15:44:44 -0300
> Subject: [PATCH 04/27] deparse/core: event triggers support COMMENT
> 
> ---
>  doc/src/sgml/event-trigger.sgml  |  8 +++-
>  src/backend/commands/comment.c   |  5 -
>  src/backend/commands/event_trigger.c |  1 +
>  src/backend/tcop/utility.c   | 21 +
>  src/include/commands/comment.h   |  2 +-
>  5 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
> index 49b8384..39ecd94 100644
> --- a/doc/src/sgml/event-trigger.sgml
> +++ b/doc/src/sgml/event-trigger.sgml
> @@ -37,7 +37,7 @@
> 
>   The ddl_command_start event occurs just before the
>   execution of a CREATE, ALTER, DROP,
> - GRANT or REVOKE
> + COMMENT, GRANT or REVOKE
>   command.  No check whether the affected object exists or doesn't exist 
> is
>   performed before the event trigger fires.
>   As an exception, however, this event does not occur for
> @@ -281,6 +281,12 @@
>  -
> 
> 
> +COMMENT
> +X
> +X
> +-
> +   
> +   

Hm. No drop support because COMMENTs are odd and use odd NULL
semantics. Fair enough.

> From 098f5acabd774004dc5d9c750d55e7c9afa60238 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Wed, 24 Sep 2014 15:53:04 -0300
> Subject: [PATCH 05/27] deparse: infrastructure needed for command deparsing


> ---
>  src/backend/catalog/objectaddress.c  | 115 +
>  src/backend/commands/event_trigger.c | 941 
> ++-
>  src/backend/tcop/Makefile|   2 +-
>  src/backend/tcop/deparse_utility.c   | 877 
>  src/backend/tcop/utility.c   |   2 +
>  src/backend/utils/adt/format_type.c  | 113 -
>  src/include/catalog/objectaddress.h  |   2 +
>  src/include/catalog/pg_proc.h|   4 +
>  src/include/commands/event_trigger.h |   3 +
>  src/include/commands/extension.h |   2 +-
>  src/include/nodes/parsenodes.h   |   2 +
>  src/include/tcop/deparse_utility.h   |  60 +++
>  src/include/utils/builtins.h |   5 +
>  13 files changed, 2117 insertions(+), 11 deletions(-)
>  create mode 100644 src/backend/tcop/deparse_utility.c
>  create mode 100644 src/include/tcop/deparse_utility.h

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Gavin Flower


On 26/09/14 08:21, Simon Riggs wrote:

On 25 September 2014 20:11, Robert Haas  wrote:


My approach would be to insert an index tuple for that value into the
index, but with the leaf ituple marked with an xid rather than a ctid.
If someone tries to insert into the index they would see this and wait
for the inserting transaction to end. The inserting transaction would
then resolve what happens in the heap (insert/update) and later
repoint the index tuple to the inserted/updated row version. I don't
see the need for page level locking since it would definitely result
in deadlocks (e.g. SQLServer).

I think that something like this might work, but the devil is in the
details.  Suppose two people try to upsert into the same table at the
same time.  There's one index.  If the transactions search that index
for conflicts first, neither sees any conflicting tuples, and both
proceed.  That's no good.  OK, so suppose each transaction inserts the
special index tuple which you mention, to lock out concurrent inserts
of that value, and then searches for already-existing conflicts.  Each
sees the other's tuple, and they deadlock.  That's no good, either.

The test index is unique, so our to-be-inserted value exists on only
one page, hence page locking applies while we insert it. The next
person to insert waits for the page lock and then sees the test tuple.

The page lock lasts only for the duration of the insertion of the
ituple, not for the whole operation.


Also, I think there are other cases where we think we're going to
insert, so we put the special index tuple in there, but then we decide
to update, so we don't need the promise tuple any more, but other
sessions are potentially still waiting for our XID to terminate even
though there's no conflict any more.  I'm having a hard time bringing
the details of those cases to mind ATM, though.

We make the decision to INSERT or UPDATE based upon what we find in
the test index. If a value if there already, we assume its an UPDATE
and go to update the row this points to. If it has been deleted we
loop back and try again/error. If the value is not present, we insert
the test tuple and progress as an INSERT, then loop back later to set
the ctid. There is no case of "don't need promise id anymore". We
would use the PK, identity or first unique index as the test index.
There is a case where an UPSERT conflicts with an INSERT causing the
latter to abort.

Anyway, this is why we need the design more clearly exposed, so you
can tell me I'm wrong by showing me the URL of it done right.

What happens if the new value(s) of the INERT/UPDATE require the page to 
be split?


I assume the mechanics of this are  catered for, but how does it affect 
locking & potential deadlocks?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 3:23 PM, Stephen Frost  wrote:
> * Andres Freund (and...@anarazel.de) wrote:
>> Heikki has marked the patch as 'ready for commiter' in the commitfest
>> (conditional on his remarks being addressed) and I agree. There seems to
>> be little benefit in waiting further. There *definitely* will be some
>> platform dependant issues, but that won't change by waiting longer.
>
> Agreed.  I won't claim to have done as good a review as others, but I've
> been following along as best I've been able to.

FWIW, I agree that you've already done plenty to ensure that this
works well on common platforms. We cannot reasonably expect you to be
just as sure that there are not problems on obscure platforms ahead of
commit. It isn't unheard of for certain committers (that don't like
Windows, say), to commit things despite having a suspicion that their
patch will turn some fraction of the buildfarm red. To a certain
extent, that's what we have it for.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-25 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> Heikki has marked the patch as 'ready for commiter' in the commitfest
> (conditional on his remarks being addressed) and I agree. There seems to
> be little benefit in waiting further. There *definitely* will be some
> platform dependant issues, but that won't change by waiting longer.

Agreed.  I won't claim to have done as good a review as others, but I've
been following along as best I've been able to.

> I plan to commit this quite soon unless somebody protests really
> quickly.

+1 from me for at least moving forward to see what happens.

> Sorting out the issues on platforms I don't have access to based on
> buildfarm feedback will take a while given how infrequent some of the
> respective animals run... But that's just life.

The buildfarm is another tool (as well as Coverity) that provides a lot
of insight into issues we may not realize ahead of time, but using them
(currently) means we have to eventually go beyond looking at the code
individually and actually commit it.

I'm very excited to see this progress and trust you, Heikki, Robert, and
the others involved in this work to have done an excellent job and to be
ready and prepared to address any issues which come from it, or to pull
it out if it turns out to be necessary (which I seriously doubt, but
still).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] RLS feature has been committed

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 3:07 PM, Stephen Frost  wrote:
> A revert wasn't requested by the individual who raised the concern (or,
> indeed, explicitly by *anyone*..  it was hinted at, but I felt the
> individuals who were hinting at it were leaving it up to that individual
> who had a concern), and I discussed it with him extensively and we came
> to what I believe is an understanding.  I'm not sure that I understand
> the need to bring it up again, unless you have a concern regarding what
> was committed.

FWIW, I think Heikki's proposal to revisit this in the next commitfest
is fine. I don't think that this is something that needs to be pored
over.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS feature has been committed

2014-09-25 Thread Stephen Frost

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 23 September 2014 07:45, Heikki Linnakangas  
> wrote:
> > OTOH, if the patch is actually OK as it was committed, there's no point
> > reverting it only to commit it again later. At the end of the day, the
> > important thing is that the patch gets sufficient review. Clearly Stephen
> > thinks that it did, while you and Andres do not.
> 
> I would observe that not requesting a revert would be inconsistent
> against all other situations I have seen or been involved with.

A revert wasn't requested by the individual who raised the concern (or,
indeed, explicitly by *anyone*..  it was hinted at, but I felt the
individuals who were hinting at it were leaving it up to that individual
who had a concern), and I discussed it with him extensively and we came
to what I believe is an understanding.  I'm not sure that I understand
the need to bring it up again, unless you have a concern regarding what
was committed.

I agree that I jumped the gun on it and said as much.  On the flip side,
it's not had quite a bit of review and there is a promise of more, which
I feel is great for the project as a whole.

> My major reason to revert is the following: the documentation contains
> no examples of real world usage, making the feature essentially
> unusable out of the box. 

I find this to be an interesting argument considering most of our
documentation doesn't include real-world examples.  I'm happy to add
more examples than those listed thus far, certainly, though I've
understood documentation to be of less general importance than code
quality and maintainability- and someone willing and committed to
maintain it.

> I attended Stephen's talk at PostgesOpen and
> even that didn't contain real cases either, leaving me to ask
> questions about stuff I thought I knew. This would be sufficient for
> me to reject commit of any other patch, so it is sufficient reason
> here also. Yeb can supply a useful real world case if there is some
> restriction on explaining what this might be used for.

This wouldn't be the only case of documentation (indeed, *any*
documentation) being added after a commit, and so I'm mystified by this
requirement for *real-world* examples in documentation to be provided
prior to commit.

> Stephen, I want this patch in 9.5 and I would very much like to see it
> go in before Oct 31. But please follow consensus, revert the patch
> now, address the minor issues as requested and then ask for re-commit
> later, as you should have done in the first place.

Simon, if you want me to revert it because of an objection over the
design, code quality, maintainability, or utter lack of documentation,
then I absolutely respect that request and would be happy to do so.

This request I am completely lost on.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] better atomics - v0.6

2014-09-25 Thread Andres Freund

On 2014-09-24 20:27:39 +0200, Andres Freund wrote:
> On 2014-09-24 21:19:06 +0300, Heikki Linnakangas wrote:
> I won't repost a version with it removed, as removing a function as the
> only doesn't seem to warrant reposting it.

I've fixed (thanks Alvaro!) some minor additional issues besides the
removal and addressing earlier comments from Heikki:
* removal of two remaining non-ascii copyright signs. The only non ascii
  name character that remains is Alvaro's name in the commit message.
* additional comments for STATIC_IF_INLINE/STATIC_IF_INLINE_DECLARE
* there was a leftover HAVE_GCC_INT_ATOMICS reference - it's now split
  into different configure tests and thus has a different name. A later
  patch entirely removes that reference, which is why I'd missed that...

Heikki has marked the patch as 'ready for commiter' in the commitfest
(conditional on his remarks being addressed) and I agree. There seems to
be little benefit in waiting further. There *definitely* will be some
platform dependant issues, but that won't change by waiting longer.

I plan to commit this quite soon unless somebody protests really
quickly.

Sorting out the issues on platforms I don't have access to based on
buildfarm feedback will take a while given how infrequent some of the
respective animals run... But that's just life.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From b64d92f1a5602c55ee8b27a7ac474f03b7aee340 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 25 Sep 2014 23:49:05 +0200
Subject: [PATCH] Add a basic atomic ops API abstracting away
 platform/architecture details.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Several upcoming performance/scalability improvements require atomic
operations. This new API avoids the need to splatter compiler and
architecture dependent code over all the locations employing atomic
ops.

For several of the potential usages it'd be problematic to maintain
both, a atomics using implementation and one using spinlocks or
similar. In all likelihood one of the implementations would not get
tested regularly under concurrency. To avoid that scenario the new API
provides a automatic fallback of atomic operations to spinlocks. All
properties of atomic operations are maintained. This fallback -
obviously - isn't as fast as just using atomic ops, but it's not bad
either. For one of the future users the atomics ontop spinlocks
implementation was actually slightly faster than the old purely
spinlock using implementation. That's important because it reduces the
fear of regressing older platforms when improving the scalability for
new ones.

The API, loosely modeled after the C11 atomics support, currently
provides 'atomic flags' and 32 bit unsigned integers. If the platform
efficiently supports atomic 64 bit unsigned integers those are also
provided.

To implement atomics support for a platform/architecture/compiler for
a type of atomics 32bit compare and exchange needs to be
implemented. If available and more efficient native support for flags,
32 bit atomic addition, and corresponding 64 bit operations may also
be provided. Additional useful atomic operations are implemented
generically ontop of these.

The implementation for various versions of gcc, msvc and sun studio have
been tested. Additional existing stub implementations for
* Intel icc
* HUPX acc
* IBM xlc
are included but have never been tested. These will likely require
fixes based on buildfarm and user feedback.

As atomic operations also require barriers for some operations the
existing barrier support has been moved into the atomics code.

Author: Andres Freund with contributions from Oskari Saarenmaa
Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera
Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=vxanvrhp3kess6pn3dmxap...@mail.gmail.com,
20131015123303.gh5...@awork2.anarazel.de,
20131028205522.gi20...@awork2.anarazel.de
---
 config/c-compiler.m4 |  98 +
 configure| 274 ++--
 configure.in |  34 +-
 src/backend/main/main.c  |   1 +
 src/backend/port/Makefile|   2 +-
 src/backend/port/atomics.c   | 127 ++
 src/backend/storage/lmgr/spin.c  |   7 +-
 src/include/c.h  |  21 +-
 src/include/pg_config.h.in   |  24 +-
 src/include/pg_config.h.win32|   3 +
 src/include/pg_config_manual.h   |   8 +
 src/include/port/atomics.h   | 531 +++
 src/include/port/atomics/arch-arm.h  |  25 ++
 src/include/port/atomics/arch-hppa.h |  17 +
 src/include/port/atomics/arch-ia64.h |  26 ++
 src/include/port/atomics/ar

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 2:37 PM, Simon Riggs  wrote:
>> I'd hoped that the commit messages, and my discussion of the feature
>> were adequate.
>
> I'd hoped that my discussion was sufficient to persuade you too, but it 
> wasn't.

I'll write user-visible docs soon, then.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 25 September 2014 22:13, Peter Geoghegan  wrote:

>> All of the above, as a separate committable patch. I hate the fact
>> that you have written no user facing documentation for this feature.
>> How can anyone tell whether the tests you've written are correct or
>> even consistent to a particular definition of correctness?
>
> I'd hoped that the commit messages, and my discussion of the feature
> were adequate.

I'd hoped that my discussion was sufficient to persuade you too, but it wasn't.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS feature has been committed

2014-09-25 Thread Simon Riggs

On 23 September 2014 07:45, Heikki Linnakangas  wrote:
> On 09/23/2014 09:15 AM, Peter Geoghegan wrote:
>>
>> On Mon, Sep 22, 2014 at 8:25 PM, Robert Haas 
>> wrote:

 Should RLS be reverted, and revisited in a future CF?
>>>
>>>
>>> IMHO, that would be entirely appropriate.
>>
>>
>> That seems pretty straightforward, then. I think that it will have to
>> be reverted.
>
>
> OTOH, if the patch is actually OK as it was committed, there's no point
> reverting it only to commit it again later. At the end of the day, the
> important thing is that the patch gets sufficient review. Clearly Stephen
> thinks that it did, while you and Andres do not.

I would observe that not requesting a revert would be inconsistent
against all other situations I have seen or been involved with.

Stephen has acted against explicit requests not to commit. I think
Robert should apply more care about issuing lock instructions, since
the concurrency of our community is important, but nonetheless, an
explicit request was made and that should be honored, even if it does
very quickly get re-committed.

My major reason to revert is the following: the documentation contains
no examples of real world usage, making the feature essentially
unusable out of the box. I attended Stephen's talk at PostgesOpen and
even that didn't contain real cases either, leaving me to ask
questions about stuff I thought I knew. This would be sufficient for
me to reject commit of any other patch, so it is sufficient reason
here also. Yeb can supply a useful real world case if there is some
restriction on explaining what this might be used for.

Stephen, I want this patch in 9.5 and I would very much like to see it
go in before Oct 31. But please follow consensus, revert the patch
now, address the minor issues as requested and then ask for re-commit
later, as you should have done in the first place.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs  wrote:
> A. UPDATE/INSERT privilege infrastructure.
> Add tests to it, make it separately committable, so we can get that done.
> Submit to Oct CF; get that done early.

Makes sense. As long as we assume that we want a unified syntax like
this - that is, that we need something vaguely insert-update or
update-insertish - then we need this. Unfortunately, we cannot add
regression tests for this without almost the full patch set.

> B. Agree command semantics by producing these things
> * Explanatory documentation (Ch6.4 Data Manipulation - Upsert)
> * SQL Reference Documentation (INSERT)
> * Test cases for feature
> * Test cases for concurrency
> * Test cases for pgbench

Okay. I do have stress-tests, that are separately maintained, in case
you missed that:

https://github.com/petergeoghegan/upsert

> All of the above, as a separate committable patch. I hate the fact
> that you have written no user facing documentation for this feature.
> How can anyone tell whether the tests you've written are correct or
> even consistent to a particular definition of correctness?

I'd hoped that the commit messages, and my discussion of the feature
were adequate.

> Submit as patch for review only to Oct 15 CF
> We then agree what is required for further work
> At this stage, poll the Django and Rails communities for acceptance
> and early warning of these features. Listen.

I know an original founder of the Django project quite well - Jacob
Kaplan-Moss (a co-worker - the guy that keynoted pgOpen in its second
year). He is very interested in this effort.

> C. Internal weirdness
> Submit C based upon earlier agreed B, submit to Dec 15 CF, major patch
> deadline, so we can fine tune for last CF.
> Then Heikki rewrites half your patch in a better way, you thank him
> and then we commit. All done.

I don't have a problem with Heikki or anyone else rewriting the value
locking part of the patch, provided it meets my requirements for such
a mechanism. Since Heikki already agreed that that standard should be
imposed, he'd hardly take issue with it now.

However, the fact is that once you actually make something like
promise tuples meet that standard, at the very least it becomes a lot
messier than you'd think. Heikki's final prototype "super deleted"
tuples by setting their xmin to InvalidTransactionId. We weren't sure
that that doesn't break some random other heapam code. Consider this,
for example:

https://github.com/postgres/postgres/blob/REL9_4_STABLE/src/backend/executor/execMain.c#L1961

So that looks safe in the face of setting xmin to InvalidTransactionId
in the way the later prototype patch did if you think about it for a
while, but there are other places where that is less clear. In short,
it becomes something that we have to worry about for ever, because
"xmin cannot change without the tuple in the slot changing" is clearly
an invariant for certain purposes. It might accidentally fail to fail
right now, but I'm not comfortable with it.

Now, I might be convinced that that's actually the way to go. I have
an open mind. But that will take discussion. I like that page
hwlocking is something that many systems do (even including Oracle, I
believe). Making big changes to nbtree is always something that
deserves to be met with skepticism, but it is nice to have an
implementation that lives in the head of AM.

Sorry, I forgot to not talk about locking.

> But we're still discussing SQL semantics. So first things first, then
> loop back around, hoping our design has not been concurrently
> deleted...

I hope the discussion can avoid "unprincipled deadlocks"

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Andres Freund

On 2014-09-26 02:34:06 +0530, Abhijit Menon-Sen wrote:
> At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
> > Unless I miss something this isn't sufficient. We need to fsync the
> > files in the data directory, not just the toplevel directory?
> 
> No, of course you're right. So a separate function that does the moral
> equivalent of "find $PGDATA -exec fsync_fname …"?

Probably will require some care to deal correctly with tablespaces.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Abhijit Menon-Sen

At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
>
>  * Also recovery shouldn't be regarded successful if the reset fails -
>  * e.g. because of ENOSPC.

OK.

> * Doing this in a separate pass is advantageous for performance reasons
> * because it allows the kernel to perform all the flushes at once.

OK.

> Unless I miss something this isn't sufficient. We need to fsync the
> files in the data directory, not just the toplevel directory?

No, of course you're right. So a separate function that does the moral
equivalent of "find $PGDATA -exec fsync_fname …"?

Will resubmit with the additional comments.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Simon Riggs

On 25 September 2014 18:30, Andres Freund  wrote:
> On 2014-09-25 18:18:09 +0100, Simon Riggs wrote:
>> On 25 September 2014 16:29, Andres Freund  wrote:
>> > I think that's not really related. Such a promotion doesn't cause data
>> > loss in the sense of loosing data a *clueful* operator wanted to
>> > keep. Yes, it can be used wrongly, but it's far from alone in that.
>>
>> Yes it does cause data loss. The clueful operator has no idea where
>> they are so there is no "used rightly" in that case.
>
> What? There definitely are cases where you don't need to know that to
> the T. Just think of the - quite frequently happening - need to promote
> a standby to run tests or reporting queries that can't be run on a
> standby.

What do they do with the standby afterwards?

Perhaps for testing, but I'd hope that Business Intelligence is done
by freezing databases at known target times. So at least you can say,
"using a database snapshot of 9am, we had the following results".

We seem to be trying to justify something that is dangerous and will
destroy data for incautious users. Of course it has uses, but thats
not the point, its the danger that is the problem, not the lack of
use. We go to a lot of trouble to avoid footguns elsewhere across many
years, so I can't see why you'd want to have the --footgun option
added here. recovery-target = 'vague'

> Sure, you shouldn't use it if you expect a very specific set of the data
> being there, but that's not always necessary. And that's why it should
> never, ever be the default.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 1:21 PM, Simon Riggs  wrote:
> The test index is unique, so our to-be-inserted value exists on only
> one page, hence page locking applies while we insert it. The next
> person to insert waits for the page lock and then sees the test tuple.
>
> The page lock lasts only for the duration of the insertion of the
> ituple, not for the whole operation.

(by page lock, I take it you mean buffer lock - converting that into a
page hwlock is what I do).

This is where it gets quite complicated. What happens if row locking
on upsert finds a conflict update changing uniquely-constrained
attributes? Sure, a vanilla non-HOT update will fail on inserting a
unique index tuple, but *it* can still cause us a row-level conflict,
and *it* can only fail (with a dup violation) when we commit/abort.
But now we're obligated to wait on it to get the row lock, and it's
obligated to wait on us to get the promise tuple lock, or any other
sort of "value lock" that hasn't already been released when we go to
row lock. Deadlock.

You cannot get away with failing to release the promise tuple/value
lock if you want to maintain useful guarantees.

It doesn't need to be a vanilla non-HOT update.  That's just the
simplest example I can think of.

>> Also, I think there are other cases where we think we're going to
>> insert, so we put the special index tuple in there, but then we decide
>> to update, so we don't need the promise tuple any more, but other
>> sessions are potentially still waiting for our XID to terminate even
>> though there's no conflict any more.  I'm having a hard time bringing
>> the details of those cases to mind ATM, though.
>
> We make the decision to INSERT or UPDATE based upon what we find in
> the test index. If a value if there already, we assume its an UPDATE
> and go to update the row this points to. If it has been deleted we
> loop back and try again/error.

Sure, you can throw an error, and that makes things a lot easier. It
also implies that the implementation is inferior to the subxact
looping pattern, which you've already implied is a thing we must beat
in every way. Frankly, I think it's a cop-out to just throw an error,
and I don't think it'll end up being some theoretical risk. It'll
happen often if it is allowed to happen at all. Allowing it to happen
almost defeats the purpose of the feature - the big appeal of the
feature is that it makes guarantees about the outcome.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 25 September 2014 20:38, Peter Geoghegan  wrote:
> On Thu, Sep 25, 2014 at 7:12 AM, Simon Riggs  wrote:
>> The way forwards, in my view, is to define precisely the behaviour we
>> wish to have. That definition will include the best current mechanism
>> for running an UPSERT using INSERT/UPDATE/loops and comparing that
>> against what is being provided here. We will then have a functional
>> test of equivalence of the approaches, and a basis for making a
>> performance test that shows that performance is increased without any
>> loss of concurrency.
>
> That sounds very reasonable.

So I promise not to discuss locking until we get the first things done.

My suggested approach to get this committed is...

A. UPDATE/INSERT privilege infrastructure.
Add tests to it, make it separately committable, so we can get that done.
Submit to Oct CF; get that done early.

B. Agree command semantics by producing these things
* Explanatory documentation (Ch6.4 Data Manipulation - Upsert)
* SQL Reference Documentation (INSERT)
* Test cases for feature
* Test cases for concurrency
* Test cases for pgbench

All of the above, as a separate committable patch. I hate the fact
that you have written no user facing documentation for this feature.
How can anyone tell whether the tests you've written are correct or
even consistent to a particular definition of correctness?
Submit as patch for review only to Oct 15 CF
We then agree what is required for further work
At this stage, poll the Django and Rails communities for acceptance
and early warning of these features. Listen.

C. Internal weirdness
Submit C based upon earlier agreed B, submit to Dec 15 CF, major patch
deadline, so we can fine tune for last CF.
Then Heikki rewrites half your patch in a better way, you thank him
and then we commit. All done.

> While I'm sure that what I have here can
> decisively beat the xact looping pattern in terms of performance as
> measured by pgbench, the real performance advantage is that this
> approach doesn't burn through XIDs. That was a concern that Andres
> highlighted in relation to using the subxact looping pattern with
> BDR's multi-master replication conflict resolution.

But we're still discussing SQL semantics. So first things first, then
loop back around, hoping our design has not been concurrently
deleted...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-25 Thread Stephen Frost

* Gregory Smith (gregsmithpg...@gmail.com) wrote:
> On 9/25/14, 2:02 PM, Peter Eisentraut wrote:
> >But having the same parameter setting mean different things in
> >different versions is the path to complete madness.
> 
> Could we go so far as to remove support for unitless time settings
> eventually?  The fact that people are setting raw numbers in the
> configuration file and have to know the unit to understand what they
> just did has never been something I like.

I could certainly get behind that idea...  Tho I do understand that
people will complain about backwards compatibility, etc, etc.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Andres Freund

Hi,

On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
> Hi Andres, Robert.
> 
> I've attached four patches here.

Cool. Will review.

> 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
>earlier in StartupXLOG.
> 
> 2. Inside that function, issue fsync()s for the main forks we create by
>copying the _init fork.

These two imo should definitely be backpatched.

> 3. A small fixup to add a const to "typedef char *FileName", because the
>earlier patch gave me warnings about discarding const-ness. This is
>consistent with many other functions in fd.c that take const char *.

I'm happy with consider that one for master (although I seem to recall having 
had
a patch for it rejected?), but I don't think we want to do that in the
backbranches.

> 4. Issue an fsync() on the data directory at startup if we need to
>perform crash recovery.

I personally think this one should be backpatched too. Does anyone
believe differently?

> From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
> From: Abhijit Menon-Sen 
> Date: Wed, 24 Sep 2014 14:43:18 +0530
> Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier
> 
> We need to call this after recovery, but not after the END_OF_RECOVERY
> checkpoint is written. If we crash after that checkpoint, for example
> because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
> the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery
> again at startup.
> 
> Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
> in the first cleanup, this leaves the database with a bunch of _init
> forks for unlogged relations, but no main forks, leading to scary
> errors.
> 
> See thread from 20140912112246.ga4...@alap3.anarazel.de for details.

With a explanation. Shiny!

> ---
>  src/backend/access/transam/xlog.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 46eef5f..218f7fb 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6863,6 +6863,14 @@ StartupXLOG(void)
>   ShutdownWalRcv();
>  
>   /*
> +  * Reset initial contents of unlogged relations.  This has to be done
> +  * AFTER recovery is complete so that any unlogged relations created
> +  * during recovery also get picked up.
> +  */
> + if (InRecovery)
> + ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
> +

+

 * Also recovery shouldn't be regarded successful if the reset fails -
 * e.g. because of ENOSPC.

> +
> + /*
> +  * copy_file() above has already called pg_flush_data() on the
> +  * files it created. Now we need to fsync those files, because
> +  * a checkpoint won't do it for us while we're in recovery.
> +  */

+

* Doing this in a separate pass is advantageous for performance reasons
* because it allows the kernel to perform all the flushes at once.

> + /*
> +  * If we need to perform crash recovery, we issue an fsync on the
> +  * data directory to try to ensure that any data written before the
> +  * crash are flushed to disk. Otherwise a power failure in the near
> +  * future might mean that earlier unflushed writes are lost, but the
> +  * more recent data written to disk from here on are persisted.
> +  */
> +
> + if (ControlFile->state != DB_SHUTDOWNED &&
> + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> + fsync_fname(data_directory, true);
> +
>   if (ControlFile->state == DB_SHUTDOWNED)
>   {
>   /* This is the expected case, so don't be chatty in standalone 
> mode */
> -- 

Unless I miss something this isn't sufficient. We need to fsync the
files in the data directory, not just the toplevel directory?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 3:17 PM, Peter Geoghegan  wrote:
>> To find out how much that optimization buys, you
>> should use tuples with many variable-length columns (say, 50)
>> preceding the text column you're sorting on. I won't be surprised if
>> that turns out to be expensive enough to be worth worrying about, but
>> I have not benchmarked it.
>
> Sorry, but I don't follow. I don't think the pertinent question is if
> it's a noticeable cost. I think the pertinent question is if it's
> worth it. Doing something about it necessitates a lot of extra memory
> access. Not doing something about it hardly affects the amount of
> memory access required, perhaps not at all.

I think you're mincing words.  If you go back and fix datum1, you'll
spend a bunch of effort doing that.If you don't, you'll pay a cost
on every comparison to re-find the relevant column inside each tuple.
You can compare those costs in a variety of cases, including the one I
mentioned, where the latter cost will be relatively high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of GetUserId() Usage

2014-09-25 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
> On 9/24/14 4:58 PM, Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> I think the case for pgstat_get_backend_current_activity() and
> >> pg_stat_get_activity and the other pgstatfuncs.c callers is easy to make
> >> and seems acceptable to me; but I would leave pg_signal_backend out of
> >> that discussion, because it has a potentially harmful side effect.  By
> >> requiring SET ROLE you add an extra layer of protection against
> >> mistakes.  (Hopefully, pg_signal_backend() is not a routine thing for
> >> well-run systems, which means human intervention, and therefore the room
> >> for error isn't insignificant.)
> > 
> > While I certainly understand where you're coming from, I don't really
> > buy into it.  Yes, cancelling a query (the only thing normal users can
> > do anyway- they can't terminate backends) could mean the loss of any
> > in-progress work, but it's not like 'rm' and I don't see that it needs
> > to require extra hoops for individuals to go through.
> 
> It would be weird if it were inconsistent: some things require role
> membership, some things require SET ROLE.  Try explaining that.

I agree..  We already have that distinction, through inherit vs.
noinherit.  I don't think it makes sense to have it also for individual
commands which we feel "might not be as safe".  You could still go
delete all their data w/o a set role if you wanted...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 25 September 2014 19:59, Peter Geoghegan  wrote:
> On Thu, Sep 25, 2014 at 9:20 AM, Robert Haas  wrote:
>> I've never been a fan of putting the index name in there.
>
> Me neither. Although I do understand Kevin's concern about the user's
> intent surrounding which unique index to merge on.

The use case cited is real. My solution of using an after trigger
works yet without adding specific functionality to this command. So we
can achieve what users want without complicating things here.

If we do decide we really want it, lets add it as a later patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 25 September 2014 20:11, Robert Haas  wrote:

>> My approach would be to insert an index tuple for that value into the
>> index, but with the leaf ituple marked with an xid rather than a ctid.
>> If someone tries to insert into the index they would see this and wait
>> for the inserting transaction to end. The inserting transaction would
>> then resolve what happens in the heap (insert/update) and later
>> repoint the index tuple to the inserted/updated row version. I don't
>> see the need for page level locking since it would definitely result
>> in deadlocks (e.g. SQLServer).
>
> I think that something like this might work, but the devil is in the
> details.  Suppose two people try to upsert into the same table at the
> same time.  There's one index.  If the transactions search that index
> for conflicts first, neither sees any conflicting tuples, and both
> proceed.  That's no good.  OK, so suppose each transaction inserts the
> special index tuple which you mention, to lock out concurrent inserts
> of that value, and then searches for already-existing conflicts.  Each
> sees the other's tuple, and they deadlock.  That's no good, either.

The test index is unique, so our to-be-inserted value exists on only
one page, hence page locking applies while we insert it. The next
person to insert waits for the page lock and then sees the test tuple.

The page lock lasts only for the duration of the insertion of the
ituple, not for the whole operation.

> Also, I think there are other cases where we think we're going to
> insert, so we put the special index tuple in there, but then we decide
> to update, so we don't need the promise tuple any more, but other
> sessions are potentially still waiting for our XID to terminate even
> though there's no conflict any more.  I'm having a hard time bringing
> the details of those cases to mind ATM, though.

We make the decision to INSERT or UPDATE based upon what we find in
the test index. If a value if there already, we assume its an UPDATE
and go to update the row this points to. If it has been deleted we
loop back and try again/error. If the value is not present, we insert
the test tuple and progress as an INSERT, then loop back later to set
the ctid. There is no case of "don't need promise id anymore". We
would use the PK, identity or first unique index as the test index.
There is a case where an UPSERT conflicts with an INSERT causing the
latter to abort.

Anyway, this is why we need the design more clearly exposed, so you
can tell me I'm wrong by showing me the URL of it done right.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-25 Thread Gregory Smith


On 9/25/14, 2:02 PM, Peter Eisentraut wrote:
But having the same parameter setting mean different things in 
different versions is the path to complete madness. 


Could we go so far as to remove support for unitless time settings 
eventually?  The fact that people are setting raw numbers in the 
configuration file and have to know the unit to understand what they 
just did has never been something I like.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-09-25 Thread Jeff Janes

On Thu, Sep 25, 2014 at 10:00 AM, Jeff Janes  wrote:

> On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar 
> wrote:
>
>>  On 24 August 2014 11:33, Amit Kapila Wrote
>>
>>
>>
>> Thanks for you comments, i have worked on both the review comment lists,
>> sent on 19 August, and 24 August.
>>
>>
>>
>> Latest patch is attached with the mail..
>>
>
> Hi Dilip,
>
> I think you have an off-by-one error in the index into the array of file
> handles.
>

Actually the problem is that the socket for the master connection was not
getting initialized, see my one line addition here.

   connSlot = (ParallelSlot*)pg_malloc(concurrentCons *
sizeof(ParallelSlot));
   connSlot[0].connection = conn;
+   connSlot[0].sock = PQsocket(conn);

However, I don't think it is good to just ignore errors from the select
call (like the EBADF) and go into a busy loop instead, so there are more
changes needed than this.

Also, cancelling the run (by hitting ctrl-C in the shell that invoked it)
does not seem to work on linux.  I get a message that says "Cancel request
sent", but then it continues to finish the job anyway.

Cheers,

Jeff

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 7:12 AM, Simon Riggs  wrote:
> The way forwards, in my view, is to define precisely the behaviour we
> wish to have. That definition will include the best current mechanism
> for running an UPSERT using INSERT/UPDATE/loops and comparing that
> against what is being provided here. We will then have a functional
> test of equivalence of the approaches, and a basis for making a
> performance test that shows that performance is increased without any
> loss of concurrency.

That sounds very reasonable.  While I'm sure that what I have here can
decisively beat the xact looping pattern in terms of performance as
measured by pgbench, the real performance advantage is that this
approach doesn't burn through XIDs. That was a concern that Andres
highlighted in relation to using the subxact looping pattern with
BDR's multi-master replication conflict resolution.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 12:11 PM, Robert Haas  wrote:
> I think that something like this might work, but the devil is in the
> details.  Suppose two people try to upsert into the same table at the
> same time.  There's one index.  If the transactions search that index
> for conflicts first, neither sees any conflicting tuples, and both
> proceed.  That's no good.  OK, so suppose each transaction inserts the
> special index tuple which you mention, to lock out concurrent inserts
> of that value, and then searches for already-existing conflicts.  Each
> sees the other's tuple, and they deadlock.  That's no good, either.

I'm very glad that you share my concern about deadlocks like this.

> Also, I think there are other cases where we think we're going to
> insert, so we put the special index tuple in there, but then we decide
> to update, so we don't need the promise tuple any more, but other
> sessions are potentially still waiting for our XID to terminate even
> though there's no conflict any more.  I'm having a hard time bringing
> the details of those cases to mind ATM, though.

Well, you might have a promise tuple in a unique index on attributes
not appearing in the UPDATE's targetlist, for one. You have the other
session waiting (doesn't have to be an upserter) just because we
*thought about* inserting a value as part of an upsert. That's pretty
bad.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Alvaro Herrera

Bruce Momjian wrote:

>   3.  9.3 multi-xact bugs spooked us into being more careful

Uh.  Multixact changes in 9.3 were infinitely more invasive than the
jsonb changes will ever be.  a) they touched basic visibility design and 
routines,
which are complex, understood by very few people, and have remained
mostly unchanged for ages; b) they changed on-disk format for an
underlying support structure, requiring pg_upgrade to handle the
conversion; c) they added new catalog infrastructure to keep track of
required freezing; d) they introduced new uint32 counters subject to
wraparound; e) they introduced a novel user of slru.c with 5-char long
filenames; f) they messed with tuple locking protocol and EvalPlanQual
logic for traversing update chains.  Maybe I'm forgetting others.

JSONB has none of these properties.  As far as I can see, the only hairy
issue here (other than getting Josh Berkus to actually test the proposed
patches) is that JSONB is changing on-disk format; but we're avoiding
most issues there by dictating that people with existing JSONB databases
need to pg_dump them, i.e. there is no conversion step being written for
pg_upgrade.

It's good to be careful; it's even better to be more careful.  I too
have learned a lesson there.

Anyway I have no opinion on the JSONB stuff, other than considering that
ignoring performance for large arrays and large objects seems to run
counter to the whole point of JSONB in the first place (and of course
failing to compress is part of that, too.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian

On Thu, Sep 25, 2014 at 09:00:07PM +0200, Andres Freund wrote:
> On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
> > On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> > > BTW, it seems like there is consensus that we ought to reorder the items
> > > in a jsonb object to have keys first and then values, independently of the
> > > other issues under discussion.  This means we *will* be breaking on-disk
> > > compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> > > to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> > > do you have time to crank out a patch for that?
> > 
> > Yes, I can do that easily.  Tell me when you want it --- I just need a
> > catalog version number to trigger on.
> 
> Do you plan to make it conditional on jsonb being used in the database?
> That'd not be bad to reduce the pain for testers that haven't used jsonb.

Yes, I already have code that scans pg_attribute looking for columns
with problematic data types and output them to a file, and then throw an
error.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 11:53 AM, Robert Haas  wrote:
> I haven't looked at that part of the patch in detail yet, so... not
> really.  But I don't see why you'd ever need to restart heap tuple
> copying.  At most you'd need to re-extract datum1 from the tuples you
> have already copied.

Well, okay, technically it's not restarting heap tuple copying, but
it's about the same thing. The point is that you have to stream a
significant chunk of the memtuples array through memory again. Not to
mention, having infrastructure to do that, and pick up where we left
off, which is significantly more code, all to make comparetup_heap() a
bit clearer (i.e. making it so that it won't have to think about an
extra sortsupport state).

> To find out how much that optimization buys, you
> should use tuples with many variable-length columns (say, 50)
> preceding the text column you're sorting on. I won't be surprised if
> that turns out to be expensive enough to be worth worrying about, but
> I have not benchmarked it.

Sorry, but I don't follow. I don't think the pertinent question is if
it's a noticeable cost. I think the pertinent question is if it's
worth it. Doing something about it necessitates a lot of extra memory
access. Not doing something about it hardly affects the amount of
memory access required, perhaps not at all.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 2:17 PM, Simon Riggs  wrote:
>> 1. You don't accept that value locks must be easily released in the
>> event of a conflict. Is anyone in this camp? It's far from obvious to
>> me what side of this question Andres is on at this stage, for example.
>> Robert might have something to say here too.
>>
>> 2. Having taken into account the experience of myself and Heikki, and
>> all that is implied by taking that approach ***while avoiding
>> unprincipled deadlocks***, you continue to believe that an approach
>> based on speculative heap insertion, or some alternative scheme is
>> better than what I have done to the nbtree code here, or you otherwise
>> dislike something about the proposed value locking scheme. You accept
>> that value locks must be released and released easily in the event of
>> a conflict, but like Heikki you just don't like what I've done to get
>> there.
>>
>> Since we can (I believe) talk about the value locking aspect and the
>> rest of the patch independently, we should do so...unless you're in
>> camp 1, in which case I guess that we'll have to thrash it out.
>
> I'm trying to understand and help out with pushing this patch forwards
> to completion.
>
> Basically, I have absolutely no idea whether I object to or agree with
> 1) and don't know where to look to find out. We need a clear
> exposition of design and the alternatives.

I laughed when I read this, because I think a lot of the discussion on
this topic has been unnecessarily muddled by jargon.

> My approach would be to insert an index tuple for that value into the
> index, but with the leaf ituple marked with an xid rather than a ctid.
> If someone tries to insert into the index they would see this and wait
> for the inserting transaction to end. The inserting transaction would
> then resolve what happens in the heap (insert/update) and later
> repoint the index tuple to the inserted/updated row version. I don't
> see the need for page level locking since it would definitely result
> in deadlocks (e.g. SQLServer).

I think that something like this might work, but the devil is in the
details.  Suppose two people try to upsert into the same table at the
same time.  There's one index.  If the transactions search that index
for conflicts first, neither sees any conflicting tuples, and both
proceed.  That's no good.  OK, so suppose each transaction inserts the
special index tuple which you mention, to lock out concurrent inserts
of that value, and then searches for already-existing conflicts.  Each
sees the other's tuple, and they deadlock.  That's no good, either.

Also, I think there are other cases where we think we're going to
insert, so we put the special index tuple in there, but then we decide
to update, so we don't need the promise tuple any more, but other
sessions are potentially still waiting for our XID to terminate even
though there's no conflict any more.  I'm having a hard time bringing
the details of those cases to mind ATM, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund

On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
> On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> > BTW, it seems like there is consensus that we ought to reorder the items
> > in a jsonb object to have keys first and then values, independently of the
> > other issues under discussion.  This means we *will* be breaking on-disk
> > compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> > to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> > do you have time to crank out a patch for that?
> 
> Yes, I can do that easily.  Tell me when you want it --- I just need a
> catalog version number to trigger on.

Do you plan to make it conditional on jsonb being used in the database?
That'd not be bad to reduce the pain for testers that haven't used jsonb.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 9:20 AM, Robert Haas  wrote:
> I've never been a fan of putting the index name in there.

Me neither. Although I do understand Kevin's concern about the user's
intent surrounding which unique index to merge on.

> I agree
> that's stuff that a DML statement shouldn't need to know about.  What
> I've advocated for in the past is specifying the list of columns that
> should be used to determine whether to insert or update.  If you have
> a match on those columns, update the row; else insert.  Any other
> unique indexes stand or fall as may be.
>
> I still think that idea has merit.

As I've said, my problem with that idea is the corner cases. Consider
the possible ambiguity. Could DML queries in production start failing
("ambiguous unique index specification") because the DBA created a new
unique index on attributes that somewhat overlap the attributes of the
unique index that the DML author actually meant? What about the
effects of BEFORE triggers, and their interaction with partial unique
indexes? If you can describe an exact behavior that overcomes these
issues, then I'll give serious consideration to implementing it. As
things stand, to be perfectly honest it sounds like a footgun to me.
There are interactions that make getting it right very ticklish.

I don't want to make a unique index specification mandatory because
that's ugly - that's the only reason, TBH. However, while what you
describe here accomplishes the same thing without being ugly, it is
potentially very surprising. Naming the unique index directly has the
great advantage of very clearly demonstrating user intent.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 2:05 PM, Peter Geoghegan  wrote:
> On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas  wrote:
>> The top issue on my agenda is figuring out a way to get rid of the
>> extra SortSupport object.
>
> Really? I'm surprised. Clearly the need to restart heap tuple copying
> from scratch, in order to make the datum1 representation consistent,
> rather than abandoning datum1 for storing abbreviated keys or pointers
> entirely is a very important aspect of whether or not we should change
> that. In turn, that's something that's going to (probably
> significantly) affect the worst case.
>
> Do you have an opinion on that?

I haven't looked at that part of the patch in detail yet, so... not
really.  But I don't see why you'd ever need to restart heap tuple
copying.  At most you'd need to re-extract datum1 from the tuples you
have already copied.  To find out how much that optimization buys, you
should use tuples with many variable-length columns (say, 50)
preceding the text column you're sorting on. I won't be surprised if
that turns out to be expensive enough to be worth worrying about, but
I have not benchmarked it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian

On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> BTW, it seems like there is consensus that we ought to reorder the items
> in a jsonb object to have keys first and then values, independently of the
> other issues under discussion.  This means we *will* be breaking on-disk
> compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> do you have time to crank out a patch for that?

Yes, I can do that easily.  Tell me when you want it --- I just need a
catalog version number to trigger on.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 7:35 AM, Robert Haas  wrote:
> On Thu, Sep 25, 2014 at 10:12 AM, Simon Riggs  wrote:
>> IMHO it is impossible to know if any of the other code is correct
>> until we have a clear and stable vision of what the command is
>> supposed to perform.
>
> +1.
>
>> The inner workings are less important than what the feature does.
>
> +1.
>
>> FWIW, the row available at the end of all BEFORE triggers is clearly
>> the object we should be manipulating, not the original VALUES()
>> clause. Otherwise this type of INSERT would behave differently from
>> normal INSERTs. Which would likely violate RLS, if nothing else.
>
> +1.

I agree with all of this. I'm glad that my opinion on how a
CONFLICTING() expression interacts with BEFORE triggers is accepted,
too.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane

BTW, it seems like there is consensus that we ought to reorder the items
in a jsonb object to have keys first and then values, independently of the
other issues under discussion.  This means we *will* be breaking on-disk
compatibility with 9.4beta2, which means pg_upgrade will need to be taught
to refuse an upgrade if the database contains any jsonb columns.  Bruce,
do you have time to crank out a patch for that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 11:17 AM, Simon Riggs  wrote:
> Basically, I have absolutely no idea whether I object to or agree with
> 1) and don't know where to look to find out. We need a clear
> exposition of design and the alternatives.
>
> My approach would be to insert an index tuple for that value into the
> index, but with the leaf ituple marked with an xid rather than a ctid.
> If someone tries to insert into the index they would see this and wait
> for the inserting transaction to end. The inserting transaction would
> then resolve what happens in the heap (insert/update) and later
> repoint the index tuple to the inserted/updated row version. I don't
> see the need for page level locking since it would definitely result
> in deadlocks (e.g. SQLServer).

The page level locks are only used to prevent concurrent insertion for
as long as it takes to get consensus to proceed among unique indexes,
and to actually insert a heap tuple. They're all released before we
lock the tuple for update, should we take that path (yes, really).
This is consistent with the behavior of other systems, I think. That's
my whole reason for preferring to do things that way. If you have a
"promise tuples" approach - be it what you outline here, or what
Heikki prototyped with heap tuple insertion, or any other - then you
need a way to *release* those "value locks" in the event of a
conflict/needing to update, before locking/updating. Otherwise, you
get deadlocks. This is an issue I highlighted when it came up with
Heikki's prototype.

AFAICT, any scheme for "value locking" needs to strongly consider the
need to *release* value locks inexpensively. Whatever else they do,
they cannot persist for the duration of the transaction IMV.

Does that make sense? If not, my next suggestion is applying an
earlier revision of Heikki's prototype, and seeing for yourself how it
can be made to deadlock in an unprincipled/impossible to prevent way
[1]. You've quite rightly highlighted the existing subxact looping
pattern as something that this needs to be better than in every way.
This is one important way in which we might fail to live up to that
standard.

[1] http://www.postgresql.org/message-id/52b4aaf0.5090...@vmware.com
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 11:22 AM, Tom Lane wrote:
> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.

OK.  I'll spend some time trying to get Socorro with JSONB working so
that I'll have a second test case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane

Josh Berkus  writes:
> On 09/25/2014 10:26 AM, Andres Freund wrote:
>> On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
>>> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
>>> done yet.

>> http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

> Yeah, and that didn't include some of Tom's bug fixes apparently, per
> the succeeding message.  Which is why I asked Heikki if he was done, to
> which he has not replied.

I took a quick look at the two patches Heikki posted.  I find the
"separate offsets array" approach unappealing.  It takes more space
than the other approaches, and that space will be filled with data
that we already know will not be at all compressible.  Moreover,
AFAICS we'd have to engrave the stride on stone tablets, which as
I already mentioned I'd really like to not do.

The "offsets-and-lengths" patch seems like the approach we ought to
compare to my patch, but it looks pretty unfinished to me: AFAICS it
includes logic to understand offsets sprinkled into a mostly-lengths
array, but no logic that would actually *store* any such offsets,
which means it's going to act just like my patch for performance
purposes.

In the interests of pushing this forward, I will work today on
trying to finish and review Heikki's offsets-and-lengths patch
so that we have something we can do performance testing on.
I doubt that the performance testing will tell us anything we
don't expect, but we should do it anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 28 August 2014 03:43, Peter Geoghegan  wrote:

> "Value locking"
> ===
>
> To date, on-list discussion around UPSERT has almost exclusively
> concerned what I've called "value locking"; the idea of locking values
> in unique indexes in the abstract (to establish the right to insert
> ahead of time). There was some useful discussion on this question
> between myself and Heikki back around December/January. Ultimately, we
> were unable to reach agreement on an approach and discussion tapered
> off. However, Heikki did understand the concerns that informed by
> design. He recognized the need to be able to easily *release* value
> locks, so as to avoid "unprincipled deadlocks", where under high
> concurrency there are deadlocks between sessions that only UPSERT a
> single row at a time. I'm not sure how widely appreciated this point
> is, but I believe that Heikki appreciates it. It is a very important
> point in my opinion. I don't want an implementation that is in any way
> inferior to the "UPSERT looping subxact" pattern does (i.e. the plpsql
> thing that the docs suggest).
>
> When we left off, Heikki continued to favor an approach that involved
> speculatively inserting heap tuples, and then deleting them in the
> event of a conflict. This design was made more complicated when the
> need to *release* value locks became apparent (Heikki ended up making
> some changes to HeapTupleSatisfiesDirty(), as well as sketching a
> design for what you might call a "super delete", where xmin can be set
> to InvalidTransactionId for speculatively-inserted heap tuples). After
> all, it wasn't as if we could abort a subxact to release locks, which
> is what the "UPSERT looping subxact" pattern does. I think it's fair
> to say that that design became more complicated than initially
> anticipated [4] [5].
>
> Anyway, the greater point here is that fundamentally, AFAICT Heikki
> and I were in agreement. Once you buy into the idea that we must avoid
> holding on to "value locks" of whatever form - as Heikki evidently did
> - then exactly what form they take is ultimately only a detail.
> Granted, it's a very important detail, but a detail nonetheless. It
> can be discussed entirely independently of all of this new stuff, and
> thank goodness for that.
>
> If anyone finds my (virtually unchanged) page heavyweight lock based
> value locking approach objectionable, I ask that the criticism be
> framed in a way that makes a sharp distinction between each of the
> following:
>
> 1. You don't accept that value locks must be easily released in the
> event of a conflict. Is anyone in this camp? It's far from obvious to
> me what side of this question Andres is on at this stage, for example.
> Robert might have something to say here too.
>
> 2. Having taken into account the experience of myself and Heikki, and
> all that is implied by taking that approach ***while avoiding
> unprincipled deadlocks***, you continue to believe that an approach
> based on speculative heap insertion, or some alternative scheme is
> better than what I have done to the nbtree code here, or you otherwise
> dislike something about the proposed value locking scheme. You accept
> that value locks must be released and released easily in the event of
> a conflict, but like Heikki you just don't like what I've done to get
> there.
>
> Since we can (I believe) talk about the value locking aspect and the
> rest of the patch independently, we should do so...unless you're in
> camp 1, in which case I guess that we'll have to thrash it out.

I'm trying to understand and help out with pushing this patch forwards
to completion.

Basically, I have absolutely no idea whether I object to or agree with
1) and don't know where to look to find out. We need a clear
exposition of design and the alternatives.

My approach would be to insert an index tuple for that value into the
index, but with the leaf ituple marked with an xid rather than a ctid.
If someone tries to insert into the index they would see this and wait
for the inserting transaction to end. The inserting transaction would
then resolve what happens in the heap (insert/update) and later
repoint the index tuple to the inserted/updated row version. I don't
see the need for page level locking since it would definitely result
in deadlocks (e.g. SQLServer).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of GetUserId() Usage

2014-09-25 Thread Peter Eisentraut

On 9/24/14 4:58 PM, Stephen Frost wrote:
> Alvaro,
> 
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>> I think the case for pgstat_get_backend_current_activity() and
>> pg_stat_get_activity and the other pgstatfuncs.c callers is easy to make
>> and seems acceptable to me; but I would leave pg_signal_backend out of
>> that discussion, because it has a potentially harmful side effect.  By
>> requiring SET ROLE you add an extra layer of protection against
>> mistakes.  (Hopefully, pg_signal_backend() is not a routine thing for
>> well-run systems, which means human intervention, and therefore the room
>> for error isn't insignificant.)
> 
> While I certainly understand where you're coming from, I don't really
> buy into it.  Yes, cancelling a query (the only thing normal users can
> do anyway- they can't terminate backends) could mean the loss of any
> in-progress work, but it's not like 'rm' and I don't see that it needs
> to require extra hoops for individuals to go through.

It would be weird if it were inconsistent: some things require role
membership, some things require SET ROLE.  Try explaining that.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-09-25 Thread Thom Brown

On 12 March 2014 16:29, Heikki Linnakangas  wrote:
> Good point. We have done two major changes to GIN in this release cycle:
> changed the data page format and made it possible to skip items without
> fetching all the keys ("fast scan"). gincostestimate doesn't know about
> either change.

Did this cost estimation issue get fixed?  If not, and if this is due
for 9.5, can this be removed from the 9.4 open items list?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Peter Geoghegan

On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas  wrote:
> The top issue on my agenda is figuring out a way to get rid of the
> extra SortSupport object.

Really? I'm surprised. Clearly the need to restart heap tuple copying
from scratch, in order to make the datum1 representation consistent,
rather than abandoning datum1 for storing abbreviated keys or pointers
entirely is a very important aspect of whether or not we should change
that. In turn, that's something that's going to (probably
significantly) affect the worst case.

Do you have an opinion on that? If you want me to start from scratch,
and then have a consistent datum1 representation, and then be able to
change the structure of comparetup_heap() as you outline (so as to get
rid of the extra SortSupport object), I can do that. My concern is the
regression. The datum1 pointer optimization appears to matter very
little for pass by value types (traditionally, before abbreviated
keys), and so I have a hard time imagining this working out.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-25 Thread Peter Eisentraut

On 9/25/14 11:03 AM, Robert Haas wrote:
> I couldn't agree more.  There's something to be said for just leaving
> this alone.

I agree.

> potentitally draw complaints.  But I also agree with his last one - of
> those three possible complaints, I certainly prefer "I had to fix my
> configuration file for the new, stricter validation" over any variant
> of "my configuration file still worked but it did something
> surprisingly different from what it used to do.".

Yes.  I don't mind that we rename parameters from time to time when
semantics changed or are refined.  But having the same parameter setting
mean different things in different versions is the path to complete madness.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 1:30 PM, Andres Freund  wrote:
>> Yes it does cause data loss. The clueful operator has no idea where
>> they are so there is no "used rightly" in that case.
>
> What? There definitely are cases where you don't need to know that to
> the T. Just think of the - quite frequently happening - need to promote
> a standby to run tests or reporting queries that can't be run on a
> standby.
>
> Sure, you shouldn't use it if you expect a very specific set of the data
> being there, but that's not always necessary.

Very well put.  +1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund

On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
> done yet.

http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund

On 2014-09-25 10:29:51 -0700, Josh Berkus wrote:
> On 09/25/2014 10:26 AM, Andres Freund wrote:
> > On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
> >> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
> >> done yet.
> > 
> > http://www.postgresql.org/message-id/541c242e.3030...@vmware.com
> 
> Yeah, and that didn't include some of Tom's bug fixes apparently, per
> the succeeding message.  Which is why I asked Heikki if he was done, to
> which he has not replied.

Well, Heikki said he doesn't see any fixes in Tom's patch. But either
way, this isn't anything that should prevent you from testing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Andres Freund

On 2014-09-25 18:18:09 +0100, Simon Riggs wrote:
> On 25 September 2014 16:29, Andres Freund  wrote:
> > I think that's not really related. Such a promotion doesn't cause data
> > loss in the sense of loosing data a *clueful* operator wanted to
> > keep. Yes, it can be used wrongly, but it's far from alone in that.
> 
> Yes it does cause data loss. The clueful operator has no idea where
> they are so there is no "used rightly" in that case.

What? There definitely are cases where you don't need to know that to
the T. Just think of the - quite frequently happening - need to promote
a standby to run tests or reporting queries that can't be run on a
standby.

Sure, you shouldn't use it if you expect a very specific set of the data
being there, but that's not always necessary. And that's why it should
never, ever be the default.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 10:26 AM, Andres Freund wrote:
> On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
>> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
>> done yet.
> 
> http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

Yeah, and that didn't include some of Tom's bug fixes apparently, per
the succeeding message.  Which is why I asked Heikki if he was done, to
which he has not replied.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 10:20 AM, Andres Freund wrote:
> On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
>> On 09/25/2014 10:14 AM, Bruce Momjian wrote:
>>> On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
 But independent of which version is chosen, we *REALLY* need to make the
 decision soon. This issue has held up the next beta (like jsonb has
 blocked previous beta) for *weeks*.

 Personally it doesn't make me very happy that Heikki and Tom had to be
 the people stepping up to fix this.
>>>
>>> I think there are a few reasons this has been delayed, aside from the
>>> scheduling ones:
>>>
>>> 1.  compression issues were a surprise, and we are wondering if
>>> there are any other surprises
>>> 2.  pg_upgrade makes future data format changes problematic
>>> 3.  9.3 multi-xact bugs spooked us into being more careful
>>>
>>> I am not sure what we can do to increase our speed based on these items.
>>
>> Alternately, this is delayed because:
>>
>> 1. We have one tested patch to fix the issue.
>>
>> 2. However, people are convinced that there's a better patch possible.
>>
>> 3. But nobody is working on this better patch except "in their spare time".
>>
>> Given this, I once again vote for releasing based on Tom's lengths-only
>> patch, which is done, tested, and ready to go.
> 
> Heikki's patch is there and polished.

If Heikki says it's ready, I'll test.  So far he's said that it wasn't
done yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNN-GiST with recheck

2014-09-25 Thread Alexander Korotkov

On Thu, Sep 25, 2014 at 9:00 PM, Emre Hasegeli  wrote:

> > Fixed, thanks.
>
> Here are my questions and comments about the code.
>
> doc/src/sgml/gist.sgml:812:
> >be rechecked from heap tuple before tuple is returned.  If
> >recheck flag isn't set then it's true by default for
> >compatibility reasons.  The recheck flag can be used
> only
>
> Recheck flag is set to false on gistget.c so I think it should say
> "false by default".  On the other hand, it is true by default on
> the consistent function.  It is written as "the safest assumption"
> on the code comments.  I don't know why the safest is chosen over
> the backwards compatible for the consistent function.
>

Agree. It should be clarified in docs.

src/backend/access/gist/gistget.c:505:
> >   /* Recheck distance from heap tuple if needed */
> >   if (GISTSearchItemIsHeap(*item) &&
> >   searchTreeItemNeedDistanceRecheck(scan,
> so->curTreeItem))
> >   {
> >   searchTreeItemDistanceRecheck(scan,
> so->curTreeItem, item);
> >   continue;
> >   }
>
> Why so->curTreeItem is passed to these functions?  They can use
> scan->opaque->curTreeItem.
>

I didn't get the difference. Few lines before:

GISTScanOpaque so = (GISTScanOpaque) scan->opaque;


> src/backend/access/gist/gistscan.c:49:
> >   /*
> >* When all distance values are the same, items without
> recheck
> >* can be immediately returned.  So they are placed first.
> >*/
> >   if (recheckCmp == 0 && distance_a.recheck !=
> distance_b.recheck)
> >   recheckCmp = distance_a.recheck ? 1 : -1;
>
> I don't understand why items without recheck can be immediately
> returned.  Do you think it will work correctly when there is
> an operator class which will return recheck true and false for
> the items under the same page?
>

Yes, I believe so. Item with recheck can't decrease it's distance, it can
only increase it. In the corner case  item can have same distance after
recheck as it was before. Then anyway items which distances are the same
can be returned in any order.


> src/backend/access/index/indexam.c:258:
> >   /* Prepare data structures for getting original indexed values
> from heap */
> >   scan->indexInfo = BuildIndexInfo(scan->indexRelation);
> >   scan->estate = CreateExecutorState();
> >   scan->slot =
> MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));
>
> With the changes in indexam.c, heap access become legal for all index
> access methods.  I think it is better than the previous version but
> I am leaving the judgement to someone experienced.  I will try to
> summarize the pros and cons of sorting the rows in the GiST access
> method, as far as I understand.
>
> Pros:
>
> * It does not require another queue.  It should be effective to sort
>   the rows inside the queue the GiST access method already has.
> * It does not complicate index access method infrastructure.
>
> Cons:
>
> * It could be done without additional heap access.
> * Other access methods could make use of the sorting infrastructure
>   one day.
> * It could be more transparent to the users.  Sorting information
>   could be shown on the explain output.
>

It would be also nice to show some information about KNN itself.


> * A more suitable data structure like binary heap could be used
>   for the queue to sort the rows.
>

Binary heap seems to be better data structure for whole KNN-GiST. But it's
a subject for a separate patch: replace RB-tree to heap in KNN-GiST. It's
not related to recheck stuff.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund

On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
> On 09/25/2014 10:14 AM, Bruce Momjian wrote:
> > On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
> >> But independent of which version is chosen, we *REALLY* need to make the
> >> decision soon. This issue has held up the next beta (like jsonb has
> >> blocked previous beta) for *weeks*.
> >>
> >> Personally it doesn't make me very happy that Heikki and Tom had to be
> >> the people stepping up to fix this.
> > 
> > I think there are a few reasons this has been delayed, aside from the
> > scheduling ones:
> > 
> > 1.  compression issues were a surprise, and we are wondering if
> > there are any other surprises
> > 2.  pg_upgrade makes future data format changes problematic
> > 3.  9.3 multi-xact bugs spooked us into being more careful
> > 
> > I am not sure what we can do to increase our speed based on these items.
> 
> Alternately, this is delayed because:
> 
> 1. We have one tested patch to fix the issue.
> 
> 2. However, people are convinced that there's a better patch possible.
> 
> 3. But nobody is working on this better patch except "in their spare time".
> 
> Given this, I once again vote for releasing based on Tom's lengths-only
> patch, which is done, tested, and ready to go.

Heikki's patch is there and polished.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 10:14 AM, Bruce Momjian wrote:
> On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
>> But independent of which version is chosen, we *REALLY* need to make the
>> decision soon. This issue has held up the next beta (like jsonb has
>> blocked previous beta) for *weeks*.
>>
>> Personally it doesn't make me very happy that Heikki and Tom had to be
>> the people stepping up to fix this.
> 
> I think there are a few reasons this has been delayed, aside from the
> scheduling ones:
> 
>   1.  compression issues were a surprise, and we are wondering if
>   there are any other surprises
>   2.  pg_upgrade makes future data format changes problematic
>   3.  9.3 multi-xact bugs spooked us into being more careful
> 
> I am not sure what we can do to increase our speed based on these items.

Alternately, this is delayed because:

1. We have one tested patch to fix the issue.

2. However, people are convinced that there's a better patch possible.

3. But nobody is working on this better patch except "in their spare time".

Given this, I once again vote for releasing based on Tom's lengths-only
patch, which is done, tested, and ready to go.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Simon Riggs

On 25 September 2014 16:29, Andres Freund  wrote:

>> > To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> > seems like a friendlier interface than making somebody shut down the
>> > server, run pg_resetxlog, and start it up again.
>>
>> It makes sense to go from paused --> promoted.
>>
>> It doesn't make sense to go from normal running --> promoted, since
>> that is just random data loss.
>
> Why? I don't see what's random in promoting a node in the current state
> *iff* it's currently consistent.
>
> Just imagine something like promoting a current standby to a full node
> because you want to run some tests on it that require writes. There's
> absolutely no need to investigate the current state for that.
>
>> I very much understand the case where
>> somebody is shouting "get the web site up, we are losing business".
>> Implementing a feature that allows people to do exactly what they
>> asked (go live now), but loses business transactions that we thought
>> had been safely recorded is not good. It implements only the exact
>> request, not its actual intention.
>
> That seems to be a problem of massively understanding on the part of the
> user. And I don't see how this is going to be safer by requiring the
> user to first issue a pause reuest.
>
> I think we should attempt to solve this by naming the command
> appropriately. Something like 'abort_replay_and_promote'. Long,
> nontrivial to type, and descriptive.
>
>> Any feature that lumps both cases together is wrongly designed and
>> will cause data loss.
>>
>> We go to a lot of trouble to ensure data is successfully on disk and
>> in WAL. I won't give that up, nor do I want to make it easier to lose
>> data than it already is.
>
> I think that's not really related. Such a promotion doesn't cause data
> loss in the sense of loosing data a *clueful* operator wanted to
> keep. Yes, it can be used wrongly, but it's far from alone in that.

Yes it does cause data loss. The clueful operator has no idea where
they are so there is no "used rightly" in that case.

If I were to give this feature a name it would be --discard or
--random-data-loss, or --reset-hard

The point of pausing is misunderstood. That is close but not quite relevant.

If you are at a known location and request promotion, we can presume
you know what you are doing, so it is simply Promote.

If you are at an unknown location and therefore have clearly not
verified any state before promotion, you are clearly making an
uninformed decision that will likely result in data loss, for which
there is no way of knowing the impact and no mechanism for recovering
from. Trying to promote something while it is still recovering proves
we don't know the state, we're just picking a random LSN.

So if you have a time delayed standby and the master breaks, or there
is a bad transaction then the correct action would be to
* pause the delayed standby
* discover where the master broke, or the xid of the bad transaction
* restart recovery to go up to the correct time/xid/lsn
* promote standby

That is already possible in 9.4

The original patch for pausing contained code to reset the PITR target
with functions, which would make the above even easier.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian

On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
> But independent of which version is chosen, we *REALLY* need to make the
> decision soon. This issue has held up the next beta (like jsonb has
> blocked previous beta) for *weeks*.
> 
> Personally it doesn't make me very happy that Heikki and Tom had to be
> the people stepping up to fix this.

I think there are a few reasons this has been delayed, aside from the
scheduling ones:

1.  compression issues were a surprise, and we are wondering if
there are any other surprises
2.  pg_upgrade makes future data format changes problematic
3.  9.3 multi-xact bugs spooked us into being more careful

I am not sure what we can do to increase our speed based on these items.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-09-25 Thread Jeff Janes

On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar  wrote:

>  On 24 August 2014 11:33, Amit Kapila Wrote
>
>
>
> Thanks for you comments, i have worked on both the review comment lists,
> sent on 19 August, and 24 August.
>
>
>
> Latest patch is attached with the mail..
>

Hi Dilip,

I think you have an off-by-one error in the index into the array of file
handles.

vacuumdb runs at full CPU, and if I run:

strace -ttt -T -f ../parallel_vac/bin/vacuumdb -z -a -j 8

I get the select returning immediately with a bad file descriptor error:

1411663937.641177 select(55, [4 5 6 7 8 9 10 54], NULL, NULL, NULL) = -1
EBADF (Bad file descriptor) <0.12>
1411663937.641232 recvfrom(3, 0x104e3f0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.12>
1411663937.641279 recvfrom(4, 0x1034bc0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.11>
1411663937.641326 recvfrom(5, 0x10017c0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.14>
1411663937.641415 recvfrom(6, 0x10097e0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.20>
1411663937.641487 recvfrom(7, 0x1012330, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.13>
1411663937.641538 recvfrom(8, 0x101af00, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.11>
1411663937.641584 recvfrom(9, 0x1023af0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.12>
1411663937.641631 recvfrom(10, 0x1054600, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.12>

Cheers,

Jeff

Re: [HACKERS] KNN-GiST with recheck

2014-09-25 Thread Emre Hasegeli

> Fixed, thanks.

Here are my questions and comments about the code.

doc/src/sgml/gist.sgml:812:
>be rechecked from heap tuple before tuple is returned.  If
>recheck flag isn't set then it's true by default for
>compatibility reasons.  The recheck flag can be used only

Recheck flag is set to false on gistget.c so I think it should say
"false by default".  On the other hand, it is true by default on
the consistent function.  It is written as "the safest assumption"
on the code comments.  I don't know why the safest is chosen over
the backwards compatible for the consistent function.

src/backend/access/gist/gistget.c:505:
>   /* Recheck distance from heap tuple if needed */
>   if (GISTSearchItemIsHeap(*item) &&
>   searchTreeItemNeedDistanceRecheck(scan, 
> so->curTreeItem))
>   {
>   searchTreeItemDistanceRecheck(scan, 
> so->curTreeItem, item);
>   continue;
>   }

Why so->curTreeItem is passed to these functions?  They can use
scan->opaque->curTreeItem.

src/backend/access/gist/gistscan.c:49:
>   /*
>* When all distance values are the same, items without recheck
>* can be immediately returned.  So they are placed first.
>*/
>   if (recheckCmp == 0 && distance_a.recheck != distance_b.recheck)
>   recheckCmp = distance_a.recheck ? 1 : -1;

I don't understand why items without recheck can be immediately
returned.  Do you think it will work correctly when there is
an operator class which will return recheck true and false for
the items under the same page?

src/backend/access/index/indexam.c:258:
>   /* Prepare data structures for getting original indexed values from 
> heap */
>   scan->indexInfo = BuildIndexInfo(scan->indexRelation);
>   scan->estate = CreateExecutorState();
>   scan->slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));

With the changes in indexam.c, heap access become legal for all index
access methods.  I think it is better than the previous version but
I am leaving the judgement to someone experienced.  I will try to
summarize the pros and cons of sorting the rows in the GiST access
method, as far as I understand.

Pros:

* It does not require another queue.  It should be effective to sort
  the rows inside the queue the GiST access method already has.
* It does not complicate index access method infrastructure.

Cons:

* It could be done without additional heap access.
* Other access methods could make use of the sorting infrastructure
  one day.
* It could be more transparent to the users.  Sorting information
  could be shown on the explain output.
* A more suitable data structure like binary heap could be used
  for the queue to sort the rows.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus

On 09/25/2014 09:01 AM, Andres Freund wrote:
> But independent of which version is chosen, we *REALLY* need to make the
> decision soon. This issue has held up the next beta (like jsonb has
> blocked previous beta) for *weeks*.

Yes, please!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas

On Wed, Sep 24, 2014 at 7:04 PM, Peter Geoghegan  wrote:
> On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan  wrote:
>> Probably not - it appears to make very little difference to
>> unoptimized pass-by-reference types whether or not datum1 can be used
>> (see my simulation of Kevin's worst case, for example [1]). Streaming
>> through a not inconsiderable proportion of memtuples again is probably
>> a lot worse. The datum1 optimization (which is not all that old) made
>> a lot of sense when initially introduced, because it avoided chasing
>> through a pointer for pass-by-value types. I think that's its sole
>> justification, though.
>
> Just to be clear -- I am blocked on this. Do you really prefer to
> restart copying heap tuples from scratch in the event of aborting,
> just to make sure that the datum1 representation is consistently
> either a pointer to text, or an abbreviated key? I don't think that
> the costs involved make that worth it, as I've said, but I'm not sure
> how to resolve that controversy.
>
> I suggest that we focus on making sure the abort logic itself is
> sound. There probably hasn't been enough discussion of that. Once that
> is resolved, we can revisit the question of whether or not copying
> should restart to keep the datum1 representation consistent. I suspect
> that leaving that until later will be easier all around.

The top issue on my agenda is figuring out a way to get rid of the
extra SortSupport object.  I'm not going to commit any version of this
patch that uses a second SortSupport for the tiebreak.  I doubt anyone
else will like that either, but you can try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 11:21 AM, Simon Riggs  wrote:
> On 25 September 2014 15:35, Robert Haas  wrote:
>>> The only problem I see is if the newly inserted row matches one row on
>>> one unique value and a different row on a different unique index.
>>> Turning the INSERT into an UPDATE will still fail on one or other, no
>>> matter which index you pick. If there is one row for ALL unique
>>> indexes then it is irrelevant which index you pick. So either way, I
>>> cannot see a reason to specify an index.
>>
>> Failure could be the right thing in some cases.  For example, imagine
>> that a user has a table containing names, email addresses, and (with
>> apologies for the American-ism, but I don't know what would be
>> comparable elsewhere) social security numbers.  The user has unique
>> indexes on both email addresses and SSNs.  If a new record arrives for
>> the same email address, they want to replace the existing record; but
>> a new record arrives with the same SSN, they want the transaction to
>> fail.  Otherwise, a newly-arrived record might overwrite the email
>> address of an existing record, which they never want to do, because
>> they view email address as the primary key.
>
> I agree with your example, but not your conclusion.
>
> If a new record arrives with a new email address that matches an
> existing record it will fail. There is a case that would be allowed,
> which would be a record that creates an entirely new email address. So
> you do have a point to argue from.
>
> However, IMV enforcing such a restriction should be done with an After
> trigger, which is already possible, not by complicating a DML
> statement with information it shouldn't need to know, or that might
> change in the future.

I've never been a fan of putting the index name in there.  I agree
that's stuff that a DML statement shouldn't need to know about.  What
I've advocated for in the past is specifying the list of columns that
should be used to determine whether to insert or update.  If you have
a match on those columns, update the row; else insert.  Any other
unique indexes stand or fall as may be.

I still think that idea has merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 11:34 AM, Andres Freund  wrote:
> On 2014-09-25 11:13:50 -0400, Robert Haas wrote:
>> On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs  wrote:
>> >> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> >> seems like a friendlier interface than making somebody shut down the
>> >> server, run pg_resetxlog, and start it up again.
>> >
>> > It makes sense to go from paused --> promoted.
>>
>> Well, we could certainly require that people call
>> pg_xlog_replay_pause() before they call pg_promote_now().  I don't
>> think I like that better, but it beats not having pg_promote_now().
>
> FWIW I think it's a mistake to only allow this on a hot standby. This
> should be doable using pg_ctl alone. a) works with wal_level=archive, b)
> sometimes hot_standby=off is a good bit more efficient c) sometimes you
> don't want to allow connections.

Good point.  Also, a pg_ctl command is more friendly to cluster-ware,
of which there is a lot these days.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS Design

2014-09-25 Thread Thom Brown

On 25 September 2014 15:26, Stephen Frost  wrote:
>> I expected this to still trigger an error due to the first policy.  Am
>> I to infer from this that the policy model is permissive rather than
>> restrictive?
>
> That's correct and I believe pretty clear in the documentation- policies
> are OR'd together, just the same as how roles are handled.  As a
> logged-in user, you have the rights of all of the roles you are a member
> of (subject to inheiritance rules, of course), and similairly, you are
> able to view and add all rows which match any policy which applies to
> you (either through role membership or through different policies).

Okay, I see now.  This is a mindset issue for me as I'm looking at
them like constraints rather than permissions.  Thanks for the
explanation.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund

On 2014-09-19 15:40:14 +0300, Heikki Linnakangas wrote:
> On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
> >I'll try to write a more polished patch tomorrow. We'll then see what it
> >looks like, and can decide if we want it.
> 
> Ok, here are two patches. One is a refined version of my earlier patch, and
> the other implements the separate offsets array approach. They are both
> based on Tom's jsonb-lengths-merged.patch, so they include all the
> whitespace fixes etc. he mentioned.
> 
> There is no big difference in terms of code complexity between the patches.
> IMHO the separate offsets array is easier to understand, but it makes for
> more complicated accessor macros to find the beginning of the
> variable-length data.

I personally am pretty clearly in favor of Heikki's version. I think it
could stand to slightly expand the reasoning behind the mixed
length/offset format; it's not immediately obvious why the offsets are
problematic for compression. Otherwise, based on a cursory look, it
looks good.

But independent of which version is chosen, we *REALLY* need to make the
decision soon. This issue has held up the next beta (like jsonb has
blocked previous beta) for *weeks*.

Personally it doesn't make me very happy that Heikki and Tom had to be
the people stepping up to fix this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-09-25 Thread Kevin Grittner

Heikki Linnakangas  wrote:

> You cast the TuplestoreRelation to Plan, and pass it to CopyPlanFields.
> That will crash, because TuplestoreRelation is nothing like a Plan:

Oops.  That's a copy/paste error I should have noticed.  Fixed,
even though the node type might be going away.  Since all of this
seems to be working very well from a user point of view, I'm going
to try to generate a lot more regression tests against the existing
code before taking another run at the API, to make sure that things
don't break in the refactoring.

I didn't hit the copy/out bugs in testing so far -- any suggestions
on a test that would exercise this code?  (I'm probably missing
something obvious.)

Craig Ringer  wrote:

> On 09/15/2014 10:25 PM, Kevin Grittner wrote:
>
>>  I broke out the changes from the previous patch in multiple commits
>>  in my repository on github:
>
> *Thankyou*

> A nice patch series published in a git repo is so much easier to work
> with than a giant squashed patch as an attachment.

I have fixed the bug reported by Heikki; be sure to grab that.

I have been merging in changes to master as I go, so that bit rot
doesn't accumulate, but I don't squash or rebase; hopefully that
style works for you.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Andres Freund

On 2014-09-25 11:13:50 -0400, Robert Haas wrote:
> On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs  wrote:
> >> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> >> seems like a friendlier interface than making somebody shut down the
> >> server, run pg_resetxlog, and start it up again.
> >
> > It makes sense to go from paused --> promoted.
> 
> Well, we could certainly require that people call
> pg_xlog_replay_pause() before they call pg_promote_now().  I don't
> think I like that better, but it beats not having pg_promote_now().

FWIW I think it's a mistake to only allow this on a hot standby. This
should be doable using pg_ctl alone. a) works with wal_level=archive, b)
sometimes hot_standby=off is a good bit more efficient c) sometimes you
don't want to allow connections.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Andres Freund

On 2014-09-24 21:36:50 +0100, Simon Riggs wrote:
> On 18 September 2014 01:22, Robert Haas  wrote:
> 
> >> "fast" promotion was actually a supported option in r8 of Postgres but
> >> this option was removed when we implemented streaming replication in
> >> r9.0
> >>
> >> The *rough* requirement is sane, but that's not the same thing as
> >> saying this exact patch makes sense.
> >
> > Granted.  Fair point.
> >
> >> If you are paused and you can see that WAL up ahead is damaged, then
> >> YES, you do want to avoid applying it. That is possible by setting a
> >> PITR target so that recovery stops at a precise location specified by
> >> you. As an existing option is it better than the blunt force trauma
> >> suggested here.
> >
> > You can pause at a recovery target, but then what if you want to go
> > read/write at that point?  Or what if you've got a time-delayed
> > standby and you want to break replication so that it doesn't replay
> > the DROP TABLE students that somebody ran on the master?  It doesn't
> > have to be that WAL is unreadable or corrupt; it's enough for it to
> > contain changes you wish to avoid replaying.
> >
> >> If you really don't care, just shutdown server, resetxlog and start
> >> her up - again, no need for new option.

I think that should pretty much never be something an admin has to
run. It's just about impossible to get this right. In all likelihood
just running pg_resetxlog on a database in recovery will have corrupted
your database.
Which is why pg_resetxlog won't even let you proceed without using -f
because it checks for DB_SHUTDOWNED. Rightly so.

pg_resetxlog *removes* *all* existing WAL and sets the current control
file state to DB_SHUTDOWNED. Thus there will be no recovery when
starting afterwards.

> > To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> > seems like a friendlier interface than making somebody shut down the
> > server, run pg_resetxlog, and start it up again.
> 
> It makes sense to go from paused --> promoted.
> 
> It doesn't make sense to go from normal running --> promoted, since
> that is just random data loss.

Why? I don't see what's random in promoting a node in the current state
*iff* it's currently consistent.

Just imagine something like promoting a current standby to a full node
because you want to run some tests on it that require writes. There's
absolutely no need to investigate the current state for that.

> I very much understand the case where
> somebody is shouting "get the web site up, we are losing business".
> Implementing a feature that allows people to do exactly what they
> asked (go live now), but loses business transactions that we thought
> had been safely recorded is not good. It implements only the exact
> request, not its actual intention.

That seems to be a problem of massively understanding on the part of the
user. And I don't see how this is going to be safer by requiring the
user to first issue a pause reuest.

I think we should attempt to solve this by naming the command
appropriately. Something like 'abort_replay_and_promote'. Long,
nontrivial to type, and descriptive.

> Any feature that lumps both cases together is wrongly designed and
> will cause data loss.
> 
> We go to a lot of trouble to ensure data is successfully on disk and
> in WAL. I won't give that up, nor do I want to make it easier to lose
> data than it already is.

I think that's not really related. Such a promotion doesn't cause data
loss in the sense of loosing data a *clueful* operator wanted to
keep. Yes, it can be used wrongly, but it's far from alone in that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-25 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Sep 25, 2014 at 12:46 AM, Tom Lane  wrote:
> > TBH I've also been wondering whether any of these proposed cures are
> > better than the disease.
> 
> I couldn't agree more.  There's something to be said for just leaving
> this alone.

I've been coming around to this also.  I had thought earlier that there
was consensus happening, but clearly that's not the case.

> > The changes that can be argued to make the
> > behavior more sane are also ones that introduce backwards compatibility
> > issues of one magnitude or another.
> 
> But on this point I think David Johnston said it best:
> 
> # Any change has the potential to draw complaints.  For you it seems that 
> "hey,
> # I upgraded to 9.5 and my logs are being rotated out every minute now.  I
> # thought I had that turned off" is the desired complaint.  Greg wants: "hey, 
> my
> # 1 hour log rotation is now happening every minute".  If the error message is
> # written correctly most people upon seeing the error will simply fix their
> # configuration and move on - regardless of whether they were proactive in
> # doing so having read the release notes.
> 
> I particularly agree with his first sentence - any change can
> potentitally draw complaints.  But I also agree with his last one - of
> those three possible complaints, I certainly prefer "I had to fix my
> configuration file for the new, stricter validation" over any variant
> of "my configuration file still worked but it did something
> surprisingly different from what it used to do.".

I'll agree with this also (which is why I had suggested moving forward
with the idea that I thought had consensus- keep things the way
they are, but toss an error if we round down a non-zero value to zero).

As with Tom, I'm not against being argued to a different position, such
as rounding up instead of down, but I still don't like the "near-zero
goes to zero" situation we currently have.  I'd be much happier if we'd
pick one or the other and move forward with it, or agree that we can't
reach a consensus and leave well enough alone.  Not entirely sure what
the best way to get to one of the above is, but I don't feel like we're
really making much more progress at this point.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 25 September 2014 15:35, Robert Haas  wrote:

>> The only problem I see is if the newly inserted row matches one row on
>> one unique value and a different row on a different unique index.
>> Turning the INSERT into an UPDATE will still fail on one or other, no
>> matter which index you pick. If there is one row for ALL unique
>> indexes then it is irrelevant which index you pick. So either way, I
>> cannot see a reason to specify an index.
>
> Failure could be the right thing in some cases.  For example, imagine
> that a user has a table containing names, email addresses, and (with
> apologies for the American-ism, but I don't know what would be
> comparable elsewhere) social security numbers.  The user has unique
> indexes on both email addresses and SSNs.  If a new record arrives for
> the same email address, they want to replace the existing record; but
> a new record arrives with the same SSN, they want the transaction to
> fail.  Otherwise, a newly-arrived record might overwrite the email
> address of an existing record, which they never want to do, because
> they view email address as the primary key.

I agree with your example, but not your conclusion.

If a new record arrives with a new email address that matches an
existing record it will fail. There is a case that would be allowed,
which would be a record that creates an entirely new email address. So
you do have a point to argue from.

However, IMV enforcing such a restriction should be done with an After
trigger, which is already possible, not by complicating a DML
statement with information it shouldn't need to know, or that might
change in the future.

Let's keep this new feature as simple as possible. ORMs everywhere
need to be encouraged to implement this and they won't do it unless it
is bone simple to use.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-25 Thread Robert Haas

On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs  wrote:
>> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> seems like a friendlier interface than making somebody shut down the
>> server, run pg_resetxlog, and start it up again.
>
> It makes sense to go from paused --> promoted.

Well, we could certainly require that people call
pg_xlog_replay_pause() before they call pg_promote_now().  I don't
think I like that better, but it beats not having pg_promote_now().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 12:46 AM, Tom Lane  wrote:
> TBH I've also been wondering whether any of these proposed cures are
> better than the disease.

I couldn't agree more.  There's something to be said for just leaving
this alone.

> The changes that can be argued to make the
> behavior more sane are also ones that introduce backwards compatibility
> issues of one magnitude or another.

But on this point I think David Johnston said it best:

# Any change has the potential to draw complaints.  For you it seems that "hey,
# I upgraded to 9.5 and my logs are being rotated out every minute now.  I
# thought I had that turned off" is the desired complaint.  Greg wants: "hey, my
# 1 hour log rotation is now happening every minute".  If the error message is
# written correctly most people upon seeing the error will simply fix their
# configuration and move on - regardless of whether they were proactive in
# doing so having read the release notes.

I particularly agree with his first sentence - any change can
potentitally draw complaints.  But I also agree with his last one - of
those three possible complaints, I certainly prefer "I had to fix my
configuration file for the new, stricter validation" over any variant
of "my configuration file still worked but it did something
surprisingly different from what it used to do.".

YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Andres Freund

On 2014-09-25 09:34:57 -0500, Merlin Moncure wrote:
> On Thu, Sep 25, 2014 at 9:14 AM, Andres Freund  wrote:
> >> Why stop at 128 mapping locks?   Theoretical downsides to having more
> >> mapping locks have been mentioned a few times but has this ever been
> >> measured?  I'm starting to wonder if the # mapping locks should be
> >> dependent on some other value, perhaps the # of shared bufffers...
> >
> > Wrong way round. You need to prove the upside of increasing it further,
> > not the contrary. The primary downside is cache hit ratio and displacing
> > other cache entries...
> 
> I can't do that because I don't have the hardware.

One interesting part of this is making sure it doesn't regress
older/smaller machines. So at least that side you could check...

> what's wrong with trying it out?

If somebody is willing to do it: nothing. I'd just much rather do the,
by now proven, simple change before starting with more complex
solutions.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 10:24 AM, Andres Freund  wrote:
> On 2014-09-25 10:22:47 -0400, Robert Haas wrote:
>> On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund  
>> wrote:
>> > That leads me to wonder: Have you measured different, lower, number of
>> > buffer mapping locks? 128 locks is, if we'd as we should align them
>> > properly, 8KB of memory. Common L1 cache sizes are around 32k...
>>
>> Amit has some results upthread showing 64 being good, but not as good
>> as 128.  I haven't verified that myself, but have no reason to doubt
>> it.
>
> How about you push the spinlock change and I crosscheck the partition
> number on a multi socket x86 machine? Seems worthwile to make sure that
> it doesn't cause problems on x86. I seriously doubt it'll, but ...

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Merlin Moncure

On Thu, Sep 25, 2014 at 9:14 AM, Andres Freund  wrote:
>> Why stop at 128 mapping locks?   Theoretical downsides to having more
>> mapping locks have been mentioned a few times but has this ever been
>> measured?  I'm starting to wonder if the # mapping locks should be
>> dependent on some other value, perhaps the # of shared bufffers...
>
> Wrong way round. You need to prove the upside of increasing it further,
> not the contrary. The primary downside is cache hit ratio and displacing
> other cache entries...

I can't do that because I don't have the hardware.  I wasn't
suggesting to just set it but to measure the affects of setting it.
But the benefits from going from 16 to 128 are pretty significant at
least on this hardware; I'm curious how much further it can be
pushed...what's wrong with trying it out?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Andres Freund

On 2014-09-25 10:09:30 -0400, Robert Haas wrote:
> I think the long-term solution here is that we need a lock-free hash
> table implementation for our buffer mapping tables, because I'm pretty
> sure that just cranking the number of locks up and up is going to
> start to have unpleasant side effects at some point.  We may be able
> to buy a few more years by just cranking it up, though.

I think mid to long term we actually need something else than a
hashtable. Capable of efficiently looking for the existance of
'neighboring' buffers so we can intelligently prefetch far enough that
the read actually completes when we get there. Also I'm pretty sure that
we'll need a way to efficiently remove all buffers for a relfilenode
from shared buffers - linearly scanning for that isn't a good
solution. So I think we need a different data structure.

I've played a bit around with just replacing buf_table.c with a custom
handrolled hashtable because I've seen more than one production workload
where hash_search_with_hash_value() is both cpu and cache miss wise
top#1 of profiles. With most calls coming from the buffer mapping and
then from the lock manager.

There's two reasons for that: a) dynahash just isn't very good and it
does a lot of things that will never be necessary for these hashes. b)
the key into the hash table is *far* too wide. A significant portion of
the time is spent comparing buffer/lock tags.

The aforementioned replacement hash table was a good bit faster for
fully cached workloads - but at the time I wrote I could still make it
crash in very high cache pressure workloads, so that should be taken
with a fair bit of salt.

I think we can comparatively easily get rid of the tablespace in buffer
tags. Getting rid of the database already would be a fair bit harder. I
haven't really managed to get an idea how to remove the fork number
without making the catalog much more complicated.  I don't think we can
go too long without at least some of these steps :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 10:12 AM, Simon Riggs  wrote:
> IMHO it is impossible to know if any of the other code is correct
> until we have a clear and stable vision of what the command is
> supposed to perform.

+1.

> The inner workings are less important than what the feature does.

+1.

> FWIW, the row available at the end of all BEFORE triggers is clearly
> the object we should be manipulating, not the original VALUES()
> clause. Otherwise this type of INSERT would behave differently from
> normal INSERTs. Which would likely violate RLS, if nothing else.

+1.

> Surely if there are multiple unique indexes then the result row must
> be validated against all unique indexes before it is allowed at all?
>
> The only problem I see is if the newly inserted row matches one row on
> one unique value and a different row on a different unique index.
> Turning the INSERT into an UPDATE will still fail on one or other, no
> matter which index you pick. If there is one row for ALL unique
> indexes then it is irrelevant which index you pick. So either way, I
> cannot see a reason to specify an index.

Failure could be the right thing in some cases.  For example, imagine
that a user has a table containing names, email addresses, and (with
apologies for the American-ism, but I don't know what would be
comparable elsewhere) social security numbers.  The user has unique
indexes on both email addresses and SSNs.  If a new record arrives for
the same email address, they want to replace the existing record; but
a new record arrives with the same SSN, they want the transaction to
fail.  Otherwise, a newly-arrived record might overwrite the email
address of an existing record, which they never want to do, because
they view email address as the primary key.

I think this kind of scenario will actually be pretty common.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index scan optimization

2014-09-25 Thread Simon Riggs

On 22 September 2014 14:46, Heikki Linnakangas  wrote:
> On 09/22/2014 04:45 PM, Tom Lane wrote:
>>
>> Heikki Linnakangas  writes:
>>>
>>> On 09/22/2014 07:47 AM, Rajeev rastogi wrote:

 So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
>>
>>
>>> The same happens in a single-column case. If you have a query like
>>> "SELECT * FROM tbl2 where id2 > 'a'", once you've found the start
>>> position of the scan, you know that all the rows that follow match too.
>>
>>
>> ... unless you're doing a backwards scan.
>
>
> Sure. And you have to still check for NULLs. Have to get the details right..

And also that the tests don't use volatile functions.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS Design

2014-09-25 Thread Stephen Frost

Thom,

* Thom Brown (t...@linux.com) wrote:
> I find it a bit of a limitation that I can't specify both INSERT and
> UPDATE for a policy.  I'd want to be able to specify something like
> this:
> 
> CREATE POLICY no_greys_allowed
>   ON colours
>   FOR INSERT, UPDATE
>   WITH CHECK (name NOT IN ('grey','gray'));
> 
> I would expect this to be rather common to prevent certain values
> making their way into a table.  Instead I'd have to create 2 policies
> as it stands.

That's not actually the case...

CREATE POLICY no_greys_allowed
  ON colours
  FOR ALL
  USING (true) -- assuming this is what you intended
  WITH CHECK (name NOT IN ('grey','gray'));

Right?  That said, I'm not against the idea of supporting mulitple
commands with one policy (similar to how ALL is done).  It wouldn't be
difficult or much of a change- make the 'cmd' a bitfield instead.  If
others feel the same then I'll look at doing that.

> In order to debug issues with accessing table data, perhaps it would
> be useful to output the name of the policy that was violated.  If a
> table had 20 policies on, it could become time-consuming to debug.

Good point.  That'll involve a bit more as I'll need to look at the
existing with check options structure, but I believe it's just adding
the field to the structure, populating it when adding the WCO entries,
and then checking for it in the ereport() call.  The policy name is
already stashed in the relcache entry, so it's already pretty easily
available.

> I keep getting tripped up by overlapping policies.  On the one hand, I
> created a policy to ensure rows being added or selected have a
> "visible" column set to true.  On the other hand, I have a policy that
> ensures that the name of a colour doesn't appear in a list.  Policy 1
> is violated until policy 2 is added:
> 
> (using the table I created in a previous post on this thread...)
> 
> # create policy must_be_visible ON colours for all to joe using
> (visible = true) with check (visible = true);
> CREATE POLICY
> 
> \c - joe
> 
> > insert into colours (name, visible) values ('pink',false);
> ERROR:  new row violates WITH CHECK OPTION for "colours"
> DETAIL:  Failing row contains (28, pink, f).
> 
> \c - thom
> 
> # create policy no_greys_allowed on colours for insert with check
> (name not in ('grey','gray'));
> CREATE POLICY
> 
> \c - joe
> 
> # insert into colours (name, visible) values ('pink',false);
> INSERT 0 1
> 
> I expected this to still trigger an error due to the first policy.  Am
> I to infer from this that the policy model is permissive rather than
> restrictive?

That's correct and I believe pretty clear in the documentation- policies
are OR'd together, just the same as how roles are handled.  As a
logged-in user, you have the rights of all of the roles you are a member
of (subject to inheiritance rules, of course), and similairly, you are
able to view and add all rows which match any policy which applies to
you (either through role membership or through different policies).

> I've also attached a few corrections for the docs.

Thanks!  I'll plan to include these with a few other typos and the fix
for the bug that Andres pointed out, once I finish testing (and doing
another CLOBBER_CACHE_ALWAYS run..).

Thanks again,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Andres Freund

On 2014-09-25 10:22:47 -0400, Robert Haas wrote:
> On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund  
> wrote:
> > That leads me to wonder: Have you measured different, lower, number of
> > buffer mapping locks? 128 locks is, if we'd as we should align them
> > properly, 8KB of memory. Common L1 cache sizes are around 32k...
> 
> Amit has some results upthread showing 64 being good, but not as good
> as 128.  I haven't verified that myself, but have no reason to doubt
> it.

How about you push the spinlock change and I crosscheck the partition
number on a multi socket x86 machine? Seems worthwile to make sure that
it doesn't cause problems on x86. I seriously doubt it'll, but ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund  wrote:
> That leads me to wonder: Have you measured different, lower, number of
> buffer mapping locks? 128 locks is, if we'd as we should align them
> properly, 8KB of memory. Common L1 cache sizes are around 32k...

Amit has some results upthread showing 64 being good, but not as good
as 128.  I haven't verified that myself, but have no reason to doubt
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Andres Freund

On 2014-09-25 09:02:25 -0500, Merlin Moncure wrote:
> On Thu, Sep 25, 2014 at 8:51 AM, Robert Haas  wrote:
> > 1. To see the effect of reduce-replacement-locking.patch, compare the
> > first TPS number in each line to the third, or the second to the
> > fourth.  At scale factor 1000, the patch wins in all of the cases with
> > 32 or more clients and exactly half of the cases with 1, 8, or 16
> > clients.  The variations at low client counts are quite small, and the
> > patch isn't expected to do much at low concurrency levels, so that's
> > probably just random variation.  At scale factor 3000, the situation
> > is more complicated.  With only 16 bufmappinglocks, the patch gets its
> > biggest win at 48 clients, and by 96 clients it's actually losing to
> > unpatched master.  But with 128 bufmappinglocks, it wins - often
> > massively - on everything but the single-client test, which is a small
> > loss, hopefully within experimental variation.
> >
> > Comments?
> 
> Why stop at 128 mapping locks?   Theoretical downsides to having more
> mapping locks have been mentioned a few times but has this ever been
> measured?  I'm starting to wonder if the # mapping locks should be
> dependent on some other value, perhaps the # of shared bufffers...

Wrong way round. You need to prove the upside of increasing it further,
not the contrary. The primary downside is cache hit ratio and displacing
other cache entries...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Andres Freund

On 2014-09-25 09:51:17 -0400, Robert Haas wrote:
> On Tue, Sep 23, 2014 at 5:50 PM, Robert Haas  wrote:
> > The patch I attached the first time was just the last commit in the
> > git repository where I wrote the patch, rather than the changes that I
> > made on top of that commit.  So, yes, the results from the previous
> > message are with the patch attached to the follow-up.  I just typed
> > the wrong git command when attempting to extract that patch to attach
> > it to the email.
> 
> Here are some more results.  TL;DR: The patch still looks good, but we
> should raise the number of buffer mapping partitions as well.

>  So I'm inclined to (a) push
> reduce-replacement-locking.patch and then also (b) bump up the number
> of buffer mapping locks to 128 (and increase MAX_SIMUL_LWLOCKS
> accordingly so that pg_buffercache doesn't get unhappy).

I'm happy with that. I don't think it's likely that a moderate increase
in the number of mapping lwlocks will be noticeably bad for any
workload.

One difference is that the total number of lwlock acquirations will be a
bit higher because currently it's more likely for the old and new to
fall into different partitions. But that's not really significant.

The other difference is the number of cachelines touched. Currently, in
concurrent workloads, there's already lots of L1 cache misses around the
buffer mapping locks because they're exclusively owned by a different
core/socket. So, to be effectively worse, the increase would need to
lead to lower overall cache hit rates by them not being in *any* cache
or displacing other content.

That leads me to wonder: Have you measured different, lower, number of
buffer mapping locks? 128 locks is, if we'd as we should align them
properly, 8KB of memory. Common L1 cache sizes are around 32k...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 9:34 AM, Andres Freund  wrote:
> Binaries compiled on solaris using sun studio cc currently don't have
> compiler and memory barriers implemented. That means we fall back to
> relatively slow generic implementations for those. Especially compiler,
> read, write barriers will be much slower than necessary (since they all
> just need to prevent compiler reordering as both sparc and x86 are run
> in TSO mode under solaris).
>
> Since my estimate is that we'll use more and more barriers, that's going
> to hurt more and more.
>
> I do *not* plan to do anything about it atm, I just thought it might be
> helpful to have this stated somewhere searchable.

To put that another way:

If there are any Sun Studio users out there who care about performance
on big iron, please send a patch to fix this...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-25 Thread Simon Riggs

On 28 August 2014 03:43, Peter Geoghegan  wrote:

> The patch currently lacks a way of referencing datums rejected for
> insertion when updating. The way MySQL handles the issue seems
> questionable. They allow you to do something like this:
>
> INSERT INTO upsert (key, val) VALUES (1 'val') ON DUPLICATE KEY UPDATE
> val = VALUES(val);
>
> The implication is that the updated value comes from the INSERT's
> VALUES() list, but emulating that seems like a bad idea. In general,
> at least with Postgres it's entirely possible that values rejected
> differ from the values appearing in the VALUES() list, due to the
> effects of before triggers. I'm not sure whether or not we should
> assume equivalent transformations during any UPDATE before triggers.
>
> This is an open item. I think it makes sense to deal with it a bit later.

IMHO it is impossible to know if any of the other code is correct
until we have a clear and stable vision of what the command is
supposed to perform.

The inner workings are less important than what the feature does.

FWIW, the row available at the end of all BEFORE triggers is clearly
the object we should be manipulating, not the original VALUES()
clause. Otherwise this type of INSERT would behave differently from
normal INSERTs. Which would likely violate RLS, if nothing else.

> As I mentioned, I have incorporated feedback from Kevin Grittner. You
> may specify a unique index to merge on from within the INSERT
> statement, thus avoiding the risk of inadvertently having the update
> affect the wrong tuple due to the user failing to consider that there
> was a would-be unique violation within some other unique index
> constraining some other attribute. You may write the DML statement
> like this:
>
> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
> upsert_pkey UPDATE SET val = 'update';
>
> I think that there is a good chance that at least some people will
> want to make this mandatory. I guess that's fair enough, but I
> *really* don't want to *mandate* that users specify the name of their
> unique index in DML for obvious reasons. Perhaps we can come up with a
> more tasteful syntax that covers all interesting cases (consider the
> issues with partial unique indexes and before triggers for example,
> where a conclusion reached about which index to use during parse
> analysis may subsequently be invalidated by user-defined code, or
> ambiguous specifications in the face of overlapping attributes between
> two unique composite indexes, etc). The Right Thing is far from
> obvious, and there is very little to garner from other systems, since
> SQL MERGE promises essentially nothing about concurrency, both as
> specified by the standard and in practice. You don't need a unique
> index at all, and as I showed in my pgCon talk, there are race
> conditions even for a trivial UPSERT operations in all major SQL MERGE
> implementations.

Surely if there are multiple unique indexes then the result row must
be validated against all unique indexes before it is allowed at all?

The only problem I see is if the newly inserted row matches one row on
one unique value and a different row on a different unique index.
Turning the INSERT into an UPDATE will still fail on one or other, no
matter which index you pick. If there is one row for ALL unique
indexes then it is irrelevant which index you pick. So either way, I
cannot see a reason to specify an index.

If we do need such a construct, we have already the concept of an
IDENTITY for a table, added in 9.4, currently targeted at replication.
Listing indexes or columns in the DML statement is more pushups for
developers and ORMs, so lets KISS.

The way forwards, in my view, is to define precisely the behaviour we
wish to have. That definition will include the best current mechanism
for running an UPSERT using INSERT/UPDATE/loops and comparing that
against what is being provided here. We will then have a functional
test of equivalence of the approaches, and a basis for making a
performance test that shows that performance is increased without any
loss of concurrency.

Once we have that, we can then be certain our time spent on internals
is not wasted by overlooking a simple userland gotcha.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgcrypto: PGP armor headers

2014-09-25 Thread Marko Tiikkaja


On 9/25/14 4:08 PM, Heikki Linnakangas wrote:

On 09/25/2014 04:56 PM, Marko Tiikkaja wrote:

On 9/25/14 3:50 PM, Heikki Linnakangas wrote:

On 09/10/2014 04:35 PM, Marko Tiikkaja wrote:
It might've been a tad more efficient to return
the StringInfo buffer directly from pgp_armor/dearmor, and avoid the
extra palloc and memcpy, but this isn't performance critical enough for
it to really matter.


I couldn't see any way of doing that without breaking the VARDATA
abstraction.  I even went looking for similar cases in the source code,
but couldn't find any.  If you can come up with a way, feel free to
change that -- I'd like to learn myself.


You could first append VARHDRSZ zeros to the StringInfo, then append the
base64-encoded data, and last replace the zeros with the real length,
using SET_VARSIZE.


That's assuming that VARDATA() is at exactly VARHDRSZ bytes.  I couldn't 
find any callers making that assumption.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Robert Haas

On Thu, Sep 25, 2014 at 10:02 AM, Merlin Moncure  wrote:
> On Thu, Sep 25, 2014 at 8:51 AM, Robert Haas  wrote:
>> 1. To see the effect of reduce-replacement-locking.patch, compare the
>> first TPS number in each line to the third, or the second to the
>> fourth.  At scale factor 1000, the patch wins in all of the cases with
>> 32 or more clients and exactly half of the cases with 1, 8, or 16
>> clients.  The variations at low client counts are quite small, and the
>> patch isn't expected to do much at low concurrency levels, so that's
>> probably just random variation.  At scale factor 3000, the situation
>> is more complicated.  With only 16 bufmappinglocks, the patch gets its
>> biggest win at 48 clients, and by 96 clients it's actually losing to
>> unpatched master.  But with 128 bufmappinglocks, it wins - often
>> massively - on everything but the single-client test, which is a small
>> loss, hopefully within experimental variation.
>>
>> Comments?
>
> Why stop at 128 mapping locks?   Theoretical downsides to having more
> mapping locks have been mentioned a few times but has this ever been
> measured?  I'm starting to wonder if the # mapping locks should be
> dependent on some other value, perhaps the # of shared bufffers...

Good question.  My belief is that the number of buffer mapping locks
required to avoid serious contention will be roughly proportional to
the number of hardware threads.  At the time the value 16 was chosen,
there were probably not more than 8-core CPUs in common use; but now
we've got a machine with 64 hardware threads and, what do you know but
it wants 128 locks.

I think the long-term solution here is that we need a lock-free hash
table implementation for our buffer mapping tables, because I'm pretty
sure that just cranking the number of locks up and up is going to
start to have unpleasant side effects at some point.  We may be able
to buy a few more years by just cranking it up, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgcrypto: PGP armor headers

2014-09-25 Thread Heikki Linnakangas


On 09/25/2014 04:56 PM, Marko Tiikkaja wrote:

On 9/25/14 3:50 PM, Heikki Linnakangas wrote:

On 09/10/2014 04:35 PM, Marko Tiikkaja wrote:
It might've been a tad more efficient to return
the StringInfo buffer directly from pgp_armor/dearmor, and avoid the
extra palloc and memcpy, but this isn't performance critical enough for
it to really matter.


I couldn't see any way of doing that without breaking the VARDATA
abstraction.  I even went looking for similar cases in the source code,
but couldn't find any.  If you can come up with a way, feel free to
change that -- I'd like to learn myself.


You could first append VARHDRSZ zeros to the StringInfo, then append the 
base64-encoded data, and last replace the zeros with the real length, 
using SET_VARSIZE.



Are you planning to post the main patch rebased on top of this soon? As
in the next day or two? Otherwise I'll mark this as "Returned with
feedback" for this commitfest.


Yes.  With good luck I'll get you a rebased one today, otherwise it'll
have to wait until tomorrow.


Ok, I'll leave this in Waiting on Author state then.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

2014-09-25 Thread Merlin Moncure

On Thu, Sep 25, 2014 at 8:51 AM, Robert Haas  wrote:
> 1. To see the effect of reduce-replacement-locking.patch, compare the
> first TPS number in each line to the third, or the second to the
> fourth.  At scale factor 1000, the patch wins in all of the cases with
> 32 or more clients and exactly half of the cases with 1, 8, or 16
> clients.  The variations at low client counts are quite small, and the
> patch isn't expected to do much at low concurrency levels, so that's
> probably just random variation.  At scale factor 3000, the situation
> is more complicated.  With only 16 bufmappinglocks, the patch gets its
> biggest win at 48 clients, and by 96 clients it's actually losing to
> unpatched master.  But with 128 bufmappinglocks, it wins - often
> massively - on everything but the single-client test, which is a small
> loss, hopefully within experimental variation.
>
> Comments?

Why stop at 128 mapping locks?   Theoretical downsides to having more
mapping locks have been mentioned a few times but has this ever been
measured?  I'm starting to wonder if the # mapping locks should be
dependent on some other value, perhaps the # of shared bufffers...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-25 Thread Andres Freund

On 2014-09-25 10:56:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > From VS 2013 onwards they're trying hard to be C99 and C11 compatible.
> 
> Sounds great.  Is VS2013 released already?

Yes.

> If so, maybe we can think about moving to C99 in 2016 or so; at least
> assuming you can build for

I think we should simply pick and choose the features we want. And go
for it now. Besidesthe fact that one of them wasn't applied, I'd sent
patches a couple months back that'd allow setting up a buildfarm animal
failing for any patch going above C89 + chosen features.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2014-09-25 Thread Andres Freund

On 2014-09-25 14:43:14 +0100, Simon Riggs wrote:
> On 25 September 2014 10:41, Andres Freund  wrote:
> > On 2014-09-25 10:24:39 +0530, Abhijit Menon-Sen wrote:
> >> At 2014-09-24 11:09:24 +0200, and...@2ndquadrant.com wrote:
> >> > I think it's completely unacceptable to copy a visibility routine.
> >>
> >> OK. Which visibility routine should I use, and should I try to create a
> >> variant that doesn't set hint bits?
> >
> > I've not yet followed your premise that you actually need one that
> > doesn't set hint bits...
> 
> Not least because I'm trying to solve a similar problem on another
> thread, so no need to make a special case here.

That's mostly unrelated though - Abhijit wants to avoid them because he
tried to avoid having *any* form of lock on the buffer. That's the
reason he tried avoid hint bit setting. Since I don't believe that's
safe (at least there's by far not enough evidence about it), there's
simply no reason to avoid it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-25 Thread Alvaro Herrera

Andres Freund wrote:
> On 2014-09-24 17:39:19 -0300, Alvaro Herrera wrote:

> > AFAIK we cannot move all the way to C99, because MSVC doesn't support
> > it.
> 
> FWIW, msvc has supported a good part of C99 for long while. There's bits
> and pieces it doesn't, but it's not things I think we're likely to
> adopt. The most commonly complained about one is C99 variable
> declarations. I can't see PG adopting that tomorrow.

I got unlucky that I got bitten by precisely that missing feature twice,
I guess.

> From VS 2013 onwards they're trying hard to be C99 and C11 compatible.

Sounds great.  Is VS2013 released already?  If so, maybe we can think
about moving to C99 in 2016 or so; at least assuming you can build for
older Windows versions with the new compiler.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 121 matches

Mail list logo