date:20130719

Re: [HACKERS] Adding new joining alghoritm to postgresql

2013-07-19 Thread Amit kapila


On Friday, July 19, 2013 7:17 PM tubadzin wrote:

> Hi. I'm a little confused.   1.I have source code 9.2.4. version from 
> http://www.postgresql.org/ftp/source/   
> 2.I want to add new alghoritm to index nested loops join, merge join and hash 
> join. 
> I have Executor catalog in src catalag containing nodeHash.c, nodeHasjoin.c, 
> nodeMergejoin and nodeNestloop.c   
> 3.After changes, I want to compile postgresql and use it.   
> 4.Problem is: a)I do not know which library is responsible for this 
> functionality. 
> I understand, that I have to compile src and replace library (I don't know 
> which library) in path where Postgresql in installed: C:\Program Files 
> (x86)\PostgreSQL\9.2   

  I think you would need to copy postgres.exe.
  Ideally you need to copy all the libraries that got changed due to your 
source code change.
  In the link below, you can even find how to create installation from source.
 
> b)I don't know how use files/library (which library?) with visual studio 2010 
> and how compile it.   
Find the instructions for how to build on windows at below link:
http://www.postgresql.org/docs/devel/static/install-windows.html

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-19 Thread Alvaro Herrera

Andres Freund escribió:

> On 2013-07-19 08:23:25 -0400, Robert Haas wrote:

> > And I'd also propose getting rid
> > of bgw_sighup and bgw_sigterm in both branches, while we're at it.
> > AFAICT, they don't add any functionality, and they're basically
> > unusable for dynamically started background workers.  Probably better
> > not to get people to used to using them.
> 
> I don't have a problem with getting rid of those, it's easy enough to
> register them inside the worker and it's safe since we start with
> blocked signals. I seem to remember some discussion about why they were
> added but I can't find a reference anymore. Alvaro, do you remember?

I left them there because it was easy; but they were absolutely
necessary only until we decided that we would start the worker's main
function with signals blocked.  I don't think there's any serious reason
not to remove them now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE EVENT TRIGGER syntax

2013-07-19 Thread Dimitri Fontaine

Joe Abbate  writes:
> What is the purpose of the [ AND ... ] at the end of the WHEN clause?
> Is that for later releases, when presumably additional filter_variables
> will be introduced?  Right now, if I add "AND tag IN ..." I get an

Yes. I had other filter variables in some versions of the patch, but
we're yet to agree on a design for the things I wanted to solve with
them.

See http://www.postgresql.org/message-id/m2txrsdzxa@2ndquadrant.fr
for some worked out example of the CONTEXT part of the Event Trigger
proposal.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [ODBC] [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Hiroshi Inoue

(2013/07/20 1:11), Tom Lane wrote:
> Andres Freund  writes:
>> On 2013-07-20 00:49:11 +0900, Hiroshi Inoue wrote:
>>> Using SnapshotSelf instead of SnapshotNow for currtid_ () wouldn't
>>> matter.
> 
>> I think it actually might. You could get into dicey situations if you
>> use currtid_ in a query performing updates or inserts because it would
>> see the to-be-inserted tuple...
> 
> I'm pretty sure Hiroshi-san was only opining about whether it would
> matter for ODBC's usage.  IIUC, ODBC is using this function to re-fetch
> rows that it inserted, updated, or at least selected-for-update in a
> previous command of the current transaction, so actually any snapshot
> would do fine.
> 
> In any case, since I moved the goalposts by suggesting that SnapshotSelf
> is just as dangerous as SnapshotNow, what we need to know is whether
> it'd be all right to change this code to use a fresh MVCC snapshot;
> and if not, why not.  It's pretty hard to see a reason why client-side
> code would want to make use of the results of a non-MVCC snapshot.

OK I agree to replace SnapshotNow for currtid_xx() by a MVCC-snapshot.

regards,
Hiroshi Inoue



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preventing tuple-table leakage in plpgsql

2013-07-19 Thread Tom Lane

Noah Misch  writes:
> On Fri, Jul 05, 2013 at 02:47:06PM -0400, Tom Lane wrote:
>> So I'm inclined to propose that SPI itself should offer some mechanism
>> for cleaning up tuple tables at subtransaction abort.  We could just
>> have it automatically throw away tuple tables made in the current
>> subtransaction, or we could allow callers to exercise some control,
>> perhaps by calling a function that says "don't reclaim this tuple table
>> automatically".  I'm not sure if there's any real use-case for such a
>> call though.

> I suppose that would be as simple as making spi_dest_startup() put the
> tuptabcxt under CurTransactionContext?  The function to prevent reclamation
> would then just do a MemoryContextSetParent().

I experimented with this, and found out that it's not quite that simple.
In a SPI procedure that hasn't created a subtransaction (eg, a plpgsql
function without an exception block), if we attach tuple tables to the
outer transaction's CurTransactionContext then they fail to go away
during SPI_finish(), creating leaks in code that was previously correct.

However, we can use your idea when running inside a subtransaction,
while still attaching the tuple table to the procedure's own procCxt
when no subtransaction is involved.  The attached draft patch does it
that way, and successfully resolves the complained-of leakage case.

I like this better than what I originally had in mind, because it
produces no change in semantics in the case where a SPI procedure
doesn't create any subtransactions, which I imagine is at least 99.44%
of third-party SPI-using code.

Also, because the tuple tables don't go away until
CleanupSubTransaction(), code that explicitly frees them will still work
as long as it does that before rolling back the subxact.  Thus, the
patch's changes to remove SPI_freetuptable() calls in
plpy_cursorobject.c are not actually necessary for correctness, it's
just that we no longer need them.  Ditto for the change in
AtEOSubXact_SPI.  Unfortunately, the change in pl_exec.c *is* necessary
to avoid a coredump, because that call was being made after rolling back
the subxact.

All in all, the risk of breaking anything outside core code seems very
small, and I'm inclined to think that we don't need to provide an API
function to reparent a tuple table.  But having said that, I'm also
inclined to not back-patch this further than 9.3, just in case.

The patch still requires documentation changes, and there's also one
more error-cleanup SPI_freetuptable() call that ought to go away, in
PLy_spi_execute_fetch_result.  But that one needs the fix posited in
http://www.postgresql.org/message-id/21017.1374273...@sss.pgh.pa.us
first.

Comments?

regards, tom lane

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index f30b2cd93330d72e0a412d6517aec08c96cc753e..3f4d1e625d738ad0c08d0245a0169451d29a631d 100644
*** a/src/backend/executor/spi.c
--- b/src/backend/executor/spi.c
*** AtEOSubXact_SPI(bool isCommit, SubTransa
*** 284,291 
  	{
  		/* free Executor memory the same as _SPI_end_call would do */
  		MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);
! 		/* throw away any partially created tuple-table */
! 		SPI_freetuptable(_SPI_current->tuptable);
  		_SPI_current->tuptable = NULL;
  	}
  }
--- 284,291 
  	{
  		/* free Executor memory the same as _SPI_end_call would do */
  		MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);
! 		/* forget any partially created tuple-table */
! 		/* (we don't need to free it, subxact cleanup will do so) */
  		_SPI_current->tuptable = NULL;
  	}
  }
*** void
*** 1641,1648 
  spi_dest_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
  {
  	SPITupleTable *tuptable;
! 	MemoryContext oldcxt;
  	MemoryContext tuptabcxt;

  	/*
  	 * When called by Executor _SPI_curid expected to be equal to
--- 1641,1649 
  spi_dest_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
  {
  	SPITupleTable *tuptable;
! 	MemoryContext parentcxt;
  	MemoryContext tuptabcxt;
+ 	MemoryContext oldcxt;

  	/*
  	 * When called by Executor _SPI_curid expected to be equal to
*** spi_dest_startup(DestReceiver *self, int
*** 1656,1669 
  	if (_SPI_current->tuptable != NULL)
  		elog(ERROR, "improper call to spi_dest_startup");

! 	oldcxt = _SPI_procmem();	/* switch to procedure memory context */

! 	tuptabcxt = AllocSetContextCreate(CurrentMemoryContext,
  	  "SPI TupTable",
  	  ALLOCSET_DEFAULT_MINSIZE,
  	  ALLOCSET_DEFAULT_INITSIZE,
  	  ALLOCSET_DEFAULT_MAXSIZE);
! 	MemoryContextSwitchTo(tuptabcxt);

  	_SPI_current->tuptable = tuptable = (SPITupleTable *)
  		palloc(sizeof(SPITupleTable));
--- 1657,1681 
  	if (_SPI_current->tuptable != NULL)
  		elog(ERROR, "improper call to spi_dest_startup");

! 	/*
! 	 * If we're in the same subtransaction our SPI procedure was called in,
! 	 * attach the tuple tab

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-19 Thread Dimitri Fontaine

Markus Wanner  writes:
>>   - per-installation (not even per-cluster) DSO availability
>> 
>> If you install PostGIS 1.5 on a system, then it's just impossible to
>> bring another cluster (of the same PostgreSQL major version), let
> On Debian, that should be well possible. Certainly installing Postgres
> 9.1 w/ postgis-1.5 in parallel to Postgres 9.2 w/ postgis-2.0 is. I
> designed it to be.
>
> I think I'm misunderstanding the problem statement, here.

(of the same PostgreSQL major version)

> Can CREATE EXTENSION check if the standbys have the extension installed?
> And refuse creation, if they don't?

No, because we don't register standbies so we don't know who they are,
and also because some things that we see connected and using the
replication protocol could well be pg_basebackup or pg_receivexlog.

Also, it's possible that the standby is only there for High Availability
purposes and runs no user query.

> I'm sure you are aware that even without this clear separation of roles,
> the guarantee means we provide an additional level of security against
> attackers.

Given lo_import() to upload a file from the client to the server then
LOAD with the absolute path where the file ended up imported (or any
untrusted PL really), this argument carries no sensible weight in my
opinion.

> None the less, the "safe by default" has served us well, I think.

That's true. We need to consider carefully the proposal at hand though.

It's all about allowing the backend to automatically load a file that it
finds within its own $PGDATA so that we can have per-cluster and
per-database modules (DSO files).

The only difference with before is the location where the file is read
from, and the main security danger comes from the fact that we used to
only consider root-writable places and now propose to consider postgres
bootstrap user writable places.

Having the modules in different places in the system when it's a
template and when it's instanciated allows us to solve a problem I
forgot to list:

  - upgrading an extension at the OS level

Once you've done that, any new backend will load the newer module
(DSO file), so you have to be real quick if installing an hot fix in
production and the SQL definition must be changed to match the new
module version…

With the ability to "instanciate" the module in a per-cluster
per-database directory within $PGDATA the new version of the DSO module
would only put in place and loaded at ALTER EXTENSION UPDATE time.

I'm still ok with allowing to fix those problems only when a security
option that defaults to 'false' has been switched to 'true', by the way,
so that it's an opt-in, but I will admit having a hard time swallowing
the threat model we're talking about…

>> I don't think the relaxed behaviour we're talking about is currently
>> possible to develop as an extension, by the way.
>
> It's extensions that undermine the guarantee, at the moment. But yeah,
> it depends a lot on what kind of "relaxed behavior" you have in mind.

The ability to load a module from another place than the current
Dynamic_library_path is what we're talking about here, and IIUC Robert
mentioned that maybe it could be down to an extension to allow for
changing the behavior. I didn't look at that in details but I would be
surprised to be able to tweak that without having to patch the backend.

> If the sysadmin wants to disallow arbitrary execution of native code to
> postgres (the process), any kind of embedded compiler likely is equally
> unwelcome.

You already mentioned untrusted PL languages, and I don't see any
difference in between offering PL/pythonu and PL/C on security grounds,
really. I don't think we would consider changing the current model of
the "LANGUAGE C" PL, we would add a new "LANGUAGE PL/C" option, either
untrusted or with a nice sandbox.

The problem of sandboxing PL/C is that it makes it impossible to address
such use cases as uuid or PostGIS extensions even if it still allows for
hstore or other datatypes styles of extensions.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Isn't PLy_spi_execute_fetch_result completely broken?

2013-07-19 Thread Tom Lane

This function appears to believe that it can PG_CATCH any random error
and then just carry on, without doing a subtransaction cleanup.  This
is as wrong as can be.  It might be all right if we had sufficiently
narrow constraints on what could happen inside the PG_TRY block, but in
point of fact it's calling datatype input functions, which could do
anything whatsoever.  I think it needs to do PG_RE_THROW() not just
"return NULL" at the end of the CATCH.  It looks like both of the
existing callers have cleanup logic, so this should be sufficient.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax

2013-07-19 Thread Karol Trzcionka

New version:
- fix returning "after" values if there are not "before"
- add more regression tests
I'd like to hear/read any feedback ;)
Regards,
Karol
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 90b9208..eba35f0 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -194,12 +194,27 @@ UPDATE [ ONLY ] table_name [ * ] [
 output_expression
 
  
-  An expression to be computed and returned by the UPDATE
-  command after each row is updated.  The expression can use any
-  column names of the table named by table_name
-  or table(s) listed in FROM.
-  Write * to return all columns.
+  An expression to be computed and returned by the
+  UPDATE command either before or after (prefixed with
+  BEFORE. and AFTER.,
+  respectively) each row is updated.  The expression can use any
+  column names of the table named by table_name or table(s) listed in
+  FROM.  Write AFTER.* to return all 
+  columns after the update. Write BEFORE.* for all
+  columns before the update. Write * to return all
+  columns after update and all triggers fired (these values are in table
+  after command). You may combine BEFORE, AFTER and raw columns in the
+  expression.
  
+ 
+ Mixing table names or aliases named before or after with the
+ above will result in confusion and suffering.  If you happen to
+ have a table called before or
+ after, alias it to something else when using
+ RETURNING.
+ 
+
 

 
@@ -287,15 +302,16 @@ UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT
   
 
   
-   Perform the same operation and return the updated entries:
+   Perform the same operation and return information on the changed entries:
 
 
 UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT
   WHERE city = 'San Francisco' AND date = '2003-07-03'
-  RETURNING temp_lo, temp_hi, prcp;
+  RETURNING temp_lo AS new_low, temp_hi AS new_high, BEFORE.temp_hi/BEFORE.temp_low AS old_ratio, AFTER.temp_hi/AFTER.temp_low AS new_ratio prcp;
 
   
 
+
   
Use the alternative column-list syntax to do the same update:
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d86e9ad..fafd311 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2335,7 +2335,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 TupleTableSlot *
 ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	 ResultRelInfo *relinfo,
-	 ItemPointer tupleid, TupleTableSlot *slot)
+	 ItemPointer tupleid, TupleTableSlot *slot, TupleTableSlot **planSlot)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 	HeapTuple	slottuple = ExecMaterializeSlot(slot);
@@ -2381,6 +2381,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	if (newSlot != NULL)
 	{
 		slot = ExecFilterJunk(relinfo->ri_junkFilter, newSlot);
+		*planSlot = newSlot;
 		slottuple = ExecMaterializeSlot(slot);
 		newtuple = slottuple;
 	}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 15f5dcc..06ebaf3 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -603,7 +603,7 @@ ExecUpdate(ItemPointer tupleid,
 		resultRelInfo->ri_TrigDesc->trig_update_before_row)
 	{
 		slot = ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
-	tupleid, slot);
+	tupleid, slot, &planSlot);
 
 		if (slot == NULL)		/* "do nothing" */
 			return NULL;
@@ -737,6 +737,7 @@ lreplace:;
 		   hufd.xmax);
 	if (!TupIsNull(epqslot))
 	{
+		planSlot = epqslot;
 		*tupleid = hufd.ctid;
 		slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
 		tuple = ExecMaterializeSlot(slot);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a896d76..409b4d1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1928,6 +1928,7 @@ range_table_walker(List *rtable,
 		{
 			case RTE_RELATION:
 			case RTE_CTE:
+			case RTE_BEFORE:
 /* nothing to do */
 break;
 			case RTE_SUBQUERY:
@@ -2642,6 +2643,7 @@ range_table_mutator(List *rtable,
 		{
 			case RTE_RELATION:
 			case RTE_CTE:
+			case RTE_BEFORE:
 /* we don't bother to copy eref, aliases, etc; OK? */
 break;
 			case RTE_SUBQUERY:
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 48cd9dc..79b03af 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2366,6 +2366,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 	switch (node->rtekind)
 	{
 		case RTE_RELATION:
+		case RTE_BEFORE:
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
 			break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index dc9cb3e..2ca29da 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/read

[HACKERS] CREATE EVENT TRIGGER syntax

2013-07-19 Thread Joe Abbate

Hello,

What is the purpose of the [ AND ... ] at the end of the WHEN clause?
Is that for later releases, when presumably additional filter_variables
will be introduced?  Right now, if I add "AND tag IN ..." I get an

ERROR:  filter variable "tag" specified more than once

Joe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FKey not enforced resulting in broken Dump/Reload

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 2:15 PM, Robert Haas  wrote:
> On Fri, Jul 19, 2013 at 1:45 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Fri, Jul 19, 2013 at 12:58 PM, Rod Taylor  wrote:
 A poorly coded trigger on the referencing table has the ability to break
 foreign keys, and as a result create a database which cannot be dumped and
 reloaded.
>>
>>> This is a known limitation of our foreign key machinery.  It might
>>> well be susceptible to improvement, but I wouldn't count on anyone
>>> rewriting it in the near future.
>>
>> If we failed to fire triggers on foreign-key actions, that would not be
>> an improvement.  And trying to circumscribe the trigger's behavior so
>> that it couldn't break the FK would be (a) quite expensive, and
>> (b) subject to the halting problem, unless perhaps you circumscribed
>> it so narrowly as to break a lot of useful trigger behaviors.  Thus,
>> there's basically no alternative that's better than "so don't do that".
>
> I think a lot of people would be happier if foreign keys were always
> checked after all regular triggers and couldn't be disabled.   But,
> eh, that's not how it works.

Please ignore this fuzzy thinking.  You're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-19 Thread John Galloway

"some Salesforce folks" that would be me!   It looks like I didn't quite
communicate to Tom just what I was looking for as I do indeed want to have
a variable number of "any" types, as:

CREATE AGGREGATE FOO ( ANYELEMENT, , VARIADIC "any") (
...
STYPE = ANYARRAY
...)
so the corresponding transition function would be
CREATE FUNCTION FOO_sfunc( ANYARRAY, ANYELEMENT, , VARIADIC
"any") RETURNS ANYARRAY
and the final func is
CREATE FUNCTION FOO_ffunc( ANYARRAY ) RETURNS ANYELEMENT

The functions are in C, and I cheat and actually use the ANYARRAY
transition variable as a struct just keeping the varlena length correct
(thanks to Tom for that idea).  Currently I just  support a fixed number of
"any" args but really need to have that be variable.

So supporting VARIADIC "any" for user defined aggregates would be most
useful.

On Thu, Jul 18, 2013 at 7:09 PM, Tom Lane  wrote:

> Noah Misch  writes:
> > (I don't know whether VARIADIC transition functions work today, but that
> would
> > become an orthogonal project.)
>
> Coincidentally enough, some Salesforce folk were asking me about allowing
> VARIADIC aggregates just a few days ago.  I experimented enough to find
> out that if you make an array-accepting transition function, and then
> force the aggregate's pg_proc entry to look like it's variadic (by
> manually setting provariadic and some other fields), then everything
> seems to Just Work: the parser and executor are both fine with it.
> So I think all that's needed here is to add some syntax support to
> CREATE AGGREGATE, and probably make some tweaks in pg_dump.  I was
> planning to go work on that sometime soon.
>
> Having said that, though, what Andrew seemed to want was VARIADIC ANY,
> which is a *completely* different kettle of fish, since the actual
> parameters can't be converted to an array.  I'm not sure if that's
> as easy to support.
>
> regards, tom lane
>
>

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-19 Thread Greg Stark

On Wed, Jun 26, 2013 at 2:47 AM, Tom Lane  wrote:
> Some of the rest of us would like to hear those reasons, because my
> immediate reaction is that the patch must be broken by design.  WITH
> ORDINALITY should not be needing to mess with the fundamental evaluation
> semantics of SRFs, but it sure sounds like it is doing so if that case
> doesn't work as expected.

As Dan pointed out later the patch is not affecting whether this case
works as expected. Only that since you can only use WITH ORDINALITY on
SRFs in the FROM list and not in the target list if you want to use it
you have to rewrite this query to put the SRFs in the FROM list.

I think we're ok with that since SRFs in the target list are something
that already work kind of strangely and probably we would rather get
rid of rather than work to extend. It would be hard to work ordinality
into the grammar in the middle of the target list let alone figure out
how to implement it.

My only reservation with this patch is whether the WITH_ORDINALITY
parser hack is the way we want to go. The precedent was already set
with WITH TIME ZONE though and I think this was the best option. The
worst failure I can come up with is this which doesn't seem like a
huge problem:

postgres=# with o as (select 1 ) select * from o;
 ?column?
--
1
(1 row)

postgres=# with ordinality as (select 1 ) select * from ordinality;
ERROR:  syntax error at or near "ordinality"
LINE 1: with ordinality as (select 1 ) select * from ordinality;
 ^

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal error after starting postgres : sys identifiers must be different

2013-07-19 Thread Indrajit Roychoudhury

*I am not the best specialist about logical replication, but as it
looks to be a requirement to have 2 nodes with different system
identifiers, you shouldn't link 1 node to another node whose data
folder has been created from the base backup of the former (for
example create the 2nd node based on a base backup of the 1st node).
The error returned here would mean so.*
**
*- *I didn't create the /data folders from any backup for both the nodes;
they were created fresh by initdb individually. Is there anything specific
that I need to do apart from initdb?

*pgcontrol_data outputs different database system ids for the two nodes. So
> don't understand why it says identifiers are same.
Are you sure? Please re-ckeck.*
**
*- *I re-verified and they are different. Pasting the snapshots below-

irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$
./pg_controldata ./data2/
pg_control version number:937
Catalog version number:   201305061
Database system identifier:   *5856368744762683487*
Database cluster state:   in production


irc2@ubuntuirc2:~/bdr/postgres-bin/bin$ ./pg_controldata ./data/
pg_control version number:937
Catalog version number:   201305061
Database system identifier:   *5901663435516996514*
Database cluster state:   in production


Also, am pasting the below snapshot to let you know that postgres doesn't
seem to be the su; in case this problem is due to that.

irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./psql testdb2
psql (9.3beta1)
Type "help" for help.
testdb2=# \du
 List of roles
 Role name |   Attributes   | Member of
---++---
 irc1  | Superuser, Create role, Create DB, Replication | {}

irc2@ubuntuirc2:~/bdr/postgres-bin/bin$ ./psql testdb2
psql (9.3beta1)
Type "help" for help.
testdb2=# \du
 List of roles
 Role name |   Attributes   | Member of
---++---
 irc2  | Superuser, Create role, Create DB, Replication | {}


Please let me know which mailing list will be more appropriate to raise my
concern regarding this postgres dev flavor.
I am yet to receive any response from Andres.

Thanks.


On Thu, Jul 18, 2013 at 3:31 PM, Michael Paquier
wrote:

> On Fri, Jul 19, 2013 at 5:02 AM, Indrajit Roychoudhury
>  wrote:
> > Could you please let me know what does the error "system identifiers are
> > same" mean? Below is the snapshot from one of the masters.
> > I am setting up BDR as per the wiki
> >   http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
> > and source @
> >  git://git.postgresql.org/git/users/andresfreund/postgres.git
> >
> > irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./postgres
> -D
> > ~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
> > LOG:  bgworkers, connection: dbname=testdb2
> > LOG:  option: dbname, val: testdb2
> > LOG:  registering background worker: bdr apply: ubuntuirc2
> > LOG:  loaded library "bdr"
> > LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
> > LOG:  doing logical startup from 0/17B6410
> > LOG:  starting up replication identifier with ckpt at 0/17B6410
> > LOG:  autovacuum launcher started
> > LOG:  starting background worker process "bdr apply: ubuntuirc2"
> > LOG:  database system is ready to accept connections
> > LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote dbname=testdb2
> > replication=true fallback_application_name=bdr
> > FATAL:  system identifiers must differ between the nodes
> > DETAIL:  Both system identifiers are 5856368744762683487.
> I am not the best specialist about logical replication, but as it
> looks to be a requirement to have 2 nodes with different system
> identifiers, you shouldn't link 1 node to another node whose data
> folder has been created from the base backup of the former (for
> example create the 2nd node based on a base backup of the 1st node).
> The error returned here would mean so.
>
> > LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with exit
> > code 1
> > ^CLOG:  received fast shutdown request
> > LOG:  aborting any active transactions
> > LOG:  autovacuum launcher shutting down
> > LOG:  shutting down
> >
> > pgcontrol_data outputs different database system ids for the two nodes.
> So
> > don't understand why it says identifiers are same.
> Are you sure? Please re-ckeck.
>
> > postgresql.conf content in one of the masters is like this-
> >
> > /
> > shared_preload_libraries = 'bdr'
> > bdr.connections = 'ubuntuirc2'
> > bdr.ubuntuirc2.dsn = 'dbname=testdb2'
> > /
> >
> > Two nodes are ubuntuirc and ubuntuirc2. Above is the output of the
> > postgresql.conf from ubuntuirc.
> If this behavior is confirmed, I think that this bug should be
> reported di

Re: [HACKERS] FKey not enforced resulting in broken Dump/Reload

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 1:45 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jul 19, 2013 at 12:58 PM, Rod Taylor  wrote:
>>> A poorly coded trigger on the referencing table has the ability to break
>>> foreign keys, and as a result create a database which cannot be dumped and
>>> reloaded.
>
>> This is a known limitation of our foreign key machinery.  It might
>> well be susceptible to improvement, but I wouldn't count on anyone
>> rewriting it in the near future.
>
> If we failed to fire triggers on foreign-key actions, that would not be
> an improvement.  And trying to circumscribe the trigger's behavior so
> that it couldn't break the FK would be (a) quite expensive, and
> (b) subject to the halting problem, unless perhaps you circumscribed
> it so narrowly as to break a lot of useful trigger behaviors.  Thus,
> there's basically no alternative that's better than "so don't do that".

I think a lot of people would be happier if foreign keys were always
checked after all regular triggers and couldn't be disabled.   But,
eh, that's not how it works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal error after starting postgres : sys identifiers must be different

2013-07-19 Thread Indrajit Roychoudhury

One more change was required to add both the users in each node's db as
super users and replication started!!

Thanks.


On Thu, Jul 18, 2013 at 5:35 PM, Andres Freund  wrote:

> Hi!
>
> On 2013-07-19 07:31:07 +0900, Michael Paquier wrote:
> > If this behavior is confirmed, I think that this bug should be
> > reported directly to Andres and not this mailing list, just because
> > logical replication is not integrated into Postgres core as of now.
>
> I think I agree, although I don't know where to report it, but to me
> personally, so far. Hackers seems to be the wrong crowd for it, given
> most of the people on it haven't even heard of bdr, much less read its
> code.
> We're definitely planning to propose it for community inclusion in some
> form, but there are several rather large preliminary steps (like getting
> changeset extraction in) that need to be done first.
>
> Not sure what's best.
>
> On 2013-07-19 07:31:07 +0900, Michael Paquier wrote:
> > On Fri, Jul 19, 2013 at 5:02 AM, Indrajit Roychoudhury
> >  wrote:
> > > Could you please let me know what does the error "system identifiers
> are
> > > same" mean? Below is the snapshot from one of the masters.
> > > I am setting up BDR as per the wiki
> > >   http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
> > > and source @
> > >  git://git.postgresql.org/git/users/andresfreund/postgres.git
> > >
> > > irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$
> ./postgres -D
> > > ~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
> > > LOG:  bgworkers, connection: dbname=testdb2
> > > LOG:  option: dbname, val: testdb2
> > > LOG:  registering background worker: bdr apply: ubuntuirc2
> > > LOG:  loaded library "bdr"
> > > LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
> > > LOG:  doing logical startup from 0/17B6410
> > > LOG:  starting up replication identifier with ckpt at 0/17B6410
> > > LOG:  autovacuum launcher started
> > > LOG:  starting background worker process "bdr apply: ubuntuirc2"
> > > LOG:  database system is ready to accept connections
> > > LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote
> dbname=testdb2
> > > replication=true fallback_application_name=bdr
> > > FATAL:  system identifiers must differ between the nodes
> > > DETAIL:  Both system identifiers are 5856368744762683487.
>
> > I am not the best specialist about logical replication, but as it
> > looks to be a requirement to have 2 nodes with different system
> > identifiers, you shouldn't link 1 node to another node whose data
> > folder has been created from the base backup of the former (for
> > example create the 2nd node based on a base backup of the 1st node).
> > The error returned here would mean so.
>
> Yes, that's correct.
>
> > > LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with
> exit
> > > code 1
> > > ^CLOG:  received fast shutdown request
> > > LOG:  aborting any active transactions
> > > LOG:  autovacuum launcher shutting down
> > > LOG:  shutting down
> > >
> > > pgcontrol_data outputs different database system ids for the two
> nodes. So
> > > don't understand why it says identifiers are same.
> > Are you sure? Please re-ckeck.
> >
> > > postgresql.conf content in one of the masters is like this-
> > >
> > > /
> > > shared_preload_libraries = 'bdr'
> > > bdr.connections = 'ubuntuirc2'
> > > bdr.ubuntuirc2.dsn = 'dbname=testdb2'
> > > /
> > >
> > > Two nodes are ubuntuirc and ubuntuirc2. Above is the output of the
> > > postgresql.conf from ubuntuirc.
>
> The problem seems to be that your dsn doesn't include a hostname but
> only a database name. So, what probably happens is both your hosts try
> to connect to themselves since connecting to the local host is the
> default when no hostname is specified. Which is one of the primary
> reasons the requirement for differing system identifiers exist...
>
> Greetings,
>
> Andres Freund
>

Re: [HACKERS] FKey not enforced resulting in broken Dump/Reload

2013-07-19 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jul 19, 2013 at 12:58 PM, Rod Taylor  wrote:
>> A poorly coded trigger on the referencing table has the ability to break
>> foreign keys, and as a result create a database which cannot be dumped and
>> reloaded.

> This is a known limitation of our foreign key machinery.  It might
> well be susceptible to improvement, but I wouldn't count on anyone
> rewriting it in the near future.

If we failed to fire triggers on foreign-key actions, that would not be
an improvement.  And trying to circumscribe the trigger's behavior so
that it couldn't break the FK would be (a) quite expensive, and
(b) subject to the halting problem, unless perhaps you circumscribed
it so narrowly as to break a lot of useful trigger behaviors.  Thus,
there's basically no alternative that's better than "so don't do that".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign Tables as Partitions

2013-07-19 Thread David Fetter

On Fri, Jul 19, 2013 at 11:41:14AM -0400, Robert Haas wrote:
> On Thu, Jul 18, 2013 at 8:56 PM, David Fetter  wrote:
> > Please find attached a PoC patch to implement $subject.
> >
> > So far, with a lot of help from Andrew Gierth, I've roughed out an
> > implementation which allows you to ALTER FOREIGN TABLE so it inherits
> > a local table.
> >
> > TBD: CREATE FOREIGN TABLE ... INHERITS ..., docs, regression testing,
> > etc., etc.
> 
> I think this could be useful, but it's going to take more than a
> three-line patch.

Of course!

> Removing code that prohibits things is easy; what's hard is
> verifying that nothing else breaks as a result.  And I bet it does.

It may well.  I see our relations eventually being as equivalent as
their definitions permit (views, materialized or not, foreign tables,
etc.), and this as work toward that.  Yes, it's normative, and we may
never fully get there, but I'm pretty sure it's worth working towards.

> This functionality was actually present in the original submission
> for foreign tables.  I ripped it out before commit because I didn't
> think all of the interactions with other commands had been
> adequately considered.  But I think they did consider it to some
> degree, which this patch does not.

A ref to the patch as submitted & patch as applied would help a lot :)

Were there any particular things you managed to break with the patch
as submitted?  Places you thought would be soft but didn't poke at?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] AGG_PLAIN thinks sorts are free

2013-07-19 Thread Tom Lane

Jeff Janes  writes:
> On Thu, Jul 18, 2013 at 8:04 PM, Tom Lane  wrote:
>> DISTINCT (and also ORDER BY) properties of aggregates are implemented
>> at runtime; the planner doesn't really do anything about them, except
>> suppress the choice it might otherwise make of using hashed aggregation.

> Couldn't a hash aggregate be superior to a sort one (for the distinct,
> not the order by)?

If it worked at all, it might be superior, but since it doesn't, it ain't.

This isn't really a matter of lack of round tuits, but a deliberate
judgment that it's probably not worth the trouble.  Per the comment in
choose_hashed_grouping:

/*
 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
 * aggregates.(Doing so would imply storing *all* the input values in
 * the hash table, and/or running many sorts in parallel, either of which
 * seems like a certain loser.)
 */

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] AGG_PLAIN thinks sorts are free

2013-07-19 Thread Jeff Janes

On Thu, Jul 18, 2013 at 8:04 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> AGG_PLAIN sometimes does sorts, but it thinks they are free.  Also, under
>> explain analyze it does not explicitly report whether the sort was external
>> or not, nor report the disk or memory usage, the way other sorts do.  I
>> don't know if those two things are related or not.
>
> DISTINCT (and also ORDER BY) properties of aggregates are implemented
> at runtime; the planner doesn't really do anything about them, except
> suppress the choice it might otherwise make of using hashed aggregation.
> Since the behavior is entirely local to the Agg plan node, it's also
> not visible to the EXPLAIN ANALYZE machinery.

Couldn't a hash aggregate be superior to a sort one (for the distinct,
not the order by)?

> Arguably we should have the planner add on some cost factor for such
> aggregates, but that would have no effect whatever on the current level
> of plan, and could only be useful if this was a subquery whose cost
> would affect choices in an outer query level.  Which is a case that's
> pretty few and far between AFAIK (do you have a real-world example where
> it matters?).

Not that I know of.  It is mainly an analytical headache.  I'm trying
to figure out why the planner makes the choices it does on more
complex queries, but one of the component queries I'm trying to build
it up from suddenly falls into this plan, where I can't see the
estimated costs and can't use "set enable_*"  to shift it away from
that into a more transparent one.


Thanks for the explanation.

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Alvaro Herrera

Robert Haas escribió:

> 4. If we use GetActiveSnapshot, all the comments about about a fresh
> MVCC snapshot still apply.  However, the snapshot in question could be
> even more stale, especially in repeatable read or serializable mode.
> However, this might be thought a more consistent behavior than what we
> have now.  And I'm guessing that this function is typically run as its
> own transaction, so in practice this doesn't seem much different from
> an MVCC snapshot, only cheaper.
> 
> At the moment, I dislike #2 and slightly prefer #4 to #3.

+1 for #4, and if we ever need more then we can provide a non-default
way to get at #2.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK TABLE Permissions

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 12:33 PM, Tom Lane  wrote:
> Stephen Frost  writes:
>> if (lockmode == AccessShareLock)
>> aclresult = pg_class_aclcheck(reloid, GetUserId(),
>>   ACL_SELECT);
>> +   else if (lockmode == RowExclusiveLock)
>> +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
>> +ACL_INSERT | ACL_UPDATE | ACL_DELETE | 
>> ACL_TRUNCATE);
>> else
>> aclresult = pg_class_aclcheck(reloid, GetUserId(),
>>   ACL_UPDATE | ACL_DELETE | 
>> ACL_TRUNCATE);
>
> Perhaps it would be better to refactor with a local variable for the
> aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
> pretty sure that the documentation work needed is more extensive
> than the actual patch ;-).  Otherwise, I don't see a problem with this.

I don't really care one way or the other whether we change this in
master, but I think back-patching changes that loosen security
restrictions is a poor idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] FKey not enforced resulting in broken Dump/Reload

2013-07-19 Thread Rod Taylor

A poorly coded trigger on the referencing table has the ability to break
foreign keys, and as a result create a database which cannot be dumped and
reloaded.

The BEFORE DELETE trigger accidentally does RETURN NEW, which suppresses
the DELETE action by the foreign key trigger. This allows the record from
the referenced table to be deleted and the record in the referencing table
to remain in place.

While I don't expect Pg to do what the coder meant, but it should throw an
error and not leave foreign key'd data in an invalid state.

This applies to both 9.1 and 9.2.


Please see attached bug.sql.


bug.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 12:51 PM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Fri, Jul 19, 2013 at 12:29 PM, Alvaro Herrera
>>  wrote:
>
>> > I think seeing otherwise invisible rows is useful in pgrowlocks.  It
>> > helps observe the effects on tuples written by concurrent transactions
>> > during experimentation.  But then, maybe this functionality belongs in
>> > pageinspect instead.
>>
>> It does seem like it should be useful, at least as an option.  But I
>> feel like changing that ought to be separate from getting rid of
>> SnapshotNow.  It seems like too big of a behavior change to pass off
>> as a harmless tweak.
>
> Agreed.

So any change we make to pgrowlocks is going to have some behavior consequences.

1. If we use SnapshotSelf, then nobody will notice the difference
unless this is used as part of a query that locks or modifies tuples
in the table being examined.  But in that case you might see the
results of the current query.  Thus, I think this is the smallest
possible behavior change, but Tom doesn't like SnapshotSelf any more
than he likes SnapshotNow.

2. If we use SnapshotDirty, then the difference is probably
noticeable, because you'll see the results of concurrent, uncommitted
transactions.  Maybe useful, but probably shouldn't be the new
default.

3. If we use a fresh MVCC snapshot, then when you scan a table you'll
see the state of play as of the beginning of your scan rather than the
state of play as of when your scan reaches the target page.  This
might be noticeable on a large table.  However, it might also be
thought an improvement.

4. If we use GetActiveSnapshot, all the comments about about a fresh
MVCC snapshot still apply.  However, the snapshot in question could be
even more stale, especially in repeatable read or serializable mode.
However, this might be thought a more consistent behavior than what we
have now.  And I'm guessing that this function is typically run as its
own transaction, so in practice this doesn't seem much different from
an MVCC snapshot, only cheaper.

At the moment, I dislike #2 and slightly prefer #4 to #3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Alvaro Herrera

Robert Haas escribió:
> On Fri, Jul 19, 2013 at 12:29 PM, Alvaro Herrera
>  wrote:

> > I think seeing otherwise invisible rows is useful in pgrowlocks.  It
> > helps observe the effects on tuples written by concurrent transactions
> > during experimentation.  But then, maybe this functionality belongs in
> > pageinspect instead.
> 
> It does seem like it should be useful, at least as an option.  But I
> feel like changing that ought to be separate from getting rid of
> SnapshotNow.  It seems like too big of a behavior change to pass off
> as a harmless tweak.

Agreed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FKey not enforced resulting in broken Dump/Reload

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 12:58 PM, Rod Taylor  wrote:
> A poorly coded trigger on the referencing table has the ability to break
> foreign keys, and as a result create a database which cannot be dumped and
> reloaded.
>
> The BEFORE DELETE trigger accidentally does RETURN NEW, which suppresses the
> DELETE action by the foreign key trigger. This allows the record from the
> referenced table to be deleted and the record in the referencing table to
> remain in place.
>
> While I don't expect Pg to do what the coder meant, but it should throw an
> error and not leave foreign key'd data in an invalid state.

This is a known limitation of our foreign key machinery.  It might
well be susceptible to improvement, but I wouldn't count on anyone
rewriting it in the near future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Minmax indexes

2013-07-19 Thread Josh Berkus

On 07/18/2013 10:39 PM, Alvaro Herrera wrote:
> To scan the index, we first obtain the TID of index tuple for page 0.  If
> this returns a valid TID, we read that tuple to determine the min/max bounds
> for this page range.  If it returns invalid, then the range is unsummarized,
> and the scan must return the whole range as needing scan.  After this
> index entry has been processed, we obtain the TID of the index tuple for
> page 0+pagesPerRange (currently this is a compile-time constant, but
> there's no reason this cannot be a per-index property).  Continue adding
> pagesPerRange until we reach the end of the heap.

Conceptually, this sounds like a good initial solution to the update
problem.

I still think we could do incremental updates to the minmax indexes per
the idea I discussed, but that could be a later version.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 12:29 PM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Fri, Jul 19, 2013 at 1:27 AM, Tom Lane  wrote:
>
>> > Why not tell people to use SnapshotDirty if they need a
>> > not-guaranteed-consistent result?  At least then it's pretty obvious
>> > that you're getting some randomness in with your news.
>
>> On further reflection, I think perhaps pgrowlocks should just register
>> a fresh MVCC snapshot and use that.  Using SnapshotDirty would return
>> TIDs of unseen tuples, which does not seem to be what is wanted there.
>
> I think seeing otherwise invisible rows is useful in pgrowlocks.  It
> helps observe the effects on tuples written by concurrent transactions
> during experimentation.  But then, maybe this functionality belongs in
> pageinspect instead.

It does seem like it should be useful, at least as an option.  But I
feel like changing that ought to be separate from getting rid of
SnapshotNow.  It seems like too big of a behavior change to pass off
as a harmless tweak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-07-19 Thread Josh Berkus

On 07/15/2013 10:19 AM, Jeff Davis wrote:
> On Thu, 2013-07-11 at 10:51 -0400, Nicholas White wrote:
>> I've attached a revised version that fixes the issues above:
> 
> I'll get to this soon, sorry for the delay.
> 
> Regards,
>   Jeff Davis

So ... are you doing a final review of this for the CF, Jeff?  We need
to either commit it or bounce it to the next CF.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK TABLE Permissions

2013-07-19 Thread Tom Lane

Stephen Frost  writes:

> if (lockmode == AccessShareLock)
> aclresult = pg_class_aclcheck(reloid, GetUserId(),
>   ACL_SELECT);
> +   else if (lockmode == RowExclusiveLock)
> +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
> +ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);
> else
> aclresult = pg_class_aclcheck(reloid, GetUserId(),
>   ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);

Perhaps it would be better to refactor with a local variable for the
aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
pretty sure that the documentation work needed is more extensive
than the actual patch ;-).  Otherwise, I don't see a problem with this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Alvaro Herrera

Robert Haas escribió:
> On Fri, Jul 19, 2013 at 1:27 AM, Tom Lane  wrote:

> > Why not tell people to use SnapshotDirty if they need a
> > not-guaranteed-consistent result?  At least then it's pretty obvious
> > that you're getting some randomness in with your news.

> On further reflection, I think perhaps pgrowlocks should just register
> a fresh MVCC snapshot and use that.  Using SnapshotDirty would return
> TIDs of unseen tuples, which does not seem to be what is wanted there.

I think seeing otherwise invisible rows is useful in pgrowlocks.  It
helps observe the effects on tuples written by concurrent transactions
during experimentation.  But then, maybe this functionality belongs in
pageinspect instead.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Andres Freund

On 2013-07-20 00:49:11 +0900, Hiroshi Inoue wrote:
> (2013/07/18 21:46), Robert Haas wrote:
> >There seems to be a consensus that we should try to get rid of
> >SnapshotNow entirely now that we have MVCC catalog scans, so I'm
> >attaching two patches that together come close to achieving that goal:
> 
> ...
> 
> >With that done, the only remaining uses of SnapshotNow in our code
> >base will be in currtid_byreloid() and currtid_byrelname().  So far no
> >one on this list has been able to understand clearly what the purpose
> >of those functions is, so I'm copying this email to pgsql-odbc in case
> >someone there can provide more insight.  If I were a betting man, I'd
> >bet that they are used in contexts where the difference between
> >SnapshotNow and SnapshotSelf wouldn't matter there, either.
> 
> Using SnapshotSelf instead of SnapshotNow for currtid_ () wouldn't
>  matter.

I think it actually might. You could get into dicey situations if you
use currtid_ in a query performing updates or inserts because it would
see the to-be-inserted tuple...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Hiroshi Inoue


(2013/07/18 21:46), Robert Haas wrote:

There seems to be a consensus that we should try to get rid of
SnapshotNow entirely now that we have MVCC catalog scans, so I'm
attaching two patches that together come close to achieving that goal:


...


With that done, the only remaining uses of SnapshotNow in our code
base will be in currtid_byreloid() and currtid_byrelname().  So far no
one on this list has been able to understand clearly what the purpose
of those functions is, so I'm copying this email to pgsql-odbc in case
someone there can provide more insight.  If I were a betting man, I'd
bet that they are used in contexts where the difference between
SnapshotNow and SnapshotSelf wouldn't matter there, either.


Using SnapshotSelf instead of SnapshotNow for currtid_ () wouldn't
 matter.

regards,
Hiroshi Inoue




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-19 Thread Josh Berkus


> I've even proposed that in the past in
> 20130225211533.gd3...@awork2.anarazel.de . I plan to propose an updated
> version of that patch (not allowing numeric 2nd level ids) for the next
> CF.
> 
> So you can just do stuff like:
> 
> server.foo.grand_unified_config = value.

Sounds good to me.  I can see lots of uses for that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [ODBC] [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Tom Lane

Andres Freund  writes:
> On 2013-07-20 00:49:11 +0900, Hiroshi Inoue wrote:
>> Using SnapshotSelf instead of SnapshotNow for currtid_ () wouldn't
>> matter.

> I think it actually might. You could get into dicey situations if you
> use currtid_ in a query performing updates or inserts because it would
> see the to-be-inserted tuple...

I'm pretty sure Hiroshi-san was only opining about whether it would
matter for ODBC's usage.  IIUC, ODBC is using this function to re-fetch
rows that it inserted, updated, or at least selected-for-update in a
previous command of the current transaction, so actually any snapshot
would do fine.

In any case, since I moved the goalposts by suggesting that SnapshotSelf
is just as dangerous as SnapshotNow, what we need to know is whether
it'd be all right to change this code to use a fresh MVCC snapshot;
and if not, why not.  It's pretty hard to see a reason why client-side
code would want to make use of the results of a non-MVCC snapshot.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign Tables as Partitions

2013-07-19 Thread Robert Haas

On Thu, Jul 18, 2013 at 8:56 PM, David Fetter  wrote:
> Please find attached a PoC patch to implement $subject.
>
> So far, with a lot of help from Andrew Gierth, I've roughed out an
> implementation which allows you to ALTER FOREIGN TABLE so it inherits
> a local table.
>
> TBD: CREATE FOREIGN TABLE ... INHERITS ..., docs, regression testing,
> etc., etc.

I think this could be useful, but it's going to take more than a
three-line patch.  Removing code that prohibits things is easy; what's
hard is verifying that nothing else breaks as a result.  And I bet it
does.

This functionality was actually present in the original submission for
foreign tables.  I ripped it out before commit because I didn't think
all of the interactions with other commands had been adequately
considered.  But I think they did consider it to some degree, which
this patch does not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] LOCK TABLE Permissions

2013-07-19 Thread Stephen Frost

Greetings,

  We've run into a curious case and I'd like to solicit feedback
  regarding a possible change to the access rights required to acquire
  locks on a relation.  Specifically, we have a process which normally
  INSERTs into a table and another process which Exclusive locks that
  same table in order to syncronize other processing.  We then ran into
  a case where we didn't actually want to INSERT but still wanted to
  have the syncronization happen.  Unfortunately, we don't allow
  LOCK TABLE to acquire RowExclusive unless you have UPDATE, DELETE, or
  TRUNCATE privileges.

  My first impression is that the current code was just overly
  simplistic regarding what level of permissions are required for a
  given lock type and that it wasn't intentional to deny processes which
  have INSERT privileges from acquiring RowExclusive (as they can do so
  anyway using an actual INSERT).  Therefore, I'd like to propose the
  below simple 3-line patch to correct this.

  Thoughts?  Objections to back-patching?

Thanks,

Stephen

diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
new file mode 100644
index 49950d7..60f54c5 100644
*** a/src/backend/commands/lockcmds.c
--- b/src/backend/commands/lockcmds.c
*** LockTableAclCheck(Oid reloid, LOCKMODE l
*** 174,179 
--- 174,182 
if (lockmode == AccessShareLock)
aclresult = pg_class_aclcheck(reloid, GetUserId(),
  ACL_SELECT);
+   else if (lockmode == RowExclusiveLock)
+   aclresult = pg_class_aclcheck(reloid, GetUserId(),
+ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);
else
aclresult = pg_class_aclcheck(reloid, GetUserId(),
  ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);



signature.asc
Description: Digital signature

Re: [HACKERS] pgindent behavior we could do without

2013-07-19 Thread Andrew Dunstan



On 07/18/2013 11:30 PM, Tom Lane wrote:

Noah Misch  writes:

On Thu, Jul 18, 2013 at 12:27:21AM -0400, Tom Lane wrote:

It's always annoyed me that pgindent insists on adjusting vertical
whitespace around #else and related commands.  This has, for example,
rendered src/include/storage/barrier.h nigh illegible: you get things
like

Agreed.  I've similarly disliked how pgindent adds a blank line between an #if
and a multi-line comment, like at the top of get_restricted_token().

AFAICT it forces a blank line before a multiline comment in most
contexts; #if isn't special (though it does seem to avoid doing that
just after a left brace).  I too don't much care for that behavior,
although it's not as detrimental to readability as removing blank lines
can be.

In general, pgindent isn't nearly as good about managing vertical
whitespace as it is about horizontal spacing ...



I agree.

I should perhaps note that when I rewrote pgindent, I deliberately 
didn't editorialize about its behaviour - but I did intend to make it 
more maintainable and simpler to change stuff like this, and so it is :-)


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LATERAL quals revisited

2013-07-19 Thread Tom Lane

Ashutosh Bapat  writes:
> On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane  wrote:
>> For there to *be* a unique "appropriate outer join", we need to require
>> that a LATERAL-using qual clause that's under an outer join contain
>> lateral references only to the outer side of the nearest enclosing outer
>> join.  There's no such restriction in the spec of course, but we can
>> make it so by refusing to flatten a sub-select if pulling it up would
>> result in having a clause in the outer query that violates this rule.
>> There's already some code in prepjointree.c (around line 1300) that
>> attempts to enforce this, though now that I look at it again I'm not
>> sure it's covering all the bases.  We may need to extend that check.

> Why do we need this restriction? Wouldn't a place (specifically join qual
> at such a place) in join tree where all the participating relations are
> present, serve as a place where the clause can be applied.

No.  If you hoist a qual that appears below an outer join to above the
outer join, you get wrong results in general: you might eliminate rows
from the outer side of the join, which a qual from within the inner side
should never be able to do.

> select * from tab1 left join tab2 t2 using (val) left join lateral (select
> val from tab2 where val2 = tab1.val * t2.val) t3 using (val);
> Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
> place where we are computing join between tab1, t2 and t3?

This particular example doesn't violate the rule I gave above, since
both tab1 and t2 are on the left side of the join to the lateral
subquery, and the qual doesn't have to get hoisted *past* an outer join,
only to the outer join of {tab1,t2} with {t3}.

>> I'm inclined to process all LATERAL-using qual clauses this way, ie
>> postpone them till we recurse back up to a place where they can
>> logically be evaluated.  That won't make any real difference when no
>> outer joins are present, but it will eliminate the ugliness that right
>> now distribute_qual_to_rels is prevented from sanity-checking the scope
>> of the references in a qual when LATERAL is present.  If we do it like
>> this, we can resurrect full enforcement of that sanity check, and then
>> throw an error if any "postponed" quals are left over when we're done
>> recursing.

> Parameterized nested loop join would always be able to evaluate a LATERAL
> query. Instead of throwing error, why can't we choose that as the default
> strategy whenever we fail to flatten subquery?

I think you misunderstood.  That error would only be a sanity check that
we'd accounted for all qual clauses, it's not something a user should
ever see.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple documentation typo patch

2013-07-19 Thread Robert Haas

On Thu, Jul 18, 2013 at 2:53 PM, David Christensen  wrote:
> Hey folks, this corrects typos going back to 
> 75c6519ff68dbb97f73b13e9976fb8075bbde7b8 where EUC_JIS_2004 and 
> SHIFT_JIS_2004 were first added.
>
> These typos are present in all supported major versions of PostgreSQL, back 
> through 8.3; I didn't look any further than that. :-)

Committed and back-patched to all currently-supported versions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] confusing typedefs in jsonfuncs.c

2013-07-19 Thread Andrew Dunstan



On 07/18/2013 09:20 PM, Peter Eisentraut wrote:

The new jsonfuncs.c has some confusing typedef scheme.  For example, it
has a bunch of definitions like this:

typedef struct getState
{
 ...
} getState, *GetState;

So GetState is a pointer to getState.  I have never seen that kind of
convention before.

This then leads to code like

GetStatestate;

state = palloc0(sizeof(getState));

which has useless mental overhead.

But the fact that GetState is really a pointer isn't hidden at all,
because state is then derefenced with -> or cast from or to void*.  So
whatever abstraction might have been intended isn't there at all.  And
all of this is an intra-file interface anyway.

And to make this even more confusing, other types such as ColumnIOData
and JsonLexContext are not pointers but structs directly.

I think a more typical PostgreSQL code convention is to use capitalized
camelcase for structs, and use explicit pointers for pointers.  I have
attached a patch that cleans this up, in my opinion.



I don't have a problem with this. Sometimes when you've stared at 
something for long enough you fail to notice such things, so I welcome 
more eyeballs on the code.


Note that this is an externally visible API, so anyone who has written 
an extension against it will possibly find it broken. But that's all the 
more reason to clean it now, before it makes it to a released version.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-19 Thread Tom Lane

Dean Rasheed  writes:
> I took a look at this patch too. I didn't read all the code in detail,
> but there was one area I wondered about --- is it still necessary to
> add the new field rowsec_relid to RangeTblEntry?

> As far as I can see, it is only used in the new function getrelid()
> which replaces the old macro of the same name, so that it can work if
> it is passed the index of the sourceRelation subquery RTE instead of
> the resultRelation. This seems to be encoding a specific assumption
> that a subquery RTE is always going to come from row-level security
> expansion.

I haven't read the patch at all, but I would opine that anything that's
changing the behavior of getrelid() is broken by definition.  Something
is being done at the wrong level of abstraction if you think that you
need to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-19 Thread Hiroshi Inoue


(2013/07/19 22:03), Andres Freund wrote:

On 2013-07-19 08:57:01 +0900, Inoue, Hiroshi wrote:

I had the idea they were used for a client-side implementation of WHERE
CURRENT OF.  Perhaps that's dead code and could be removed entirely?


It's been reported that ODBC still uses them.


Though PostgreSQL's TID is similar to Orale's ROWID, it is transient
and changed after update operations unfortunately. I implemented
the currtid_xx functions to supplement the difference. For example

currtid(relname, original tid)

(hopefully) returns the current tid of the original row when it is
updated.


That is only guaranteed to work though when you're in a transaction old
enough to prevent removal of the old or intermediate row versions. E.g.


Yes it's what I meant by (hopefully).
At the time when I implemented currtid(), I was able to use TIDs in
combination with OIDs.

regards,
Hiroshi Inoue


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-19 Thread Greg Smith


On 7/19/13 3:53 AM, KONDO Mitsumasa wrote:

Recently, a user who think system availability is important uses
synchronous replication cluster.


If your argument for why it's OK to ignore bounding crash recovery on 
the master is that it's possible to failover to a standby, I don't think 
that is acceptable.  PostgreSQL users certainly won't like it.



I want you to read especially point that is line 631, 651, and 656.
MAX_WRITEBACK_PAGES is 1024 (1024 * 4096 byte).


You should read http://www.westnet.com/~gsmith/content/linux-pdflush.htm 
to realize everything you're telling me about the writeback code and its 
congestion logic I knew back in 2007.  The situation is even worse than 
you describe, because this section of Linux has gone through multiple, 
major revisions since then.  You can't just say "here is the writeback 
source code"; you have to reference each of the commonly deployed 
versions of the writeback feature to tell how this is going to play out 
if released.  There are four major ones I pay attention to.  The old 
kernel style as see in RHEL5/2.6.18--that's what my 2007 paper 
discussed--the similar code but with very different defaults in 2.6.22, 
the writeback method/tuning in RHEL6/Debian Squeeze/2.6.32, and then 
there are newer kernels.  (The newer ones separate out into a few 
branches too, I haven't mapped those as carefully yet)


If you tried to model your feature on Linux's approach here, what that 
means is that the odds of an ugly feedback loop here are even higher. 
You're increasing the feedback on what's already a bad situation that 
triggers trouble for people in the field.  When Linux's congestion logic 
causes checkpoint I/O spikes to get worse than they otherwise might be, 
people panic because it seems like they stopped altogether.  There are 
some examples of what really bad checkpoints look like in 
http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf 
if you want to see some of them.  That's the talk I did around the same 
time I was trying out spreading the database fsync calls out over a 
longer period.


When I did that, checkpoints became even less predictable, and that was 
a major reason behind why I rejected the approach.  I think your 
suggestion will have the same problem.  You just aren't generating test 
cases with really large write workloads yet to see it.  You also don't 
seem afraid of how exceeding the checkpoint timeout is a very bad thing yet.



In addition, if you set a large value of a checkpoint_timeout or
checkpoint_complete_taget, you have said that performance is improved,
but is it true in all the cases?


The timeout, yes.  Throughput is always improved by increasing 
checkpoint_timeout.  Less checkpoints per unit of time increases 
efficiency.  Less writes of the most heavy accessed buffers happen per 
transaction.  It is faster because you are doing less work, which on 
average is always faster than doing more work.  And doing less work 
usually beats doing more work, but doing it smarter.


If you want to see how much work per transaction a test is doing, track 
the numbers of buffers written at the beginning/end of your test via 
pg_stat_bgwriter.  Tests that delay checkpoints will show a lower total 
number of writes per transaction.  That seems more efficient, but it's 
efficiency mainly gained by ignoring checkpoint_timeout.



When a checkpoint complication target is actually enlarged,
performance may fall in some cases. I think this as the last fsync
having become heavy owing to having write in slowly.


I think you're confusing throughput and latency here.  Increasing the 
checkpoint timeout, or to a lesser extent the completion target, on 
average that increases throughput.  It results in less work, and the 
more/less work amount is much more important than worrying about 
scheduler details.  Now matter how efficient a given write is, whether 
you've sorted it across elevator horizon boundary A or boundary B, it's 
better not do it at all.


But having less checkpoints makes latency worse sometimes too.  Whether 
latency or throughput is considered the more important thing is very 
complicated.  Having checkpoint_completion_target as the knob to control 
the latency/throughput trade-off hasn't worked out very well.  No one 
has done a really comprehensive look at this trade-off since the 8.3 
development.  I got halfway through it for 9.1, we figured out that the 
fsync queue filling was actually responsible for most of my result 
variation, and then Robert fixed that.  It was a big enough change that 
all my data from before that I had to throw out as no longer relevant.


By the way:  if you have a theory like "the last fsync having become 
heavy" for why something is happening, measure it.  Set log_min_messages 
to debug2 and you'll get details about every single fsync in your logs. 
 I did that for all my tests that led me to conclude fsync delaying on 
its own didn't help that problem.  I was m

[HACKERS] Adding new joining alghoritm to postgresql

2013-07-19 Thread tubadzin

Hi.I'm a little confused. 1.I have source code 9.2.4. version 
fromhttp://www.postgresql.org/ftp/source/ 2.I want to add new alghoritm to 
index nested loops join, merge join and hash join. I have Executor catalog in 
src catalag containing nodeHash.c, nodeHasjoin.c, nodeMergejoin and 
nodeNestloop.c 3.After changes, I want to compile postgresql and use it. 
4.Problem is:a)I do not know which library is responsible for this 
functionality. I understand, that I have to compile src and replace library (I 
don't know which library) in path where Postgresql in installed:C:\Program 
Files (x86)\PostgreSQL\9.2 b)I don't know how use files/library (which 
library?) with visual studio 2010 and how compile it. Thanks for you all 
answers. Tom.

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Andres Freund

On 2013-07-19 01:27:41 -0400, Tom Lane wrote:
> Noah Misch  writes:
> > To me, the major advantage of removing SnapshotNow is to force all
> > third-party code to reevaluate.  But that could be just as well
> > achieved by renaming it to, say, SnapshotImmediate.  If there are
> > borderline-legitimate SnapshotNow uses in our code base, I'd lean
> > toward a rename instead.  Even if we decide to remove every core use,
> > third-party code might legitimately reach a different conclusion on
> > similar borderline cases.

I don't think there are many people that aren't active on -hackers that
can actually understand the implications of using SnapshotNow. Given
-hackers hasn't fully grasped them in several cases... And even if those
borderline cases are safe, that's really only valid for a specific
postgres version. Catering to that doesn't seem like a good idea to me.

> Indeed, I'm thinking I don't believe in SnapshotSelf anymore either.
> It's got all the same consistency issues as SnapshotNow.  In fact, it
> has *more* issues, because it's also vulnerable to weirdnesses caused by
> inconsistent ordering of tuple updates among multiple tuples updated by
> the same command.

Hm. I kind of can see the point of it in constraint code where it
probably would be rather hard to remove usage of it, but e.g. the
sepgsql usage looks pretty dubious to me.
At least in the cases where the constraint code uses them I don't think
the SnapshotNow dangers apply since those specific rows should be locked
et al.

The selinux usage looks like a design flaw to me, but I don't really
understand that code, so I very well may be wrong.

> Why not tell people to use SnapshotDirty if they need a
> not-guaranteed-consistent result?  At least then it's pretty obvious
> that you're getting some randomness in with your news.

Especially if we're going to lower the lock level of some commands, but
even now, that opens us to more issues due to nonmatching table
definitions et al. That doesn't sound like a good idea to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-19 Thread Samrat Revagade

>> *for example: for failback safe standby.*

>I think that introducing another configuration format is a pretty bad
>idea. While something new might not turn out to be as bad, we've seen
>how annoying a separate configuration format turned out for
>recovery.conf.

Its not totally different way of configuration.
ini file will be parsed in the same way as postgresql.conf.
just want to separate out the replication parameters, to make simpler
configuration for future developments in the field of replication such
as failback-safe standby.

> So you can just do stuff like:
>
> server.foo.grand_unified_config = value.

But according to your approach and considering the use case of
failback safe standby
the parameters into the postgresql.conf will vary dynamically, and i
don't think so doing this in the postgresql.conf is a good idea
because it already contains whole bunch of parameters:

for example:
if i want to configure 2 servers then it will add 6 lines,for 3 -9, for 4-12
setting's for particular server will be:

considering the way of setting value to conf parameters  : guc . value

standby_name.'AAA'
synchronous_transfer. commit
wal_sender_timeout.60


Regards,
Samrat
Samrat Revgade


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-19 Thread Andres Freund

On 2013-07-19 08:57:01 +0900, Inoue, Hiroshi wrote:
> >>I had the idea they were used for a client-side implementation of WHERE
> >>CURRENT OF.  Perhaps that's dead code and could be removed entirely?
> >
> >It's been reported that ODBC still uses them.
> 
> Though PostgreSQL's TID is similar to Orale's ROWID, it is transient
> and changed after update operations unfortunately. I implemented
> the currtid_xx functions to supplement the difference. For example
> 
>   currtid(relname, original tid)
> 
> (hopefully) returns the current tid of the original row when it is
> updated.

That is only guaranteed to work though when you're in a transaction old
enough to prevent removal of the old or intermediate row versions. E.g.
BEGIN;
INSERT INTO foo...; -- last tid (0, 1)
COMMIT;
BEGIN;
SELECT currtid(foo, '(0, 1'));
COMMIT;

can basically return no or even an arbitrarily different row. Same with
an update...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 1:27 AM, Tom Lane  wrote:
> Noah Misch  writes:
>> To me, the major advantage of removing SnapshotNow is to force all
>> third-party code to reevaluate.  But that could be just as well
>> achieved by renaming it to, say, SnapshotImmediate.  If there are
>> borderline-legitimate SnapshotNow uses in our code base, I'd lean
>> toward a rename instead.  Even if we decide to remove every core use,
>> third-party code might legitimately reach a different conclusion on
>> similar borderline cases.
>
> Meh.  If there is third-party code with a legitimate need for
> SnapshotNow, all we'll have done is to create an annoying version
> dependency for them.  So if we think that's actually a likely scenario,
> we shouldn't rename it.  But the entire point of this change IMO is that
> we *don't* think there is a legitimate use-case for SnapshotNow.
>
> Indeed, I'm thinking I don't believe in SnapshotSelf anymore either.
> It's got all the same consistency issues as SnapshotNow.  In fact, it
> has *more* issues, because it's also vulnerable to weirdnesses caused by
> inconsistent ordering of tuple updates among multiple tuples updated by
> the same command.
>
> Why not tell people to use SnapshotDirty if they need a
> not-guaranteed-consistent result?  At least then it's pretty obvious
> that you're getting some randomness in with your news.

You know, I didn't really consider that before, but I kind of like it.
 I think that would be entirely suitable (and perhaps better) for
pgstattuple and get_actual_variable_range().

On further reflection, I think perhaps pgrowlocks should just register
a fresh MVCC snapshot and use that.  Using SnapshotDirty would return
TIDs of unseen tuples, which does not seem to be what is wanted there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-19 Thread Andres Freund

Hi,

On 2013-07-19 08:23:25 -0400, Robert Haas wrote:
> On Fri, Jul 19, 2013 at 7:24 AM, Andres Freund  wrote:
> > The changes here make it impossible to write a bgworker which properly
> > works in 9.3 and 9.4. Was that intended? If so, the commit message
> > should mention the compatibility break...
>
> Yeah, sorry, I probably should have mentioned that.  The structure
> needs to be fixed size for us to store it in shared memory.

> > If it was intended I propose changing the signature for 9.3 as
> > well. There's just no point in releasing 9.3 when we already know which
> > trivial but breaking change will be required for 9.4
>
> I think that would be a good idea.

> And I'd also propose getting rid
> of bgw_sighup and bgw_sigterm in both branches, while we're at it.
> AFAICT, they don't add any functionality, and they're basically
> unusable for dynamically started background workers.  Probably better
> not to get people to used to using them.

I don't have a problem with getting rid of those, it's easy enough to
register them inside the worker and it's safe since we start with
blocked signals. I seem to remember some discussion about why they were
added but I can't find a reference anymore. Alvaro, do you remember?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-19 Thread Magnus Hagander

On Fri, Jul 19, 2013 at 1:23 PM, Robert Haas  wrote:
> On Fri, Jul 19, 2013 at 7:24 AM, Andres Freund  wrote:
>> The changes here make it impossible to write a bgworker which properly
>> works in 9.3 and 9.4. Was that intended? If so, the commit message
>> should mention the compatibility break...
>
> Yeah, sorry, I probably should have mentioned that.  The structure
> needs to be fixed size for us to store it in shared memory.
>
>> If it was intended I propose changing the signature for 9.3 as
>> well. There's just no point in releasing 9.3 when we already know which
>> trivial but breaking change will be required for 9.4
>
> I think that would be a good idea.  And I'd also propose getting rid
> of bgw_sighup and bgw_sigterm in both branches, while we're at it.
> AFAICT, they don't add any functionality, and they're basically
> unusable for dynamically started background workers.  Probably better
> not to get people to used to using them.

+1. Much better to take that pain now, before we have made a production release.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2

2013-07-19 Thread Robert Haas

On Mon, Jul 15, 2013 at 1:41 PM, Jeff Davis  wrote:
>> I'm of the opinion that we ought to extract the parts of the patch
>> that hold the VM pin for longer, review those separately, and if
>> they're good and desirable, apply them.
>
> I'm confused. My patch holds a VM page pinned for those cases where
> PD_ALL_VISIBLE is currently used -- scans or insert/update/delete. If we
> have PD_ALL_VISIBLE, there's no point in the cache, right?

Hmm.  You might be right.  I thought there might be some benefit
there, but I guess we're not going to clear the bit more than once, so
maybe not.

> To check visibility from an index scan, you either need to pin a heap
> page or a VM page. Why would checking the heap page be cheaper? Is it
> just other code in the VM-testing path that's slower? Or is there
> concurrency involved in your measurements which may indicate contention
> over VM pages?

Well, I have seen some data that hints at contention problems with VM
pages, but it's not conclusive.  But the real issue is that, if the
index-only scan finds the VM page not set, it still has to check the
heap page.  Thus, it ends up accessing two pages rather than one, and
it turns out that's more expensive.  It's unfortunately hard to
predict whether the cost of checking VM first will pay off.  It's a
big win if we learn that we don't need to look at the heap page
(because we don't need to read, lock, pin, or examine it, in that
case) but it's a loss if we do (because checking the VM isn't free).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-19 Thread Robert Haas

On Fri, Jul 19, 2013 at 7:24 AM, Andres Freund  wrote:
> The changes here make it impossible to write a bgworker which properly
> works in 9.3 and 9.4. Was that intended? If so, the commit message
> should mention the compatibility break...

Yeah, sorry, I probably should have mentioned that.  The structure
needs to be fixed size for us to store it in shared memory.

> If it was intended I propose changing the signature for 9.3 as
> well. There's just no point in releasing 9.3 when we already know which
> trivial but breaking change will be required for 9.4

I think that would be a good idea.  And I'd also propose getting rid
of bgw_sighup and bgw_sigterm in both branches, while we're at it.
AFAICT, they don't add any functionality, and they're basically
unusable for dynamically started background workers.  Probably better
not to get people to used to using them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-19 Thread Dean Rasheed

On 19 July 2013 04:09, Greg Smith  wrote:
> On 7/18/13 11:03 PM, Stephen Frost wrote:
>>
>> Wasn't there a wiki page about this
>> feature which might also help?  Bigger question- is it correct for the
>> latest version of the patch?
>
>
> https://wiki.postgresql.org/wiki/RLS has accumulated a lot of older debris
> that may or may not be useful here.
>
> This useful piece was just presented at PGCon:
> http://www.pgcon.org/2013/schedule/attachments/273_PGcon2013-kaigai-row-level-security.pdf
> That is very up to date intro to the big picture issues.
>

Hi,

I took a look at this patch too. I didn't read all the code in detail,
but there was one area I wondered about --- is it still necessary to
add the new field rowsec_relid to RangeTblEntry?

As far as I can see, it is only used in the new function getrelid()
which replaces the old macro of the same name, so that it can work if
it is passed the index of the sourceRelation subquery RTE instead of
the resultRelation. This seems to be encoding a specific assumption
that a subquery RTE is always going to come from row-level security
expansion.

Is it the case that this can only happen from expand_targetlist(), in
which case why not pass both source_relation and result_relation to
expand_targetlist(), which I think will make that code neater. As it
stands, what expand_targetlist() sees as result_relation is actually
source_relation, which getrelid() then handles specially. Logically I
think expand_targetlist() should pass the actual result_relation to
getrelid(), opening the target table, and then pass source_relation to
makeVar() when building new TLEs.

With that change to expand_targetlist(), the change to getrelid() may
be unnecessary, and then also the new rowsec_relid field in
RangeTblEntry may not be needed.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LATERAL quals revisited

2013-07-19 Thread Ashutosh Bapat

I have couple of questions.

On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane  wrote:

> I've been studying the bug reported at
>
> http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
> that the planner can do the wrong thing with queries like
>
> SELECT * FROM
>   i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
>
> I think the fundamental problem is that, because the "i.n = j.n" clause
> appears syntactically in WHERE, the planner is treating it as if it were
> an inner-join clause; but really it ought to be considered a clause of
> the upper LEFT JOIN.  That is, semantically this query ought to be
> equivalent to
>
> SELECT * FROM
>   i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;
>
> However, because distribute_qual_to_rels doesn't see the clause as being
> attached to the outer join, it's not marked with the correct properties
> and ends up getting evaluated in the wrong place (as a "filter" clause
> not a "join filter" clause).  The bug is masked in the test cases we've
> used so far because those cases are designed to let the clause get
> pushed down into the scan of the inner relation --- but if it doesn't
> get pushed down, it's evaluated the wrong way.
>
> After some contemplation, I think that the most practical way to fix
> this is for deconstruct_recurse and distribute_qual_to_rels to
> effectively move such a qual to the place where it logically belongs;
> that is, rather than processing it when we look at the lower WHERE
> clause, set it aside for a moment and then add it back when looking at
> the ON clause of the appropriate outer join.  This should be reasonably
> easy to do by keeping a list of "postponed lateral clauses" while we're
> scanning the join tree.
>
> For there to *be* a unique "appropriate outer join", we need to require
> that a LATERAL-using qual clause that's under an outer join contain
> lateral references only to the outer side of the nearest enclosing outer
> join.  There's no such restriction in the spec of course, but we can
> make it so by refusing to flatten a sub-select if pulling it up would
> result in having a clause in the outer query that violates this rule.
> There's already some code in prepjointree.c (around line 1300) that
> attempts to enforce this, though now that I look at it again I'm not
> sure it's covering all the bases.  We may need to extend that check.
>
>
Why do we need this restriction? Wouldn't a place (specifically join qual
at such a place) in join tree where all the participating relations are
present, serve as a place where the clause can be applied. E.g. in the query

select * from tab1 left join tab2 t2 using (val) left join lateral (select
val from tab2 where val2 = tab1.val * t2.val) t3 using (val);

Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
place where we are computing join between tab1, t2 and t3?


 I'm inclined to process all LATERAL-using qual clauses this way, ie
> postpone them till we recurse back up to a place where they can
> logically be evaluated.  That won't make any real difference when no
> outer joins are present, but it will eliminate the ugliness that right
> now distribute_qual_to_rels is prevented from sanity-checking the scope
> of the references in a qual when LATERAL is present.  If we do it like
> this, we can resurrect full enforcement of that sanity check, and then
> throw an error if any "postponed" quals are left over when we're done
> recursing.
>
>
Parameterized nested loop join would always be able to evaluate a LATERAL
query. Instead of throwing error, why can't we choose that as the default
strategy whenever we fail to flatten subquery?

Can we put the clause with lateral references at its appropriate place
while flattening the subquery? IMO, that will be cleaner and lesser work
than first pulling the clause and then putting it back again? Right, now,
we do not have that capability in pull_up_subqueries() but given its
recursive structure, it might be easier to do it there.


> Thoughts, better ideas?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] WITH CHECK OPTION for auto-updatable views

2013-07-19 Thread Dean Rasheed

On 18 July 2013 22:27, Stephen Frost  wrote:
> Dean,
>
>
> * Stephen Frost (sfr...@snowman.net) wrote:
>> Thanks!  This is really looking quite good, but it's a bit late and I'm
>> going on vacation tomorrow, so I didn't quite want to commit it yet. :)
>
> Apologies on this taking a bit longer than I expected, but it's been
> committed and pushed now.  Please take a look and let me know of any
> issues you see with the changes that I made.
>

Excellent. Thank you! The changes look good to me.

Cheers,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-19 Thread Andres Freund

On 2013-07-19 14:54:16 +0530, Samrat Revagade wrote:
> I was going through the archives and there was a discussion about
> using  ini file to setup
> replication.(http://www.postgresql.org/message-id/4c9876b4.9020...@enterprisedb.com).
> I think if we work on this proposal and separate out the replication
> setup from postgresql.conf file then we can provide more granularity
> while setting up the replication parameters.
> for example, we can set different values of wal_sender_timeout for
> each standby sever.
> 
> So i think it is good idea to separate out the replication settings
> from postgresql.conf file and put into ini file.
> Once it is confirmed then we can extend the ini file to support future
> developments into replication.
> *for example: for failback safe standby.*

I think that introducing another configuration format is a pretty bad
idea. While something new might not turn out to be as bad, we've seen
how annoying a separate configuration format turned out for
recovery.conf.

I'd much rather go ahead and remove the nesting limit of GUCs. That
should give us just about all that can be achieved with an ini file with
a 1 line change. Sometime we might want to extend our format to add ini
like sections but I think that *definitely* should be a separate
proposal.

I've even proposed that in the past in
20130225211533.gd3...@awork2.anarazel.de . I plan to propose an updated
version of that patch (not allowing numeric 2nd level ids) for the next
CF.

So you can just do stuff like:

server.foo.grand_unified_config = value.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-19 Thread Samrat Revagade

Please find updated hyperlinks:

> Below I have explained how to to use ini file for failback safe stadby setup:
> While discussing feature of fail-back safe standby
> (CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com)

http://www.postgresql.org/message-id/CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com

> We have decided to use the *ini* file  to configure the fail-back safe standby
> here is the link for that:
> cad21aocy2_bqvpzjey7s77amnccbxfj+1gphggdbulklav0...@mail.gmail.com

http://www.postgresql.org/message-id/cad21aocy2_bqvpzjey7s77amnccbxfj+1gphggdbulklav0...@mail.gmail.com

Regards,
Samrat Revgade


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Using ini file to setup replication

2013-07-19 Thread Samrat Revagade

Hi,

I was going through the archives and there was a discussion about
using  ini file to setup
replication.(http://www.postgresql.org/message-id/4c9876b4.9020...@enterprisedb.com).
I think if we work on this proposal and separate out the replication
setup from postgresql.conf file then we can provide more granularity
while setting up the replication parameters.
for example, we can set different values of wal_sender_timeout for
each standby sever.

So i think it is good idea to separate out the replication settings
from postgresql.conf file and put into ini file.
Once it is confirmed then we can extend the ini file to support future
developments into replication.
*for example: for failback safe standby.*

Below I have explained how to to use ini file for failback safe stadby setup:
While discussing feature of fail-back safe standby
(CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com)
We have decided to use the *ini* file  to configure the fail-back safe standby
here is the link for that:
cad21aocy2_bqvpzjey7s77amnccbxfj+1gphggdbulklav0...@mail.gmail.com
But there is no strong positive/negative feedback for the concept of
introducing the ini file.please have a look at it and  give feedback.


In todays scenario, In replication we only have the 2 ways of
configuring the standby server
1. Asynchronous standby
2. Synchronous standby

With the patch of failback safe standby we have more granularity in
setting up the  standby servers.
1. Default synchronous standby.(AAA)
2. Default asynchronous standby. (BBB)
3. Synchronous standby and also make same standby as a failback safe
standby.(CCC)
4. Asynchronous standby and also make same standby as a failback safe
standby.(DDD)

In failback safe standby we are thinking to add to more granular
settings of replication parameters for example
1. User can set seperate value for wal_sender_timeout for each server.
2. User can set seperate value of synchronous_transfer for each server.

Consider the scenario where user want to setup the 4 standby servers
as explained above so setting for them will be:
 - ini file -
[Server]
standby_name = 'AAA'
synchronous_transfer = commit
wal_sender_timeout = 60

[Server]
standby_name = 'BBB'
synchronous_transfer = none
wal_sender_timeout = 40

[Server]
standby_name = 'CCC'
synchronous_transfer = all
wal_sender_timeout = 50

[Server]
standby_name = 'DDD'
synchronous_transfer = data_flush
wal_sender_timeout = 50

so setting up such a scenario through postgresql.conf file is
impossible and if we try to do that it will add lot of complexity to
the code.
so use of ini file will be the very good choice in this case.

Thank you ,
Samrat


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-19 Thread KONDO Mitsumasa


(2013/07/19 0:41), Greg Smith wrote:

On 7/18/13 11:04 AM, Robert Haas wrote:

On a system where fsync is sometimes very very slow, that
might result in the checkpoint overrunning its time budget - but SO
WHAT?


Checkpoints provide a boundary on recovery time.  That is their only purpose.
You can always do better by postponing them, but you've now changed the 
agreement
with the user about how long recovery might take.
Recently, a user who think system availability is important uses synchronous 
replication cluster. And, as Robert saying, a user who cannot build cluster 
system will not use this function in GUC.


When it became IO busy in calling fsync(), my patch does not take the over IO 
load in fsync(). Actually, it is the same as OS writeback structure. I read 
kernel source code which is fs/fs-writeback.c in linux-2.6.32-358.0.1.el6. It is 
latest RHEL6.4 kernel code. It seems that wb_writeback() controlled disk IO in 
OS-writeback function. Please see under source code. If OS think IO is busy, it 
does not write more IO for bail.


fs/fs-writeback.c @wb_writeback()
 623 /*
 624  * For background writeout, stop when we are below the
 625  * background dirty threshold
 626  */
 627 if (work->for_background && !over_bground_thresh())
 628 break;
 629
 630 wbc.more_io = 0;
 631 wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 632 wbc.pages_skipped = 0;
 633
 634 trace_wbc_writeback_start(&wbc, wb->bdi);
 635 if (work->sb)
 636 __writeback_inodes_sb(work->sb, wb, &wbc);
 637 else
 638 writeback_inodes_wb(wb, &wbc);
 639 trace_wbc_writeback_written(&wbc, wb->bdi);
 640 work->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 641 wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 642
 643 /*
 644  * If we consumed everything, see if we have more
 645  */
 646 if (wbc.nr_to_write <= 0)
 647 continue;
 648 /*
 649  * Didn't write everything and we don't have more IO, bail
 650  */
 651 if (!wbc.more_io)
 652 break;
 653 /*
 654  * Did we write something? Try for more
 655  */
 656 if (wbc.nr_to_write < MAX_WRITEBACK_PAGES)
 657 continue;
 658 /*
 659  * Nothing written. Wait for some inode to
 660  * become available for writeback. Otherwise
 661  * we'll just busyloop.
 662  */
 663 spin_lock(&inode_lock);
 664 if (!list_empty(&wb->b_more_io))  {
 665 inode = list_entry(wb->b_more_io.prev,
 666 struct inode, i_list);
 667 trace_wbc_writeback_wait(&wbc, wb->bdi);
 668 inode_wait_for_writeback(inode);
 669 }
 670 spin_unlock(&inode_lock);
 671 }
 672
 673 return wrote;

I want you to read especially point that is line 631, 651, and 656. 
MAX_WRITEBACK_PAGES is 1024 (1024 * 4096 byte). OS writeback scheduler does not 
write over MAX_WRITEBACK_PAGES. Because, if it write big data than 
MAX_WRITEBACK_PAGES, it will be IO-busy. And if it cannot write at all, OS think 
it needs recovery of IO performance. It is same as my patch's logic.


In addition, if you set a large value of a checkpoint_timeout or 
checkpoint_complete_taget, you have said that performance is improved, but is it 
true in all the cases? Since the write of the dirty buffer which passed 30 
seconds or more is carried out at intervals of 5 seconds, as there are many 
recesses of a write, a possibility of becoming an inefficient random write. For 
example, as for the worsening case, when the sleep time for 200 ms is inserted 
each time, since only 25 page (200 KB) can write in 5 seconds. I think it is bad 
efficiency to write. When a checkpoint complication target is actually enlarged, 
performance may fall in some cases. I think this as the last fsync having become 
heavy owing to having write in slowly.


I would like to make a itemizing list which can be proof of my patch from you. 
Because DBT-2 benchmark spent lot of time about 1 setting test per 3 - 4 hours. 
Of course, I think it is important to obtain your consent.


Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo