date:20120626

[HACKERS] Server crash while trying to fetch EXPLAIN query results with a cursor

2012-06-26 Thread Rushabh Lathia

Hi All,

Consider the following testcase:

postgres=# select version();

version
-
 PostgreSQL 9.3devel on i686-pc-linux-gnu, compiled by gcc (Ubuntu
4.4.3-4ubuntu5) 4.4.3, 32-bit
(1 row)

postgres=# do $$
declare
 t cursor for explain SELECT n into tbl from generate_series(1,10) n;
begin
 open t;
end
$$;
The connection to the server was lost. Attempting reset: Failed.

Above testcase endup with server crash. Crash is due to following assert
into ScanQueryForLocks()

Assert(parsetree->commandType != CMD_UTILITY);

Testcase works fine with earlier version.

Description :
==

Here is the stack for the crash:

#0  ScanQueryForLocks (parsetree=0xa304ea4, acquire=1 '\001') at
plancache.c:1324
#1  0x08447e81 in AcquirePlannerLocks (stmt_list=0xa304dcc, acquire=1
'\001') at plancache.c:1306
#2  0x08446c34 in RevalidateCachedQuery (plansource=0xa3041bc) at
plancache.c:460
#3  0x0844743a in GetCachedPlan (plansource=0xa3041bc, boundParams=0x0,
useResOwner=0 '\000') at plancache.c:936
#4  0x0827047a in SPI_cursor_open_internal (name=0xa2fc124 "t",
plan=0xa3059dc, paramLI=0x0, read_only=0 '\000') at spi.c:1187
#5  0x0827024e in SPI_cursor_open_with_paramlist (name=0xa2fc124 "t",
plan=0xa3059dc, params=0x0, read_only=0 '\000') at spi.c:1112
#6  0x002c59ec in exec_stmt_open (estate=0xbfc3c4d0, stmt=0xa2fd4c0) at
pl_exec.c:3534


Here query->commandType is CMD_UTILITY, and its utilitystmt contain
CreateTableAsStmt. Now looking further I found that for any SELECT .. INTO
.. clause we create CreateTableAsStmt to remove the into clause from the
select stmt.

In the above mentioned stack before calling ScanQueryForLocks(),
AcquirePlannerLocks() do having check for CMD_UTILITY and it fetch Query
from the utility stmt. To get the Query stmt from Utilty command it calls
UtilityContainsQuery().

Into UtilityContainsQuery() we need to handle ExplainStmt properlly for the
the SELECT INTO so that it returns proper contained Query Stmt.

PFA patch for the same.

Thanks,


-- 
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8b73858..9286ce5 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1362,9 +1362,20 @@ UtilityContainsQuery(Node *parsetree)
 	switch (nodeTag(parsetree))
 	{
 		case T_ExplainStmt:
-			Assert(IsA(((ExplainStmt *) parsetree)->query, Query));
-			return (Query *) ((ExplainStmt *) parsetree)->query;
-
+			{
+Query *qry;
+Assert(IsA(((ExplainStmt *) parsetree)->query, Query));
+qry = (Query *) ((ExplainStmt *) parsetree)->query;
+/*
+ * Any SELECT..INTO clause get transformed to CreateTableAsStmt,
+ * and thats CMD_UTILITY stmt. So if CMD_UTILITY found for any
+ * ExplainStmt Query then call UtilityConainsQuery() again
+ * to fetch the proper contained query.
+ */
+if (qry->commandType == CMD_UTILITY)
+	qry = UtilityContainsQuery(qry->utilityStmt);
+return qry;
+			}
 		case T_CreateTableAsStmt:
 			/* might or might not contain a Query ... */
 			if (IsA(((CreateTableAsStmt *) parsetree)->query, Query))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Tom Lane

Pavel Stehule  writes:
>> Another thing I don't care for is the unannounced protocol extension.

> yes, it needs protocol extension and increasing version too. But I
> don't afraid about dissynchronisation - server doesn't send 'v'
> message when client doesn't support it.

And you would know that how, exactly?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-26 Thread Kohei KaiGai

2012/6/26 Tom Lane :
> Kohei KaiGai  writes:
>> 2012/6/26 Robert Haas :
>>> I think you're missing the point.  Everyone who has commented on this
>>> issue is in favor of having some check that causes the RLS predicate
>>> *not to get added in the first place*.
>
>> Here is a simple idea to avoid the second problematic scenario; that
>> assign 0 as cost of has_superuser_privilege().
>
> I am not sure which part of "this isn't safe" isn't getting through to
> you.  Aside from the scenarios Robert mentioned, consider the
> possibility that f_malicious() is marked immutable, so that the planner
> is likely to call it (to replace the call with its value) before it will
> ever think about whether has_superuser_privilege should be called first.
>
> Please just do what everybody is asking for, and create a bypass that
> does not require fragile, easily-broken-by-future-changes assumptions
> about what the planner will do with a WHERE clause.
>
The problem is the way to implement it.
If we would have permission checks on planner stage, it cannot handle
a case when user-id would be switched prior to executor stage, thus
it needs something remedy to handle the scenario correctly.
Instead of a unique plan per query, it might be a solution to generate
multiple plans depending on user-id, and choose a proper one in
executor stage.

Which type of implementation is what everybody is asking for?

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Pavel Stehule

2012/6/26 Tom Lane :
> Pavel Stehule  writes:
>> it is not security issue - just I dislike sending complete stack, when
>> just only one variable should be used.
>
> That's a pretty darn weak argument.  If I read the patch correctly, what
> you're proposing involves a dynamic fetch from the client at runtime,
> which is going to be disastrous for performance.  Quite aside from the
> network round trip involved, the fetch function would have to be marked
> volatile (since it has client-visible side-effects, not to mention that
> we don't know when the client might change the variable value); which
> would really hurt any query involving it, and probably lead to yet more
> round trips.

I didn't implement any optimization, because it is just concept, but
server side caching is possible. Then only "first read" and any
"write" can do some network communication.

>
> Pushing over the known values once at session start (and individual
> values after updates) is likely to be vastly better-performant than
> this.  Matters could be improved further by requiring variables to be
> sent to the server to be explicitly marked, which seems like a good
> idea anyway in case anybody has security concerns that they're not
> going to let you airily dismiss.
>

this is decision between push and pull model. Both variants has own
issues and benefits. Probably pull model has more complex changes in
protocol implementation. Push model needs more code on client side.
Propagation psql variables should be enabled some command line option
and can be disabled by default.

> Another thing I don't care for is the unannounced protocol extension.
> This feature is just not interesting enough to justify breaking
> client compatibility, but that's what it would do as proposed.
> Clients that haven't heard of this 'v' message would probably
> think they'd lost sync and drop the connection.
>

yes, it needs protocol extension and increasing version too. But I
don't afraid about dissynchronisation - server doesn't send 'v'
message when client doesn't support it.

> (BTW, the patch doesn't seem to include the added backend source file?)

The goal of this patch is showing requested functionality and checking
how hard is implementation

Regards

Pavel

>
>                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizer Path Candidates difference in 9.1.3 and 9.2 beta1

2012-06-26 Thread Qi Huang



> [ shrug... ] When you're not showing us exactly what you did, it's hard
> to answer that for sure. But there is some prefiltering logic in
> joinpath.c that you might have to lobotomize too if you want to keep
> known-inferior join paths.
> 
> regards, tom lane

Thanks, Tom.
Below is what I did for the code in add_path(). I commented out the section of 
remove_old, and also make sure always accept_new. Then at make_one_rel(), after 
calling "rel = make_rel_from_joinlist(root, joinlist); ", I set "pprint(rel)" 
in the next line. Checking the RelOptInfo printed out in logfile, I can see 
some bushy trees, in 9.1.3. 

267 >---/*268 >--- * Loop to check proposed new path against old paths.  Note 
it is possible269 >--- * for more than one old path to be tossed out because 
new_path dominates270 >--- * it.271 >--- *272 >--- * We can't use foreach here 
because the loop body may delete the current273 >--- * list cell.274 >--- */275 
>---p1_prev = NULL;276 >---for (p1 = list_head(parent_rel->pathlist); p1 != 
NULL; p1 = p1_next)277 >---{   ...338339 
>--->---/* 340 >--->--- * Remove current element from pathlist if dominated by 
new. 341 >--->--- */ 342 //>->---if (remove_old) 343 //>->---{ 344 
//>->--->---parent_rel->pathlist = list_delete_cell(parent_rel->pathlist, 345 
//>->--->--->--->--->--->--->--->--->--->--->--->---p1, p1_prev);   
   346   347 >--->--->---/* 348 
>--->--->--- * Delete the data pointed-to by the deleted cell, if possible 349 
>--->--->--- */ 350 //>->--->---if (!IsA(old_path, IndexPath)) 351 /
 />->--->--->---pfree(old_path); 352 >--->--->---/* p1_prev does not advance */ 
353 //>->---} 354 //>->---else 355 //>->---{ 356 >--->--->---/* new belongs 
after this old path if it has cost >= old's */ 357 >--->--->---if (costcmp >= 
0) 358 >--->--->--->---insert_after = p1; 359 >--->--->---/* p1_prev advances 
*/ 360 >--->--->---p1_prev = p1; 361 //>->---}362   363 >--->---/* 364 >--->--- 
* If we found an old path that dominates new_path, we can quit 365 >--->--- * 
scanning the pathlist; we will not add new_path, and we assume 366 >--->--- * 
new_path cannot dominate any other elements of the pathlist. 367 >--->--- */ 
368 >--->---if (!accept_new) 369 >--->--->---break; 370 >---} 371   372 //>-if 
(accept_new)
373 //>-{ 374 >--->---/* Accept the 
new path: insert it at proper place in pathlist */ 375 >--->---if 
(insert_after) 376 >--->--->---lappend_cell(parent_rel->pathlist, in
 sert_after, new_path); 377 >--->---else 378 >--->--->---parent_rel->pathlist = 
lcons(new_path, parent_rel->pathlist); 379 //>-} 380 //>-else 381 //>-{ 382 
>--->---/* Reject and recycle the new path */ 383 //>->---if (!IsA(new_path, 
IndexPath)) 384 //>->--->---pfree(new_path); 385 //>-}


Best RegardsHuang Qi VictorComputer Science of National University of Singapore

Re: [HACKERS] Optimizer Path Candidates difference in 9.1.3 and 9.2 beta1

2012-06-26 Thread Tom Lane

Qi Huang  writes:
> Hi, hackersI modified the code in add_path() a bit so that all the query 
> path candidates inside pathlist will not be removed and all new path will be 
> added into the pathlist, thus all path candidates are kept in pathlist. I 
> then tested a four-relation query. In 9.1.3, I can see thousands of 
> candidates in the final RelOptInfo, and some of them are even busy trees. But 
> in 9.2 beta1 which I forked from github, there are no such busy trees and 
> only about 50 join path in total, which should match the requirement of 
> System R algo. Is there any modification regarding the system R algo in the 
> new release? And something wrong in algo in 9.1.3?Thanks

[ shrug... ]  When you're not showing us exactly what you did, it's hard
to answer that for sure.  But there is some prefiltering logic in
joinpath.c that you might have to lobotomize too if you want to keep
known-inferior join paths.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

Robert Haas  writes:
> So, here's a patch.  Instead of using POSIX shmem, I just took the
> expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS
> memory.  The sysv shm is still allocated, but it's just a copy of
> PGShmemHeader; the "real" shared memory is the anonymous block.  This
> won't work if EXEC_BACKEND is defined so it just falls back on
> straight sysv shm in that case.

Um.  I hadn't thought about the EXEC_BACKEND interaction, but that seems
like a bit of a showstopper.  I would not like to give up the ability
to debug EXEC_BACKEND mode on Unixen.

Would Posix shmem help with that at all?  Why did you choose not to
use the Posix API, anyway?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-26 Thread Etsuro Fujita

Hi,

> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Wednesday, June 27, 2012 5:09 AM
> To: Etsuro Fujita
> Cc: Ants Aasma; Jay Levitt; Tom Lane; PostgreSQL-development; Francois Deliege
> Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
needed
> 
> On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas  wrote:
> > On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
> >  wrote:
> >>> I'm confused by this remark, because surely the query planner does it this
> >>> way only if there's no LIMIT.  When there is a LIMIT, we choose based on
> >>> the startup cost plus the estimated fraction of the total cost we expect
> >>> to pay based on dividing the LIMIT by the overall row count estimate.  Or
> >>> is this not what you're talking about?
> >>
> >> I think that Ants is pointing the way of estimating costs in
> >> choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
> >> cheapest_path + hashagg, where the costs are calculated based on the total
> >> cost only of cheapest_path.  I think that it might be good to do cost_agg()
> >> for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
> >> strategy.
> >
> > Well, Ants already made some adjustments to those functions; not sure
> > if this means they need some more adjustment, but I don't see that
> > there's a general problem with the costing algorithm around LIMIT.
> 
> Ants, do you intend to update this patch for this CommitFest?  Or at
> all?  It seems nobody's too excited about this, so I'm not sure
> whether it makes sense for you to put more work on it.  But please
> advise as to your plans.

Please excuse my slow response, I would also like to know your plan.

Best regards,
Etsuro Fujita

> Thanks,
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Optimizer Path Candidates difference in 9.1.3 and 9.2 beta1

2012-06-26 Thread Qi Huang


Hi, hackersI modified the code in add_path() a bit so that all the query 
path candidates inside pathlist will not be removed and all new path will be 
added into the pathlist, thus all path candidates are kept in pathlist. I then 
tested a four-relation query. In 9.1.3, I can see thousands of candidates in 
the final RelOptInfo, and some of them are even busy trees. But in 9.2 beta1 
which I forked from github, there are no such busy trees and only about 50 join 
path in total, which should match the requirement of System R algo. Is there 
any modification regarding the system R algo in the new release? And something 
wrong in algo in 9.1.3?Thanks
   

Best RegardsHuang Qi VictorComputer Science of National University of Singapore

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 6:25 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> So let's fix the 80% case with something we feel confident in, and then
>> revisit the no-sysv interlock as a separate patch.  That way if we can't
>> fix the interlock issues, we still have a reduced-shmem version of Postgres.
>
> Yes.  Insisting that we have the whole change in one patch is a good way
> to prevent any forward progress from happening.  As Alvaro noted, there
> are plenty of issues to resolve without trying to change the interlock
> mechanism at the same time.

So, here's a patch.  Instead of using POSIX shmem, I just took the
expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS
memory.  The sysv shm is still allocated, but it's just a copy of
PGShmemHeader; the "real" shared memory is the anonymous block.  This
won't work if EXEC_BACKEND is defined so it just falls back on
straight sysv shm in that case.

There are obviously some portability issues here - this is documented
not to work on Linux <= 2.4, but it's not clear whether it fails with
some suitable error code or just pretends to work and does the wrong
thing.  I tested that it does compile and work on both Linux 3.2.6 and
MacOS X 10.6.8.  And the comments probably need work and... who knows
what else is wrong.  But, thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

anonymous-shmem.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-26 Thread Etsuro Fujita

Hi Kaigai-san,

> -Original Message-
> From: Kohei KaiGai [mailto:kai...@kaigai.gr.jp]
> Sent: Tuesday, June 26, 2012 11:05 PM
> To: Etsuro Fujita
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
> 
> 2012/6/26 Etsuro Fujita :
> > Hi Kaigai-san,
> >
> >> -Original Message-
> >> From: Kohei KaiGai [mailto:kai...@kaigai.gr.jp]
> >> Sent: Monday, June 25, 2012 9:49 PM
> >> To: Etsuro Fujita
> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> >> foreign tables
> >>
> >> Fujita-san,
> >>
> >> The revised patch is almost good for me.
> >> Only point I noticed is the check for redundant or duplicated options I
> > pointed
> >> out on the previous post.
> >> So, if you agree with the attached patch, I'd like to hand over this patch
> for
> >> committers.
> >
> > OK I agree with you.  Thanks for the revision!
> >
> > Besides the revision, I modified check_selective_binary_conversion() to run
> > heap_close() in the whole-row-reference case.  Attached is an updated
version
> of
> > the patch.
> >
> Sorry, I overlooked this code path.

No, It's my fault.

> So, I'd like to take over this patch for committers.

Thanks,

Best regards,
Etsuro Fujita

> 
> Thanks,
> 
> > Thanks.
> >
> > Best regards,
> > Etsuro Fujita
> >
> >> All I changed is:
> >>  --- a/src/backend/commands/copy.c
> >>  +++ b/src/backend/commands/copy.c
> >>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@
index
> >> 98bcb2f..0143d81 100644
> >>         }
> >>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
> >>  +      {
> >> -+          if (cstate->convert_binary)
> >> ++          if (cstate->convert_selectively)
> >>  +              ereport(ERROR,
> >>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
> >>  +                       errmsg("conflicting or redundant options")));
> >>
> >> Thanks,
> >>
> >> 2012/6/20 Etsuro Fujita :
> >> > Hi KaiGai-san,
> >> >
> >> > Thank you for the review.
> >> >
> >> >> -Original Message-
> >> >> From: pgsql-hackers-ow...@postgresql.org
> >> >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
> >> >> Sent: Wednesday, June 20, 2012 1:26 AM
> >> >> To: Etsuro Fujita
> >> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
> >> >> file foreign tables
> >> >>
> >> >> Hi Fujita-san,
> >> >>
> >> >> Could you rebase this patch towards the latest tree?
> >> >> It was unavailable to apply the patch cleanly.
> >> >
> >> > Sorry, I updated the patch.  Please find attached an updated version
> >> > of the patch.
> >> >
> >> >> I looked over the patch, then noticed a few points.
> >> >>
> >> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
> >> >> If so, cstate->convert_binary is not a suitable flag to check
> >> >> redundant option. It seems to me cstate->convert_selectively is more
> >> >> suitable flag to check it.
> >> >>
> >> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
> >> >> +       {
> >> >> +           if (cstate->convert_binary)
> >> >> +               ereport(ERROR,
> >> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
> >> >> +                        errmsg("conflicting or redundant
> >> >> + options")));
> >> >> +           cstate->convert_selectively = true;
> >> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
> >> >> +               cstate->convert_binary = (List *) defel->arg;
> >> >> +           else
> >> >> +               ereport(ERROR,
> >> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> >> +                        errmsg("argument to option \"%s\" must be a
> >> >> list of column names",
> >> >> +                               defel->defname)));
> >> >> +       }
> >> >
> >> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
> >> > all the columns needed to be converted to binary representation, which
> >> > is NIL for the case where no columns are needed to be converted.  For
> >> > example,
> >> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
> >> > cstate->convert_selectively is set to true, no columns are converted
> >> > cstate->at
> >> > NextCopyFrom().  Most efficient case!  In short,
> >> > cstate->convert_selectively represents whether to do selective binary
> >> > conversion at NextCopyFrom(), and
> >> > cstate->convert_binary represents all the columns to be converted at
> >> > NextCopyFrom(), that can be NIL.
> >> >
> >> >> At NextCopyFrom(), this routine computes default values if configured.
> >> >> In case when these values are not referenced, it might be possible to
> >> >> skip unnecessary calculations.
> >> >> Is it unavailable to add logic to avoid to construct cstate->defmap
> >> >> on unreferenced columns at  BeginCopyFrom()?
> >> >
> >>

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

I wrote:
> Reflecting on this further, it seems to me that the main remaining
> failure modes are (1) file locking doesn't work, or (2) idiot DBA
> manually removes the lock file.

Oh, wait, I just remembered the really fatal problem here: to quote from
the SUS fcntl spec,

All locks associated with a file for a given process are removed
when a file descriptor for that file is closed by that process
or the process holding that file descriptor terminates.

That carefully says "a file descriptor", not "the file descriptor
through which the lock was acquired".  Any close() referencing the lock
file will do.  That means that it is possible for perfectly innocent
code --- for example, something that scans all files in the data
directory, as say pg_basebackup might do --- to cause a backend process
to lose its lock.  When we looked at this before, it seemed like a
showstopper.  Even if we carefully taught every directory-scanning loop
in postgres not to touch the lock file, we cannot expect that for
instance a pl/perl function wouldn't accidentally break things.  And
99.999% of the time nobody would notice ... it would just be that last
0.001% of people that would be screwed.

Still, this discussion has yielded a useful advance, which is that we
now see how we might safely make use of lock mechanisms that don't
inherit across fork().  We just need something less broken than fcntl().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread David Fetter

On Tue, Jun 26, 2012 at 05:05:27PM -0500, Kevin Grittner wrote:
> David Fetter  wrote:
> > On Mon, Jun 25, 2012 at 01:50:54PM -0500, Kevin Grittner wrote:
>  
> >> One fine point regarding before and after images -- if a value
> >> doesn't change in an UPDATE, there's no reason to include it in
> >> both the BEFORE and AFTER tuple images, as long as we have the
> >> null column bitmaps -- or some other way of distinguishing
> >> unchanged from NULL.  (This could be especially important when
> >> the unchanged column was a 50 MB bytea.)
> > 
> > How about two bitmaps: one telling which columns are actually
> > there, the other with NULLs?
>  
> There are quite a few ways that could be done, but I suspect
> Álvaro's idea is best:
>  
> http://archives.postgresql.org/message-id/1340654533-sup-5...@alvh.no-ip.org

Looks great (or at least way better than mine) to me :)

> In any event, it sounds like Andres wants to keep it as simple as
> possible for the moment, and just include both tuples in their
> entirety.  Hopefully that is something which can be revisited before
> the last CF.

I hope so, too...

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

"A.M."  writes:
> On 06/26/2012 07:30 PM, Tom Lane wrote:
>>> I solved this via fcntl locking.

>> No, you didn't, because fcntl locks aren't inherited by child processes.
>> Too bad, because they'd be a great solution otherwise.

> You claimed this last time and I replied:
> http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php

> "I address this race condition by ensuring that a lock-holding violator 
> is the postmaster or a postmaster child. If such as condition is 
> detected, the child exits immediately without touching the shared 
> memory. POSIX shmem is inherited via file descriptors."

> This is possible because the locking API allows one to request which PID 
> violates the lock. The child expects the lock to be held and checks that 
> the PID is the parent. If the lock is not held, that means that the 
> postmaster is dead, so the child exits immediately.

OK, I went back and re-read the original patch, and I now agree that
something like this is possible --- but I don't like the way you did
it. The dependence on particular PIDs seems both unnecessary and risky.

The key concept here seems to be that the postmaster first stakes a
claim on the data directory by exclusive-locking a lock file.  If
successful, it reduces that lock to shared mode (which can be done
atomically, according to the SUS fcntl specification), and then holds
the shared lock until it exits.  Spawned children will not initially
have a lock, but what they can do is attempt to acquire shared lock on
the lock file.  If fail, exit.  If successful, *check to see that the
parent postmaster is still alive* (ie, getppid() != 1).  If so, the
parent must have been continuously holding the lock, and the child has
successfully joined the pool of shared lock holders.  Otherwise bail
out without having changed anything.  It is the "parent is still alive"
check, not any test on individual PIDs, that makes this work.

There are two concrete reasons why I don't care for the
GetPIDHoldingLock() way.  Firstly, the fact that you can get a blocking
PID from F_GETLK isn't an essential part of the concept of file locking
IMO --- it's just an incidental part of this particular API.  May I
remind you that the reason we're stuck on SysV shmem in the first place
is that we decided to depend on an incidental part of that API, namely
nattch?  I would like to not require file locking to have any semantics
more specific than "a process can hold an exclusive or a shared lock on
a file, which is auto-released at process exit".  Secondly, in an NFS
world I don't believe that the returned l_pid value can be trusted for
anything.  If it's a PID from a different machine then it might
accidentally conflict with one on our machine, or not.

Reflecting on this further, it seems to me that the main remaining
failure modes are (1) file locking doesn't work, or (2) idiot DBA
manually removes the lock file.  Both of these could be ameliorated
with some refinements to the basic idea.  For (1), I suggest that
we tweak the startup process (only) to attempt to acquire exclusive lock
on the lock file.  If it succeeds, we know that file locking is broken,
and we can complain.  (This wouldn't help for cases where cross-machine
locking is broken, but I see no practical way to detect that.)
For (2), the problem really is that the proposed patch conflates the PID
file with the lock file, but people are conditioned to think that PID
files are removable.  I suggest that we create a separate, permanently
present file that serves only as the lock file and doesn't ever get
modified (it need have no content other than the string "Don't remove
this!").  It'd be created by initdb, not by individual postmaster runs;
indeed the postmaster should fail if it doesn't find the lock file
already present.  The postmaster PID file should still exist with its
current contents, but it would serve mostly as documentation and as
server-contact information for pg_ctl; it would not be part of the data
directory locking mechanism.

I wonder whether this design can be adapted to Windows?  IIRC we do
not have a bulletproof data directory lock scheme for Windows.
It seems like this makes few enough demands on the lock mechanism
that there ought to be suitable primitives available there too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 6:20 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> So, what about keeping a FIFO in the data directory?
>
> Hm, does that work if the data directory is on NFS?  Or some other weird
> not-really-Unix file system?

I would expect NFS to work in general.  We could test that.  Of
course, it's more than possible that there's some bizarre device out
there that purports to be NFS but doesn't actually support mkfifo.
It's difficult to prove a negative.

>> When the
>> postmaster starts up, it tries to open the file with O_NONBLOCK |
>> O_WRONLY (or O_NDELAY | O_WRONLY, if the platform has O_NDELAY rather
>> than O_NONBLOCK).  If that succeeds, it bails out.  If it fails with
>> anything other than ENXIO, it bails out.  If it fails with exactly
>> ENXIO, then it opens the pipe with O_RDONLY
>
> ... race condition here ...

Oh, if someone tries to start two postmasters at the same time?  Hmm.

>> and arranges to pass the
>> file descriptor down to all of its children, so that a subsequent open
>> will fail if it or any of its children are still alive.
>
> This might be made to work, but that doesn't sound quite right in
> detail.
>
> I remember we speculated about using an fcntl lock on some file in the
> data directory, but that fails because child processes don't inherit
> fcntl locks.
>
> In the modern world, it'd be really a step forward if the lock mechanism
> worked on shared storage, ie a data directory on NFS or similar could be
> locked against all comers not just those on the same node as the
> original postmaster.  I don't know how to do that though.

Well, I think that in theory that DOES work.  But I also think it's
often misconfigured.  Which could also be said of NFS in general.

> In the meantime, insisting that we solve this problem before we do
> anything is a good recipe for ensuring that nothing happens, just
> like it hasn't happened for the last half dozen years.  (I see Alvaro
> just made the same point.)

Agreed all around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread A.M.


On 06/26/2012 07:15 PM, Alvaro Herrera wrote:


Excerpts from Tom Lane's message of mar jun 26 18:58:45 -0400 2012:


Even if you actively try to configure the shmem settings to exactly
fill shmmax (which I concede some installation scripts might do),
it's going to be hard to do because of the 8K granularity of the main
knob, shared_buffers.


Actually it's very easy -- just try to start postmaster on a system with
not enough shmmax and it will tell you how much shmem it wants.  Then
copy that number verbatim in the config file.  This might fail on picky
systems such as MacOSX that require some exact multiple or power of some
other parameter, but it works fine on Linux.



Except that we have to account for other installers. A user can install 
an application in the future which clobbers the value and then the 
original application will fail to run. The options to get the first app 
working is:


a) to re-install the first app (potentially preventing the second app 
from running)
b) to have the first app detect the failure and readjust the value 
(guessing what it should be) and potentially forcing a reboot
c) to have the the user manually adjust the value and potentially force 
a reboot


The failure usually gets blamed on the first application.

That's why we had to nuke SysV shmem.

Cheers,
M



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread A.M.


On 06/26/2012 07:30 PM, Tom Lane wrote:

"A.M."  writes:

On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:

I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way.



I solved this via fcntl locking.


No, you didn't, because fcntl locks aren't inherited by child processes.
Too bad, because they'd be a great solution otherwise.



You claimed this last time and I replied:
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php

"I address this race condition by ensuring that a lock-holding violator 
is the postmaster or a postmaster child. If such as condition is 
detected, the child exits immediately without touching the shared 
memory. POSIX shmem is inherited via file descriptors."


This is possible because the locking API allows one to request which PID 
violates the lock. The child expects the lock to be held and checks that 
the PID is the parent. If the lock is not held, that means that the 
postmaster is dead, so the child exits immediately.


Cheers,
M

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-26 Thread Andres Freund

Hi Steve,

On Tuesday, June 26, 2012 02:14:22 AM Steve Singer wrote:
> I planned to have some cutoff 'max_changes_in_memory_per_txn' value.
> > If it has
> > been reached for one transaction all existing changes are spilled to
> > disk. New changes again can be kept in memory till its reached again.
> Do you want max_changes_per_in_memory_txn or do you want to put a limit
> on the total amount of memory that the cache is able to use? How are you
> going to tell a DBA to tune max_changes_in_memory_per_txn? They know how
> much memory their system has and that they can devote to the apply cache
> versus other things, giving them guidance on how estimating how much
> open transactions they might have at a point in time  and how many
> WAL change records each transaction generates seems like a step
> backwards from the progress we've been making in getting Postgresql to
> be easier to tune.  The maximum number of transactions that could be
> opened at a time is governed by max_connections on the master at the
> time the WAL was generated , so I don't even see how the machine
> processing the WAL records could autotune/guess that.
It even can be significantly higher than max_connections because 
subtransactions are only recognizable as part of their parent transaction 
uppon commit.

I think max_changes_in_memory_per_txn will be the number of changes for now. 
Making memory based accounting across multiple concurrent transactions work 
efficiently and correctly isn't easy.


> > We need to support serializing the cache for crash recovery + shutdown of
> > the receiving side as well. Depending on how we do the wal decoding we
> > will need it more frequently...
> Have you described your thoughts on crash recovery on another thread?
I think I have somewhere, but given how much in flux our thoughts on decoding 
are I think its not that important yet.

> I am thinking that this module would have to serialize some state
> everytime it calls cache->commit() to ensure that consumers don't get
> invoked twice on the same transaction.
In one of the other patches I implemented it by adding the (origin_id, 
origin_lsn) pair to replicated commits. During recovery the startup process 
sets up the shared memory status up to which point we applied.
If you then every now and then perform a 'logical checkpoint' writing down 
whats the beginning lsn of the longest in-progress transaction is you can 
fully recover from that point on.

> If the apply module is making changes to the same backend that the apply
> cache serializes to then both the state for the apply cache and the
> changes that committed changes/transactions make will be persisted (or
> not persisted) together.   What if I am replicating from x86 to x86_64
> via a apply module that does textout conversions?
> 
> x86 Proxy x86_64
> WAL--> apply
>   cache
> 
>|   (proxy catalog)
> 
>   apply module
>textout  ->
>SQL statements
> 
> 
> How do we ensure that the commits are all visible(or not visible)  on
> the catalog on the proxy instance used for decoding WAL, the destination
> database, and the state + spill files of the apply cache stay consistent
> in the event of a crash of either the proxy or the target?
> I don't think you can (unless we consider two-phase commit, and I'd
> rather we didn't).  Can we come up with a way of avoiding the need for
> them to be consistent with each other?
Thats discussed in the "Catalog/Metadata consistency during changeset 
extraction from wal" thread and we haven't yet determined which solution is 
the best ;)

> Code Review
> =
> 
> applycache.h
> ---
> +typedef struct ApplyCacheTupleBuf
> +{
> +/* position in preallocated list */
> +ilist_s_node node;
> +
> +HeapTupleData tuple;
> +HeapTupleHeaderData header;
> +char data[MaxHeapTupleSize];
> +} ApplyCacheTupleBuf;
> 
> Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big
> the data in the transaction is? Wouldn't workloads with inserts of lots
> of small rows in a transaction eat up lots of memory that is allocated
> but storing nothing?  The only alternative I can think of is dynamically
> allocating these and I don't know what the cost/benefit of that overhead
> will be versus spilling to disk sooner.
Dynamically allocating them totally destroys performance, I tried that. I 
think at some point we should have 4 or so list of preallocated tuple bufs of 
different sizes and then use the smallest possible one. But I think this 
solution is ok in the very first version.

If you allocate dynamically you also get a noticeable performance drop when 
you let the decoding run for a while because of fragmentation inside the 
memory allocator.

> +* FIXME: better name
> + */
> +ApplyCacheChange*
> +ApplyCacheGetChange(ApplyCache*

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

"A.M."  writes:
> On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:
>> I'm simply suggesting that for additional benefits it may be worth
>> thinking about getting around nattach and thus SysV shmem, especially
>> with regard to safety, in an open-ended way.

> I solved this via fcntl locking.

No, you didn't, because fcntl locks aren't inherited by child processes.
Too bad, because they'd be a great solution otherwise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] embedded list v2

2012-06-26 Thread Andres Freund

Hi,

To recapitulate why I think this sort of embedded list is worthwile:
* minimal memory overhead (16 bytes for double linked list heads/nodes on 
64bit systems)
* no additional memory allocation overhead
* no additional dereference to access the contents of a list element
* most modifications are completely branchless
* the current dllist.h interface has double the memory overhead and much more 
complex manipulation operators
* Multiple places in postgres have grown local single or double linked list 
implementations
* I need it ;)

Attached are three patches:
1. embedded list implementation
2. make the list implementation usable without USE_INLINE
3. convert all callers to ilist.h away from dllist.h

For 1 I:
a. added more comments and some introduction, some more functions
b. moved the file from utils/ilist.h to lib/ilist.h
c. actually included the c file with the check functions
d. did *not* split it up into single/double linked list files, doesn't seem to 
be worth the trouble given how small ilist.(c|h) are
e. did *not* try to get an interface similar to dllist.h. I don't think the 
old one is better and it makes the breakage more obvious should somebody else 
use the old implementation although I doubt it.

I can be convinced to do d. and e. but I don't think they are an improvement.

For 2 I used ugly macro hackery to avoid declaring every function twice, in a 
c file and in a header.

Opinions on the state of the above patches?

I did not expect any performance difference in the current usage, but just to 
be sure I ran the following tests:

connection heavy:
pgbench -n -S  -p 5501 -h /tmp -U andres postgres -c 16 -j 16 -T 10 -C
master: 3109 3024 3012
ilist:  3097 3033 3024

somewhat SearchCatCache heavy:
pgbench -n -S  -p 5501 -h /tmp -U andres postgres -T 100 -c 16 -j 1
master: 98979.453879 99554.485631 99393.587880
ilist:  98960.545559 99583.319870 99498.923273

As expected the differences are on the level of noise...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 2e9d955fbb625004061509a62ecca83fde68d027 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 6 May 2012 00:26:35 +0200
Subject: [PATCH 1/3] Add embedded list interface (header only)

Adds a single and a double linked list which can easily embedded into other
datastructures and can be used without any additional allocations.

Problematic: It requires USE_INLINE to be used. It could be remade to fallback
to to externally defined functions if that is not available but that hardly
seems sensibly at this day and age. Besides, the speed hit would be noticeable
and its only used in new code which could be disabled on machines - given they
still exists - without proper support for inline functions

 3 files changed, 509 insertions(+), 1 deletion(-)

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 2e1061e..c94297a 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = dllist.o stringinfo.o
+OBJS = dllist.o stringinfo.o ilist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/ilist.c b/src/backend/lib/ilist.c
new file mode 100644
index 000..72de7a3
--- /dev/null
+++ b/src/backend/lib/ilist.c
@@ -0,0 +1,79 @@
+/*-
+ *
+ * ilist.c
+ *	  support for integrated/inline double and single linked lists
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/ilist.c
+ *
+ * NOTES
+ *
+ *	  This function only contains testing code for inline single/double linked
+ *	  lists.
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "lib/ilist.h"
+
+#ifdef ILIST_DEBUG
+void ilist_d_check(ilist_d_head* head)
+{
+ilist_d_node *cur;
+
+if(!head ||
+   !(&head->head))
+	elog(ERROR, "double linked list head is not properly initialized");
+
+for(cur = head->head.next;
+cur != &head->head;
+cur = cur->next){
+	if(!(cur) ||
+	   !(cur->next) ||
+	   !(cur->prev) ||
+	   !(cur->prev->next == cur) ||
+	   !(cur->next->prev == cur))
+	{
+		elog(ERROR, "double linked list is corrupted");
+	}
+}
+
+for(cur = head->head.prev;
+cur != &head->head;
+cur = cur->prev){
+	if(!(cur) ||
+	   !(cur->next) ||
+	   !(cur->prev) ||
+	   !(cur->prev->next == cur) ||
+	   !(cur->next->prev == cur))
+	{
+		elog(ERROR, "double linked list is corrupted");
+	}
+}
+}
+
+void ilist_s_check(ilist_s_head* head)
+{
+ilist_s_node *cur;
+
+if(!head ||
+   !(&head->head))
+	elog(E

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Alvaro Herrera


Excerpts from Tom Lane's message of mar jun 26 18:58:45 -0400 2012:

> Even if you actively try to configure the shmem settings to exactly
> fill shmmax (which I concede some installation scripts might do),
> it's going to be hard to do because of the 8K granularity of the main
> knob, shared_buffers.

Actually it's very easy -- just try to start postmaster on a system with
not enough shmmax and it will tell you how much shmem it wants.  Then
copy that number verbatim in the config file.  This might fail on picky
systems such as MacOSX that require some exact multiple or power of some
other parameter, but it works fine on Linux.

I think the minimum you can request, at least on Linux, is 1 byte.

> Moreover, a installation script that did that
> would soon learn not to, because of the fact that we don't worry too
> much about changing small details of shared memory consumption in minor
> releases.

+1

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-26 Thread Nils Goroll

> It's
> still unproven whether it'd be an improvement, but you could expect to
> prove it one way or the other with a well-defined amount of testing.

I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see
attached patch. The patch is for the git head, but it can easily be applied for
9.1.3, which is what I did for my tests.

This had disastrous effects on Solaris because it does not use anything similar
to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do
without syscalls for the simple case).

But I was surprised to see that it works relatively well on linux. Here's a
glimpse of my results:

hacked code 9.1.3:

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time
./postgres -D ../data -p 55502 & ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ;
./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid
sending incremental file list
...
ransaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 768
number of threads: 128
number of transactions per client: 20
number of transactions actually processed: 15360/15360
tps = 476.873261 (including connections establishing)
tps = 485.964355 (excluding connections establishing)
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
-bash-4.1$ LOG:  shutting down
LOG:  database system is shut down
210.58user 78.88system 0:50.64elapsed 571%CPU (0avgtext+0avgdata
1995968maxresident)k
0inputs+1153872outputs (0major+2464649minor)pagefaults 0swaps

original code (vanilla build on amd64) 9.1.3:

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time
./postgres -D ../data -p 55502 & ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ;
./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid
sending incremental file list
...
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 768
number of threads: 128
number of transactions per client: 20
number of transactions actually processed: 15360/15360
tps = 499.993685 (including connections establishing)
tps = 510.410883 (excluding connections establishing)
LOG:  received smart shutdown request
-bash-4.1$ LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
196.21user 71.38system 0:47.99elapsed 557%CPU (0avgtext+0avgdata
1360800maxresident)k
0inputs+1147904outputs (0major+2375965minor)pagefaults 0swaps


config:

-bash-4.1$ egrep '^[a-z]' /tmp/test_template_data/postgresql.conf
max_connections = 1800  # (change requires restart)
shared_buffers = 10GB   # min 128kB
temp_buffers = 64MB # min 800kB
work_mem = 256MB# min 64kB,d efault 1MB
maintenance_work_mem = 2GB  # min 1MB, default 16MB
bgwriter_delay = 10ms   # 10-1ms between rounds
bgwriter_lru_maxpages = 1000# 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10.0  # 0-10.0 multipler on buffers 
scanned/round
wal_level = hot_standby # minimal, archive, or hot_standby
wal_buffers = 64MB  # min 32kB, -1 sets based on 
shared_buffers
commit_delay = 1# range 0-10, in microseconds
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error 
message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'  # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.5  # same scale as above (default: 4.0)
cpu_tuple_cost = 0.005
cpu_index_tuple_cost = 0.0025
cpu_operator_cost = 0.0001
effective_cache_size = 192GB



So it looks like using pthread_mutexes could at least be an option on Linux.

Using futexes directly could be even cheaper.


As a side note, it looks like I have not expressed myself clearly:

I did not intend to suggest to replace proven, working code (which probably is
the best you can get for some platforms) with posix calls. I apologize for the
provocative question.


Regarding the actual production issue, I did not manage to synthetically provoke
the saturation we are seeing in production using pgbench - I could not even get
anywhere near the production load. So I cannot currently test if reducing the
amount of spinning and waking up exactly one waiter (which is what linux/nptl
pthread_mutex_unlock does) would solve/mitigate the production issue I am
working on, and I'd highly appreciate any pointers in this direction.

Cheers, Nils
diff --git a/src/backend/storage/lmgr/s_lock.c 
b/src/backend/storage/lmgr/s_lock.c
index bc8d89f..a45fdf6 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -20,6 +20,8 @

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

"A.M."  writes:
> This can be trivially reproduced if one runs an old (SysV shared 
> memory-based) postgresql alongside a potentially newer postgresql with a 
> smaller SysV segment. This can occur with applications that bundle postgresql 
> as part of the app.

I don't believe that that case is a counterexample to what's being
proposed (namely, grabbing a minimum-size shmem segment, perhaps 1K).
It would only fail if the old postmaster ate up *exactly* SHMMAX worth
of shmem, which is not real likely.  As a data point, on my Mac laptop
with SHMMAX set to 32MB, 9.2 will by default eat up 31624KB, leaving
more than a meg available.  Sure, that isn't enough to start another
old-style postmaster, but it would be plenty of room for one that only
wants 1K.

Even if you actively try to configure the shmem settings to exactly
fill shmmax (which I concede some installation scripts might do),
it's going to be hard to do because of the 8K granularity of the main
knob, shared_buffers.  Moreover, a installation script that did that
would soon learn not to, because of the fact that we don't worry too
much about changing small details of shared memory consumption in minor
releases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Kevin Grittner

Tom Lane  wrote:
 
> In the meantime, insisting that we solve this problem before we do
> anything is a good recipe for ensuring that nothing happens, just
> like it hasn't happened for the last half dozen years.  (I see
> Alvaro just made the same point.)
 
And now so has Josh.
 
+1 from me, too.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

Josh Berkus  writes:
> So let's fix the 80% case with something we feel confident in, and then
> revisit the no-sysv interlock as a separate patch.  That way if we can't
> fix the interlock issues, we still have a reduced-shmem version of Postgres.

Yes.  Insisting that we have the whole change in one patch is a good way
to prevent any forward progress from happening.  As Alvaro noted, there
are plenty of issues to resolve without trying to change the interlock
mechanism at the same time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread A.M.


On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:
> 
> (Emphasis mine).
> 
> I don't think that -hackers at the time gave the zero-shmem rationale
> much weight (I also was not that happy about the safety mechanism of
> that patch), but upon more reflection (and taking into account *other*
> software that may mangle shmem settings) I think it's something at
> least worth thinking about again one more time.  What killed the patch
> was an attachment to the deemed-less-safe stategy for avoiding bogus
> shmem attachments already in it, but I don't seem to recall anyone
> putting a whole lot of thought at the time into the zero-shmem case
> from what I could read on the list, because a small interlock with
> nattach seemed good-enough.
> 
> I'm simply suggesting that for additional benefits it may be worth
> thinking about getting around nattach and thus SysV shmem, especially
> with regard to safety, in an open-ended way.  Maybe there's a solution
> (like Robert's FIFO suggestion?) that is not too onerous and can
> satisfy everyone.


I solved this via fcntl locking. I also set up gdb to break in critical regions 
to test the interlock and I found no flaw in the design. More eyes would be 
welcome, of course.
https://github.com/agentm/postgres/tree/posix_shmem

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Tom Lane

Robert Haas  writes:
> So, what about keeping a FIFO in the data directory?

Hm, does that work if the data directory is on NFS?  Or some other weird
not-really-Unix file system?

> When the
> postmaster starts up, it tries to open the file with O_NONBLOCK |
> O_WRONLY (or O_NDELAY | O_WRONLY, if the platform has O_NDELAY rather
> than O_NONBLOCK).  If that succeeds, it bails out.  If it fails with
> anything other than ENXIO, it bails out.  If it fails with exactly
> ENXIO, then it opens the pipe with O_RDONLY

... race condition here ...

> and arranges to pass the
> file descriptor down to all of its children, so that a subsequent open
> will fail if it or any of its children are still alive.

This might be made to work, but that doesn't sound quite right in
detail.

I remember we speculated about using an fcntl lock on some file in the
data directory, but that fails because child processes don't inherit
fcntl locks.

In the modern world, it'd be really a step forward if the lock mechanism
worked on shared storage, ie a data directory on NFS or similar could be
locked against all comers not just those on the same node as the
original postmaster.  I don't know how to do that though.

In the meantime, insisting that we solve this problem before we do
anything is a good recipe for ensuring that nothing happens, just
like it hasn't happened for the last half dozen years.  (I see Alvaro
just made the same point.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Josh Berkus


> This can be trivially reproduced if one runs an old (SysV shared 
> memory-based) postgresql alongside a potentially newer postgresql with a 
> smaller SysV segment. This can occur with applications that bundle postgresql 
> as part of the app.

I'm not saying it doesn't happen at all.  I'm saying it's not the 80%
case.

So let's fix the 80% case with something we feel confident in, and then
revisit the no-sysv interlock as a separate patch.  That way if we can't
fix the interlock issues, we still have a reduced-shmem version of Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread A.M.


On Jun 26, 2012, at 5:44 PM, Josh Berkus wrote:

> 
>> On that, I used to be of the opinion that this is a good compromise (a
>> small amount of interlock space, plus mostly posix shmem), but I've
>> heard since then (I think via AgentM indirectly, but I'm not sure)
>> that there are cases where even the small SysV segment can cause
>> problems -- notably when other software tweaks shared memory settings
>> on behalf of a user, but only leaves just-enough for the software
>> being installed.  This is most likely on platforms that don't have a
>> high SysV shmem limit by default, so installers all feel the
>> prerogative to increase the limit, but there's no great answer for how
>> to compose a series of such installations.  It only takes one
>> installer that says "whatever, I'm just catenating stuff to
>> sysctl.conf that works for me" to sabotage Postgres' ability to start.
> 
> Personally, I see this as rather an extreme case, and aside from AgentM
> himself, have never run into it before.  Certainly it would be useful to
> not need SysV RAM at all, but it's more important to get a working patch
> for 9.3.


This can be trivially reproduced if one runs an old (SysV shared memory-based) 
postgresql alongside a potentially newer postgresql with a smaller SysV 
segment. This can occur with applications that bundle postgresql as part of the 
app.

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Daniel Farina

On Tue, Jun 26, 2012 at 2:53 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:
>
>> On that, I used to be of the opinion that this is a good compromise (a
>> small amount of interlock space, plus mostly posix shmem), but I've
>> heard since then (I think via AgentM indirectly, but I'm not sure)
>> that there are cases where even the small SysV segment can cause
>> problems -- notably when other software tweaks shared memory settings
>> on behalf of a user, but only leaves just-enough for the software
>> being installed.
>
> This argument is what killed the original patch.  If you want to get
> anything done *at all* I think it needs to be dropped.  Changing shmem
> implementation is already difficult enough --- you don't need to add the
> requirement that the interlocking mechanism be changed simultaneously.
> You (or whoever else) can always work on that as a followup patch.

True, but then again, I did very intentionally write:

> Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:
>> *I wouldn't let perfect be the enemy of good* to make progress
>> here, but it appears this was a witnessed real problem, so it may
>> be worth reconsidering if there is a way we can safely remove all
>> SysV by finding an alternative to the nattach mechanic.

(Emphasis mine).

I don't think that -hackers at the time gave the zero-shmem rationale
much weight (I also was not that happy about the safety mechanism of
that patch), but upon more reflection (and taking into account *other*
software that may mangle shmem settings) I think it's something at
least worth thinking about again one more time.  What killed the patch
was an attachment to the deemed-less-safe stategy for avoiding bogus
shmem attachments already in it, but I don't seem to recall anyone
putting a whole lot of thought at the time into the zero-shmem case
from what I could read on the list, because a small interlock with
nattach seemed good-enough.

I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way.  Maybe there's a solution
(like Robert's FIFO suggestion?) that is not too onerous and can
satisfy everyone.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane

Martijn van Oosterhout  writes:
> And then you have fabulous things like:
> https://git.reviewboard.kde.org/r/102145/
> (OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
> it.)

> Seems not very well tested in any case.

> It might be worthwhile testing futexes on Linux though, they are
> specifically supported on any kind of shared memory (shm/mmap/fork/etc)
> and quite well tested.

Yeah, a Linux-specific replacement of spinlocks with futexes seems like
a lot safer idea than "let's rely on posix mutexes everywhere".  It's
still unproven whether it'd be an improvement, but you could expect to
prove it one way or the other with a well-defined amount of testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread Kevin Grittner

David Fetter  wrote:
> On Mon, Jun 25, 2012 at 01:50:54PM -0500, Kevin Grittner wrote:
 
>> One fine point regarding before and after images -- if a value
>> doesn't change in an UPDATE, there's no reason to include it in
>> both the BEFORE and AFTER tuple images, as long as we have the
>> null column bitmaps -- or some other way of distinguishing
>> unchanged from NULL.  (This could be especially important when
>> the unchanged column was a 50 MB bytea.)
> 
> How about two bitmaps: one telling which columns are actually
> there, the other with NULLs?
 
There are quite a few ways that could be done, but I suspect
Álvaro's idea is best:
 
http://archives.postgresql.org/message-id/1340654533-sup-5...@alvh.no-ip.org
 
In any event, it sounds like Andres wants to keep it as simple as
possible for the moment, and just include both tuples in their
entirety.  Hopefully that is something which can be revisited before
the last CF.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Alvaro Herrera


Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:

> On that, I used to be of the opinion that this is a good compromise (a
> small amount of interlock space, plus mostly posix shmem), but I've
> heard since then (I think via AgentM indirectly, but I'm not sure)
> that there are cases where even the small SysV segment can cause
> problems -- notably when other software tweaks shared memory settings
> on behalf of a user, but only leaves just-enough for the software
> being installed.

This argument is what killed the original patch.  If you want to get
anything done *at all* I think it needs to be dropped.  Changing shmem
implementation is already difficult enough --- you don't need to add the
requirement that the interlocking mechanism be changed simultaneously.
You (or whoever else) can always work on that as a followup patch.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 5:44 PM, Josh Berkus  wrote:
>
>> On that, I used to be of the opinion that this is a good compromise (a
>> small amount of interlock space, plus mostly posix shmem), but I've
>> heard since then (I think via AgentM indirectly, but I'm not sure)
>> that there are cases where even the small SysV segment can cause
>> problems -- notably when other software tweaks shared memory settings
>> on behalf of a user, but only leaves just-enough for the software
>> being installed.  This is most likely on platforms that don't have a
>> high SysV shmem limit by default, so installers all feel the
>> prerogative to increase the limit, but there's no great answer for how
>> to compose a series of such installations.  It only takes one
>> installer that says "whatever, I'm just catenating stuff to
>> sysctl.conf that works for me" to sabotage Postgres' ability to start.
>
> Personally, I see this as rather an extreme case, and aside from AgentM
> himself, have never run into it before.  Certainly it would be useful to
> not need SysV RAM at all, but it's more important to get a working patch
> for 9.3.

+1.

I'd sort of given up on finding a solution that doesn't involve system
V shmem anyway, but now that I think about it... what about using a
FIFO?  The man page for open on MacOS X says:

[ENXIO]O_NONBLOCK and O_WRONLY are set, the file is a FIFO,
   and no process has it open for reading.

And Linux says:

  ENXIO  O_NONBLOCK | O_WRONLY is set, the named file is a  FIFO  and  no
 process has the file open for reading.  Or, the file is a device
 special file and no corresponding device exists.

And HP/UX says:

  [ENXIO]O_NDELAY is set, the named file is a FIFO,
 O_WRONLY is set, and no process has the file open
 for reading.

So, what about keeping a FIFO in the data directory?  When the
postmaster starts up, it tries to open the file with O_NONBLOCK |
O_WRONLY (or O_NDELAY | O_WRONLY, if the platform has O_NDELAY rather
than O_NONBLOCK).  If that succeeds, it bails out.  If it fails with
anything other than ENXIO, it bails out.  If it fails with exactly
ENXIO, then it opens the pipe with O_RDONLY and arranges to pass the
file descriptor down to all of its children, so that a subsequent open
will fail if it or any of its children are still alive.

This might even be more reliable than what we do right now, because
our current system appears not to be robust against the removal of
postmaster.pid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_terminate_backend for same-role

2012-06-26 Thread Daniel Farina

On Tue, Jun 26, 2012 at 1:58 PM, Robert Haas  wrote:
> On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina  wrote:
>> On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis  wrote:
>>> On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote:
 On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao  wrote:
 > On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina  wrote:
 >> Parallel to pg_cancel_backend, it'd be nice to allow the user to just
 >> outright kill a backend that they own (politely, with a SIGTERM),
 >> aborting any transactions in progress, including the idle transaction,
 >> and closing the socket.
 >
 > +1

 Here's a patch implementing the simple version, with no more guards
 against signal racing than have been seen previously.  The more
 elaborate variants to close those races is being discussed in a
 parallel thread, but I thought I'd get this simple version out there.
>>>
>>> Review:
>>>
>>> After reading through the threads, it looks like there was no real
>>> objection to this approach -- pid recycling is not something we're
>>> concerned about.
>>>
>>> I think you're missing a doc update though, in func.sgml:
>>>
>>> "For the less restrictive pg_cancel_backend, the role of an
>>> active backend can be found from
>>> the usename column of the
>>> pg_stat_activity view."
>>>
>>> Also, high-availability.sgml makes reference to pg_cancel_backend(), and
>>> it might be worthwhile to add an "...and pg_terminate_backend()" there.
>>>
>>> Other than that, it looks good to me.
>>
>> Good comments. Patch attached to address the doc issues.  The only
>> iffy thing is that the paragraph "For the less restrictive..." I have
>> opted to remove in its entirely.  I think the documents are already
>> pretty clear about the same-user rule, and I'm not sure if this is the
>> right place for a crash-course on attributes in pg_stat_activity (but
>> maybe it is).
>>
>> "...and pg_terminate_backend" seems exactly right.
>>
>> And I think now that the system post-patch doesn't have such a strange
>> contrast between the ability to send SIGINT vs. SIGTERM, such a
>> contrast may not be necessary.
>>
>> I'm also not sure what the policy is about filling paragraphs in the
>> manual.  I filled one, which increases the sgml churn a bit. git
>> (show|diff) --word-diff helps clean it up.
>
> I went ahead and committed this.
>
> I kinda think we should back-patch this into 9.2.  It doesn't involve
> a catalog change, and would make the behavior consistent between the
> two releases, instead of changing in 9.1 and then changing again in
> 9.2.  Thoughts?

I think that would be good.

It is at the level of pain whereas I would backpatch from day one, but
I think it would be a welcome change to people who couldn't justify
gnashing their teeth and distributing a tweaked Postgres like I would.
 It saves me some effort, too.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Martijn van Oosterhout

On Tue, Jun 26, 2012 at 01:46:06PM -0500, Merlin Moncure wrote:
> Well, that would introduce a backend dependency on pthreads, which is
> unpleasant.  Also you'd need to feature test via
> _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
> processes (and configure your mutexes as such when you do).  There are
> probably other reasons why this can't be done, but I personally don' t
> klnow of any.

And then you have fabulous things like:

https://git.reviewboard.kde.org/r/102145/
(OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
it.)

Seems not very well tested in any case.

It might be worthwhile testing futexes on Linux though, they are
specifically supported on any kind of shared memory (shm/mmap/fork/etc)
and quite well tested.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Josh Berkus


> On that, I used to be of the opinion that this is a good compromise (a
> small amount of interlock space, plus mostly posix shmem), but I've
> heard since then (I think via AgentM indirectly, but I'm not sure)
> that there are cases where even the small SysV segment can cause
> problems -- notably when other software tweaks shared memory settings
> on behalf of a user, but only leaves just-enough for the software
> being installed.  This is most likely on platforms that don't have a
> high SysV shmem limit by default, so installers all feel the
> prerogative to increase the limit, but there's no great answer for how
> to compose a series of such installations.  It only takes one
> installer that says "whatever, I'm just catenating stuff to
> sysctl.conf that works for me" to sabotage Postgres' ability to start.

Personally, I see this as rather an extreme case, and aside from AgentM
himself, have never run into it before.  Certainly it would be useful to
not need SysV RAM at all, but it's more important to get a working patch
for 9.3.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 5:18 PM, Josh Berkus  wrote:
> On 6/26/12 2:13 PM, Robert Haas wrote:
>> On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
>>  wrote:
>>> Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
 Robert, all:

 Last I checked, we had a reasonably acceptable patch to use mostly Posix
 Shared mem with a very small sysv ram partition.  Is there anything
 keeping this from going into 9.3?  It would eliminate a major
 configuration headache for our users.
>>>
>>> I don't think that patch was all that reasonable.  It needed work, and
>>> in any case it needs a rebase because it was pretty old.
>>
>> Yep, agreed.
>>
>> I'd like to get this fixed too, but it hasn't made it up to the top of
>> my list of things to worry about.
>
> Was there a post-AgentM version of the patch, which incorporated the
> small SySV RAM partition?  I'm not finding it.

To my knowledge, no.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Daniel Farina

On Tue, Jun 26, 2012 at 2:18 PM, Josh Berkus  wrote:
> On 6/26/12 2:13 PM, Robert Haas wrote:
>> On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
>>  wrote:
>>> Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
 Robert, all:

 Last I checked, we had a reasonably acceptable patch to use mostly Posix
 Shared mem with a very small sysv ram partition.  Is there anything
 keeping this from going into 9.3?  It would eliminate a major
 configuration headache for our users.
>>>
>>> I don't think that patch was all that reasonable.  It needed work, and
>>> in any case it needs a rebase because it was pretty old.
>>
>> Yep, agreed.
>>
>> I'd like to get this fixed too, but it hasn't made it up to the top of
>> my list of things to worry about.
>
> Was there a post-AgentM version of the patch, which incorporated the
> small SySV RAM partition?  I'm not finding it.

On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed.  This is most likely on platforms that don't have a
high SysV shmem limit by default, so installers all feel the
prerogative to increase the limit, but there's no great answer for how
to compose a series of such installations.  It only takes one
installer that says "whatever, I'm just catenating stuff to
sysctl.conf that works for me" to sabotage Postgres' ability to start.

So there may be a benefit in finding a way to have no SysV memory at
all.  I wouldn't let perfect be the enemy of good to make progress
here, but it appears this was a witnessed real problem, so it may be
worth reconsidering if there is a way we can safely remove all SysV by
finding an alternative to the nattach mechanic.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Tom Lane

Merlin Moncure  writes:
> On Tue, Jun 26, 2012 at 3:05 AM, Magnus Hagander  wrote:
>> But with a small change to psql they could, without the need for a
>> whole new type of variable. For example, psql could set all those
>> variable as "psql.", which could then be accessed
>> from the DO PL code just fine.

> That's a really neat idea.

I do see a problem with this client-push idea, which is what happens if
psql sends a SET and later the active transaction gets rolled back.
psql does not have enough knowledge to be sure whether it lost the SET
or not.  It could hack things by always resending all variables after
any rollback, but ugh.

We could address that by inventing a non-transactional variant of SET,
perhaps.  Not sure it's worth the complication though --- I don't think
I want to have to define how that would interact with other variants
of SET in the same transaction ...

Another approach would be to define such variables as being truly
shared, in the spirit of last-update-wins multi master replication.
The backend sends over its values using the existing GUC_REPORT
mechanism.  So a rollback would cause the psql-side variable to revert
as well.  Not actually sure if that behavior would be more or less
useful than a simpler definition, but it's worth thinking about.

In this connection, there was some recent discussion in the jdbc list
of wanting clients to be able to set the GUC_REPORT flag on any GUC
variable, because the jdbc driver would like to track some settings we
have not seen fit to mark that way.  Not sure if anybody mentioned that
on -hackers yet, but it's coming.  If we had that ability then a
shared-variable behavior like this could be built entirely on the psql
side: the push part is just SET, and the pull part is GUC_REPORT.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Josh Berkus

On 6/26/12 2:13 PM, Robert Haas wrote:
> On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
>  wrote:
>> Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
>>> Robert, all:
>>>
>>> Last I checked, we had a reasonably acceptable patch to use mostly Posix
>>> Shared mem with a very small sysv ram partition.  Is there anything
>>> keeping this from going into 9.3?  It would eliminate a major
>>> configuration headache for our users.
>>
>> I don't think that patch was all that reasonable.  It needed work, and
>> in any case it needs a rebase because it was pretty old.
> 
> Yep, agreed.
> 
> I'd like to get this fixed too, but it hasn't made it up to the top of
> my list of things to worry about.

Was there a post-AgentM version of the patch, which incorporated the
small SySV RAM partition?  I'm not finding it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Tom Lane

Pavel Stehule  writes:
> it is not security issue - just I dislike sending complete stack, when
> just only one variable should be used.

That's a pretty darn weak argument.  If I read the patch correctly, what
you're proposing involves a dynamic fetch from the client at runtime,
which is going to be disastrous for performance.  Quite aside from the
network round trip involved, the fetch function would have to be marked
volatile (since it has client-visible side-effects, not to mention that
we don't know when the client might change the variable value); which
would really hurt any query involving it, and probably lead to yet more
round trips.

Pushing over the known values once at session start (and individual
values after updates) is likely to be vastly better-performant than
this.  Matters could be improved further by requiring variables to be
sent to the server to be explicitly marked, which seems like a good
idea anyway in case anybody has security concerns that they're not
going to let you airily dismiss.

Another thing I don't care for is the unannounced protocol extension.
This feature is just not interesting enough to justify breaking
client compatibility, but that's what it would do as proposed.
Clients that haven't heard of this 'v' message would probably
think they'd lost sync and drop the connection.

(BTW, the patch doesn't seem to include the added backend source file?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
 wrote:
> Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
>> Robert, all:
>>
>> Last I checked, we had a reasonably acceptable patch to use mostly Posix
>> Shared mem with a very small sysv ram partition.  Is there anything
>> keeping this from going into 9.3?  It would eliminate a major
>> configuration headache for our users.
>
> I don't think that patch was all that reasonable.  It needed work, and
> in any case it needs a rebase because it was pretty old.

Yep, agreed.

I'd like to get this fixed too, but it hasn't made it up to the top of
my list of things to worry about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-06-26 Thread Robert Haas

On Sat, Mar 3, 2012 at 3:22 PM, Jeff Janes  wrote:
> When sorting small tuples, the memtuples array can use a substantial
> fraction of the total per-tuple memory used. (In the case of pass by
> value, it is all of it)
>
> The way it grows leads to sub-optimal memory usage.

Greg, I see you signed up to review this on the 14th, but I don't see
a review yet.  Are you planning to post one soon?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-06-26 Thread Robert Haas

On Wed, May 2, 2012 at 9:01 PM, Josh Berkus  wrote:
> On 5/2/12 10:20 AM, Jameison Martin wrote:
>> Attached are the following as per various requests:
>>       * test_results.txt: the performance benchmarking results,
>>
>>       * TestTrailingNull.java: the performance benchmarking code, with a few 
>> additional scenarios as per various requests
>>
>>       * hardinfo_report.txt: some information about the hardware and OS of 
>> the system on which the benchmarks were run, and
>>
>>       * postgresql.conf: the postgresql.conf used when running benchmarks. 
>> Note that the changes made to the vanilla postgresql.conf can be identified 
>> by looking for the string 'jamie' in the file I attached (there aren't that 
>> many)
>
> Nice, thanks.  I'll try some of my own tests when I get a chance; I have
> a really good use-case for this optimization.

Josh,

The CommitFest application lists you as the reviewer for this patch.
Are you (I hope) planning to review it?

I see you posted up a follow-up email asking Tom what he had in mind.
Personally, I don't think this needs incredibly complicated testing.
I think you should just test a workload involving inserting and/or
updating rows with lots of trailing NULL columns, and then another
workload with a table of similar width that... doesn't.  If we can't
find a regression - or, better, we find a win in one or both cases -
then I think we're done here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-26 Thread Tom Lane

Kohei KaiGai  writes:
> 2012/6/26 Robert Haas :
>> I think you're missing the point.  Everyone who has commented on this
>> issue is in favor of having some check that causes the RLS predicate
>> *not to get added in the first place*.

> Here is a simple idea to avoid the second problematic scenario; that
> assign 0 as cost of has_superuser_privilege().

I am not sure which part of "this isn't safe" isn't getting through to
you.  Aside from the scenarios Robert mentioned, consider the
possibility that f_malicious() is marked immutable, so that the planner
is likely to call it (to replace the call with its value) before it will
ever think about whether has_superuser_privilege should be called first.

Please just do what everybody is asking for, and create a bypass that
does not require fragile, easily-broken-by-future-changes assumptions
about what the planner will do with a WHERE clause.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_terminate_backend for same-role

2012-06-26 Thread Robert Haas

On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina  wrote:
> On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis  wrote:
>> On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote:
>>> On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao  wrote:
>>> > On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina  wrote:
>>> >> Parallel to pg_cancel_backend, it'd be nice to allow the user to just
>>> >> outright kill a backend that they own (politely, with a SIGTERM),
>>> >> aborting any transactions in progress, including the idle transaction,
>>> >> and closing the socket.
>>> >
>>> > +1
>>>
>>> Here's a patch implementing the simple version, with no more guards
>>> against signal racing than have been seen previously.  The more
>>> elaborate variants to close those races is being discussed in a
>>> parallel thread, but I thought I'd get this simple version out there.
>>
>> Review:
>>
>> After reading through the threads, it looks like there was no real
>> objection to this approach -- pid recycling is not something we're
>> concerned about.
>>
>> I think you're missing a doc update though, in func.sgml:
>>
>> "For the less restrictive pg_cancel_backend, the role of an
>> active backend can be found from
>> the usename column of the
>> pg_stat_activity view."
>>
>> Also, high-availability.sgml makes reference to pg_cancel_backend(), and
>> it might be worthwhile to add an "...and pg_terminate_backend()" there.
>>
>> Other than that, it looks good to me.
>
> Good comments. Patch attached to address the doc issues.  The only
> iffy thing is that the paragraph "For the less restrictive..." I have
> opted to remove in its entirely.  I think the documents are already
> pretty clear about the same-user rule, and I'm not sure if this is the
> right place for a crash-course on attributes in pg_stat_activity (but
> maybe it is).
>
> "...and pg_terminate_backend" seems exactly right.
>
> And I think now that the system post-patch doesn't have such a strange
> contrast between the ability to send SIGINT vs. SIGTERM, such a
> contrast may not be necessary.
>
> I'm also not sure what the policy is about filling paragraphs in the
> manual.  I filled one, which increases the sgml churn a bit. git
> (show|diff) --word-diff helps clean it up.

I went ahead and committed this.

I kinda think we should back-patch this into 9.2.  It doesn't involve
a catalog change, and would make the behavior consistent between the
two releases, instead of changing in 9.1 and then changing again in
9.2.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-26 Thread Kohei KaiGai

2012/6/26 Robert Haas :
>> Of course, here is some limitations, to keep the patch size reasonable
>> level to review.
>> - The permission to bypass RLS policy was under discussion.
>>  If and when we should provide a special permission to bypass RLS
>>  policy, the "OR has_superuser_privilege()" shall be replaced by
>>  "OR has_table_privilege(tableoid, 'RLSBYPASS')".
>
> I think you're missing the point.  Everyone who has commented on this
> issue is in favor of having some check that causes the RLS predicate
> *not to get added in the first place*.  Adding a modified predicate is
> not the same thing.  First, the query planner might not be smart
> enough to optimize away the clause even when the predicate holds,
> causing an unnecessary performance drain.  Second, there's too much
> danger of being able to set a booby-trap for the superuser this way.
> Suppose that the RLS subsystem replaces f_malicious() by f_malicious
> OR has_superuser_privilege().  Now the superuser comes along with the
> nightly pg_dump run and the query optimizer executes SELECT * FROM
> nuts WHERE f_malicious() OR has_superuser_privilege().  The query
> optimizer notes that the cost of f_malicious() is very low and decides
> to execute that before has_superuser_privilege().  Oops.  I think it's
> just not acceptable to handle this by clause-munging: we need to not
> add the clause in the first place.
>
Here is a simple idea to avoid the second problematic scenario; that
assign 0 as cost of has_superuser_privilege(). I allows to keep this
function more lightweight than any possible malicious function, since
CreateFunction enforces positive value.

But the first point is still remaining.

As you pointed out before, it might be a solution to have case-handling
for superusers and others in case of simple query protocol; that uses
same snapshot for planner and executor stage.

How should we handle the issue?

During the previous discussion, Tom mentioned about an idea that
saves prepared statement hashed with user-id to switch the query
plan depending on user's privilege.
Even though I hesitated to buy this idea at that time, it might be
worth to investigate this idea to satisfy both security and performance;
that will generate multiple query plans to be chosen at executor
stage later.

How about your opinion?

> Comments on the patch itself:
>
> 1. Please do not abbreviate rowlevel to rowlv or RowLevel to RowLv or
> ROWLEVEL to ROWLV.  That makes it harder to read and harder to grep.
> Spell it out.
>
OK,

> 2. Since the entirety of ATExecSetRowLvSecurity is conditional on
> whether clause != NULL, you might as well split it into two functions,
> one for each case.
>
OK,

> 3. The fact that you've had to hack preprocess_targetlist and
> adjust_appendrel_attrs_mutator suggests to me that the insertion of
> the generate subquery is happening at the wrong phase of the process.
> We don't need those special cases for views, so it seems like we
> shouldn't need them here, either.
>
Main reason why I had to patch them is special case handling for
references to system columns; that is unavailable to have for sub-
queries.
But, I'm not 100% sure around these implementation. So, it needs
more investigations.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-26 Thread Alvaro Herrera


Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
> Robert, all:
> 
> Last I checked, we had a reasonably acceptable patch to use mostly Posix
> Shared mem with a very small sysv ram partition.  Is there anything
> keeping this from going into 9.3?  It would eliminate a major
> configuration headache for our users.

I don't think that patch was all that reasonable.  It needed work, and
in any case it needs a rebase because it was pretty old.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-26 Thread Robert Haas

On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas  wrote:
> On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
>  wrote:
>>> I'm confused by this remark, because surely the query planner does it this
>>> way only if there's no LIMIT.  When there is a LIMIT, we choose based on
>>> the startup cost plus the estimated fraction of the total cost we expect
>>> to pay based on dividing the LIMIT by the overall row count estimate.  Or
>>> is this not what you're talking about?
>>
>> I think that Ants is pointing the way of estimating costs in
>> choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
>> cheapest_path + hashagg, where the costs are calculated based on the total
>> cost only of cheapest_path.  I think that it might be good to do cost_agg()
>> for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
>> strategy.
>
> Well, Ants already made some adjustments to those functions; not sure
> if this means they need some more adjustment, but I don't see that
> there's a general problem with the costing algorithm around LIMIT.

Ants, do you intend to update this patch for this CommitFest?  Or at
all?  It seems nobody's too excited about this, so I'm not sure
whether it makes sense for you to put more work on it.  But please
advise as to your plans.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] measuring spinning

2012-06-26 Thread Robert Haas

On Mon, Jun 18, 2012 at 9:36 PM, Robert Haas  wrote:
> On Sat, Jun 16, 2012 at 6:25 PM, Jeff Janes  wrote:
>>> Well, this fell through the cracks, because I forgot to add it to the
>>> January CommitFest.  Here it is again, rebased.
>>
>> This applies and builds cleanly and passes make check (under enable-cassert).
>>
>> Not test or docs are needed for a patch of this nature.
>>
>> It does what it says, and we want it.
>>
>> I wondered if the change in the return signature of s_lock would have
>> an affect on performance.  So I've run a series of pgbench -T 30 -P
>> -c8 -j8, at a scale of 30 which fits in shared_buffers, using an
>> Amazon c1.xlarge
>> (8 cores).  I ran both HEAD, and HEAD+patch (without LWLOCK_STATS in
>> both cases), in random ordering.  The patch was 0.37% slower, average
>> 298483 selects per second patched to 299582 HEAD.  The difference is
>> probably real (p value 0.042, one sided.) but is also pretty much
>> negligible and could just be due to where the executable code falls in
>> the cache lines which could move around with other changes to the
>> code.
>
> I found this a bit surprising, so I retested on the IBM POWER7 box at
> concurrencies from 1 to 64 and found some test runs faster and some
> test runs slower - maybe a few more faster than slower.   I could do a
> more exact analysis, but I'm inclined to think that if there is an
> effect here, it's probably just in the noise, and that the real effect
> you measured was, as you say, the result of cache-line shifting or
> some other uninteresting artifact that could just as easily move back
> the other way on some other commit.  Really, s_lock should not be
> getting called often enough to matter much, and ignoring the return
> value of that function shouldn't cost anyone much.
>
>> Two suggestions:
>>
>> In your original email you say "number of pg_usleep() calls that are
>> required to acquire each LWLock", but nothing in the code says this.
>> Just reading lwlock.c I would naively assume it is reporting the
>> number of TAS spins, not the number of spin-delays (and in fact that
>> is what I did assume until I read your email more carefully).  A
>> comment somewhere in lwlock.c would be helpful.
>
> Yeah, or maybe I should just change the printout to say spindelay
> instead of spin, and rename the variables likewise.
>
>> Also in lwlock.c,
>>
>>        if (sh_acquire_counts[i] || ex_acquire_counts[i] ||
>> block_counts[i] || spin_counts[i])
>>
>> I don't think we can have spins (or blocks, for that matter) unless we
>> have some acquires to have caused them, so the last two tests in that
>> line seem to be noise.
>
> Perhaps so, but I think it's probably safe and clear to just follow
> the existing code pattern.
>
>> Since my suggestions are minor, should I go ahead and mark this ready
>> for committer?
>
> If you think it should be committed, yes.

So Jeff did that, and I've now committed the patch.

Thanks for the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Posix Shared Mem patch

2012-06-26 Thread Josh Berkus

Robert, all:

Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition.  Is there anything
keeping this from going into 9.3?  It would eliminate a major
configuration headache for our users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Pavel Stehule

2012/6/26 Merlin Moncure :
> On Tue, Jun 26, 2012 at 3:05 AM, Magnus Hagander  wrote:
>> But with a small change to psql they could, without the need for a
>> whole new type of variable. For example, psql could set all those
>> variable as "psql.", which could then be accessed
>> from the DO PL code just fine.
>
> That's a really neat idea.

yes, it can be good idea - psql sends some status variables on start,
so it should be small patch

Pavel

>
> merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Merlin Moncure

On Tue, Jun 26, 2012 at 3:05 AM, Magnus Hagander  wrote:
> But with a small change to psql they could, without the need for a
> whole new type of variable. For example, psql could set all those
> variable as "psql.", which could then be accessed
> from the DO PL code just fine.

That's a really neat idea.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll


> But if you start with "let's not support any platforms that don't have this 
> feature"

This will never be my intention.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll

Hi Merlin,

> _POSIX_THREAD_PROCESS_SHARED

sure.

> Also, it's forbidden to do things like invoke i/o in the backend while
> holding only a spinlock. As to your larger point, it's an interesting
> assertion -- some data to back it up would help.

Let's see if I can get any. ATM I've only got indications, but no proof.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane

Nils Goroll  writes:
> Now that the scene is set, here's the simple question: Why all this? Why not
> simply use posix mutexes which, on modern platforms, will map to efficient
> implementations like adaptive mutexes or futexes?

(1) They do not exist everywhere.
(2) There is absolutely no evidence to suggest that they'd make things better.

If someone cared to rectify (2), we could consider how to use them as an
alternative implementation.  But if you start with "let's not support
any platforms that don't have this feature", you're going to get a cold
reception.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Pavel Stehule

2012/6/26 David Fetter :
> On Tue, Jun 26, 2012 at 10:12:52AM +0200, Pavel Stehule wrote:
>> 2012/6/26 Magnus Hagander :
>> > On Tue, Jun 26, 2012 at 9:50 AM, Pavel Stehule  
>> > wrote:
>> >> 2012/6/26 Magnus Hagander :
>> >>> On Tue, Jun 26, 2012 at 7:06 AM, Pavel Stehule  
>> >>> wrote:
>>  Hello
>> 
>>  I worked on simple patch, that enable access from server side to
>>  client side data. It add two new hooks to libpq - one for returning of
>>  local context, second for setting of local context.
>> 
>>  A motivation is integration of possibilities of psql console together
>>  with stronger language - plpgsql. Second target is enabling
>>  possibility to save a result of some server side process in psql. It
>>  improve vars feature in psql.
>> 
>>  pavel ~/src/postgresql/src $ cat test.sql
>>  \echo value of external paremeter is :"myvar"
>> 
>>  do $$
>>  begin
>>   -- we can take any session variable on client side
>>   -- it is safe against to SQL injection
>>   raise notice 'external parameter accessed from plpgsql is "%"',
>>  hgetvar('myvar');
>> 
>>   -- we can change this session variable and finish transaction
>>   perform hsetvar('myvar', 'Hello, World');
>>  end;
>>  $$ language plpgsql;
>> 
>>  \echo new value of session variable is :"myvar"
>> 
>>  cat test.sql | psql postgres -v myvar=Hello
>>  value of external paremeter is "Hello"
>>  NOTICE:  external parameter accessed from plpgsql is "Hello"
>>  DO
>>  new value of session variable is "Hello, World"
>> 
>>  This is just proof concept - there should be better integration with
>>  pl languages, using cache for read on server side, ...
>> 
>>  Notices?
>> >>>
>> >>> Why not just use a custom GUC variable instead? E.g. you could have
>> >>> psql SET "psql.myvar='Hello, World'", and then you'd need no changes
>> >>> at all in the backend? Maybe have a "shorthand interface" for
>> >>> accessing GUCs in psql would help in making it easier, but do we
>> >>> really need a whole new variable concept?
>> >>
>> >> GUC variables doesn't help with access to psql's command line
>> >> parameters from DO PL code.
>> >
>> > But with a small change to psql they could, without the need for a
>> > whole new type of variable. For example, psql could set all those
>> > variable as "psql.", which could then be accessed
>> > from the DO PL code just fine.
>>
>> yes, it is possibility too.  It has different issues - it can send
>> unwanted variables -
>
> Could you expand on this just a bit?  Are you picturing something an
> attacker could somehow use, or...?

it is not security issue - just I dislike sending complete stack, when
just only one variable should be used.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fet...@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Merlin Moncure

On Tue, Jun 26, 2012 at 12:02 PM, Nils Goroll  wrote:
> Hi,
>
> I am currently trying to understand what looks like really bad scalability of
> 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but 
> only
> marginal amounts of additional load seem to push it to 70% and the application
> becomes highly unresponsive.
>
> My current understanding basically matches the issues being addressed by 
> various
> 9.2 improvements, well summarized in
> http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf
>
> An additional aspect is that, in order to address the latent risk of data 
> loss &
> corruption with WBCs and async replication, we have deliberately moved the db
> from a similar system with WB cached storage to ssd based storage without a 
> WBC,
> which, by design, has (in the best WBC case) approx. 100x higher latencies, 
> but
> much higher sustained throughput.
>
>
> On the new system, even with 30% user "acceptable" load, oprofile makes 
> apparent
> significant lock contention:
>
> opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres
>
>
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 30240    27.9720  postgres                 s_lock
> 5069      4.6888  postgres                 GetSnapshotData
> 3743      3.4623  postgres                 AllocSetAlloc
> 3167      2.9295  libc-2.12.so             strcoll_l
> 2662      2.4624  postgres                 SearchCatCache
> 2495      2.3079  postgres                 hash_search_with_hash_value
> 2143      1.9823  postgres                 nocachegetattr
> 1860      1.7205  postgres                 LWLockAcquire
> 1642      1.5189  postgres                 base_yyparse
> 1604      1.4837  libc-2.12.so             __strcmp_sse42
> 1543      1.4273  libc-2.12.so             __strlen_sse42
> 1156      1.0693  libc-2.12.so             memcpy
>
> Unfortunately I don't have profiling data for the high-load / contention
> condition yet, but I fear the picture will be worse and pointing in the same
> direction.
>
> 
> In particular, the _impression_ is that lock contention could also be related 
> to
> I/O latencies making me fear that cases could exist where spin locks are being
> helt while blocking on IO.
> 
>
>
> Looking at the code, it appears to me that the roll-your-own s_lock code 
> cannot
> handle a couple of cases, for instance it will also spin when the lock holder 
> is
> not running at all or blocking on IO (which could even be implicit, e.g. for a
> page flush). These issues have long been addressed by adaptive mutexes and 
> futexes.
>
> Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
> having spun for long (not not blocked), spin even longer in future), which
> appears to me to have the potential of becoming highly counter-productive.
>
>
> Now that the scene is set, here's the simple question: Why all this? Why not
> simply use posix mutexes which, on modern platforms, will map to efficient
> implementations like adaptive mutexes or futexes?

Well, that would introduce a backend dependency on pthreads, which is
unpleasant.  Also you'd need to feature test via
_POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
processes (and configure your mutexes as such when you do).  There are
probably other reasons why this can't be done, but I personally don' t
klnow of any.

Also, it's forbidden to do things like invoke i/o in the backend while
holding only a spinlock. As to your larger point, it's an interesting
assertion -- some data to back it up would help.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread David Fetter

On Mon, Jun 25, 2012 at 01:50:54PM -0500, Kevin Grittner wrote:
> Andres Freund  wrote:
>  
> > I am not sure were going to get all that into 9.3.
>  
> Sure, that was more related to why I was questioning how much these
> use cases even *could* integrate -- whether it even paid to
> *consider* these use cases at this point.  Responses from Robert and
> you have pretty much convinced me that I was being overly
> pessimistic on that.
>  
> One fine point regarding before and after images -- if a value
> doesn't change in an UPDATE, there's no reason to include it in both
> the BEFORE and AFTER tuple images, as long as we have the null
> column bitmaps -- or some other way of distinguishing unchanged from
> NULL.  (This could be especially important when the unchanged column
> was a 50 MB bytea.)

How about two bitmaps: one telling which columns are actually there,
the other with NULLs?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread David Fetter

On Tue, Jun 26, 2012 at 10:12:52AM +0200, Pavel Stehule wrote:
> 2012/6/26 Magnus Hagander :
> > On Tue, Jun 26, 2012 at 9:50 AM, Pavel Stehule  
> > wrote:
> >> 2012/6/26 Magnus Hagander :
> >>> On Tue, Jun 26, 2012 at 7:06 AM, Pavel Stehule  
> >>> wrote:
>  Hello
> 
>  I worked on simple patch, that enable access from server side to
>  client side data. It add two new hooks to libpq - one for returning of
>  local context, second for setting of local context.
> 
>  A motivation is integration of possibilities of psql console together
>  with stronger language - plpgsql. Second target is enabling
>  possibility to save a result of some server side process in psql. It
>  improve vars feature in psql.
> 
>  pavel ~/src/postgresql/src $ cat test.sql
>  \echo value of external paremeter is :"myvar"
> 
>  do $$
>  begin
>   -- we can take any session variable on client side
>   -- it is safe against to SQL injection
>   raise notice 'external parameter accessed from plpgsql is "%"',
>  hgetvar('myvar');
> 
>   -- we can change this session variable and finish transaction
>   perform hsetvar('myvar', 'Hello, World');
>  end;
>  $$ language plpgsql;
> 
>  \echo new value of session variable is :"myvar"
> 
>  cat test.sql | psql postgres -v myvar=Hello
>  value of external paremeter is "Hello"
>  NOTICE:  external parameter accessed from plpgsql is "Hello"
>  DO
>  new value of session variable is "Hello, World"
> 
>  This is just proof concept - there should be better integration with
>  pl languages, using cache for read on server side, ...
> 
>  Notices?
> >>>
> >>> Why not just use a custom GUC variable instead? E.g. you could have
> >>> psql SET "psql.myvar='Hello, World'", and then you'd need no changes
> >>> at all in the backend? Maybe have a "shorthand interface" for
> >>> accessing GUCs in psql would help in making it easier, but do we
> >>> really need a whole new variable concept?
> >>
> >> GUC variables doesn't help with access to psql's command line
> >> parameters from DO PL code.
> >
> > But with a small change to psql they could, without the need for a
> > whole new type of variable. For example, psql could set all those
> > variable as "psql.", which could then be accessed
> > from the DO PL code just fine.
> 
> yes, it is possibility too.  It has different issues - it can send
> unwanted variables -

Could you expand on this just a bit?  Are you picturing something an
attacker could somehow use, or...?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST subsplit question

2012-06-26 Thread Jeff Davis

On Tue, 2012-06-26 at 11:28 -0400, Robert Haas wrote:
> On Wed, May 30, 2012 at 4:35 PM, Alexander Korotkov
>  wrote:
> >> So, do we demote that message to a DEBUG1? Or do we make it more clear
> >> what the authors of a specific picksplit are supposed to do to avoid
> >> that problem? Or am I misunderstanding something?
> >
> >
> > +1 for demote message to DEBUG1. I think it shouldn't be so noisy, it just
> > indicates something could be improved.
> > Also I think we defenitely need to document secondary split. Now it's no
> > chances to understand without reverse engeneering it from code.
> 
> I'm happy to go demote the message if we have consensus on that, but
> somebody else is going to need to provide the doc patch.  Any takers?

I was planning to do that, but it won't be for a few days at least. If
someone else wants to do it sooner, feel free.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-26 Thread Josh Berkus


> To implement it, a new array can be added in the local process memory
> to hold lwlock statistics, and update counters both in the shared
> memory and the local process memory at once. Then, the session can
> retrieve 'per-session' statistics from the local process memory
> via some dedicated function.

That would be way cool from a diagnostics perspective.  Not sure what
impact it has, though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Improve DROP FUNCTION hint

2012-06-26 Thread Robert Haas

On Mon, Jun 11, 2012 at 11:12 AM, Robert Haas  wrote:
> On Sat, Jun 9, 2012 at 11:42 AM, Dean Rasheed  
> wrote:
>> Hi,
>>
>> Attached is a small patch to improve the HINT message produced by
>> CREATE OR REPLACE FUNCTION when the new function definition conflicts
>> with the old definition. With this patch the hint now includes the
>> function's name and signature as a directly pasteable SQL command. So,
>> for example, instead of
>>
>> psql:functions.sql:70: ERROR:  cannot change return type of existing function
>> HINT:  Use DROP FUNCTION first.
>>
>> it now says
>>
>> psql:functions.sql:70: ERROR:  cannot change return type of existing function
>> HINT:  Use DROP FUNCTION foo(integer,integer) first.
>>
>> which saves having to open the file, find the function and then type
>> in the DROP statement manually.
>
> +1.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll

Hi,

I am currently trying to understand what looks like really bad scalability of
9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only
marginal amounts of additional load seem to push it to 70% and the application
becomes highly unresponsive.

My current understanding basically matches the issues being addressed by various
9.2 improvements, well summarized in
http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf

An additional aspect is that, in order to address the latent risk of data loss &
corruption with WBCs and async replication, we have deliberately moved the db
from a similar system with WB cached storage to ssd based storage without a WBC,
which, by design, has (in the best WBC case) approx. 100x higher latencies, but
much higher sustained throughput.


On the new system, even with 30% user "acceptable" load, oprofile makes apparent
significant lock contention:

opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres


Profiling through timer interrupt
samples  %image name   symbol name
3024027.9720  postgres s_lock
5069  4.6888  postgres GetSnapshotData
3743  3.4623  postgres AllocSetAlloc
3167  2.9295  libc-2.12.so strcoll_l
2662  2.4624  postgres SearchCatCache
2495  2.3079  postgres hash_search_with_hash_value
2143  1.9823  postgres nocachegetattr
1860  1.7205  postgres LWLockAcquire
1642  1.5189  postgres base_yyparse
1604  1.4837  libc-2.12.so __strcmp_sse42
1543  1.4273  libc-2.12.so __strlen_sse42
1156  1.0693  libc-2.12.so memcpy

Unfortunately I don't have profiling data for the high-load / contention
condition yet, but I fear the picture will be worse and pointing in the same
direction.


In particular, the _impression_ is that lock contention could also be related to
I/O latencies making me fear that cases could exist where spin locks are being
helt while blocking on IO.



Looking at the code, it appears to me that the roll-your-own s_lock code cannot
handle a couple of cases, for instance it will also spin when the lock holder is
not running at all or blocking on IO (which could even be implicit, e.g. for a
page flush). These issues have long been addressed by adaptive mutexes and 
futexes.

Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
having spun for long (not not blocked), spin even longer in future), which
appears to me to have the potential of becoming highly counter-productive.


Now that the scene is set, here's the simple question: Why all this? Why not
simply use posix mutexes which, on modern platforms, will map to efficient
implementations like adaptive mutexes or futexes?

Thanks, Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Alvaro Herrera


Excerpts from Boszormenyi Zoltan's message of mar jun 26 12:43:34 -0400 2012:

> So, should I keep the enum TimeoutName? Are global variables for
> keeping dynamically assigned values preferred over the enum?
> Currently we have 5 timeout sources in total, 3 of them are used by
> regular backends, the remaining 2 are used by replication standby.
> We can have a fixed size array (say with 8 or 16 elements) for future use
> and this would be plenty.
> 
> Opinions?

My opinion is that the fixed size array is fine.

I'll go set the patch "waiting on author".  Also, remember to review
some other people's patches.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Boszormenyi Zoltan


2012-06-26 18:12 keltezéssel, Alvaro Herrera írta:

Excerpts from Boszormenyi Zoltan's message of mar jun 26 03:59:06 -0400 2012:

2012-06-26 06:59 keltezéssel, Alvaro Herrera írta:

I cleaned up the framework patch a bit.  My version's attached.  Mainly,
returning false for failure in some code paths that are only going to
have the caller elog(FATAL) is rather pointless -- it seems much better
to just have the code itself do the elog(FATAL).  In any case, I
searched for reports of those error messages being reported in the wild
and saw none.

OK. The cleanups are always good.

One nitpick, though. Your version doesn't contain the timeout.h
and compilation fails because of it. :-) You might have forgotten
to do "git commit -a".

Oops.  Attached. It's pretty much the same you had, except some "bools"
are changed to void.


There is one thing that still bothers me on this whole framework patch,
but I'm not sure it's easily fixable.  Basically the API to timeout.c is
the whole list of timeouts and their whole behaviors.  If you want to
add a new one, you can't just call into the entry points, you have to
modify timeout.c itself (as well as timeout.h as well as whatever code
it is that you want to add timeouts to).  This may be good enough, but I
don't like it.  I think it boils down to proctimeout.h is cheating.

The interface I would actually like to have is something that lets the
modules trying to get a timeout register the timeout-checking function
as a callback.  API-wise, it would be much cleaner.  However, I'm not
raelly sure that code-wise it would be any cleaner at all.  In fact I
think it'd be rather cumbersome.  However, if you could give that idea
some thought, it'd be swell.

Well, I can make the registration interface similar to how LWLocks
are treated, but that doesn't avoid modification of the base_timeouts
array in case a new internal use case arises. Say:

#define USER_TIMEOUTS4

intn_timeouts = TIMEOUT_MAX;
static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS];

and register_timeout() adds data after TIMEOUT_MAX.

Well, I don't expect that we're going to have many external uses.  My
point about registration is so that internal callers use it as well.  I
was thinking we could do something like xact.c does, which is to have
somewhere in proc.c a call like this:

TimeoutRegister(DEADLOCK_TIMEOUT, 1, true, CheckDeadLock, GetCurrentTimestamp);

at process startup (the magic value 1 is the priority.  Maybe there's a
better way to handle this).  That way, timeout.c or timeout.h do not
need to know anything about proc.c at all; we don't need proctimeout.h
at all.  The only thing it (the timeout module) needs to know is that
there is a symbolic constant named DEADLOCK_TIMEOUT.  Similarly for
statement timeout, etc.

When you call enable_timeout you first have to ensure that
DEADLOCK_TIMEOUT has been registered; and if it's not, die on an
Assert().  That way you ensure that there are no bugs where you try to
enable a timeout that hasn't been registered.


Currently, TimeoutName (as an index value) and "priority" is the same
semantically.

I would also add an Assert into register_timeout() to avoid double registration
of the same to prevent overriding he callback function pointer. If that's in
place, the TimeoutName value may still work as is: an index into 
base_timeouts[].


(If we later find the need to let external modules add timeouts, which I
find unlikely, we can have TimeoutRegister return TimeoutName and have
this value be dynamically allocated.  But for now I think that would be
useless complication).

The difference with lwlocks is that each lwlock is just a number.


Strictly speaking, just as TimeoutName.


   The
lwlock.c module doesn't need to know anything about how each lwlock is
actually used.  It's not the same thing here -- which is why I think it
would be better to have all modules, even the hardcoded ones, use a
registering interface.


OK.


... Now, it could be argued that it would be even better to have
TimeoutRegister not take the TimeoutName argument, and instead generate
it dynamically and pass it back to the caller -- that way you don't need
the enum in timeout.h.


This was what I had in mind at first ...


   The problem I have with that idea is that it
would force the caller modules to have a global variable to keep track
of that, which seems worse to me.


... and realized this as well.

So, should I keep the enum TimeoutName? Are global variables for
keeping dynamically assigned values preferred over the enum?
Currently we have 5 timeout sources in total, 3 of them are used by
regular backends, the remaining 2 are used by replication standby.
We can have a fixed size array (say with 8 or 16 elements) for future use
and this would be plenty.

Opinions?




I haven't looked at the second patch at all yet.

This is the whole point of the first patch.

I know that.  But I want to get the details of the framework right
before we move on to add more stuf

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-26 Thread Robert Haas

On Thu, Jun 14, 2012 at 11:43 AM, Kohei KaiGai  wrote:
> In the previous discussion, we planned to add a syntax option to
> clarify the command type to fire the RLS policy, such as FOR UPDATE.
> But current my opinion is, it is not necessary. For example, we can
> reference the contents of rows being updated / deleted using
> RETURNING clause. So, it does not make sense to have different
> RLS policy at UPDATE / DELETE from SELECT.

I agree.  That doesn't make sense, and we don't need to support it.

> On the other hand, I'm not 100% sure about my design to restrict
> rows to be updated and deleted. Similarly, it expands the target
> relation of UPDATE or DELETE statement into a sub-query with
> condition. ExecModifyTable() pulls a tuple from the sub-query,
> instead of regular table, so it seems to me working at least, but
> I didn't try all the possible cases of course.

I don't think there's any reason why that shouldn't work.  DELETE ..
USING already allows ModifyTable to be scanning a join product, so
this is not much different.

> Of course, here is some limitations, to keep the patch size reasonable
> level to review.
> - The permission to bypass RLS policy was under discussion.
>  If and when we should provide a special permission to bypass RLS
>  policy, the "OR has_superuser_privilege()" shall be replaced by
>  "OR has_table_privilege(tableoid, 'RLSBYPASS')".

I think you're missing the point.  Everyone who has commented on this
issue is in favor of having some check that causes the RLS predicate
*not to get added in the first place*.  Adding a modified predicate is
not the same thing.  First, the query planner might not be smart
enough to optimize away the clause even when the predicate holds,
causing an unnecessary performance drain.  Second, there's too much
danger of being able to set a booby-trap for the superuser this way.
Suppose that the RLS subsystem replaces f_malicious() by f_malicious
OR has_superuser_privilege().  Now the superuser comes along with the
nightly pg_dump run and the query optimizer executes SELECT * FROM
nuts WHERE f_malicious() OR has_superuser_privilege().  The query
optimizer notes that the cost of f_malicious() is very low and decides
to execute that before has_superuser_privilege().  Oops.  I think it's
just not acceptable to handle this by clause-munging: we need to not
add the clause in the first place.

Comments on the patch itself:

1. Please do not abbreviate rowlevel to rowlv or RowLevel to RowLv or
ROWLEVEL to ROWLV.  That makes it harder to read and harder to grep.
Spell it out.

2. Since the entirety of ATExecSetRowLvSecurity is conditional on
whether clause != NULL, you might as well split it into two functions,
one for each case.

3. The fact that you've had to hack preprocess_targetlist and
adjust_appendrel_attrs_mutator suggests to me that the insertion of
the generate subquery is happening at the wrong phase of the process.
We don't need those special cases for views, so it seems like we
shouldn't need them here, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Alvaro Herrera

Excerpts from Boszormenyi Zoltan's message of mar jun 26 03:59:06 -0400 2012:
> 
> 2012-06-26 06:59 keltezéssel, Alvaro Herrera írta:
> > I cleaned up the framework patch a bit.  My version's attached.  Mainly,
> > returning false for failure in some code paths that are only going to
> > have the caller elog(FATAL) is rather pointless -- it seems much better
> > to just have the code itself do the elog(FATAL).  In any case, I
> > searched for reports of those error messages being reported in the wild
> > and saw none.
> 
> OK. The cleanups are always good.
> 
> One nitpick, though. Your version doesn't contain the timeout.h
> and compilation fails because of it. :-) You might have forgotten
> to do "git commit -a".

Oops.  Attached. It's pretty much the same you had, except some "bools"
are changed to void.

> > There is one thing that still bothers me on this whole framework patch,
> > but I'm not sure it's easily fixable.  Basically the API to timeout.c is
> > the whole list of timeouts and their whole behaviors.  If you want to
> > add a new one, you can't just call into the entry points, you have to
> > modify timeout.c itself (as well as timeout.h as well as whatever code
> > it is that you want to add timeouts to).  This may be good enough, but I
> > don't like it.  I think it boils down to proctimeout.h is cheating.
> >
> > The interface I would actually like to have is something that lets the
> > modules trying to get a timeout register the timeout-checking function
> > as a callback.  API-wise, it would be much cleaner.  However, I'm not
> > raelly sure that code-wise it would be any cleaner at all.  In fact I
> > think it'd be rather cumbersome.  However, if you could give that idea
> > some thought, it'd be swell.
> 
> Well, I can make the registration interface similar to how LWLocks
> are treated, but that doesn't avoid modification of the base_timeouts
> array in case a new internal use case arises. Say:
> 
> #define USER_TIMEOUTS4
> 
> intn_timeouts = TIMEOUT_MAX;
> static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS];
> 
> and register_timeout() adds data after TIMEOUT_MAX.

Well, I don't expect that we're going to have many external uses.  My
point about registration is so that internal callers use it as well.  I
was thinking we could do something like xact.c does, which is to have
somewhere in proc.c a call like this:

TimeoutRegister(DEADLOCK_TIMEOUT, 1, true, CheckDeadLock, GetCurrentTimestamp);

at process startup (the magic value 1 is the priority.  Maybe there's a
better way to handle this).  That way, timeout.c or timeout.h do not
need to know anything about proc.c at all; we don't need proctimeout.h
at all.  The only thing it (the timeout module) needs to know is that
there is a symbolic constant named DEADLOCK_TIMEOUT.  Similarly for
statement timeout, etc.

When you call enable_timeout you first have to ensure that
DEADLOCK_TIMEOUT has been registered; and if it's not, die on an
Assert().  That way you ensure that there are no bugs where you try to
enable a timeout that hasn't been registered.

(If we later find the need to let external modules add timeouts, which I
find unlikely, we can have TimeoutRegister return TimeoutName and have
this value be dynamically allocated.  But for now I think that would be
useless complication).

The difference with lwlocks is that each lwlock is just a number.  The
lwlock.c module doesn't need to know anything about how each lwlock is
actually used.  It's not the same thing here -- which is why I think it
would be better to have all modules, even the hardcoded ones, use a
registering interface.

... Now, it could be argued that it would be even better to have
TimeoutRegister not take the TimeoutName argument, and instead generate
it dynamically and pass it back to the caller -- that way you don't need
the enum in timeout.h.  The problem I have with that idea is that it
would force the caller modules to have a global variable to keep track
of that, which seems worse to me.

> > I haven't looked at the second patch at all yet.
> 
> This is the whole point of the first patch.

I know that.  But I want to get the details of the framework right
before we move on to add more stuff to it.

> But as I said above for
> registering a new timeout source, it's a new internal use case.
> One that touches a lot of places which cannot simply be done
> by registering a new timeout source.

Sure.  That's going to be the case for any other timeout we want to add
(which is why I think we don't really need dynamic timeouts).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


1-timeout-framework-v9a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-06-26 Thread Robert Haas

On Tue, May 22, 2012 at 12:15 PM, chinnaobi  wrote:
> You mean when the primary which is going to switch its role to standby might
> not have sent all the WAL records to the standby and If it is switched to
> standby it has more WAL records than the standby which is now serves as
> primary. Is it ??

Yes, that is possible.  Or the standby might have received all the WAL
records but not be caught up in terms of replaying them.

> It is actually the standby server which has to be restored from archive when
> it is switching to primary right .. Not the primary which is switching to
> standby ??

If you want to promote a standby, you can just do it (pg_ctl promote).
 If you have a master that you want to demote to a standby, you've got
to resync it to whatever the current master is.  I understand repmgr
has some tooling to help automate that, although I have not played
with it myself.  In any event rsync can be a big help in reducing the
resync time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] empty backup_label

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 11:33 AM, Magnus Hagander  wrote:
> On Tue, Jun 26, 2012 at 5:24 PM, Robert Haas  wrote:
>> On Sun, Jun 24, 2012 at 5:33 PM, David Kerr  wrote:
>>> Howdy,
>>>
>>> We're using NetApp's flexclone's whenever we need to move our DB between 
>>> machines.
>>>
>>> One specific case where we do that is when we're creating a new streaming 
>>> replication target.
>>>
>>> The basic steps we're using are:
>>> pg_start_backup();
>>> 
>>> pg_stop_backup();
>>>
>>> The problem i'm seeing is that periodically the backup_label is empty, 
>>> which means
>>> I can't start the new standby.
>>>
>>> I believe that since the NetApp stuff is all happening within the SAN this 
>>> file hasn't been
>>> fsynced to disk by the time we take the snapshot.
>>>
>>> One option would be to do a "sync" prior to the clone, however that seems 
>>> kind of like a
>>> heavy operation, and it's slightly more complicated to script. (having to 
>>> have a user
>>> account on the system to sudo rather than just connecting to the db to 
>>> issue the
>>> pg_start_backup(...) )
>>>
>>> Another option is to add pg_fsync(fileno(fp)) after the fflush() when 
>>> creating the file (I'm not
>>> sure if fsync implies fflush or not, if it does you could just replace it.)
>>>
>>> I think this type of snapshot is fairly common, I've been doing them since 
>>> 2000 with EMC,
>>> i'm sure that most SAN vendors support it.
>>
>> These seems like a good idea to me.  Actually, I'm wondering if we
>> shouldn't back-patch this.
>>
>> Thoughts?
>
> Certainly can't hurt.
>
> I guess any other files that are lost this way will be recreated by
> WAL recovery - or is there something else tha tmight be of risk of
> similar treatment?

I can't think of anything.  pg_start_backup does a checkpoint, which
in theory oughta be enough to make sure everything that matters hits
the platter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] empty backup_label

2012-06-26 Thread David Kerr

On Tue, Jun 26, 2012 at 05:33:42PM +0200, Magnus Hagander wrote:
- On Tue, Jun 26, 2012 at 5:24 PM, Robert Haas  wrote:
- > On Sun, Jun 24, 2012 at 5:33 PM, David Kerr  wrote:
- >> Howdy,
- >>
- >> We're using NetApp's flexclone's whenever we need to move our DB between 
machines.
- >>
- >> One specific case where we do that is when we're creating a new streaming 
replication target.
- >>
- >> The basic steps we're using are:
- >> pg_start_backup();
- >> 
- >> pg_stop_backup();
- >>
- >> The problem i'm seeing is that periodically the backup_label is empty, 
which means
- >> I can't start the new standby.
- >>
- >> I believe that since the NetApp stuff is all happening within the SAN this 
file hasn't been
- >> fsynced to disk by the time we take the snapshot.
- >>
- >> One option would be to do a "sync" prior to the clone, however that seems 
kind of like a
- >> heavy operation, and it's slightly more complicated to script. (having to 
have a user
- >> account on the system to sudo rather than just connecting to the db to 
issue the
- >> pg_start_backup(...) )
- >>
- >> Another option is to add pg_fsync(fileno(fp)) after the fflush() when 
creating the file (I'm not
- >> sure if fsync implies fflush or not, if it does you could just replace it.)
- >>
- >> I think this type of snapshot is fairly common, I've been doing them since 
2000 with EMC,
- >> i'm sure that most SAN vendors support it.
- >
- > These seems like a good idea to me.  Actually, I'm wondering if we
- > shouldn't back-patch this.
- >
- > Thoughts?
- 
- Certainly can't hurt.
- 
- I guess any other files that are lost this way will be recreated by
- WAL recovery - or is there something else tha tmight be of risk of
- similar treatment?

The only other file I've run into is pgstats.stat, which I think is ok to get 
blown away.
There certianly could be others though.

Dave

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-06-26 Thread Robert Haas

On Tue, May 22, 2012 at 11:31 PM, Joel Jacobson  wrote:
> This is true, which means some users won't be able to use the feature,
> because they are using an ancient OS or have function names with slashes,
> hm, is it even possible to have function names with slashes?

Sure.  If you quote the function name, you can put anything you want
in there.  Note that Windows disallows a whole bunch of special
characters in filenames, while UNIX-like systems tend to disallow only
slash.

> I suppose you have a lot more experience of what postgres installations exists
> in the world. Do you think it's common databases have non-ascii problematic
> characters in object names?
>
> Is it a project policy all features of all standard tools must be
> useful for all users
> on all platforms on all databases? Or is it acceptable if some features are 
> only
> useable for, say, 90% of the users?

There are cases where we permit features that only work on some
platforms, but it's rare.  Usually, we do this only when the platform
lacks some API that exists elsewhere.  For example, collations and
prefetching are not supported on Windows because the UNIX APIs we use
don't exist there.

In this case, it seems like you could work around the problem by, say,
URL-escaping any characters that can't be used in an unquoted
identifier.  Of course that might make the file name long enough to
hit the platform-specific file name limit.  Not sure what to do about
that.  The basic idea you're proposing here has been proposed a number
of times before, but it's always fallen down over questions of (1)
what do do with very long object names or those containing special
characters and (2) objects (like functions) for which schema+name is
not a unique identifier.

I don't think either of these problems ought to be a complete
show-stopper.  It seems to me that the trade-off is that when object
names are long, contain special characters, or are overloaded, we'll
have to munge the names in some way to prevent collisions.  That could
mean that the names are not 100% stable, which would possibly produce
some annoyance if you're using a VCS to track changes, but maybe
that's an acceptable trade-off, because it shouldn't happen very
often.  If we could guararantee that identifiers less than 64
characters which are not overloaded and contain no special characters
requiring quoting end up in an eponymous file, I think that would be
good enough to make most of our users pretty happy.  In other cases, I
think the point would just be to make it work (with a funny name)
rather than fail.

> \i /home/postgres/database/gluepay-split/public/TYPE/dblink_pkey_results.sql
> \i /home/postgres/database/gluepay-split/public/TYPE/r_matchedwithdrawal.sql
> \i 
> /home/postgres/database/gluepay-split/public/TYPE/r_unapprovedwithdrawal.sql
> \i 
> /home/postgres/database/gluepay-split/public/TYPE/ukaccountvalidationchecktype.sql
> \i /home/postgres/database/gluepay-split/aml/FUNCTION/check_name.sql
> \i /home/postgres/database/gluepay-split/aml/FUNCTION/describe_entityid.sql
> \i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql
> \i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql
> -- ... all the objects ..
> \i 
> /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerid_fkey.sql
> \i 
> /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerstatusid_fkey.sql
> \i 
> /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workertypeid_fkey.sql

It would be better to use \ir here rather than hard-code path names, I
think.  Then you'd only need to require that all the files be in the
same directory, rather than requiring them to be at a certain
hard-coded place in the filesystem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-26 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jun 25, 2012 at 4:25 PM, k...@rice.edu  wrote:
>> Here is the benchmark list from the Google lz4 page:
>> 
>> NameRatio   C.speed D.speed
>> LZ4 (r59)   2.084   330  915
>> LZO 2.05 1x_1   2.038   311  480
>> QuickLZ 1.5 -1  2.233   257  277
>> Snappy 1.0.52.024   227  729
>> LZF 2.076   197  465
>> FastLZ  2.030   190  420
>> zlib 1.2.5 -1   2.72839  195
>> LZ4 HC (r66)2.71218 1020
>> zlib 1.2.5 -6   3.09514  210

>> lz4 absolutely dominates on compression/decompression speed. While fastlz
>> is faster than zlib(-1) on compression, lz4 is almost 2X faster still.

> At the risk of making everyone laugh at me, has anyone tested pglz?

Another point here is that those Google numbers (assuming that they
apply to our use-cases, a point not in evidence) utterly fail to make
the argument that zlib is not the thing to use.  zlib is beating all
the others on compression ratio, often by 50%.  Before making any
comparisons, you have to make some assumptions about what the network
speed is ... and unless it's pretty damn fast relative to your CPU speed
getting the better compression ratio is going to be more attractive than
saving some cycles.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] empty backup_label

2012-06-26 Thread Magnus Hagander

On Tue, Jun 26, 2012 at 5:24 PM, Robert Haas  wrote:
> On Sun, Jun 24, 2012 at 5:33 PM, David Kerr  wrote:
>> Howdy,
>>
>> We're using NetApp's flexclone's whenever we need to move our DB between 
>> machines.
>>
>> One specific case where we do that is when we're creating a new streaming 
>> replication target.
>>
>> The basic steps we're using are:
>> pg_start_backup();
>> 
>> pg_stop_backup();
>>
>> The problem i'm seeing is that periodically the backup_label is empty, which 
>> means
>> I can't start the new standby.
>>
>> I believe that since the NetApp stuff is all happening within the SAN this 
>> file hasn't been
>> fsynced to disk by the time we take the snapshot.
>>
>> One option would be to do a "sync" prior to the clone, however that seems 
>> kind of like a
>> heavy operation, and it's slightly more complicated to script. (having to 
>> have a user
>> account on the system to sudo rather than just connecting to the db to issue 
>> the
>> pg_start_backup(...) )
>>
>> Another option is to add pg_fsync(fileno(fp)) after the fflush() when 
>> creating the file (I'm not
>> sure if fsync implies fflush or not, if it does you could just replace it.)
>>
>> I think this type of snapshot is fairly common, I've been doing them since 
>> 2000 with EMC,
>> i'm sure that most SAN vendors support it.
>
> These seems like a good idea to me.  Actually, I'm wondering if we
> shouldn't back-patch this.
>
> Thoughts?

Certainly can't hurt.

I guess any other files that are lost this way will be recreated by
WAL recovery - or is there something else tha tmight be of risk of
similar treatment?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-26 Thread Euler Taveira

On 26-06-2012 12:23, Robert Haas wrote:
> At the risk of making everyone laugh at me, has anyone tested pglz?  I
> observe that if the compression ration and performance are good, we
> might consider using it for this purpose, too, which would avoid
> having to add dependencies.  Conversely, if they are bad, and we
> decide to support another algorithm, we might consider also using that
> other algorithm, at least optionally, for the purposes for which we
> now use pglz.
> 
I'll remember to test it too.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to avoid base backup in automated failover

2012-06-26 Thread Robert Haas

On Mon, Jun 4, 2012 at 9:14 AM, chinnaobi  wrote:
> Recently I was writing an application to implement automated failover with
> env: Two 2008 R2 servers, Network area storage, asynchronous replication,
> WAL archive on primary enabled.
>
> Is there any way to avoid starting standby server always from base backup in
> automated failover. I see the database is growing huge. I can't keep doing
> base backup every day.
>
> Please suggest solution

The usual solution is to configure the standby as a warm or hot
standby, so that logs are continuously replayed there.  Then if the
master dies, you only have to wait for replication to catch up before
promoting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST subsplit question

2012-06-26 Thread Robert Haas

On Wed, May 30, 2012 at 4:35 PM, Alexander Korotkov
 wrote:
>> So, do we demote that message to a DEBUG1? Or do we make it more clear
>> what the authors of a specific picksplit are supposed to do to avoid
>> that problem? Or am I misunderstanding something?
>
>
> +1 for demote message to DEBUG1. I think it shouldn't be so noisy, it just
> indicates something could be improved.
> Also I think we defenitely need to document secondary split. Now it's no
> chances to understand without reverse engeneering it from code.

I'm happy to go demote the message if we have consensus on that, but
somebody else is going to need to provide the doc patch.  Any takers?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] empty backup_label

2012-06-26 Thread Robert Haas

On Sun, Jun 24, 2012 at 5:33 PM, David Kerr  wrote:
> Howdy,
>
> We're using NetApp's flexclone's whenever we need to move our DB between 
> machines.
>
> One specific case where we do that is when we're creating a new streaming 
> replication target.
>
> The basic steps we're using are:
> pg_start_backup();
> 
> pg_stop_backup();
>
> The problem i'm seeing is that periodically the backup_label is empty, which 
> means
> I can't start the new standby.
>
> I believe that since the NetApp stuff is all happening within the SAN this 
> file hasn't been
> fsynced to disk by the time we take the snapshot.
>
> One option would be to do a "sync" prior to the clone, however that seems 
> kind of like a
> heavy operation, and it's slightly more complicated to script. (having to 
> have a user
> account on the system to sudo rather than just connecting to the db to issue 
> the
> pg_start_backup(...) )
>
> Another option is to add pg_fsync(fileno(fp)) after the fflush() when 
> creating the file (I'm not
> sure if fsync implies fflush or not, if it does you could just replace it.)
>
> I think this type of snapshot is fairly common, I've been doing them since 
> 2000 with EMC,
> i'm sure that most SAN vendors support it.

These seems like a good idea to me.  Actually, I'm wondering if we
shouldn't back-patch this.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-26 Thread Robert Haas

On Mon, Jun 25, 2012 at 4:25 PM, k...@rice.edu  wrote:
> On Mon, Jun 25, 2012 at 09:45:26PM +0200, Florian Pflug wrote:
>> On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
>> > Magnus Hagander  writes:
>> >> Or that it takes less code/generates cleaner code...
>> >
>> > So we're talking about some LZO things such as snappy from google, and
>> > that would be another run time dependency IIUC.
>> >
>> > I think it's time to talk about fastlz:
>> >
>> >  http://fastlz.org/
>> >  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
>> >
>> >  551 lines of C code under MIT license, works also under windows
>> >
>> > I guess it would be easy (and safe) enough to embed in our tree should
>> > we decide going this way.
>>
>> Agreed. If we extend the protocol to support compression, and not rely
>> on SSL, then let's pick one of these LZ77-style compressors, and let's
>> integrate it into our tree.
>>
>> We should then also make it possible to enable compression only for
>> the server -> client direction. Since those types of compressions are
>> usually pretty easy to decompress, that reduces the amount to work
>> non-libpq clients have to put in to take advantage of compression.
>
> Here is the benchmark list from the Google lz4 page:
>
> Name            Ratio   C.speed D.speed
> LZ4 (r59)       2.084   330      915
> LZO 2.05 1x_1   2.038   311      480
> QuickLZ 1.5 -1  2.233   257      277
> Snappy 1.0.5    2.024   227      729
> LZF             2.076   197      465
> FastLZ          2.030   190      420
> zlib 1.2.5 -1   2.728    39      195
> LZ4 HC (r66)    2.712    18     1020
> zlib 1.2.5 -6   3.095    14      210
>
> lz4 absolutely dominates on compression/decompression speed. While fastlz
> is faster than zlib(-1) on compression, lz4 is almost 2X faster still.

At the risk of making everyone laugh at me, has anyone tested pglz?  I
observe that if the compression ration and performance are good, we
might consider using it for this purpose, too, which would avoid
having to add dependencies.  Conversely, if they are bad, and we
decide to support another algorithm, we might consider also using that
other algorithm, at least optionally, for the purposes for which we
now use pglz.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-06-26 Thread Fujii Masao

On Wed, Mar 28, 2012 at 10:08 AM, Fujii Masao  wrote:
> On Wed, Mar 28, 2012 at 12:30 AM, Alvaro Herrera
>  wrote:
>>
>> Excerpts from Fujii Masao's message of mar mar 27 06:40:34 -0300 2012:
>>
>>> Anyway, should I add this patch into the next CF? Or is anyone planning
>>> to commit the patch for 9.2?
>>
>> I think the correct thing to do here is add to next CF, and if some
>> committer has enough interest in getting it quickly it can be grabbed
>> from there and committed into 9.2.
>
> Yep! I've added it to next CF.

Attached is the revised version of the patch.

Regards,

-- 
Fujii Masao


pg_controldata_walfilename_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-26 Thread Andres Freund

On Tuesday, June 26, 2012 04:01:26 PM Robert Haas wrote:
> On Fri, Jun 22, 2012 at 12:35 PM, Andres Freund  
wrote:
> >> Can you elaborate on that a bit?  What scenarios did you play around
> >> with, and what does "win" mean in this context?
> > 
> > I had two machines connected locally and setup HS and my prototype
> > between them (not at once obviously).
> > The patch reduced all the average latency between both nodes (measured by
> > 'ticker' rows arriving in a table on the standby), the jitter in latency
> > and the amount of load I had to put on the master before the standby
> > couldn't keep up anymore.
> > 
> > I played with different loads:
> > * multple concurrent ~50MB COPY's
> > * multple concurrent ~50MB COPY's, pgbench
> > * pgbench
> > 
> > All three had a ticker running concurrently with synchronous_commit=off
> > (because it shouldn't cause any difference in the replication pattern
> > itself).
> > 
> > The difference in averagelag and cutoff were smallest with just pgbench
> > running alone and biggest with COPY running alone. Highjitter was most
> > visible with just pgbench running alone but thats likely just because
> > the average lag was smaller.
> 
> OK, that sounds pretty promising.  I'd like to run a few performance
> tests on this just to convince myself that it doesn't lead to a
> significant regression in other scenarios.  Assuming that doesn't turn
> up anything major, I'll go ahead and commit this.
Independent testing would be great, its definitely possible that I oversaw 
something although I obviously don't think so ;).

> Can you provide a rebased version?  It seems that one of the hunks in
> xlog.c no longer applies.
Will do so. Not sure if I can finish it today though, I am in the midst of 
redoing the ilist and xlogreader patches. I guess tomorrow will suffice 
otherwise...

Thanks!

Andres


-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-26 Thread Kohei KaiGai

2012/6/26 Etsuro Fujita :
> Hi Kaigai-san,
>
>> -Original Message-
>> From: Kohei KaiGai [mailto:kai...@kaigai.gr.jp]
>> Sent: Monday, June 25, 2012 9:49 PM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Fujita-san,
>>
>> The revised patch is almost good for me.
>> Only point I noticed is the check for redundant or duplicated options I
> pointed
>> out on the previous post.
>> So, if you agree with the attached patch, I'd like to hand over this patch 
>> for
>> committers.
>
> OK I agree with you.  Thanks for the revision!
>
> Besides the revision, I modified check_selective_binary_conversion() to run
> heap_close() in the whole-row-reference case.  Attached is an updated version 
> of
> the patch.
>
Sorry, I overlooked this code path. It seems to me fair enough.

So, I'd like to take over this patch for committers.

Thanks,

> Thanks.
>
> Best regards,
> Etsuro Fujita
>
>> All I changed is:
>>  --- a/src/backend/commands/copy.c
>>  +++ b/src/backend/commands/copy.c
>>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index
>> 98bcb2f..0143d81 100644
>>         }
>>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
>>  +      {
>> -+          if (cstate->convert_binary)
>> ++          if (cstate->convert_selectively)
>>  +              ereport(ERROR,
>>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
>>  +                       errmsg("conflicting or redundant options")));
>>
>> Thanks,
>>
>> 2012/6/20 Etsuro Fujita :
>> > Hi KaiGai-san,
>> >
>> > Thank you for the review.
>> >
>> >> -Original Message-
>> >> From: pgsql-hackers-ow...@postgresql.org
>> >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
>> >> Sent: Wednesday, June 20, 2012 1:26 AM
>> >> To: Etsuro Fujita
>> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> Hi Fujita-san,
>> >>
>> >> Could you rebase this patch towards the latest tree?
>> >> It was unavailable to apply the patch cleanly.
>> >
>> > Sorry, I updated the patch.  Please find attached an updated version
>> > of the patch.
>> >
>> >> I looked over the patch, then noticed a few points.
>> >>
>> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> >> If so, cstate->convert_binary is not a suitable flag to check
>> >> redundant option. It seems to me cstate->convert_selectively is more
>> >> suitable flag to check it.
>> >>
>> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> >> +       {
>> >> +           if (cstate->convert_binary)
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> >> +                        errmsg("conflicting or redundant
>> >> + options")));
>> >> +           cstate->convert_selectively = true;
>> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> >> +               cstate->convert_binary = (List *) defel->arg;
>> >> +           else
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> >> +                        errmsg("argument to option \"%s\" must be a
>> >> list of column names",
>> >> +                               defel->defname)));
>> >> +       }
>> >
>> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
>> > all the columns needed to be converted to binary representation, which
>> > is NIL for the case where no columns are needed to be converted.  For
>> > example,
>> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
>> > cstate->convert_selectively is set to true, no columns are converted
>> > cstate->at
>> > NextCopyFrom().  Most efficient case!  In short,
>> > cstate->convert_selectively represents whether to do selective binary
>> > conversion at NextCopyFrom(), and
>> > cstate->convert_binary represents all the columns to be converted at
>> > NextCopyFrom(), that can be NIL.
>> >
>> >> At NextCopyFrom(), this routine computes default values if configured.
>> >> In case when these values are not referenced, it might be possible to
>> >> skip unnecessary calculations.
>> >> Is it unavailable to add logic to avoid to construct cstate->defmap
>> >> on unreferenced columns at  BeginCopyFrom()?
>> >
>> > I think that we don't need to add the above logic because file_fdw
>> > does
>> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
>> > doesn't construct cstate->defmap at all.
>> >
>> > I fixed a bug plus some minor optimization in
>> > check_binary_conversion() that is renamed to
>> > check_selective_binary_conversion() in the updated version, and now
>> > file_fdw gives up selective binary conversion for the following
>> > cases:
>> >
>> >  a) BINARY format
>> >  b) CSV/TEXT format and whole row reference
>> >  c) CSV/TEXT format a

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-26 Thread Robert Haas

On Fri, Jun 22, 2012 at 12:35 PM, Andres Freund  wrote:
>> Can you elaborate on that a bit?  What scenarios did you play around
>> with, and what does "win" mean in this context?
> I had two machines connected locally and setup HS and my prototype between
> them (not at once obviously).
> The patch reduced all the average latency between both nodes (measured by
> 'ticker' rows arriving in a table on the standby), the jitter in latency and
> the amount of load I had to put on the master before the standby couldn't keep
> up anymore.
>
> I played with different loads:
> * multple concurrent ~50MB COPY's
> * multple concurrent ~50MB COPY's, pgbench
> * pgbench
>
> All three had a ticker running concurrently with synchronous_commit=off
> (because it shouldn't cause any difference in the replication pattern itself).
>
> The difference in averagelag and cutoff were smallest with just pgbench 
> running
> alone and biggest with COPY running alone. Highjitter was most visible with
> just pgbench running alone but thats likely just because the average lag was
> smaller.

OK, that sounds pretty promising.  I'd like to run a few performance
tests on this just to convince myself that it doesn't lead to a
significant regression in other scenarios.  Assuming that doesn't turn
up anything major, I'll go ahead and commit this.

Can you provide a rebased version?  It seems that one of the hunks in
xlog.c no longer applies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw in contrib

2012-06-26 Thread Kohei KaiGai

Harada-san,

I checked your patch, and had an impression that includes many
improvements from the previous revision that I looked at the last
commit fest.

However, I noticed several points to be revised, or investigated.

* It seems to me expected results of the regression test is not
  attached, even though test cases were included. Please add it.

* cleanup_connection() does not close the connection in case
  when this callback was invoked due to an error under sub-
  transaction. It probably makes problematic behavior.

  See the following steps to reproduce the matter:

postgres=# BEGIN;
BEGIN
postgres=# SELECT * FROM ft3;
 a |  b
---+-
 1 | aaa
 2 | bbb
 3 | ccc
 4 | ddd
 5 | eee
 6 | fff
(6 rows)

postgres=# SAVEPOINT sv_01;
SAVEPOINT

postgres=# SELECT * FROM ft3 WHERE 1 / (a - 4) > 0;  -- should be
error on remote transaction
ERROR:  could not execute remote query
DETAIL:  ERROR:  division by zero

HINT:  SELECT a, b FROM public.t1 WHERE (((1 OPERATOR(pg_catalog./) (a
OPERATOR(pg_catalog.-) 4)) OPERATOR(pg_catalog.>) 0))

postgres=# ROLLBACK TO SAVEPOINT sv_01;
ROLLBACK

postgres=# SELECT * FROM ft3;
ERROR:  could not execute EXPLAIN for cost estimation
DETAIL:  ERROR:  current transaction is aborted, commands ignored
until end of transaction block

HINT:  SELECT a, b FROM public.t1

  Once we got an error under the remote transaction, it eventually raises
  an error on local side, then cleanup_connection() should be invoked.
  But it ignores the error due to sub-transaction, thus, the remote transaction
  being already aborted is kept.
  I'd like to suggest two remedy.
  1. connections are closed, even if errors on sub-transaction.
  2. close the connection if PQexecParams() returns an error,
  on execute_query() prior to raise a local error.

  * Regarding to deparseSimpleSql(), it pulls attributes being referenced
from baserestrictinfo and reltargetlist using pull_var_clause().
Is it unavailable to use local_conds instead of baserestrictinfo?
We can optimize references to the variable being consumed at the
remote side only. All we need to transfer is variables referenced
in targetlist and local-clauses.
In addition, is pull_var_clause() reasonable to list up all the attribute
referenced at the both expression tree? It seems to be pull_varattnos()
is more useful API in this situation.

  * Regarding to deparseRelation(), it scans simple_rte_array to fetch
RangeTblEntry with relation-id of the target foreign table.
It does not match correct entry if same foreign table is appeared in
this list twice or more, like a case of self-join. I'd like to recommend
to use simple_rte_array[baserel->relid] instead.
In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE,
or not. It seems to me this check should be Assert(), if routines of
pgsql_fdw is called towards regular relations.

  * Regarding to deparseVar(), is it unnecessary to check relkind of
the relation being referenced by Var node, isn't it?

  * Regarding to deparseBoolExpr(), compiler raised a warning
because op can be used without initialized.

  * Even though it is harmless, sortConditions() is a misleading function
name. How about categolizeConditions() or screeningConditions()?

Thanks for your great jobs.

2012/6/14 Shigeru HANADA :
> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
> in core, again.  This patch is basically rebased version of the patches
> proposed in 9.2 development cycle, and contains some additional changes
> such as concern about volatile variables for PG_CATCH blocks.  In
> addition, I combined old three patches (pgsql_fdw core functionality,
> push-down support, and analyze support) into one patch for ease of
> review. (I think these features are necessary for pgsql_fdw use.)
>
> After the development for 9.2 cycle, pgsql_fdw can gather statistics
> from remote side by providing sampling function to analyze module in
> backend core, use them to estimate selectivity of WHERE clauses which
> will be pushed-down, and retrieve query results from remote side with
> custom row processor which prevents memory exhaustion from huge result set.
>
> It would be possible to add some more features, such as ORDER BY
> push-down with index information support, without changing existing
> APIs, but at first add relatively simple pgsql_fdw and enhance it seems
> better.  In addition, once pgsql_fdw has been merged, it would help
> implementing proof-of-concept of SQL/MED-related features.
>
> Regards,
> --
> Shigeru HANADA
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread Robert Haas

On Mon, Jun 25, 2012 at 3:17 PM, Andres Freund  wrote:
>> I suppose the main reason we haven't done it already is that it
>> increases the period of time during which we're using 2X the disk
>> space.
> I find that an acceptable price if its optional. Making it such doesn't seem
> to be a problem for me.

+1.

>> I think there is absolutely nothing wrong with doing extra things in
>> ALTER TABLE when logical replication is enabled.  We've got code
>> that's conditional on Hot Standby being enabled in many places in the
>> system; why should logical replication be any different?  If we set
>> the bar for logical replication at "the system can't do anything
>> differently when logical replication is enabled" then I cheerfully
>> submit that we are doomed.  You've already made WAL format changes to
>> support logging the pre-image of the tuple, which is a hundred times
>> more likely to cause a performance problem than any monkeying around
>> we might want to do in ALTER TABLE.
>>
>> I am deeply skeptical that we need to look inside of transactions that
>> do full-table rewrites.  But even if we do, I don't see that what I'm
>> proposing precludes it.  For example, I think we could have ALTER
>> TABLE emit WAL records specifically for logical replication that allow
>> us to disentangle which tuple descriptor to use at which point in the
>> transaction.  I don't see that that would even be very difficult to
>> set up.
> Sorry, I was imprecise above: I have no problem doing some changes during
> ALTER TABLE if logical rep is enabled. I am worried though that to make that
> robust you would need loads of places that emit additional information:
> * ALTER TABLE
> * ALTER FUNCTIION
> * ALTER OPERATOR
> * ALTER/CREATE CAST
> * TRUNCATE
> * CLUSTER
> * ...
>
> I have the feeling that would we want to do that the full amount of required
> information would be rather high and end up being essentially the catalog.
> Just having an intermediate tupledesc doesn't help that much if you have e.g.
> record_out doing type lookups of its own.
>
> There also is the issue you have talked about before, that a user-type might
> depend on values in other tables. Unless were ready to break at least
> transactional behaviour for those for now...) I don't see how decoding outside
> of the transaction is ever going to be valid? I wouldn't have a big problem
> declaring that as broken for now...

I've been thinking about this a lot.  My thinking's still evolving
somewhat, but consider the following case.  A user defines a type T
with an input function I and and output function O.   They create a
table which uses type T and insert a bunch of data, which is duly
parsed using I; then, they replace I with a new input function I' and
O with a new output function O'.  Now, clearly, if we process the
inserts using the catalogs that were in effect at the time the inserts
we're done, we could theoretically get different output than if we use
the catalogs that were in effect after the I/O functions were
replaced.  But is the latter output wrong, or merely different?  My
first thought when we started talking about this was "it's wrong", but
the more I think about it, the less convinced I am...

...because it can't possibly be right to suppose that it's impossible
to decode heap tuples using any catalog contents other than the ones
that were in effect at the time the tuples got inserted.  If that were
true, then we wouldn't be able to read a table after adding or
dropping a column, which of course we can.  It seems to me that it
might be sufficient to guarantee that we'll decode using the same
*types* that were in effect at the time the inserts happened.  If the
user yanks the rug out from under us by changing the type definition,
maybe we simply define that as a situation in which they get to keep
both pieces.  After all, if you replace the type definition in a way
that makes sensible decoding of the table impossible, you've pretty
much shot yourself in the foot whether logical replication enters the
picture or not.

If the enum case, for example, we go to great pains to make sure that
the table contents are always decodable not only under the current
version of SnapshotNow, but also any successor version.  We do that by
prohibiting ALTER TYPE .. ADD VALUE from running inside a transaction
block - because if we inserted a row into pg_enum and then inserted
dependent rows into some user table, a rollback could leave us with
rows that we can't decode.  For the same reason, we don't allow ALTER
TYPE .. DROP VALUE.  I think that we can infer a general principle
from this: while I/O functions may refer to catalog contents, they may
not do so in a way that could be invalidated by subsequent commits or
rollbacks.  If they did, they'd be breaking the ability of subsequent
SELECT statements to read the table.

An interesting case that is arguably an exception to this rule is that
regwhatever types, which will cheerfully output their value as an OID
if it

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 26, 2012 at 8:03 AM, Boszormenyi Zoltan  wrote:
>> I know, but it doesn't feel right to "register" static functionality.

> We do it elsewhere.  The overhead is pretty minimal compared to other
> things we already do during startup, and avoiding the need for the
> array to have a fixed-size seems worth it, IMHO.

It's not even clear that the array needs to be dynamically resizable (yet).
Compare for instance syscache invalidation callbacks, which have so far
gotten along fine with a fixed-size array to hold registrations.  It's
foreseeable that we'll need something better someday, but that day
hasn't arrived, and might never arrive.

I agree with the feeling that this patch isn't done if the "core"
timeout code has to know specifically about each usage.  We have that
situation already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-26 Thread Tom Lane

Amit Kapila  writes:
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
>> The implementation I've wanted to see for some time is that you can
>> start a standalone backend, but it speaks FE/BE protocol to its caller
>> (preferably over pipes, so that there is no issue whatsoever of where
>> you can securely put a socket or anything like that).  

> Can't it be done like follow the FE/BE protocol, but call directly the
> server API's 
> at required places. 

That wouldn't be easier, nor cleaner, and it would open us up to
client-induced database corruption (from failure to follow APIs, crashes
in the midst of an operation, memory stomps, etc).  We decided long ago
that we would never support truly embedded operation in the sense of PG
executing in the client's process/address space.  I like the design
suggested above because it has many of the good properties of an
embedded database (in particular, no need to manage or contact a server)
but still keeps the client code at arm's length.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 8:03 AM, Boszormenyi Zoltan  wrote:
> I know, but it doesn't feel right to "register" static functionality.

We do it elsewhere.  The overhead is pretty minimal compared to other
things we already do during startup, and avoiding the need for the
array to have a fixed-size seems worth it, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-26 Thread Robert Haas

On Mon, Jun 25, 2012 at 11:05 PM, Jeff Janes  wrote:
> On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas  wrote:
>>
>> Hmm.  That was actually a gloss I added on existing code to try to
>> convince myself that it was safe; I don't think that the changes I
>> made make that any more or less safe than it was before.
>
> Right, sorry.  I thought there was some strength reduction going on
> there as well.
>
> Thanks for the various explanations, they address my concerns.  I see
> that v2 applies over v1.
>
> I've verified performance improvements using 8 cores with my proposed
> pgbench -P benchmark, with a scale that fits in shared_buffers.
> It brings it most of the way, but not quite, up to the btree performance.
>
>
> I've marked this as ready for committer.

Thanks for the review; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread Simon Riggs

On 25 June 2012 17:42, Kevin Grittner  wrote:

> This is clearly going to depend on the topology.  You would
> definitely want to try to replicate the DDL for the case on which
> Simon is focused (which seems to me to be essentially physical
> replication of catalogs with logical replication of data changes
> from any machine to all others).

Just to remove any doubt: I'm not trying to support a single use case.

The overall proposals include a variety of design patterns. Each of
those covers many reasons for doing it, but end up with same
architecture.

1) Single master replication, with options not possible with physical
2) Multimaster
3) Many to One: data aggregation
4) Online upgrade

I don't think it will be possible to support all of those in one
release. Each has different challenges.

3 and 4 will not be worked on until 9.4, unless someone else is
willing to work on them. That isn't meant to be harsh, just an
explanation of practical reality that I hope people can accept without
needing to argue it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-26 Thread Satoshi Nagayasu

Hi all,

I've modified the pg_stat_lwlocks patch to be able to work with
the latest PostgreSQL Git code.

This patch provides:
  pg_stat_lwlocks   New system view to show lwlock statistics.
  pg_stat_get_lwlocks() New function to retrieve lwlock statistics.
  pg_stat_reset_lwlocks()   New function to reset lwlock statistics.

Please try it out.

Regards,

2012/06/26 5:29, Satoshi Nagayasu wrote:
> Hi all,
> 
> I've been working on a new system view, pg_stat_lwlocks, to observe
> LWLock, and just completed my 'proof-of-concept' code that can work
> with version 9.1.
> 
> Now, I'd like to know the possibility of this feature for future
> release.
> 
> With this patch, DBA can easily determine a bottleneck around lwlocks.
> --
> postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10;
>   lwlockid | calls  | waits | time_ms
> --++---+-
> 49 | 193326 | 32096 |   23688
>  8 |   3305 |   133 |1335
>  2 | 21 | 0 |   0
>  4 | 135188 | 0 |   0
>  5 |  57935 | 0 |   0
>  6 |141 | 0 |   0
>  7 |  24580 | 1 |   0
>  3 |   3282 | 0 |   0
>  1 | 41 | 0 |   0
>  9 |  3 | 0 |   0
> (10 rows)
> 
> postgres=#
> --
> 
> In this view,
>'lwlockid' column represents LWLockId used in the backends.
>'calls' represents how many times LWLockAcquire() was called.
>'waits' represents how many times LWLockAcquire() needed to wait
>within it before lock acquisition.
>'time_ms' represents how long LWLockAcquire() totally waited on
>a lwlock.
> 
> And lwlocks that use a LWLockId range, such as BufMappingLock or
> LockMgrLock, would be grouped and summed up in a single record.
> For example, lwlockid 49 in the above view represents LockMgrLock
> statistics.
> 
> Now, I know there are some considerations.
> 
> (1) Performance
> 
>I've measured LWLock performance both with and without the patch,
>and confirmed that this patch does not affect the LWLock perfomance
>at all.
> 
>pgbench scores with the patch:
>  tps = 900.906658 (excluding connections establishing)
>  tps = 908.528422 (excluding connections establishing)
>  tps = 903.900977 (excluding connections establishing)
>  tps = 910.470595 (excluding connections establishing)
>  tps = 909.685396 (excluding connections establishing)
> 
>pgbench scores without the patch:
>  tps = 909.096785 (excluding connections establishing)
>  tps = 894.868712 (excluding connections establishing)
>  tps = 910.074669 (excluding connections establishing)
>  tps = 904.022770 (excluding connections establishing)
>  tps = 895.673830 (excluding connections establishing)
> 
>Of course, this experiment was not I/O bound, and the cache hit ratio
>was>99.9%.
> 
> (2) Memory space
> 
>In this patch, I added three new members to LWLock structure
>as uint64 to collect statistics.
> 
>It means that those members must be held in the shared memory,
>but I'm not sure whether it's appropriate.
> 
>I think another possible option is holding those statistics
>values in local (backend) process memory, and send them through
>the stat collector process (like other statistics values).
> 
> (3) LWLock names (or labels)
> 
>Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is
>not easy for DBA to determine actual lock type.
> 
>So, I want to show LWLock names (or labels), like 'WALWriteLock'
>or 'LockMgrLock', but how should I implement it?
> 
> Any comments?
> 
> Regards,


-- 
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 7cc1d41..f832b45 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -658,6 +658,14 @@ CREATE VIEW pg_stat_bgwriter AS
 pg_stat_get_buf_alloc() AS buffers_alloc,
 pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_lwlocks AS
+SELECT
+S.lwlockid,
+S.calls,
+S.waits,
+S.time_ms
+FROM pg_stat_get_lwlocks() AS S;
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index 95d4b37..2a2c197 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -32,6 +32,7 @@
 #include "storage/proc.h"
 #include "storage/spin.h"
 
+#include 
 
 /* We use the ShmemLock spinlock to protect LWLockAssign */
 extern slock_t *ShmemLock;
@@ -46,6 +47,11 @@ typedef struct LWLock
PGPROC *head;   /* head of list of waiting 
PGPROCs */
PGPROC *tail;

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Boszormenyi Zoltan


2012-06-26 13:50 keltezéssel, Robert Haas írta:

On Tue, Jun 26, 2012 at 3:59 AM, Boszormenyi Zoltan  wrote:

Well, I can make the registration interface similar to how LWLocks
are treated, but that doesn't avoid modification of the base_timeouts
array in case a new internal use case arises. Say:

#define USER_TIMEOUTS4

intn_timeouts = TIMEOUT_MAX;
static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS];

Since timeouts - unlike lwlocks - do not need to touch shared memory,
there's no need for a hard-coded limit here.  You can just allocate
the array using MemoryContextAlloc(TopMemoryContext, ...) and enlarge
it as necessary.  To avoid needing to modify the base_timeouts array,
you can just have internal callers push their entries into the array
during process startup using the same function call that an external
module would use.


I know, but it doesn't feel right to "register" static functionality.

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-26 Thread Andres Freund

On Monday, June 25, 2012 08:50:54 PM Kevin Grittner wrote:
> Andres Freund  wrote:
> >> We most particularly *don't* want DDL to replicate automatically,
> >> because the schema changes are deployed along with related
> >> software changes, and we like to pilot any changes for at least a
> >> few days.  Depending on the release, the rollout may take a
> >> couple months, or we may slam in out everywhere a few days after
> >> the first pilot deployment.
> > 
> > Thats a sensible for your use-case - but I do not think its thats
> > the appropriate behaviour for anything which is somewhat
> > out-of-the box...

> Right.  We currently consider the issues involved in a change and
> script it by hand.  I think we want to continue to do that.  The
> point was that, given the variety of timings and techniques for
> distributing schema changes, maybe is was only worth doing that
> automatically for the most constrained and controlled cases.
Agreed.

> >> So you could certainly punt all of this for any release as far as
> >> Wisconsin Courts are concerned.  We need to know table and column
> >> names, before and after images, and some application-supplied
> >> metadata.
> > 
> > I am not sure were going to get all that into 9.3.
> 
> Sure, that was more related to why I was questioning how much these
> use cases even *could* integrate -- whether it even paid to
> *consider* these use cases at this point.  Responses from Robert and
> you have pretty much convinced me that I was being overly
> pessimistic on that.
I think its an important question to ask, otherwise we might just end up with 
infrastructure unusable for all our goals... Or usable but unfinished 
infrastructure because its to complex to build in sensible time.

> One fine point regarding before and after images -- if a value
> doesn't change in an UPDATE, there's no reason to include it in both
> the BEFORE and AFTER tuple images, as long as we have the null
> column bitmaps -- or some other way of distinguishing unchanged from
> NULL.  (This could be especially important when the unchanged column
> was a 50 MB bytea.)  It doesn't matter to me which way this is
> optimized -- in our trigger-based system we chose to keep the full
> BEFORE tuple and just show AFTER values which were distinct from the
> corresponding BEFORE values, but it would be trivial to adapt the
> code to the other way.
I don't think doing that is worth the trouble in the first incarnation. There 
is enough detail hidden in that that makes that non-trivial that I wouldn't 
worry about it until the rest of the infrastructure exists. That part of the 
code will definitely be version specific so I see no problem improving on that 
incrementally.

> Sorry for that bout of pessimism.
Oh, no reason for that. I have some doubts about that myself, so...

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-26 Thread Robert Haas

On Tue, Jun 26, 2012 at 3:59 AM, Boszormenyi Zoltan  wrote:
> Well, I can make the registration interface similar to how LWLocks
> are treated, but that doesn't avoid modification of the base_timeouts
> array in case a new internal use case arises. Say:
>
> #define USER_TIMEOUTS    4
>
> int    n_timeouts = TIMEOUT_MAX;
> static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS];

Since timeouts - unlike lwlocks - do not need to touch shared memory,
there's no need for a hard-coded limit here.  You can just allocate
the array using MemoryContextAlloc(TopMemoryContext, ...) and enlarge
it as necessary.  To avoid needing to modify the base_timeouts array,
you can just have internal callers push their entries into the array
during process startup using the same function call that an external
module would use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backport of fsync queue compaction

2012-06-26 Thread Robert Haas

On Tue, Jun 19, 2012 at 5:39 PM, Robert Haas  wrote:
> On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith  wrote:
>> In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
>> to try and compact the fsync queue when clients find it full.  There's no
>> visible behavior change, just a substantial performance boost possible in
>> the rare but extremely bad situations where the background writer stops
>> doing fsync absorption.  I've been running that in production at multiple
>> locations since practically the day it hit this mailing list, with backports
>> all the way to 8.3 being common (and straightforward to construct).  I've
>> never seen a hint of a problem with this new code.
>
> I've been in favor of back-porting this for a while, so you'll get no
> argument from me.
>
> Anyone disagree?

Hearing no disagreement, I went ahead and did this.  I didn't take
Greg Smith's suggestion of adding a log message when/if the fsync
compaction logic fails to make any headway, because (1) the proposed
message didn't follow message style guidelines and I couldn't think of
a better one that did and (2) I'm not sure it's worth creating extra
translation work in the back-branches for such a marginal case.  We
can revisit this if people feel strongly about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-26 Thread Etsuro Fujita

Hi Kaigai-san,

> -Original Message-
> From: Kohei KaiGai [mailto:kai...@kaigai.gr.jp]
> Sent: Monday, June 25, 2012 9:49 PM
> To: Etsuro Fujita
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
> 
> Fujita-san,
> 
> The revised patch is almost good for me.
> Only point I noticed is the check for redundant or duplicated options I
pointed
> out on the previous post.
> So, if you agree with the attached patch, I'd like to hand over this patch for
> committers.

OK I agree with you.  Thanks for the revision!

Besides the revision, I modified check_selective_binary_conversion() to run
heap_close() in the whole-row-reference case.  Attached is an updated version of
the patch.

Thanks.

Best regards,
Etsuro Fujita

> All I changed is:
>  --- a/src/backend/commands/copy.c
>  +++ b/src/backend/commands/copy.c
>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index
> 98bcb2f..0143d81 100644
> }
>  +  else if (strcmp(defel->defname, "convert_binary") == 0)
>  +  {
> -+  if (cstate->convert_binary)
> ++  if (cstate->convert_selectively)
>  +  ereport(ERROR,
>  +  (errcode(ERRCODE_SYNTAX_ERROR),
>  +   errmsg("conflicting or redundant options")));
> 
> Thanks,
> 
> 2012/6/20 Etsuro Fujita :
> > Hi KaiGai-san,
> >
> > Thank you for the review.
> >
> >> -Original Message-
> >> From: pgsql-hackers-ow...@postgresql.org
> >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
> >> Sent: Wednesday, June 20, 2012 1:26 AM
> >> To: Etsuro Fujita
> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
> >> file foreign tables
> >>
> >> Hi Fujita-san,
> >>
> >> Could you rebase this patch towards the latest tree?
> >> It was unavailable to apply the patch cleanly.
> >
> > Sorry, I updated the patch.  Please find attached an updated version
> > of the patch.
> >
> >> I looked over the patch, then noticed a few points.
> >>
> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
> >> If so, cstate->convert_binary is not a suitable flag to check
> >> redundant option. It seems to me cstate->convert_selectively is more
> >> suitable flag to check it.
> >>
> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
> >> +       {
> >> +           if (cstate->convert_binary)
> >> +               ereport(ERROR,
> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
> >> +                        errmsg("conflicting or redundant
> >> + options")));
> >> +           cstate->convert_selectively = true;
> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
> >> +               cstate->convert_binary = (List *) defel->arg;
> >> +           else
> >> +               ereport(ERROR,
> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> +                        errmsg("argument to option \"%s\" must be a
> >> list of column names",
> >> +                               defel->defname)));
> >> +       }
> >
> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
> > all the columns needed to be converted to binary representation, which
> > is NIL for the case where no columns are needed to be converted.  For
> > example,
> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
> > cstate->convert_selectively is set to true, no columns are converted
> > cstate->at
> > NextCopyFrom().  Most efficient case!  In short,
> > cstate->convert_selectively represents whether to do selective binary
> > conversion at NextCopyFrom(), and
> > cstate->convert_binary represents all the columns to be converted at
> > NextCopyFrom(), that can be NIL.
> >
> >> At NextCopyFrom(), this routine computes default values if configured.
> >> In case when these values are not referenced, it might be possible to
> >> skip unnecessary calculations.
> >> Is it unavailable to add logic to avoid to construct cstate->defmap
> >> on unreferenced columns at  BeginCopyFrom()?
> >
> > I think that we don't need to add the above logic because file_fdw
> > does
> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> > doesn't construct cstate->defmap at all.
> >
> > I fixed a bug plus some minor optimization in
> > check_binary_conversion() that is renamed to
> > check_selective_binary_conversion() in the updated version, and now
> > file_fdw gives up selective binary conversion for the following
> > cases:
> >
> >  a) BINARY format
> >  b) CSV/TEXT format and whole row reference
> >  c) CSV/TEXT format and all the user attributes needed
> >
> >
> > Best regards,
> > Etsuro Fujita
> >
> >> Thanks,
> >>
> >> 2012/5/11 Etsuro Fujita :
> >> >> -Original Message-
> >> >> From: Robert Haas [mailto:robertmh...@gmail.com]
> >> >> Sent: Friday, May 11, 2012 1:36 AM
> >> >> To: Etsuro Fujita

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-26 Thread Vik Reykja

On Wed, Jun 20, 2012 at 2:19 AM, Tom Lane  wrote:

> I've marked this patch committed, although in the end there was nothing
> left of it ;-)
>

Thank you, Dean and Tom!

I'm sorry for not participating in this thread, I've been away for the past
five weeks and have much catching up to do.

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Pavel Stehule

2012/6/26 Magnus Hagander :
> On Tue, Jun 26, 2012 at 9:50 AM, Pavel Stehule  
> wrote:
>> 2012/6/26 Magnus Hagander :
>>> On Tue, Jun 26, 2012 at 7:06 AM, Pavel Stehule  
>>> wrote:
 Hello

 I worked on simple patch, that enable access from server side to
 client side data. It add two new hooks to libpq - one for returning of
 local context, second for setting of local context.

 A motivation is integration of possibilities of psql console together
 with stronger language - plpgsql. Second target is enabling
 possibility to save a result of some server side process in psql. It
 improve vars feature in psql.

 pavel ~/src/postgresql/src $ cat test.sql
 \echo value of external paremeter is :"myvar"

 do $$
 begin
  -- we can take any session variable on client side
  -- it is safe against to SQL injection
  raise notice 'external parameter accessed from plpgsql is "%"',
 hgetvar('myvar');

  -- we can change this session variable and finish transaction
  perform hsetvar('myvar', 'Hello, World');
 end;
 $$ language plpgsql;

 \echo new value of session variable is :"myvar"

 cat test.sql | psql postgres -v myvar=Hello
 value of external paremeter is "Hello"
 NOTICE:  external parameter accessed from plpgsql is "Hello"
 DO
 new value of session variable is "Hello, World"

 This is just proof concept - there should be better integration with
 pl languages, using cache for read on server side, ...

 Notices?
>>>
>>> Why not just use a custom GUC variable instead? E.g. you could have
>>> psql SET "psql.myvar='Hello, World'", and then you'd need no changes
>>> at all in the backend? Maybe have a "shorthand interface" for
>>> accessing GUCs in psql would help in making it easier, but do we
>>> really need a whole new variable concept?
>>
>> GUC variables doesn't help with access to psql's command line
>> parameters from DO PL code.
>
> But with a small change to psql they could, without the need for a
> whole new type of variable. For example, psql could set all those
> variable as "psql.", which could then be accessed
> from the DO PL code just fine.

yes, it is possibility too. It has different issues - it can send
unwanted variables - maybe some compromise is optimum.



>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proof concept - access to session variables on client side

2012-06-26 Thread Magnus Hagander

On Tue, Jun 26, 2012 at 9:50 AM, Pavel Stehule  wrote:
> 2012/6/26 Magnus Hagander :
>> On Tue, Jun 26, 2012 at 7:06 AM, Pavel Stehule  
>> wrote:
>>> Hello
>>>
>>> I worked on simple patch, that enable access from server side to
>>> client side data. It add two new hooks to libpq - one for returning of
>>> local context, second for setting of local context.
>>>
>>> A motivation is integration of possibilities of psql console together
>>> with stronger language - plpgsql. Second target is enabling
>>> possibility to save a result of some server side process in psql. It
>>> improve vars feature in psql.
>>>
>>> pavel ~/src/postgresql/src $ cat test.sql
>>> \echo value of external paremeter is :"myvar"
>>>
>>> do $$
>>> begin
>>>  -- we can take any session variable on client side
>>>  -- it is safe against to SQL injection
>>>  raise notice 'external parameter accessed from plpgsql is "%"',
>>> hgetvar('myvar');
>>>
>>>  -- we can change this session variable and finish transaction
>>>  perform hsetvar('myvar', 'Hello, World');
>>> end;
>>> $$ language plpgsql;
>>>
>>> \echo new value of session variable is :"myvar"
>>>
>>> cat test.sql | psql postgres -v myvar=Hello
>>> value of external paremeter is "Hello"
>>> NOTICE:  external parameter accessed from plpgsql is "Hello"
>>> DO
>>> new value of session variable is "Hello, World"
>>>
>>> This is just proof concept - there should be better integration with
>>> pl languages, using cache for read on server side, ...
>>>
>>> Notices?
>>
>> Why not just use a custom GUC variable instead? E.g. you could have
>> psql SET "psql.myvar='Hello, World'", and then you'd need no changes
>> at all in the backend? Maybe have a "shorthand interface" for
>> accessing GUCs in psql would help in making it easier, but do we
>> really need a whole new variable concept?
>
> GUC variables doesn't help with access to psql's command line
> parameters from DO PL code.

But with a small change to psql they could, without the need for a
whole new type of variable. For example, psql could set all those
variable as "psql.", which could then be accessed
from the DO PL code just fine.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 103 matches

Mail list logo