date:20140203

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-02-03 Thread Fujii Masao

On Sat, Feb 1, 2014 at 7:41 PM, Christian Kruse
christ...@2ndquadrant.com wrote:
 Hi,

 On 01/02/14 02:45, Fujii Masao wrote:

 LOG:  process 33662 still waiting for ShareLock on transaction
 1011 after 1000.184 ms
 DETAIL:  Process holding the lock: 33660. Request queue: 33662.
 [... snip ...]
 LOG:  process 33665 still waiting for ExclusiveLock on tuple (0,4)
 of relation 16384 of database 12310 after 1000.134 ms
 DETAIL:  Process holding the lock: 33662. Request queue: 33665

 This log message says that the process 33662 is holding the lock, but
 it's not true.

 As the message says: first lock is waiting for the transaction, second
 one for the tuple. So that are two different locks thus the two
 different holders and queues. So...

 Is this the intentional behavior?

 Yes, I think so.

Oh, yes. You're right.

I have other minor comments:

Since you added errdetail_log_plural(), ISTM that you need to update
sources.sgml.

 While I'm griping, this message isn't even trying to follow the project's
 message style guidelines.  Detail or context messages are supposed to be
 complete sentence(s), with capitalization and punctuation to match.

 Hm, I hope I fixed it in this version of the patch.

Current message doesn't look like complete sentence yet... We would
need to use something like Processes X, Y are holding while Z is waiting
for the lock.. I could not come up with good message, though..

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-02-03 Thread Christian Kruse

Hi,

On 03/02/14 17:59, Fujii Masao wrote:
 Since you added errdetail_log_plural(), ISTM that you need to update
 sources.sgml.

[x] Fixed.

  While I'm griping, this message isn't even trying to follow the project's
  message style guidelines.  Detail or context messages are supposed to be
  complete sentence(s), with capitalization and punctuation to match.
 
  Hm, I hope I fixed it in this version of the patch.
 
 Current message doesn't look like complete sentence yet... We would
 need to use something like Processes X, Y are holding while Z is waiting
 for the lock.. I could not come up with good message, though..

The problem is that we have two potential plural cases in this
message. That leads to the need to formulate the second part
independently from singular/plural. I tried to improve a little bit
and propose this message:

Singular:
The following process is holding the lock: A. The request queue
consists of: B.

Plural:
Following processes are holding the lock: A, B. The request queue
consists of: C.

Attached you will find an updated patch.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 881b0c3..aa20807 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -251,6 +251,15 @@ ereport(ERROR,
/listitem
listitem
 para
+ functionerrdetail_log_plural(const char *fmt_singuar, const char
+ *fmt_plural, unsigned long n, ...)/function is like
+ functionerrdetail_log/, but with support for various plural forms of
+ the message.
+ For more information see xref linkend=nls-guidelines.
+/para
+   /listitem
+   listitem
+para
  functionerrhint(const char *msg, ...)/function supplies an optional
  quotehint/ message; this is to be used when offering suggestions
  about how to fix the problem, as opposed to factual details about
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fb449a8..da72c82 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1209,13 +1209,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		 */
 		if (log_lock_waits  deadlock_state != DS_NOT_YET_CHECKED)
 		{
-			StringInfoData buf;
+			StringInfoData buf,
+		lock_waiters_sbuf,
+		lock_holders_sbuf;
 			const char *modename;
 			long		secs;
 			int			usecs;
 			long		msecs;
+			SHM_QUEUE  *procLocks;
+			PROCLOCK   *proclock;
+			bool		first_holder = true,
+		first_waiter = true;
+			int			lockHoldersNum = 0;
 
 			initStringInfo(buf);
+			initStringInfo(lock_waiters_sbuf);
+			initStringInfo(lock_holders_sbuf);
+
 			DescribeLockTag(buf, locallock-tag.lock);
 			modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid,
 	   lockmode);
@@ -1225,10 +1235,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			msecs = secs * 1000 + usecs / 1000;
 			usecs = usecs % 1000;
 
+			/*
+			 * we loop over the lock's procLocks to gather a list of all
+			 * holders and waiters. Thus we will be able to provide more
+			 * detailed information for lock debugging purposes.
+			 *
+			 * lock-procLocks contains all processes which hold or wait for
+			 * this lock.
+			 */
+
+			LWLockAcquire(partitionLock, LW_SHARED);
+
+			procLocks = (lock-procLocks);
+			proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
+			   offsetof(PROCLOCK, lockLink));
+
+			while (proclock)
+			{
+/*
+ * we are a waiter if myProc-waitProcLock == proclock; we are
+ * a holder if it is NULL or something different
+ */
+if (proclock-tag.myProc-waitProcLock == proclock)
+{
+	if (first_waiter)
+	{
+		appendStringInfo(lock_waiters_sbuf, %d,
+		 proclock-tag.myProc-pid);
+		first_waiter = false;
+	}
+	else
+		appendStringInfo(lock_waiters_sbuf, , %d,
+		 proclock-tag.myProc-pid);
+}
+else
+{
+	if (first_holder)
+	{
+		appendStringInfo(lock_holders_sbuf, %d,
+		 proclock-tag.myProc-pid);
+		first_holder = false;
+	}
+	else
+		appendStringInfo(lock_holders_sbuf, , %d,
+		 proclock-tag.myProc-pid);
+
+	lockHoldersNum++;
+}
+
+proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink,
+			   offsetof(PROCLOCK, lockLink));
+			}
+
+			LWLockRelease(partitionLock);
+
 			if (deadlock_state == DS_SOFT_DEADLOCK)
 ereport(LOG,
 		(errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errdetail_log_plural(The following process is holding the lock: %s. The request queue consists of: %s.,
+			   Following processes are holding the lock: %s. The request queue consists of: %s.,
+

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Andres Freund

On 2014-02-03 12:00:40 +0800, Craig Ringer wrote:
 I think it's a good thing personally - we shouldn't be exporting every
 little internal var in the symbol table.
 
 If we built with -fvisibility=hidden on 'nix there'd be no need to
 complain about commits being on on 'nix then breaking on Windows, 'cos
 the 'nix build would break in the same place. That's all or nothing
 though, there's no vars hidden, procs exported option in gcc.

I think that'd be an exercise in futility. We're not talking about a
general purpose library here, where I agree -fvisibility=hidden is a
useful thing, but about the backend. We'd break countless extensions
people have written. Most of those have been authored on *nix.
To make any form of sense we'd need to have a really separate API
layer between internal/external stuff. That doesn't seem likely to
arrive anytime soon, if ever.
I think all that would achieve is that we'd regularly need to backpatch
visibility fixes. And have countless pointless flames about which
variables to expose.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-02-03 Thread Kyotaro HORIGUCHI

Hello, The attached file
unionall_inh_idx_typ3_v6_20140203.patch is the new version of
this patch fixed the 'whole-row-var' bug.


First of all, on second thought about this,

  create table parent (a int, b int);
  create table child () inherits (parent);
  insert into parent values (1,10);
  insert into child values (2,20);
  select a, b from parent union all select a, b from child;
 
 Mmm. I had the same result. Please let me have a bit more time.

This turned out to be a correct result. The two tables have
following records after the two INSERTs.

| =# select * from only parent;
|  1 | 10
| (1 row)
| 
| =# select * from child;
|  2 | 20
| (1 row)

Then it is natural that the parent-side in the UNION ALL returns
following results.

| =# select * from parent;
|  a | b  
| ---+
|  1 | 10
|  2 | 20
| (2 rows)

Finally, the result we got has proved not to be a problem.


Second, about the crash in this sql,

 select parent from parent union all select parent from parent;

It is ignored whole-row reference (=0) which makes the index of
child translated-vars list invalid (-1).  I completely ignored it
in spite that myself referred to before.

Unfortunately ConvertRowtypeExpr prevents appendrels from being
removed currently, and such a case don't seem to take place so
often, so I decided to exclude the case. In addition, I found
that only setting rte-inh = false causes duplicate scanning on
the same relations through abandoned appendrels so I set
parent/child_relid to InvalidOid so as no more to referred to
from the ex-parent (and ex-children).

The following queries seems to work correctly (but with no
optimization) after these fixes.

create table parent (a int, b int);
create table child () inherits (parent);
insert into parent values (1,10);
insert into child values (2,20);
select parent from parent union all select parent from parent;
 parent 

 (1,10)
 (2,20)
 (1,10)
 (2,20)
(4 rows)
select a, parent from parent union all select a,parent from parent;
 a | parent 
---+
 1 | (1,10)
 2 | (2,20)
 1 | (1,10)
 2 | (2,20)
(4 rows)
select a, b from parent union all select a, b from parent;
 a | b  
---+
 1 | 10
 2 | 20
 1 | 10
 2 | 20
(4 rows)
select a, b from parent union all select a, b from child;
 a | b  
---+
 2 | 20
 1 | 10
 2 | 20
(3 rows)

 The attached two patches are rebased to current 9.4dev HEAD and
 make check at the topmost directory and src/test/isolation are
 passed without error. One bug was found and fixed on the way. It
 was an assertion failure caused by probably unexpected type
 conversion introduced by collapse_appendrels() which leads
 implicit whole-row cast from some valid reltype to invalid
 reltype (0) into adjust_appendrel_attrs_mutator().

What query demonstrated this bug in the previous type 2/3 patches?
  
  I would still like to know the answer to the above question.

I rememberd after some struggles. It failed during 'make check',
on the following query in inherit.sql.

| update bar set f2 = f2 + 100
| from
|   ( select f1 from foo union all select f1+3 from foo ) ss
| where bar.f1 = ss.f1;

The following SQLs causes the same type of crash.

create temp table foo(f1 int, f2 int);
create temp table foo2(f3 int) inherits (foo);
create temp table bar(f1 int, f2 int);
update bar set f2 = 1
from
  ( select f1 from foo union all select f1+3 from foo ) ss
where bar.f1 = ss.f1;

The tipped stone was wholerow1 for the subquery created in
preprocess_targetlist.

|   /* Not a table, so we need the whole row as a junk var */
|   var = makeWholeRowVar(rt_fetch(rc-rti, range_table),
..
|   snprintf(resname, sizeof(resname), wholerow%u, rc-rowmarkId);

Then the finishing blow was a appendRelInfo corresponding to the
var above, whose parent_reltype = 0 and child_reltype != 0, and
of course introduced by my patch. Since such a situation takes
place even for this patch, the modification is left alone.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 52dcc72..ece9318 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,12 @@ typedef struct
 	AppendRelInfo *appinfo;
 } adjust_appendrel_attrs_context;
 
+typedef struct {
+	AppendRelInfo	*child_appinfo;
+	Index			 target_rti;
+	bool			 failed;
+} transvars_merge_context;
+
 static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,
 	   double tuple_fraction,
 	   List *colTypes, List *colCollations,
@@ -98,6 +104,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,
 	  List *input_plans,
 	  List *refnames_tlist);
 static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
+static Node *transvars_merge_mutator(Node *node,
+	 transvars_merge_context *ctx);
 static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-02-03 Thread Simon Riggs

On 3 February 2014 10:06, Christian Kruse christ...@2ndquadrant.com wrote:
 Hi,

 On 03/02/14 17:59, Fujii Masao wrote:
 Since you added errdetail_log_plural(), ISTM that you need to update
 sources.sgml.

 [x] Fixed.

  While I'm griping, this message isn't even trying to follow the project's
  message style guidelines.  Detail or context messages are supposed to be
  complete sentence(s), with capitalization and punctuation to match.
 
  Hm, I hope I fixed it in this version of the patch.

 Current message doesn't look like complete sentence yet... We would
 need to use something like Processes X, Y are holding while Z is waiting
 for the lock.. I could not come up with good message, though..

 The problem is that we have two potential plural cases in this
 message. That leads to the need to formulate the second part
 independently from singular/plural. I tried to improve a little bit
 and propose this message:

 Singular:
 The following process is holding the lock: A. The request queue
 consists of: B.

 Plural:
 Following processes are holding the lock: A, B. The request queue
 consists of: C.

Seems too complex. How about this...

Lock holder(s): A. Lock waiter(s) B
Lock holder(s): A, B. Lock waiter(s) C

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Dave Page

On Sun, Feb 2, 2014 at 3:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dave Page dp...@pgadmin.org writes:
 On Sun, Feb 2, 2014 at 1:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think we should give serious consideration to desupporting this
 combination so that we can get rid of the plague of PGDLLIMPORT
 marks.

 No objection here - though I should point out that it's not been
 offline for a long time (aside from a couple of weeks in January) -
 it's been happily building most pre-9.2 branches for ages. 9.1 seems
 to be stuck, along with HEAD, and I forgot to add 9.3. I'm in the
 process of cleaning that up as time allows, but am happy to drop it
 instead if we no longer want to support anything that old. We
 certainly don't use anything resembling that config for the EDB
 installer builds.

 Further discussion pointed out that currawong, for example, seems to
 want PGDLLIMPORT markings but is able to get by without them in
 some cases that narwhal evidently doesn't like.  So at this point,
 desupporting narwhal's configuration is clearly premature --- we
 should instead be looking into exactly what is causing the different
 cases to fail or not fail.

 I still have hopes that we might be able to get rid of PGDLLIMPORT
 marks, but by actually understanding why they seem to be needed in
 some cases and not others, not by just arbitrarily dropping support.

 In the meantime, please do get HEAD running again on that machine.

Done: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-03%2009%3A26%3A43

It's not happy though :-(

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path

2014-02-03 Thread MauMau


From: Rajeev rastogi rajeev.rast...@huawei.com

Please find the attached revised patch.


Thanks, your patch looks good.  I confirmed the following:

* Applied cleanly to the latest source tree, and built on Linux and Windows 
(with MSVC) without any warnings.


* Changed to D:\ and ran pg_ctl register -N pg -D pgdata, and checked the 
registry value ImagePath with regedit.  It contained -D D:\pgdata.


* Changed to D:\pgdata and ran pg_ctl register -N pg -D ..\pgdata, and 
checked the registry value ImagePath with regedit.  It contained -D 
D:\pgdata.



Finally, please change the function description comment of 
get_absolute_path() so that the style follows other function in path.c. 
That is:


/*
* function_name
*
* Function description
*/

Then update the CommitFest entry with the latest patch and let me know of 
that.  Then, I'll change the patch status to ready for committer.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [review] PostgreSQL Service on Windows does not start if data directory given is relative path

2014-02-03 Thread Rajeev rastogi

On 3rd February 2014, MauMau Wrote: 

 Thanks, your patch looks good.  I confirmed the following:
 
 * Applied cleanly to the latest source tree, and built on Linux and
 Windows (with MSVC) without any warnings.
 
 * Changed to D:\ and ran pg_ctl register -N pg -D pgdata, and checked
 the registry value ImagePath with regedit.  It contained -D D:\pgdata.
 
 * Changed to D:\pgdata and ran pg_ctl register -N pg -D ..\pgdata,
 and checked the registry value ImagePath with regedit.  It contained -D
 D:\pgdata.

Thanks for reviewing and testing the patch.

 Finally, please change the function description comment of
 get_absolute_path() so that the style follows other function in path.c.
 That is:
 
 /*
  * function_name
  *
  * Function description
  */

I have modified the function description comment as per your suggestion.

 Then update the CommitFest entry with the latest patch and let me know
 of that.  Then, I'll change the patch status to ready for committer.

I will update commitFest with the latest patch immediately after sending the 
mail.

Thanks and Regards,
Kumar Rajeev Rastogi





pgctl_win32service_rel_dbpath_v4.patch
Description: pgctl_win32service_rel_dbpath_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-02-03 Thread Christian Kruse

Hi Simon,

On 03/02/14 10:43, Simon Riggs wrote:
  Singular:
  The following process is holding the lock: A. The request queue
  consists of: B.
 
  Plural:
  Following processes are holding the lock: A, B. The request queue
  consists of: C.
 
 Seems too complex. How about this...
 
 Lock holder(s): A. Lock waiter(s) B
 Lock holder(s): A, B. Lock waiter(s) C

This is basically the same as before, it is even shorter. The
complaint was that I don't use a whole sentence in this error
detail. Won't the change fulfill the same complaint?

To be honest, I'd like to stick with your earlier proposal:

Singular:
Process holding the lock: A. Request queue: B

Plural:
Processes holding the lock: A, B. Request queue: C, D

This seems to be a good trade-off between project guidelines,
readability and parsability.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpq4w0aipTXc.pgp
Description: PGP signature

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-02-03 Thread Robert Haas

On Sun, Feb 2, 2014 at 11:13 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan p...@heroku.com writes:
 On Wed, Jan 29, 2014 at 1:10 PM, Robert Haas rh...@postgresql.org wrote:
 Include planning time in EXPLAIN ANALYZE output.

 Isn't it perhaps a little confusing that Planning time may well
 exceed Total runtime?

 Perhaps s/Total runtime/Execution time/ ?

I'm not really feeling a compelling need to change that.  We've been
displaying total runtime - described exactly that way - for many
releases and it's surely is confusing to the novice that the time
reported can be much less than the time reported by psql's \timing
option, usually because of planning time.  But adding the planning
time to the output seems to me to make that better, not worse.  If the
user can't figure out that runtime != planning time, I'm not sure
they'll be able to figure out execution time != planning time, either.

One of the reasons it's called Total runtime, or so I've always
assumed, is because it's more inclusive than the time shown for the
root node of the plan tree.  Specifically, it includes the time
required to fire triggers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Tomas Vondra

Hi Oleg,

On 3 Únor 2014, 7:53, Oleg Bartunov wrote:
 Tomasa, it'd be nice if you use real data in your testing.

I'm using a mailing list archive (the benchmark is essentially a simple
search engine on top of the archive, implemented using built-in
full-text). So I think this is a quite 'real' dataset, not something
synthetic.

The queries are generated, of course, but I strived to make them as real
as possible.

Sure, this is not a hstore-centric benchmark, but the thread is about GIN
indexes, so I think it's fair.

 One very good application of gin fast-scan is dramatic performance
 improvement  of hstore/jsonb @ operator, see slides 57, 58
 http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf.
 I'd like not to lost this benefit :)

Yes, that's something we could/should test, probably. Sadly I don't have a
dataset or a complete real-world test case at hand. Any ideas?

I certainly agree that it'd be very sad to lose the performance gain for
hstore/json. OTOH my fear is that to achieve that gain, we'll noticeably
slow down other important use cases (e.g. full-text search), which is one
of the reasons why I was doing the tests.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Robert Haas

On Mon, Feb 3, 2014 at 5:45 AM, Dave Page dp...@pgadmin.org wrote:
 Done: 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-03%2009%3A26%3A43

 It's not happy though :-(

Specifically, it says:

dlltool --export-all  --output-def libpg_buffercachedll.def
pg_buffercache_pages.o
dllwrap -o pg_buffercache.dll --dllname pg_buffercache.dll  --def
libpg_buffercachedll.def pg_buffercache_pages.o -L../../src/port
-L../../src/common -Wl,--allow-multiple-definition -L/mingw/lib
-Wl,--as-needed   -L../../src/backend -lpostgres
Info: resolving _MainLWLockArray by linking to __imp__MainLWLockArray
(auto-import)
fu01.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname'
fu02.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname'
nmth00.o(.idata$4+0x0): undefined reference to `_nm__MainLWLockArray'

I assume I should go stick a DLLIMPORT on MainLWLockArray, unless
somebody has another proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1

2014-02-03 Thread Rajni Baliyan

Hi All,

Is there any way to automate the archive deletion process. Any script or
command in HA setup using pgpool

Thanks in advance

Best Regards,
*Rajni Baliyan | Database - Consultant*

*ASHNIK PTE. LTD.*101 Cecil Street, #11-11 Tong Eng Building, Singapore
069533
M : +65 83858518 T: +65 6438 3504 | www.ashnik.com
www.facebook.com/ashnikbiz | www.twitter.com/ashnikbiz

[image: email patch]

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).


On Fri, Jan 31, 2014 at 11:22 AM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Jul 26, 2013 at 06:28:05PM -0400, Tom Lane wrote:
  Our documentation appears not to disclose this fine point, but a look
  at the SQL-MED standard says it's operating per spec.  The standard also
  says that ADD is an error if the option is already defined, which is a
  bit more defensible, but still not exactly what I'd call user-friendly.
  And the error we issue for that case is pretty misleading too:
 
  regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true')
 ;
  ALTER SERVER
  regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate
 'false') ;
  ERROR:  option use_remote_estimate provided more than once
 
  I think we could do with both more documentation, and better error
  messages for these cases.  In the SET-where-you-should-use-ADD case,
  perhaps
 
  ERROR:  option use_remote_estimate has not been set
  HINT: Use ADD not SET to define an option that wasn't already set.
 
  In the ADD-where-you-should-use-SET case, perhaps
 
  ERROR:  option use_remote_estimate is already set
  HINT: Use SET not ADD to change an option's value.
 
  The provided more than once wording would be appropriate if the same
  option is specified more than once in the command text, but I'm not sure
  that it's worth the trouble to detect that case.

 Where are on this?

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +


 --
 Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

image002.jpg

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Jesper Krogh


On 03/02/14 02:44, Tomas Vondra wrote:

(2) The question is whether the new patch works fine on rare words. See
 this for comparison of the patches against HEAD:

   http://www.fuzzy.cz/tmp/gin/3-rare-words.png
   http://www.fuzzy.cz/tmp/gin/3-rare-words-new.png

 and this is the comparison of the two patches:

   http://www.fuzzy.cz/tmp/gin/patches-rare-words.png

 That seems fine to me - some queries are slower, but we're talking
 about queries taking 1 or 2 ms, so the measurement error is probably
 the main cause of the differences.

(3) With higher numbers of frequent words, the differences (vs. HEAD or
 the previous patch) are not that dramatic as in (1) - the new patch
 is consistently by ~20% faster.

Just thinking, this is about one algorithm is being better or frequent words
and another algorithm being better at rare words... we do have
this information (at least or tsvector) in the statistics, would
it be possible to just call the consistent function more often if the
statistics gives signs that it actually is a frequent word?

Jesper - heavily dependent on tsvector-searches, with both frequent and 
rare words.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bugfix patch for json_array_elements

2014-02-03 Thread Andrew Dunstan



On 02/02/2014 08:54 PM, Craig Ringer wrote:

On 02/03/2014 09:09 AM, Craig Ringer wrote:


At a guess, we're looking at a case where a new child context is created
at every call, so every MemoryContextResetChildren call has to deal with
more child contexts.

That would be yes. After a short run, I see 32849 lines like:

json_array_elements temporary cxt: 8192 total in 1 blocks; 8160 free (0
chunks); 32 used

under the context:

   PortalMemory: 8192 total in 1 blocks
 PortalHeapMemory: 7168 total in 3 blocks
   ExecutorState: 65600 total in 4 blocks
 ExprContext: 8192 total in 1 blocks
   json_array_elements temporary cxt: 8192 total in 1 blocks;
8160 free (0 chunks); 32 used


The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.


[...]

+   MemoryContextDelete(state-tmp_cxt);
+   state-tmp_cxt = NULL;
+
PG_RETURN_NULL();




Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the 
end of the function since it's a child of the CurrentMemoryContext. But 
if I got that wrong I'm happy to fix it. I don't see the point in 
setting state-tmp_cxt to NULL since it's about to go out of scope anyway.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I assume I should go stick a DLLIMPORT on MainLWLockArray, unless
 somebody has another proposal.

Hold up awhile.  The reason we're resurrecting it is to get a crosscheck
on what symbols it thinks need DLLIMPORT that no other platform does.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path

2014-02-03 Thread MauMau


From: Rajeev rastogi rajeev.rast...@huawei.com
I will update commitFest with the latest patch immediately after sending 
the mail.


OK, done setting the status to ready for committer.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-03 12:00:40 +0800, Craig Ringer wrote:
 I think it's a good thing personally - we shouldn't be exporting every
 little internal var in the symbol table.
 
 If we built with -fvisibility=hidden on 'nix there'd be no need to
 complain about commits being on on 'nix then breaking on Windows, 'cos
 the 'nix build would break in the same place. That's all or nothing
 though, there's no vars hidden, procs exported option in gcc.

 To make any form of sense we'd need to have a really separate API
 layer between internal/external stuff. That doesn't seem likely to
 arrive anytime soon, if ever.

Indeed.  Aside from the difficulty of having tighter requirements on just
one platform (which, for development purposes, isn't even a mainstream
one), it seems completely silly to me that the restriction applies only
to variables and not functions.  We expose *far* more global functions
than global variables; and there's no particular reason to think that
calling a random function is any safer than frobbing a random global
variable.

What we need is to make the Windows platform stop thinking that it is
the center of the world, and make it act as much as possible like the
Unix-oid platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Andres Freund

On 2014-02-03 09:13:02 -0500, Tom Lane wrote:
 What we need is to make the Windows platform stop thinking that it is
 the center of the world, and make it act as much as possible like the
 Unix-oid platforms.

What about the completely crazy thing of simply redefining export to
include the declspec on windows for backend build? That will possibly
blow up the size of the .def files et al a bit, but I have little mercy
with that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Craig Ringer

On 02/03/2014 06:37 PM, Andres Freund wrote:
 On 2014-02-03 12:00:40 +0800, Craig Ringer wrote:
 I think it's a good thing personally - we shouldn't be exporting every
 little internal var in the symbol table.

 If we built with -fvisibility=hidden on 'nix there'd be no need to
 complain about commits being on on 'nix then breaking on Windows, 'cos
 the 'nix build would break in the same place. That's all or nothing
 though, there's no vars hidden, procs exported option in gcc.
 
 I think that'd be an exercise in futility. We're not talking about a
 general purpose library here, where I agree -fvisibility=hidden is a
 useful thing, but about the backend. We'd break countless extensions
 people have written. Most of those have been authored on *nix.
 To make any form of sense we'd need to have a really separate API
 layer between internal/external stuff. That doesn't seem likely to
 arrive anytime soon, if ever.
 I think all that would achieve is that we'd regularly need to backpatch
 visibility fixes. And have countless pointless flames about which
 variables to expose.

Fair point. If we're not going to define a proper API, then export
control is not useful. And since there isn't a proper API, nor any on
the cards, _that_ is a reasonable reason to just export all.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Andres Freund

On 2014-02-03 22:23:16 +0800, Craig Ringer wrote:
 On 02/03/2014 06:37 PM, Andres Freund wrote:
  I think that'd be an exercise in futility. We're not talking about a
  general purpose library here, where I agree -fvisibility=hidden is a
  useful thing, but about the backend. We'd break countless extensions
  people have written. Most of those have been authored on *nix.
  To make any form of sense we'd need to have a really separate API
  layer between internal/external stuff. That doesn't seem likely to
  arrive anytime soon, if ever.
  I think all that would achieve is that we'd regularly need to backpatch
  visibility fixes. And have countless pointless flames about which
  variables to expose.
 
 Fair point. If we're not going to define a proper API, then export
 control is not useful. And since there isn't a proper API, nor any on
 the cards, _that_ is a reasonable reason to just export all.

We have a (mostly) proper API. Just not an internal/external API split.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-02-03 Thread Fujii Masao

On Fri, Jan 31, 2014 at 9:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:

 On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao masao.fu...@gmail.com
 wrote:
  Hi,
 
  The files in pg_stat_tmp directory don't need to be backed up because
  they are
  basically reset at the archive recovery. So I think it's worth
  changing pg_basebackup
  so that it skips any files in pg_stat_tmp directory. Thought?

 I think this is good idea, but can't it also avoid
 PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
 pg_stat_tmp


 All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE
 refers to just the global one. You want to exclude based on
 PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
 stats_temp_directory if it's in PGDATA.

 Attached patch changes basebackup.c so that it skips all files in both
 pg_stat_tmp
 and stats_temp_directory. Even when a user sets stats_temp_directory
 to the directory
 other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
 per recent change of pg_stat_statements, the external query file is
 always created there.

Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 What about the completely crazy thing of simply redefining export to
 include the declspec on windows for backend build? That will possibly
 blow up the size of the .def files et al a bit, but I have little mercy
 with that.

You mean #define extern extern PGDLLIMPORT ?  What will that do to
function declarations that have extern?  Are we sure C compilers
are smart enough to not go into an infinite loop on such a macro?

Anyway, narwhal's latest results thicken the plot even more.
How come pg_buffercache has been building OK on the other
Windows critters, when MainLWLockArray lacks PGDLLIMPORT?
That lets out the theory about linking to libpq having something
to do with it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgresql for cygwin - 3rd

2014-02-03 Thread Andrew Dunstan



On 02/01/2014 05:46 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 02/01/2014 05:12 PM, Marco Atzeri wrote:

is it possible the tsearch test never worked on en_US.utf8
but only on C locale ?

Yes, that's more or less what I said, I thought.
Maybe we need to test this on other Windows systems in non-C encodings.
Let's make sure it's only a Cygwin problem.
I'll work on that. You should try to concentrate on the thinks like
prepared_xacts and isolation_test that we know don't work ONLY on Cygwin.

Please test the patch I just posted in the pgsql-bugs thread.  It looks
to me like there are bugs in both the C and non-C locale cases, but only
the latter case would be exercised by our regression tests, since we
don't use any non-ASCII characters in the tests.





This is commit 082c0dfa140b5799bc7eb574d68610dcfaa619ba and friends, right?

If so, it appears to have done the trick. brolga is now happy running 
tsearch with en_US.utf8: 
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=brolgadt=2014-02-03%2011%3A26%3A10stg=install-check-en_US.utf8


Cool.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Alexander Korotkov

On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/26/2014 08:24 AM, Tomas Vondra wrote:

 Hi!

 On 25.1.2014 22:21, Heikki Linnakangas wrote:

 Attached is a new version of the patch set, with those bugs fixed.


 I've done a bunch of tests with all the 4 patches applied, and it seems
 to work now. I've done tests with various conditions (AND/OR, number of
 words, number of conditions) and I so far I did not get any crashes,
 infinite loops or anything like that.

 I've also compared the results to 9.3 - by dumping the database and
 running the same set of queries on both machines, and indeed I got 100%
 match.

 I also did some performance tests, and that's when I started to worry.

 For example, I generated and ran 1000 queries that look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ to_tsquery('english','(header  53  32 
 useful  dropped)')
 ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header  53 
 32  useful  dropped)')) DESC;

 i.e. in this case the query always was 5 words connected by AND. This
 query is a pretty common pattern for fulltext search - sort by a list of
 words and give me the best ranked results.

 On 9.3, the script was running for ~23 seconds, on patched HEAD it was
 ~40. It's perfectly reproducible, I've repeated the test several times
 with exactly the same results. The test is CPU bound, there's no I/O
 activity at all. I got the same results with more queries (~100k).

 Attached is a simple chart with x-axis used for durations measured on
 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
 a vast majority of queries is up to 2x slower - that's pretty obvious
 from the chart.

 Only about 50 queries are faster on HEAD, and 700 queries are more than
 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms
 on HEAD).

 Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):

   http://explain.depesz.com/s/5tv

 and on HEAD (same query):

   http://explain.depesz.com/s/1lI

 Clearly the main difference is in the Bitmap Index Scan which takes
 60ms on 9.3 and 120ms on HEAD.

 On 9.3 the perf top looks like this:

  34.79%  postgres [.] gingetbitmap
  28.96%  postgres [.] ginCompareItemPointers
   9.36%  postgres [.] TS_execute
   5.36%  postgres [.] check_stack_depth
   3.57%  postgres [.] FunctionCall8Coll

 while on 9.4 it looks like this:

  28.20%  postgres [.] gingetbitmap
  21.17%  postgres [.] TS_execute
   8.08%  postgres [.] check_stack_depth
   7.11%  postgres [.] FunctionCall8Coll
   4.34%  postgres [.] shimTriConsistentFn

 Not sure how to interpret that, though. For example where did the
 ginCompareItemPointers go? I suspect it's thanks to inlining, and that
 it might be related to the performance decrease. Or maybe not.


 Yeah, inlining makes it disappear from the profile, and spreads that
 time to the functions calling it.

 The profile tells us that the consistent function is called a lot more
 than before. That is expected - with the fast scan feature, we're calling
 consistent not only for potential matches, but also to refute TIDs based on
 just a few entries matching. If that's effective, it allows us to skip many
 TIDs and avoid consistent calls, which compensates, but if it's not
 effective, it's just overhead.

 I would actually expect it to be fairly effective for that query, so
 that's a bit surprising. I added counters to see where the calls are coming
 from, and it seems that about 80% of the calls are actually coming from
 this little the feature I explained earlier:


  In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.

 Attached is a quick patch to remove that, on top of all the other
 patches, if you want to test the effect.


 Every single change you did in fast scan seems to be reasonable, but
 testing shows that something went wrong. Simple test with 3 words of
 different selectivities.

 After applying your patches:

 #

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Andres Freund

On 2014-02-03 09:25:57 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  What about the completely crazy thing of simply redefining export to
  include the declspec on windows for backend build? That will possibly
  blow up the size of the .def files et al a bit, but I have little mercy
  with that.
 
 You mean #define extern extern PGDLLIMPORT ?  What will that do to
 function declarations that have extern?

As far as I understand it's perfectly ok to have that on
functions. http://msdn.microsoft.com/en-us/library/aa271769%28v=vs.60%29.aspx
even claims it's faster in some scenarios.

 Are we sure C compilers
 are smart enough to not go into an infinite loop on such a macro?

I'd be really surprised if there were any that wouldn't completely choke
on pg otherwise. Especially not as we are only talking about windows
where only gcc and msvc is supported these days.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Q: How to use indexer api smartly

2014-02-03 Thread amihay gonen

My goal is to implement a new index type base on bitmap index algorithm.

I've to main problems :

1. How to get Target list - list of columns need to be returned from
query on the index.

   I want to implement index only access , today the indexer api get row-id
and then PG retrive the data from the table .
  My idea is to return the list of columns directly from the index .
  The question is it possible ? how do I know what are the column list
(Assuming the index contain all those columns ).

2. How to implement a query context within the indexer api .
   The indexers functions :amcostestimate, ambeginscan and amrescan ) do
not maintain a context.
The idea is the better memory management of index structures ( lock the
memory structure for query life time  - until it end)


Thanks ,
Amihay.

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Andrew Dunstan



On 02/03/2014 01:13 AM, Peter Geoghegan wrote:

On Sun, Feb 2, 2014 at 5:22 PM, Craig Ringer cr...@2ndquadrant.com wrote:

I'm not a fan of MinGW (gcc) on Windows, it's a right pain. It's also
the only open source compiler currently supported for PostgreSQL on
Windows - practically the only one available. I don't know about you,
but I'm not too keen on assuming Microsoft will continue to offer free
toolchains that're usable for our purposes. They're crippling their free
build tools more and more with each release, which isn't a good trend.

I was under the impression that Microsoft had finally come around to
implementing a few C99 features in Visual Studio 2013 precisely
because they want there to be an open source ecosystem on Windows.





It's not so long ago that they were saying they would no longer publish 
free-as-in-beer command line compilers at all. The outrage made them 
change their minds, but we really can't rely on only Microsoft compilers 
for Windows.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Robert Haas

On Mon, Feb 3, 2014 at 9:57 AM, Andrew Dunstan and...@dunslane.net wrote:
 It's not so long ago that they were saying they would no longer publish
 free-as-in-beer command line compilers at all. The outrage made them change
 their minds, but we really can't rely on only Microsoft compilers for
 Windows.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 02/03/2014 01:13 AM, Peter Geoghegan wrote:
 On Sun, Feb 2, 2014 at 5:22 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 I'm not a fan of MinGW (gcc) on Windows, it's a right pain. It's also
 the only open source compiler currently supported for PostgreSQL on
 Windows - practically the only one available. I don't know about you,
 but I'm not too keen on assuming Microsoft will continue to offer free
 toolchains that're usable for our purposes. They're crippling their free
 build tools more and more with each release, which isn't a good trend.

 I was under the impression that Microsoft had finally come around to
 implementing a few C99 features in Visual Studio 2013 precisely
 because they want there to be an open source ecosystem on Windows.

 It's not so long ago that they were saying they would no longer publish 
 free-as-in-beer command line compilers at all. The outrage made them 
 change their minds, but we really can't rely on only Microsoft compilers 
 for Windows.

FWIW, I'd definitely not be in favor of desupporting mingw altogether.
However, narwhal appears to be running a pretty ancient version:

 configure: using compiler=gcc.exe (GCC) 3.4.2 (mingw-special)

If it turns out that more recent versions have saner behavior, I'd
not have the slightest qualms about saying we don't support such old
versions anymore.

But this is all speculation at the moment.  I'm hoping somebody
with a Windows build environment (ie, not me) will poke into the
situation and figure out exactly why some of these global variables
seem to need PGDLLIMPORT while others don't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Robert Haas

On Fri, Jan 31, 2014 at 12:44 PM, Antonin Houska
antonin.hou...@gmail.com wrote:
 In 9.3 I noticed that postmaster considers bgworker crashed (and
 therefore tries to restart it) even if it has exited with zero status code.

 I first thought about a patch like the one below, but then noticed that
 postmaster.c:bgworker_quickdie() signal handler exits with 0 too (when
 there's no success). Do we need my patch, my patch + something for the
 handler or no patch at all?

I think the word crashed here doesn't actually mean crashed.  If a
worker dies with an exit status of other than 0 or 1, we call
HandleChildCrash(), which effects a system-wide restart.  But if it
exits with code 0, we don't do that.  We just set HaveCrashedWorker so
that it gets restarted immediately, and that's it.  In other words,
exit(0) in a bgworker causes an immediate relaunch, and exit(1) causes
the restart interval to be respected, and exit(other) causes a
system-wide crash-and-restart cycle.

This is admittedly a weird API, and we've had some discussion of
whether to change it, but I don't know that we've reached any final
conclusion.  I'm tempted to propose exactly inverting the current
meaning of exit(0).  That is, make it mean don't restart me, ever,
even if I have a restart interval configured rather than restart me
right away, even if I have a restart interval configured.  That way,
a background process that wants to run until it accomplishes some task
could be written to exit(1) on error and exit(0) on success, which
seems quite natural.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 This is admittedly a weird API, and we've had some discussion of
 whether to change it, but I don't know that we've reached any final
 conclusion.  I'm tempted to propose exactly inverting the current
 meaning of exit(0).  That is, make it mean don't restart me, ever,
 even if I have a restart interval configured rather than restart me
 right away, even if I have a restart interval configured.  That way,
 a background process that wants to run until it accomplishes some task
 could be written to exit(1) on error and exit(0) on success, which
 seems quite natural.

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the restart immediately behavior?

I think this is an improvement, but it probably depends on what
you think the use-cases are for bgworkers.  I can definitely see
that there is a need for a bgworker to be just plain done, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-03 Thread Merlin Moncure

On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote:
 +  para id=functions-json-table
 +   xref linkend=functions-json-creation-table shows the functions that 
 are
 +   available for creating typejson/type values.
 +   (see xref linkend=datatype-json)
/para

 -  table id=functions-json-table
 -titleJSON Support Functions/title
 +  indexterm
 +   primaryarray_to_json/primary
 +  /indexterm
 +  indexterm
 +   primaryrow_to_json/primary
 +  /indexterm
 +  indexterm
 +   primaryto_json/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_build_array/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_build_object/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_object/primary
 +  /indexterm

 Hm, why are you collecting the indexterms at the top in the contrast to
 the previous way of collecting them at the point of documentation?

 diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
 index 1ae9fa0..fd93d9b 100644
 --- a/src/backend/utils/adt/Makefile
 +++ b/src/backend/utils/adt/Makefile
 @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o 
 array_typanalyze.o \
   tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
   tsvector.o tsvector_op.o tsvector_parser.o \
   txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
 - rangetypes_typanalyze.o rangetypes_selfuncs.o
 + rangetypes_typanalyze.o rangetypes_selfuncs.o \
 + jsonb.o jsonb_support.o

 Odd, most OBJS lines are kept in alphabetical order, but that doesn't
 seem to be the case here.

 +/*
 + * for jsonb we always want the de-escaped value - that's what's in token
 + */
 +

 strange newline.

 +static void
 +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype)
 +{
 + JsonbInState *_state = (JsonbInState *) state;
 + JsonbValue  v;
 +
 + v.size = sizeof(JEntry);
 +
 + switch (tokentype)
 + {
 +
 ...

 + default:/* nothing else should 
 be here in fact */
 + break;

 Shouldn't this at least Assert(false) or something?

 +static void
 +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c)
 +{
 + uint32  hentry = c  JENTRY_TYPEMASK;
 +
 + if (hentry == JENTRY_ISNULL)
 + {
 + v-type = jbvNull;
 + v-size = sizeof(JEntry);
 + }
 + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || 
 hentry == JENTRY_ISCALAR)
 + {
 + recvJsonb(buf, v, level + 1, (uint32) c);
 + }
 + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE)
 + {
 + v-type = jbvBool;
 + v-size = sizeof(JEntry);
 + v-boolean = (hentry == JENTRY_ISFALSE) ? false : true;
 + }
 + else if (hentry == JENTRY_ISNUMERIC)
 + {
 + v-type = jbvNumeric;
 + v-numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, 
 PointerGetDatum(buf),
 +
 Int32GetDatum(0), Int32GetDatum(-1)));
 +
 + v-size = sizeof(JEntry) * 2 + VARSIZE_ANY(v-numeric);

 What's the *2 here?

 +static void
 +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header)
 +{
 + uint32  hentry;
 + uint32  i;

 This function and recvJsonbValue call each other recursively, afaics
 without any limit, shouldn't they check for the stack depth?

 + hentry = header  JENTRY_TYPEMASK;
 +
 + v-size = 3 * sizeof(JEntry);

 *3?

 + if (hentry == JENTRY_ISOBJECT)
 + {
 + v-type = jbvHash;
 + v-hash.npairs = header  JB_COUNT_MASK;
 + if (v-hash.npairs  0)
 + {
 + v-hash.pairs = palloc(sizeof(*v-hash.pairs) * 
 v-hash.npairs);
 +

 Hm, if I understand correctly, we're just allocating JB_COUNT_MASK
 (which is 0x0FFF) * sizeof(*v-hash.pairs) bytes here, without any
 crosschecks about the actual length of the data? So with a few bytes the
 server can be coaxed to allocate a gigabyte of data?
 Since this immediately calls another input routine, this can be done in
 a nested fashion, quickly OOMing the server.

 I think this and several other places really need a bit more input
 sanity checking.

 + for (i = 0; i  v-hash.npairs; i++)
 + {
 + recvJsonbValue(buf, v-hash.pairs[i].key, 
 level, pq_getmsgint(buf, 4));
 + if (v-hash.pairs[i].key.type != jbvString)
 + elog(ERROR, jsonb's key could be only 
 a string);

 Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a
 few other places.

 +char *
 +JsonbToCString(StringInfo out, char *in, int estimated_len)
 +{
 + boolfirst = true;
 + JsonbIterator *it;
 + int type;
 + JsonbValue

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Tomas Vondra

Hi Alexander,

On 3 Únor 2014, 15:31, Alexander Korotkov wrote:

 I found my patch 0005-Ternary-consistent-implementation.patch to be
 completely wrong. It introduces ternary consistent function to opclass,
 but
 don't uses it, because I forgot to include ginlogic.c change into patch.
 So, it shouldn't make any impact on performance. However, testing results
 with that patch significantly differs. That makes me very uneasy. Can we
 now reproduce exact same?

Do I understand it correctly that the 0005 patch should give exactly the
same performance as the 9.4-heikki branch (as it was applied on it, and
effectively did no change). This wasn't exactly what I measured, although
the differences were not that significant.

I can rerun the tests, if that's what you're asking for. I'll improve the
test a bit - e.g. I plan to average multiple runs, to filter out random
noise (which might be significant for such short queries).

 Right version of these two patches in one against current head is
 attached.
 I've rerun tests with it, results are
 /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
 postprocessing including graph drawing?

Yes, I'll do that. However I'll have to rerun the other tests too, because
the
previous runs were done on a different machine.

I'm a bit confused right now. The previous patches (0005 + 0007) were
supposed
to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
were
not commited yet, so why is this version against HEAD?

To summarize, I know of these patch sets:

9.4-heikki (old version)
0001-Optimize-GIN-multi-key-queries.patch
0002-Further-optimize-the-multi-key-GIN-searches.patch
0003-Further-optimize-GIN-multi-key-searches.patch
0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

9.4-alex-1 (based on 9.4-heikki)
0005-Ternary-consistent-implementation.patch

9.4-alex-1 (based on 9.4-alex-1)
0006-Sort-entries.patch

9.4-heikki (new version)
gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

9.4-alex-2 (new version)
gin-fast-scan.10.patch.gz

Or did I get that wrong?

 Sometimes test cases are not what we expect. For example:

 =# explain SELECT id FROM messages WHERE body_tsvector @@
 to_tsquery('english','(5alpha1-initdb''d)');
QUERY PLAN

 
  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
 ''initdb''  ''d'''::tsquery)
-  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
 rows=1 width=0)
  Index Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
 ''initdb''  ''d'''::tsquery)
  Planning time: 0.257 ms
 (5 rows)

 5alpha1-initdb'd is 3 gin entries with different frequencies.

Why do you find that strange? The way the query is formed or the way it's
evaluated?

The query generator certainly is not perfect, so it may produce some
strange queries.

 Also, these patches are not intended to change relevance ordering speed.
 When number of results are high, most of time is relevance calculating and
 sorting. I propose to remove ORDER BY clause from test cases to see scan
 speed more clear.

Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
one without it.

 I've dump of postgresql.org search queries from Magnus. We can add them to
 our test case.

You mean search queries from the search for mailing list archives? Sure,
we add that.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-03 Thread Andres Freund

On 2014-02-03 09:22:52 -0600, Merlin Moncure wrote:
  I lost my stomach (or maybe it was the glass of red) somewhere in the
  middle, but I think this needs a lot of work. Especially the io code
  doesn't seem ready to me. I'd consider ripping out the send/recv code
  for 9.4, that seems the biggest can of worms. It will still be usable
  without.
 
 Not having type send/recv functions is somewhat dangerous; it can
 cause problems for libraries that run everything through the binary
 wire format.  I'd give jsonb a pass on that, being a new type, but
 would be concerned if hstore had that ability revoked.

Yea, removing it for hstore would be a compat problem...

 offhand note: hstore_send seems pretty simply written and clean; it's
 a simple nonrecursive iterator...

But a send function is pretty pointless without the corresponding recv
function... And imo recv simply is to dangerous as it's currently
written.
I am not saying that it cannot be made work, just that it's still nearly
as ugly as when I pointed out several of the dangers some weeks back.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Alexander Korotkov

On Thu, Jan 30, 2014 at 8:38 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/30/2014 01:53 AM, Tomas Vondra wrote:

 (3) A file with explain plans for 4 queries suffering ~2x slowdown,
  and explain plans with 9.4 master and Heikki's patches is available
  here:

http://www.fuzzy.cz/tmp/gin/queries.txt

  All the queries have 6 common words, and the explain plans look
  just fine to me - exactly like the plans for other queries.

  Two things now caught my eye. First some of these queries actually
  have words repeated - either exactly like database  database or
  in negated form like !anything  anything. Second, while
  generating the queries, I use dumb frequency, where only exact
  matches count. I.e. write != written etc. But the actual number
  of hits may be much higher - for example write matches exactly
  just 5% documents, but using @@ it matches more than 20%.

  I don't know if that's the actual cause though.


 I tried these queries with the data set you posted here:
 http://www.postgresql.org/message-id/52e4141e.60...@fuzzy.cz. The reason
 for the slowdown is the extra consistent calls it causes. That's expected -
 the patch certainly does call consistent in situations where it didn't
 before, and if the pre-consistent checks are not able to eliminate many
 tuples, you lose.

 So, what can we do to mitigate that? Some ideas:

 1. Implement the catalog changes from Alexander's patch. That ought to
 remove the overhead, as you only need to call the consistent function once,
 not both ways. OTOH, currently we only call the consistent function if
 there is only one unknown column. If with the catalog changes, we always
 call the consistent function even if there are more unknown columns, we
 might end up calling it even more often.

 2. Cache the result of the consistent function.

 3. Make the consistent function cheaper. (How? Magic?)

 4. Use some kind of a heuristic, and stop doing the pre-consistent checks
 if they're not effective. Not sure what the heuristic would look like.


I'm fiddling around with following idea of such heuristic:
1) Sort entries ascending by estimate of TIDs number. Assume number of
entries to be n.
2) Sequentially call ternary consistent with first m of GIN_FALSE and rest
n-m of GIN_MAYBE until it returns GIN_FALSE.
3) Decide whether to use fast-scan depending on number of TIDs in first m
entries and number of TIDs in rest (n-m) entries. Something like
number_of_TIDs_in_frist_m_entries * k  number_of_TIDs_in_rest_n-m_entries
where k is some quotient.
For sure, it rough method, but it should be fast. Without natural
implementation of ternary consistent function algorithm will be slightly
different: only m values close to n should be checked.
I'm going to implement this heuristic against last version of your patch.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Regression tests failing if not launched on db regression

2014-02-03 Thread Robert Haas

On Fri, Jan 31, 2014 at 5:48 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sat, Feb 1, 2014 at 12:42 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 31, 2014 at 2:28 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I took a look at this with a view to committing it but on examination
 I'm not sure this is the best way to proceed.  The proposed text
 documents that the tests should be run in a database called
 regression, but the larger documentation chapter of which this section
 is a part never explains how to run them anywhere else, so it feels a
 bit like telling a ten-year-old not to burn out the clutch.

 The bit about not changing enable_* probably belongs, if anywhere, in
 section 30.2, on test evaluation, rather than here.
 And what about the attached? I have moved all the content to 30.2, and
 added two paragraphs: one about the planner flags, the other about the
 database used.
 Regards,

 Well, it doesn't really address my first concern, which was that you
 talk about running the tests in a database named regression, but
 that's exactly what make check does and it's unclear how you would
 do anything else without modifying the source code.  It's not the
 purpose of the documentation to tell you all the ways that you could
 break things if you patch the tree.  I also don't want to document
 exactly which tests would fail if you did hack things like that; that
 documentation is likely to become outdated.

 I think the remaining points you raise are worth mentioning.  I'm
 attaching a patch with my proposed rewording of your changes.  I made
 the section about configuration parameters a bit more generic and
 adjusted the phrasing to sound more natural in English, and I moved
 your mention of the other issues around a bit.  What do you think of
 this version?
 The part about the planning parameter looks good, thanks. The places
 used to mention the databases used also makes more sense. Thanks for
 your input.

OK, committed and back-patched to 9.3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNN-GiST with recheck

2014-02-03 Thread Alexander Korotkov

On Tue, Jan 28, 2014 at 9:32 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/28/2014 04:12 PM, Alexander Korotkov wrote:

 On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com

 wrote:


  4. (as you mentioned in the other thread: ) It's a modularity violation
 that you peek into the heap tuple from gist. I think the proper way to do
 this would be to extend the IndexScan executor node to perform the
 re-shuffling of tuples that come from the index in wrong order, or
 perhaps
 add a new node type for it.

 Of course that's exactly what your partial sort patch does :-). I haven't
 looked at that in detail, but I don't think the approach the partial sort
 patch takes will work here as is. In the KNN-GiST case, the index is
 returning tuples roughly in the right order, but a tuple that it returns
 might in reality belong somewhere later in the ordering. In the partial
 sort patch, the input stream of tuples is divided into non-overlapping
 groups, so that the tuples within the group are not sorted, but the
 groups
 are. I think the partial sort case is a special case of the KNN-GiST
 case,
 if you consider the lower bound of each tuple to be the leading keys that
 you don't need to sort.


 Yes. But, for instance btree accesses heap for unique checking. Is really
 it so crimilal? :-)


 Well, it is generally considered an ugly hack in b-tree too. I'm not 100%
 opposed to doing such a hack in GiST, but would very much prefer not to.


  This is not only question of a new node or extending existing node. We
 need
 to teach planner/executor access method can return value of some
 expression
 which is lower bound of another expression. AFICS now access method can
 return only original indexed datums and TIDs. So, I afraid that enormous
 infrastructure changes are required. And I can hardly imagine what they
 should look like.


 Yeah, I'm not sure either. Maybe a new field in IndexScanDesc, along with
 xs_itup. Or as an attribute of xs_itup itself.


This shouldn't look like a hack too. Otherwise I see no point of it: it's
better to have some isolated hack in access method than hack in
planner/executor. So I see following changes to be needed to implement this
right way:
1) Implement new relation between operators: operator1 is lower bound of
operator2.
2) Extend am interface to let it return values of operators.
3) Implement new node for knn-sorting.
However, it requires a lot of changes in PostgreSQL infrastructure and can
appear to be not enough general too (we don't know until we have another
application).

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Alexander Korotkov

On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
 
  I found my patch 0005-Ternary-consistent-implementation.patch to be
  completely wrong. It introduces ternary consistent function to opclass,
  but
  don't uses it, because I forgot to include ginlogic.c change into patch.
  So, it shouldn't make any impact on performance. However, testing results
  with that patch significantly differs. That makes me very uneasy. Can we
  now reproduce exact same?

 Do I understand it correctly that the 0005 patch should give exactly the
 same performance as the 9.4-heikki branch (as it was applied on it, and
 effectively did no change). This wasn't exactly what I measured, although
 the differences were not that significant.


Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
http://www.fuzzy.cz/tmp/gin/#
In some queries it differs in times. I wonder why.

I can rerun the tests, if that's what you're asking for. I'll improve the
 test a bit - e.g. I plan to average multiple runs, to filter out random
 noise (which might be significant for such short queries).

  Right version of these two patches in one against current head is
  attached.
  I've rerun tests with it, results are
  /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
  postprocessing including graph drawing?

 Yes, I'll do that. However I'll have to rerun the other tests too, because
 the
 previous runs were done on a different machine.

 I'm a bit confused right now. The previous patches (0005 + 0007) were
 supposed
 to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
 were
 not commited yet, so why is this version against HEAD?

 To summarize, I know of these patch sets:

 9.4-heikki (old version)
 0001-Optimize-GIN-multi-key-queries.patch
 0002-Further-optimize-the-multi-key-GIN-searches.patch
 0003-Further-optimize-GIN-multi-key-searches.patch
 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

 9.4-alex-1 (based on 9.4-heikki)
 0005-Ternary-consistent-implementation.patch

 9.4-alex-1 (based on 9.4-alex-1)
 0006-Sort-entries.patch


 From these patches I only need to compare 9.4-heikki (old version) and
9.4-alex-1 to release my doubts.

9.4-heikki (new version)
 gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

 9.4-alex-2 (new version)
 gin-fast-scan.10.patch.gz

 Or did I get that wrong?


 Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
numbering.


  Sometimes test cases are not what we expect. For example:
 
  =# explain SELECT id FROM messages WHERE body_tsvector @@
  to_tsquery('english','(5alpha1-initdb''d)');
 QUERY PLAN
 
 
 
   Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
 Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
  ''initdb''  ''d'''::tsquery)
 -  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
  rows=1 width=0)
   Index Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1''
 
  ''initdb''  ''d'''::tsquery)
   Planning time: 0.257 ms
  (5 rows)
 
  5alpha1-initdb'd is 3 gin entries with different frequencies.

 Why do you find that strange? The way the query is formed or the way it's
 evaluated?

 The query generator certainly is not perfect, so it may produce some
 strange queries.


I just mean that in this case 3 words doesn't mean 3 gin entries.

  Also, these patches are not intended to change relevance ordering speed.
  When number of results are high, most of time is relevance calculating
 and
  sorting. I propose to remove ORDER BY clause from test cases to see scan
  speed more clear.

 Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
 one without it.


Good.

  I've dump of postgresql.org search queries from Magnus. We can add them
 to
  our test case.

 You mean search queries from the search for mailing list archives? Sure,
 we add that.


Yes. I'll transform it into tsquery and send you privately.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] bugfix patch for json_array_elements

2014-02-03 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 02/02/2014 08:54 PM, Craig Ringer wrote:
 The attached patch deletes the context after use, bringing performance
 back into line. It should be backpatched to 9.3.

 Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the 
 end of the function since it's a child of the CurrentMemoryContext.

The executor does MemoryContextReset, not
MemoryContextResetAndDeleteChildren, on the per-tuple context.  That means
that child contexts will be reset, not deleted.  I seem to recall some
discussions about changing that, or even redefining MemoryContextReset to
automatically delete child contexts; but it would take a fair amount of
research to be sure such a change was safe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Tomas Vondra

On 3 Únor 2014, 17:08, Alexander Korotkov wrote:
 On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
 
  I found my patch 0005-Ternary-consistent-implementation.patch to be
  completely wrong. It introduces ternary consistent function to
 opclass,
  but
  don't uses it, because I forgot to include ginlogic.c change into
 patch.
  So, it shouldn't make any impact on performance. However, testing
 results
  with that patch significantly differs. That makes me very uneasy. Can
 we
  now reproduce exact same?

 Do I understand it correctly that the 0005 patch should give exactly the
 same performance as the 9.4-heikki branch (as it was applied on it, and
 effectively did no change). This wasn't exactly what I measured,
 although
 the differences were not that significant.


 Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
 http://www.fuzzy.cz/tmp/gin/#

Yes.

 In some queries it differs in times. I wonder why.

Not sure.

 I can rerun the tests, if that's what you're asking for. I'll improve the
 test a bit - e.g. I plan to average multiple runs, to filter out random
 noise (which might be significant for such short queries).

  Right version of these two patches in one against current head is
  attached.
  I've rerun tests with it, results are
  /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
  postprocessing including graph drawing?

 Yes, I'll do that. However I'll have to rerun the other tests too,
 because
 the
 previous runs were done on a different machine.

 I'm a bit confused right now. The previous patches (0005 + 0007) were
 supposed
 to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK
 those
 were
 not commited yet, so why is this version against HEAD?

 To summarize, I know of these patch sets:

 9.4-heikki (old version)
 0001-Optimize-GIN-multi-key-queries.patch
 0002-Further-optimize-the-multi-key-GIN-searches.patch
 0003-Further-optimize-GIN-multi-key-searches.patch
 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

 9.4-alex-1 (based on 9.4-heikki)
 0005-Ternary-consistent-implementation.patch

 9.4-alex-1 (based on 9.4-alex-1)
 0006-Sort-entries.patch


  From these patches I only need to compare 9.4-heikki (old version) and
 9.4-alex-1 to release my doubts.

OK, understood.


 9.4-heikki (new version)
 gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

 9.4-alex-2 (new version)
 gin-fast-scan.10.patch.gz

 Or did I get that wrong?


  Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
 numbering.

You're right. It should have been like this:

9.4-alex-1 (based on 9.4-heikki)
0005-Ternary-consistent-implementation.patch

9.4-alex-2 (based on 9.4-alex-1)
0006-Sort-entries.patch

9.4-alex-3 (new version, not yet tested)
gin-fast-scan.10.patch.gz


   Sometimes test cases are not what we expect. For example:
 
  =# explain SELECT id FROM messages WHERE body_tsvector @@
  to_tsquery('english','(5alpha1-initdb''d)');
 QUERY PLAN
 
 
 
   Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
 Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
  ''initdb''  ''d'''::tsquery)
 -  Bitmap Index Scan on messages_body_tsvector_idx
 (cost=0.00..84.00
  rows=1 width=0)
   Index Cond: (body_tsvector @@ '''5alpha1-initdb'' 
 ''5alpha1''
 
  ''initdb''  ''d'''::tsquery)
   Planning time: 0.257 ms
  (5 rows)
 
  5alpha1-initdb'd is 3 gin entries with different frequencies.

 Why do you find that strange? The way the query is formed or the way
 it's
 evaluated?

 The query generator certainly is not perfect, so it may produce some
 strange queries.


 I just mean that in this case 3 words doesn't mean 3 gin entries.

Isn't that expected? I mean, that's what to_tsquery may do, right?

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Robert Haas

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So
 exit(0) - done, permanently
 exit(1) - done until restart interval
 exit(other) - crash
 and there's no way to obtain the restart immediately behavior?

That's what I was thinking about, yes.  Of course, when the restart
interval is 0, done until restart interval is the same as restart
immediately, so for anyone who wants to *always* restart immediately
there is no problem.  Where you will run into trouble is if you
sometimes want to wait for the restart interval and other times want
to restart immediately.  But I'm not sure that's a real use case.  If
it is, I suggest that we assign it some other, more obscure exit code
and reserve 0 and 1 for what I believe will probably be the common
cases.

It would be potentially more useful and more general to have a
function BackgroundWorkerSetMyRestartInterval() or similar.  That way,
a background worker could choose to get restarted after whatever
amount of time it likes in a particular instance, rather than only 0
or the interval configured at registration time.  But I'm not that
excited about implementing that, and certainly not for 9.4.  It's not
clear that there's a real need, and there are thorny questions like
should a postmaster crash and restart cycle cause an immediate
restart? and what if some other process wants to poke that
background worker and cause it to restart sooner?.  There are lots of
interesting and potentially useful behaviors here, but I'm wary of
letting the engineering getting out ahead of the use cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow startup due to LWLockAssign() spinlock

2014-02-03 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On larger, multi-socket, machines, startup takes a fair bit of time. As
 I was profiling anyway I looked into it and noticed that just about all
 of it is spent in LWLockAssign() called by InitBufferPool(). Starting
 with shared_buffers=48GB on the server Nate Boley provided, takes about
 12 seconds. Nearly all of it spent taking the ShmemLock spinlock.
 Simply modifying LWLockAssign() to not take the spinlock when
 !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making
 LWLockAssign() prettier it seems enough of a speedup to be worthwile
 nonetheless.

Hm.  This patch only works if the postmaster itself never assigns any
LWLocks except during startup.  That's *probably* all right, but it
seems a bit scary.  Is there any cheap way to make the logic actually
be what your comment claims, namely Interlocking is not necessary during
postmaster startup?  I guess we could invent a ShmemInitInProgress global
flag ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So
 exit(0) - done, permanently
 exit(1) - done until restart interval
 exit(other) - crash
 and there's no way to obtain the restart immediately behavior?

 That's what I was thinking about, yes.  Of course, when the restart
 interval is 0, done until restart interval is the same as restart
 immediately, so for anyone who wants to *always* restart immediately
 there is no problem.  Where you will run into trouble is if you
 sometimes want to wait for the restart interval and other times want
 to restart immediately.  But I'm not sure that's a real use case.  If
 it is, I suggest that we assign it some other, more obscure exit code
 and reserve 0 and 1 for what I believe will probably be the common
 cases.

Agreed, but after further reflection it seems like if you've declared
a restart interval, then done until restart interval is probably the
common case.  So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I don't offhand see a point in an exit and restart immediately case.
Why exit at all, if you could just keep running?  If you *can't* just
keep running, it probably means you know you've bollixed something,
so that the crash case is probably what to do anyway.

 It would be potentially more useful and more general to have a
 function BackgroundWorkerSetMyRestartInterval() or similar.

That might be a good idea too, but I think it's orthogonal to what
the exit codes mean.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Robert Haas

On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So
 exit(0) - done, permanently
 exit(1) - done until restart interval
 exit(other) - crash
 and there's no way to obtain the restart immediately behavior?

 That's what I was thinking about, yes.  Of course, when the restart
 interval is 0, done until restart interval is the same as restart
 immediately, so for anyone who wants to *always* restart immediately
 there is no problem.  Where you will run into trouble is if you
 sometimes want to wait for the restart interval and other times want
 to restart immediately.  But I'm not sure that's a real use case.  If
 it is, I suggest that we assign it some other, more obscure exit code
 and reserve 0 and 1 for what I believe will probably be the common
 cases.

 Agreed, but after further reflection it seems like if you've declared
 a restart interval, then done until restart interval is probably the
 common case.  So how about

 exit(0) - done until restart interval, or permanently if there is none
 exit(1) - done permanently, even if a restart interval was declared
 exit(other) - crash

I think what I proposed is better for two reasons:

1. It doesn't change the meaning of exit(1) vs. 9.3.  All background
worker code I've seen or heard about (which is admittedly not all
there is) does exit(1) because the current exit(0) behavior doesn't
seem to be what anyone wants.  So I don't think it matters much
whether we redefine exit(0), but redefining exit(1) will require
version-dependent changes to the background workers in contrib and,
most likely, every other background worker anyone has written.

2. If you want a worker that starts up, does something, and doesn't
need to be further restarted when that succeeds, then under your
definition you return 0 when you fail and 1 when you succeed.   Under
my definition you do the opposite, which I think is more natural.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2014-02-03 Thread Antonin Houska

On 01/31/2014 06:26 AM, Fujii Masao wrote:
 Is there a good place to define the constant, so that both backend and
 client can use it? I'd say 'include/common' but no existing file seems
 to be appropriate. I'm not sure if it's worth to add a new file there.
 
 If there is no good place to define them, it's okay to define them
 also in client side
 for now.
 +termBASE_BACKUP [literalLABEL/literal
 replaceable'label'/replaceable] [literalPROGRESS/literal]
 [literalFAST/literal] [literalWAL/literal]
 [literalNOWAIT/literal] [literalMAX_RATE/literal]/term
 
 It's better to add something like replaceable'rate'/replaceable just after
 literalMAX_RATE/literal.
 
 + para
 +  literalMAX_RATE/literal does not affect WAL streaming.
 + /para
 
 I don't think that this paragraph is required here because BASE_BACKUP is
 basically independent from WAL streaming.
 
 Why did you choose bytes per second as a valid rate which we can specify?
 Since the minimum rate is 32kB, isn't it better to use KB per second for 
 that?
 If we do that, we can easily increase the maximum rate from 1GB to very large
 number in the future if required.

The attached version addresses all the comments above.


 +wait_result = WaitLatch(MyWalSnd-latch, WL_LATCH_SET | WL_TIMEOUT
 +| WL_POSTMASTER_DEATH, (long) (sleep / 1000));
 
 If WL_POSTMASTER_DEATH is triggered, we should exit immediately like
 other process does? This is not a problem of this patch. This problem exists
 also in current master. But ISTM it's better to solve that together. Thought?

Once we're careful about not missing signals, I think PM death should be
noticed too. The backup functionality itself would probably manage to
finish without postmaster, however it's executed under walsender process.

Question is where !PostmasterIsAlive() check should be added. I think it
should go to the main loop of perform_base_backup(), but that's probably
not in the scope of this patch.

Do you think that my patch should only add a comment like Don't forget
to set WL_POSTMASTER_DEATH flag when making basebackup.c sensitive to PM
death?

// Antonin Houska (Tony)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 832524e..704b653 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1772,7 +1772,7 @@ The commands accepted in walsender mode are:
   /varlistentry
 
   varlistentry
-termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal]/term
+termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal] [literalMAX_RATE/literal replaceable'rate'/replaceable]/term
 listitem
  para
   Instructs the server to start streaming a base backup.
@@ -1840,7 +1840,20 @@ The commands accepted in walsender mode are:
   the waiting and the warning, leaving the client responsible for
   ensuring the required log is available.
  /para
- /listitem
+/listitem
+   /varlistentry
+   varlistentry
+termliteralMAX_RATE/literal replaceablerate//term
+listitem
+ para
+  Limit (throttle) the maximum amount of data transferred from server
+  to client per unit of time. The expected unit is kilobytes per second.
+  If this option is specified, the value must either be equal to zero
+  or it must fall within the range from 32 kB through 1 GB (inclusive).
+  If zero is passed or the option is not specified, no restriction is
+  imposed on the transfer.
+ /para
+/listitem
/varlistentry
   /variablelist
  /para
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c379df5..f8866db 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,27 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-r replaceable class=parameterrate/replaceable/option/term
+  termoption--max-rate=replaceable class=parameterrate/replaceable/option/term
+  listitem
+   para
+The maximum amount of data transferred from server per second.
+The purpose is to limit the impact of applicationpg_basebackup/application
+on the running server.
+   /para
+   para
+This option always affects transfer of the data directory. Transfer of
+WAL files is only affected if the collection method is literalfetch/literal.
+   /para
+   para
+Valid values are between literal32 kB/literal and literal1024 MB/literal.
+Suffixes literalk/literal (kilobytes) and literalM/literal
+(megabytes) are accepted.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry

Re: [HACKERS] slow startup due to LWLockAssign() spinlock

2014-02-03 Thread Andres Freund

On 2014-02-03 11:22:45 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On larger, multi-socket, machines, startup takes a fair bit of time. As
  I was profiling anyway I looked into it and noticed that just about all
  of it is spent in LWLockAssign() called by InitBufferPool(). Starting
  with shared_buffers=48GB on the server Nate Boley provided, takes about
  12 seconds. Nearly all of it spent taking the ShmemLock spinlock.
  Simply modifying LWLockAssign() to not take the spinlock when
  !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making
  LWLockAssign() prettier it seems enough of a speedup to be worthwile
  nonetheless.
 
 Hm.  This patch only works if the postmaster itself never assigns any
 LWLocks except during startup.  That's *probably* all right, but it
 seems a bit scary.  Is there any cheap way to make the logic actually
 be what your comment claims, namely Interlocking is not necessary during
 postmaster startup?  I guess we could invent a ShmemInitInProgress global
 flag ...

I'd be fine with inventing such a flag, I couldn't find one and decided
that this alone didn't merit it, since it seems to be really unlikely
that we will start to allocate such resources after startup in the
postmaster. Unless we're talking about single user mode obviously, but
the spinlock isn't necessary there anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bugfix patch for json_array_elements

2014-02-03 Thread Andrew Dunstan



On 02/03/2014 11:12 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 02/02/2014 08:54 PM, Craig Ringer wrote:

The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.

Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the
end of the function since it's a child of the CurrentMemoryContext.

The executor does MemoryContextReset, not
MemoryContextResetAndDeleteChildren, on the per-tuple context.  That means
that child contexts will be reset, not deleted.  I seem to recall some
discussions about changing that, or even redefining MemoryContextReset to
automatically delete child contexts; but it would take a fair amount of
research to be sure such a change was safe.



Good to know.

Is it worth a note in src/backend/utils/mmgr/README so people are warned 
against making the same mistake I did? Both of these are set-returning 
functions operating in materialize mode - not sure if that makes any 
difference.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-02-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Agreed, but after further reflection it seems like if you've declared
 a restart interval, then done until restart interval is probably the
 common case.  So how about ...

 I think what I proposed is better for two reasons:

 1. It doesn't change the meaning of exit(1) vs. 9.3.  All background
 worker code I've seen or heard about (which is admittedly not all
 there is) does exit(1) because the current exit(0) behavior doesn't
 seem to be what anyone wants.

Hm.  If that's actually the case, then I agree that preserving the
current behavior of exit(1) is useful.  I'd been assuming we were
breaking things anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-03 Thread Robert Haas

On Fri, Jan 31, 2014 at 10:22 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote:
 Right now, whether or not to autovacuum is the rest of a two-pronged
 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

 Thoughts?  Let the hurling of rotten tomatoes begin.

 Robert, where are we on this?  Should I post a patch?

I started working on this at one point but didn't finish the
implementation, let alone the no-doubt-onerous performance testing
that will be needed to validate whatever we come up with.  It would be
really easy to cause serious regressions with ill-considered changes
in this area, and I don't think many people here have the bandwidth
for a detailed study of all the different workloads that might be
affected here right this very minute.  More generally, you're sending
all these pings three weeks after the deadline for CF4.  I don't think
that's a good time to encourage people to *start* revising old
patches, or writing new ones.

I've also had some further thoughts about the right way to drive
vacuum scheduling.  I think what we need to do is tightly couple the
rate at which we're willing to do vacuuming to the rate at which we're
incurring vacuum debt.  That is, if we're creating 100kB/s of pages
needing vacuum, we vacuum at 2-3MB/s (with default settings).  If
we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
everything goes to heck.  The rate of vacuuming needs to be tied
somehow to the rate at which we're creating stuff that needs to be
vacuumed.  Right now we don't even have a way to measure that, let
alone auto-regulate the aggressiveness of autovacuum on that basis.

Similarly, for marking of pages as all-visible, we currently make the
same decision whether the relation is getting index-scanned (in which
case the failure to mark those pages all-visible may be suppressing
the use of index scans or making them less effective) or whether it's
not being accessed at all (in which case vacuuming it won't help
anything, and might hurt by pushing other pages out of cache).  Again,
if we had better statistics, we could measure this - counting heap
fetches for actual index-only scans plus heap fetches for index scans
that might have been planned index-only scans but for the relation
having too few all-visible pages doesn't sound like an impossible
metric to gather.  And if we had that, we could use it to trigger
vacuuming, instead of guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Alexander Korotkov

On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra t...@fuzzy.cz wrote:

Sometimes test cases are not what we expect. For example:
  
   =# explain SELECT id FROM messages WHERE body_tsvector @@
   to_tsquery('english','(5alpha1-initdb''d)');
  QUERY PLAN
  
  
 
 
Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
  Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
   ''initdb''  ''d'''::tsquery)
  -  Bitmap Index Scan on messages_body_tsvector_idx
  (cost=0.00..84.00
   rows=1 width=0)
Index Cond: (body_tsvector @@ '''5alpha1-initdb'' 
  ''5alpha1''
  
   ''initdb''  ''d'''::tsquery)
Planning time: 0.257 ms
   (5 rows)
  
   5alpha1-initdb'd is 3 gin entries with different frequencies.
 
  Why do you find that strange? The way the query is formed or the way
  it's
  evaluated?
 
  The query generator certainly is not perfect, so it may produce some
  strange queries.
 
 
  I just mean that in this case 3 words doesn't mean 3 gin entries.

 Isn't that expected? I mean, that's what to_tsquery may do, right?


Everything is absolutely correct. :-) It just may be not what do you expect
if you aren't getting into details.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-03 Thread Tomas Vondra

On 3 Únor 2014, 19:18, Alexander Korotkov wrote:
 On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra t...@fuzzy.cz wrote:

Sometimes test cases are not what we expect. For example:
  
   =# explain SELECT id FROM messages WHERE body_tsvector @@
   to_tsquery('english','(5alpha1-initdb''d)');
  QUERY PLAN
  
  
 
 
Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
  Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' 
 ''5alpha1'' 
   ''initdb''  ''d'''::tsquery)
  -  Bitmap Index Scan on messages_body_tsvector_idx
  (cost=0.00..84.00
   rows=1 width=0)
Index Cond: (body_tsvector @@ '''5alpha1-initdb'' 
  ''5alpha1''
  
   ''initdb''  ''d'''::tsquery)
Planning time: 0.257 ms
   (5 rows)
  
   5alpha1-initdb'd is 3 gin entries with different frequencies.
 
  Why do you find that strange? The way the query is formed or the way
  it's
  evaluated?
 
  The query generator certainly is not perfect, so it may produce some
  strange queries.
 
 
  I just mean that in this case 3 words doesn't mean 3 gin entries.

 Isn't that expected? I mean, that's what to_tsquery may do, right?


 Everything is absolutely correct. :-) It just may be not what do you
 expect
 if you aren't getting into details.

Well, that's not how I designed the benchmark. I haven't based the
benchmark on GIN entries, but on 'natural' words, to simulate real
queries. I understand using GIN terms might get more consistent results
(e.g. 3 GIN terms with given frequency) than the current approach.

However this was partially a goal, to cover wider range of cases. Also,
that's why the benchmark works with relative speedup - comparing the query
duration with and without the patch.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_sleep(interval)

2014-02-03 Thread Robert Haas

On Sun, Feb 2, 2014 at 4:50 AM, Julien Rouhaud rjuju...@gmail.com wrote:
 It seems like pg_sleep_until() has issues if used within a transaction, as
 it uses now() and not clock_timestamp(). Please find attached a patch that
 solves this issue.

 For consistency reasons, I also modified pg_sleep_for() to also use
 clock_timestamp.

Good catch!

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-02-03 Thread Pavel Stehule

Hello

I am not happy from warnings_as_error

what about stop_on_warning instead?

second question: should be these errors catchable or uncatchable?

When I work on large project, where I had to use some error handler of
EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax
errors was catched by this handler. Any debugging was terribly difficult
and I had to write plpgsql_lint as solution.

Regards

Pavel


2014-02-03 Marko Tiikkaja ma...@joh.to:

 Hi everyone,

 Here's a new version of the patch.  Added new tests and docs and changed
 the GUCs per discussion.

 plpgsql.warnings_as_errors only affects compilation at CREATE FUNCTION
 time:

 =# set plpgsql.warnings to 'all';
 SET
 =#* set plpgsql.warnings_as_errors to true;
 SET
 =#* select foof(1); -- compiled since it's the first call in this session
 WARNING:  variable f1 shadows a previously defined variable
 LINE 2: declare f1 int;
 ^
  foof
 --

 (1 row)

 =#* create or replace function foof(f1 int) returns void as
 $$
 declare f1 int;
 begin
 end $$ language plpgsql;
 ERROR:  variable f1 shadows a previously defined variable
 LINE 3: declare f1 int;


 Currently, plpgsql_warnings is a boolean since there's only one warning we
 implement.  The idea is to make it a bit field of some kind in the future
 when we add more warnings.  Starting that work for 9.4 seemed like
 overkill, though.  I tried to keep things simple.


 Regards,
 Marko Tiikkaja

Re: [HACKERS] plpgsql.warn_shadow

2014-02-03 Thread Marko Tiikkaja


On 2014-02-03 9:17 PM, Pavel Stehule wrote:

I am not happy from warnings_as_error

what about stop_on_warning instead?


abort_compilation_on_warning?  I think if we're not going to make it 
more clear that warnings are only promoted to errors at CREATE FUNCTION 
time, we can't do much better than warnings_as_errors.



second question: should be these errors catchable or uncatchable?

When I work on large project, where I had to use some error handler of
EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax
errors was catched by this handler. Any debugging was terribly difficult
and I had to write plpgsql_lint as solution.


Is this really an issue considering these errors can only happen when 
someone runs CREATE FUNCTION?



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-02-03 Thread Pavel Stehule

Dne 3.2.2014 21:48 Marko Tiikkaja ma...@joh.to napsal(a):

 On 2014-02-03 9:17 PM, Pavel Stehule wrote:

 I am not happy from warnings_as_error

 what about stop_on_warning instead?


 abort_compilation_on_warning?  I think if we're not going to make it more
clear that warnings are only promoted to errors at CREATE FUNCTION time, we
can't do much better than warnings_as_errors.


 second question: should be these errors catchable or uncatchable?

 When I work on large project, where I had to use some error handler of
 EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax
 errors was catched by this handler. Any debugging was terribly difficult
 and I had to write plpgsql_lint as solution.


 Is this really an issue considering these errors can only happen when
someone runs CREATE FUNCTION?

Can you look, pls, what terminology is used in gcc, clang, perl or python.

I agree with idea, but proposed names I have not associated with something.

Regards

Pavel


 Regards,
 Marko Tiikkaja

[HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD

2014-02-03 Thread Peter Geoghegan

On Sun, Feb 2, 2014 at 7:13 AM, Andres Freund and...@2ndquadrant.com wrote:
 Just as reference, we're talking about a performance degradation from
 475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
 max_connections to 90, from 91...

That's pretty terrible.

 So, I looked into this, and I am fairly certain it's because of the
 (mis-)alignment of the buffer descriptors. With certain max_connections
 settings InitBufferPool() happens to get 64byte aligned addresses, with
 others not. I checked the alignment with gdb to confirm that.

I find your diagnosis to be quite plausible.

 A quick hack (attached) making BufferDescriptor 64byte aligned indeed
 restored performance across all max_connections settings. It's not
 surprising that a misaligned buffer descriptor causes problems -
 there'll be plenty of false sharing of the spinlocks otherwise. Curious
 that the the intel machine isn't hurt much by this.

I think that is explained here:

http://www.agner.org/optimize/blog/read.php?i=142v=t

With Sandy Bridge, Misaligned memory operands [are] handled efficiently.

 Now all this hinges on the fact that by a mere accident
 BufferDescriptors are 64byte in size:

Are they 64 bytes in size on REL9_*_STABLE? How about on win64? I
think we're reasonably disciplined here already, but long is 32-bits
in length even on win64. Looks like it would probably be okay, but as
you say, it doesn't seem like something to leave to chance.

 We could polish up the attached patch and apply it to all the branches,
 the costs of memory are minimal. But I wonder if we shouldn't instead
 make ShmemInitStruct() always return cacheline aligned addresses. That
 will require some fiddling, but it might be a good idea nonetheless?

What fiddling are you thinking of?

 I think we should also consider some more reliable measures to have
 BufferDescriptors cacheline sized, rather than relying on the happy
 accident. Debugging alignment issues isn't fun, too much of a guessing
 game...

+1. Maybe make code that isn't appropriately aligned fail to compile?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Jeff Janes

On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.comwrote:


 Some background:
 The setups that triggered me into working on the patchset didn't really
 have a pgbench like workload, the individual queries were/are more
 complicated even though it's still an high throughput OLTP workload. And
 the contention was *much* higher than what I can reproduce with pgbench
 -S, there was often nearly all time spent in the lwlock's spinlock, and
 it was primarily the buffer mapping lwlocks, being locked in shared
 mode. The difference is that instead of locking very few buffers per
 query like pgbench does, they touched much more.



Perhaps I should try to argue for this extension to pgbench again:

http://www.postgresql.org/message-id/CAMkU=1w0K3RNhtPuLF8WQoVi6gxgG6mcnpC=-ivjwkjkydp...@mail.gmail.com

I think it would go a good job of exercising what you want, provided you
set the scale so that all data fit in RAM but not in shared_buffers.

Or maybe you want it to fit in shared_buffers, since the buffer mapping
lock was contended in shared mode--that suggests the problem is finding the
buffer that already has the page, not making a buffer to have the page.

Cheers,

Jeff

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Peter Geoghegan

On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote:
 The changed algorithm for lwlock imo is an *algorithmic* improvement,
 not one for a particular architecture. The advantage being that locking
 a lwlock which is primarily taken in shared mode will never need need to
 wait or loop.

I agree. My point was only that the messaging ought to be that this is
something that those with multi-socket Intel systems should take note
of.

 Yes, that branch is used by some of them. But to make that clear to all
 that are still reading, I have *first* presented the patch  findings to
 -hackers and *then* backported it, and I have referenced the existance
 of the patch for 9.2 on list before. This isn't some kind of secret
 sauce deal...

No, of course not. I certainly didn't mean to imply that. My point was
only that anyone that is affected to the same degree as the party with
the 4 socket server might be left with a very poor impression of
Postgres if we failed to fix the problem. It clearly rises to the
level of a bugfix.

 That might be something to do later, as it *really* can hurt in
 practice. We had one server go from load 240 to 11...

Well, we have to commit something on master first. But it should be a
priority to avoid having this hurt users further, since the problems
are highly predictable for certain types of servers.

 But I think we should first focus on getting the patch ready for
 master, then we can see where it's going. At the very least I'd like to
 split of the part modifying the current spinlocks to use the atomics,
 that seems far to invasive.

Agreed.

 I unfortunately can't tell you that much more, not because it's private,
 but because it mostly was diagnosed by remote hand debugging, limiting
 insights considerably.

Of course.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-02-03 Thread Peter Geoghegan

On Mon, Feb 3, 2014 at 4:15 AM, Robert Haas robertmh...@gmail.com wrote:
 I'm not really feeling a compelling need to change that.  We've been
 displaying total runtime - described exactly that way - for many
 releases and it's surely is confusing to the novice that the time
 reported can be much less than the time reported by psql's \timing
 option, usually because of planning time.

I think that has a lot more to do with network roundtrip time and
protocol/serialization overhead.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD

2014-02-03 Thread Andres Freund

On 2014-02-03 15:17:13 -0800, Peter Geoghegan wrote:
 On Sun, Feb 2, 2014 at 7:13 AM, Andres Freund and...@2ndquadrant.com wrote:
  Just as reference, we're talking about a performance degradation from
  475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
  max_connections to 90, from 91...

 That's pretty terrible.

Yea, I was scared myself.

  A quick hack (attached) making BufferDescriptor 64byte aligned indeed
  restored performance across all max_connections settings. It's not
  surprising that a misaligned buffer descriptor causes problems -
  there'll be plenty of false sharing of the spinlocks otherwise. Curious
  that the the intel machine isn't hurt much by this.

 I think that is explained here:

 http://www.agner.org/optimize/blog/read.php?i=142v=t

 With Sandy Bridge, Misaligned memory operands [are] handled efficiently.

No, I don't think so. Those improvements afair refer to unaligned
accesses as in accessing a 4 byte variable at address % 4 != 0.

  Now all this hinges on the fact that by a mere accident
  BufferDescriptors are 64byte in size:

 Are they 64 bytes in size on REL9_*_STABLE? How about on win64? I
 think we're reasonably disciplined here already, but long is 32-bits
 in length even on win64. Looks like it would probably be okay, but as
 you say, it doesn't seem like something to leave to chance.

I haven't checked any branches yet, but I don't remember any recent
changes to sbufdesc. And I think we're lucky enough that all the used
types are the same width across common 64bit OSs.
But I absolutely agree we shouldn't leave this to chance.

  We could polish up the attached patch and apply it to all the branches,
  the costs of memory are minimal. But I wonder if we shouldn't instead
  make ShmemInitStruct() always return cacheline aligned addresses. That
  will require some fiddling, but it might be a good idea nonetheless?

 What fiddling are you thinking of?

Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before
returning from ShmemAlloc() (and thereby ShmemInitStruct). After I'd
written the email I came across an interesting bit of code:

void *
ShmemAlloc(Size size)
{
...
/* extra alignment for large requests, since they are probably buffers 
*/
if (size = BLCKSZ)
newStart = BUFFERALIGN(newStart);

/*
 * Preferred alignment for disk I/O buffers.  On some CPUs, copies between
 * user space and kernel space are significantly faster if the user buffer
 * is aligned on a larger-than-MAXALIGN boundary.  Ideally this should be
 * a platform-dependent value, but for now we just hard-wire it.
 */
#define ALIGNOF_BUFFER  32

#define BUFFERALIGN(LEN)TYPEALIGN(ALIGNOF_BUFFER, (LEN))

So, we're already trying to make large allocations (which we'll be using
here) try to fit to a 32 byte alignment. Which unfortunately isn't
enough here, but it explains why max_connections has to be changed by
exactly 1 to achieve adequate performance...
So, the absolutely easiest thing would be to just change ALIGNOF_BUFFER
to 64. The current value was accurate in 2003 (common cacheline size
back then), but it really isn't anymore today. Anything relevant is
64byte.

But I really think we should also remove the BLCKSZ limit. There's
relatively few granular/dynamic usages of ShmemAlloc(), basically just
shared memory hashes. And I'd be rather unsurprised if aligning all
allocations for hashes resulted in a noticeable speedup. We have
frightening bottlenecks in those today.

  I think we should also consider some more reliable measures to have
  BufferDescriptors cacheline sized, rather than relying on the happy
  accident. Debugging alignment issues isn't fun, too much of a guessing
  game...

 +1. Maybe make code that isn't appropriately aligned fail to compile?

I was wondering about using some union trickery to make sure the array
only consists out of properly aligned types. That'd require some changes
but luckily code directly accessing the BufferDescriptors array isn't
too far spread.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-02-03 Thread Tom Lane

Peter Geoghegan p...@heroku.com writes:
 On Mon, Feb 3, 2014 at 4:15 AM, Robert Haas robertmh...@gmail.com wrote:
 I'm not really feeling a compelling need to change that.  We've been
 displaying total runtime - described exactly that way - for many
 releases and it's surely is confusing to the novice that the time
 reported can be much less than the time reported by psql's \timing
 option, usually because of planning time.

 I think that has a lot more to do with network roundtrip time and
 protocol/serialization overhead.

The problem I'm having with the way it stands now is that one would
reasonably expect that Total time is the total of all times counted
by EXPLAIN, including main plan execution time, trigger firing time,
and now planning time.  Since it is not, any longer, a total, I think
renaming it would be a good idea.  I'm not wedded to execution time
in particular, but I don't like total.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow startup due to LWLockAssign() spinlock

2014-02-03 Thread Andres Freund

On 2014-02-03 11:22:45 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On larger, multi-socket, machines, startup takes a fair bit of time. As
  I was profiling anyway I looked into it and noticed that just about all
  of it is spent in LWLockAssign() called by InitBufferPool(). Starting
  with shared_buffers=48GB on the server Nate Boley provided, takes about
  12 seconds. Nearly all of it spent taking the ShmemLock spinlock.
  Simply modifying LWLockAssign() to not take the spinlock when
  !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making
  LWLockAssign() prettier it seems enough of a speedup to be worthwile
  nonetheless.
 
 Hm.  This patch only works if the postmaster itself never assigns any
 LWLocks except during startup.  That's *probably* all right, but it
 seems a bit scary.  Is there any cheap way to make the logic actually
 be what your comment claims, namely Interlocking is not necessary during
 postmaster startup?  I guess we could invent a ShmemInitInProgress global
 flag ...

So, here's a flag implementing things with that flag. I kept your name,
as it's more in line with ipci.c's naming, but it looks kinda odd
besides proc_exit_inprogress.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index c392d4f..ece9674 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -48,6 +48,11 @@ shmem_startup_hook_type shmem_startup_hook = NULL;
 static Size total_addin_request = 0;
 static bool addin_request_allowed = true;
 
+/*
+ * Signals whether shared memory is currently being initialized, while the
+ * postmaster hasn't forked any children yet.
+ */
+bool ShmemInitInProgress = false;
 
 /*
  * RequestAddinShmemSpace
@@ -97,6 +102,12 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		int			numSemas;
 
 		/*
+		 * We're now initializing shared memory, and we won't have forked
+		 * children yet.
+		 */
+		ShmemInitInProgress = true;
+
+		/*
 		 * Size of the Postgres shared-memory block is estimated via
 		 * moderately-accurate estimates for the big hogs, plus 100K for the
 		 * stuff that's too small to bother with estimating.
@@ -261,4 +272,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 */
 	if (shmem_startup_hook)
 		shmem_startup_hook();
+
+	if(!IsUnderPostmaster)
+		ShmemInitInProgress = false;
 }
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82ef440..0365f9b 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -355,9 +355,11 @@ CreateLWLocks(void)
  * LWLockAssign - assign a dynamically-allocated LWLock number
  *
  * We interlock this using the same spinlock that is used to protect
- * ShmemAlloc().  Interlocking is not really necessary during postmaster
- * startup, but it is needed if any user-defined code tries to allocate
- * LWLocks after startup.
+ * ShmemAlloc().  Interlocking is not necessary while we're initializing
+ * shared memory, but it is needed for any code allocating LWLocks after
+ * startup.  Since we allocate large amounts of LWLocks for the buffer pool
+ * during startup, we avoid taking the spinlock when not needed, as it has
+ * shown to slowdown startup considerably.
  */
 LWLock *
 LWLockAssign(void)
@@ -368,14 +370,17 @@ LWLockAssign(void)
 	volatile int *LWLockCounter;
 
 	LWLockCounter = (int *) ((char *) MainLWLockArray - 3 * sizeof(int));
-	SpinLockAcquire(ShmemLock);
+	if (!ShmemInitInProgress)
+		SpinLockAcquire(ShmemLock);
 	if (LWLockCounter[0] = LWLockCounter[1])
 	{
-		SpinLockRelease(ShmemLock);
+		if (!ShmemInitInProgress)
+			SpinLockRelease(ShmemLock);
 		elog(ERROR, no more LWLocks available);
 	}
 	result = MainLWLockArray[LWLockCounter[0]++].lock;
-	SpinLockRelease(ShmemLock);
+	if (!ShmemInitInProgress)
+		SpinLockRelease(ShmemLock);
 	return result;
 }
 
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 37eca7a..0f1e00a 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -63,6 +63,7 @@ typedef void (*shmem_startup_hook_type) (void);
 
 /* ipc.c */
 extern bool proc_exit_inprogress;
+extern bool ShmemInitInProgress;
 
 extern void proc_exit(int code) __attribute__((noreturn));
 extern void shmem_exit(int code);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?

2014-02-03 Thread Michael Paquier

On Sun, Feb 2, 2014 at 12:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-02 00:04:41 +0900, Michael Paquier wrote:
 On Fri, Dec 13, 2013 at 2:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  On 2013-12-12 11:55:51 -0500, Tom Lane wrote:
  I'm not, however, terribly thrilled with the suggestions to add implicit
  casts associated with this type.  Implicit casts are generally dangerous.
 
  It's a tradeof. Currently we have the following functions returning LSNs
  as text:
  * pg_current_xlog_location
  * pg_current_xlog_insert_location
  * pg_last_xlog_receive_location
  * pg_last_xlog_replay_location
  one view containing LSNs
  * pg_stat_replication
  and the following functions accepting LSNs as textual paramters:
  * pg_xlog_location_diff
  * pg_xlogfile_name
 
  The question is how do we deal with backward compatibility when
  introducing a LSN type? There might be some broken code around
  monitoring if we simply replace the type without implicit casts.
 
  Given the limited usage, how bad would it really be if we simply
  made all those take/return the LSN type?  As long as the type's
  I/O representation looks like the old text format, I suspect
  most queries wouldn't notice.

 I've asked around inside 2ndq and we could find one single problematic
 query, so it's really not too bad.

 Are there some plans to awaken this patch (including changing the
 output of the functions of xlogfuncs.c)? This would be useful for the
 differential backup features I am looking at these days. I imagine
 that it is too late for 9.4 though...

 I think we should definitely go ahead with the patch per-se, we just
 added another system view with lsns in it... I don't have a too strong
 opinion whether to do it in 9.4 or 9.5. It seems fairly low impact to
 me, and it's an old patch, but I personally don't have the tuits to
 refresh the patch right now.
Please find attached a patch implementing lsn as a datatype, based on
the one Robert wrote a couple of years ago. Since XLogRecPtr has been
changed to a simple uint64, this *refresh* has needed some work. In
order to have this data type plugged in more natively with other xlog
system functions, I have added as well PG_RETURN_LSN and PG_GETARG_LSN
to facilitate the interface, making easier code management if
XLogRecPtr or LSN format are changed in the future.

Patch contains regression tests as well as a bit of documentation.
Perhaps this is too late for 9.4, so if there are no objections I'll
simply add this patch to the next commit fest in June for 9.5. Having
the system functions use this data type (pg_start_backup, pg_xlog_*,
create_rep_slot, etc.)  does not look to be that difficult as all the
functions in xlogfuncs.c actually use XLogRecPtr before changing it to
text, so it would actually simplify the code. I think I'll simply code
it as I just looking at it now...
Regards,
-- 
Michael
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***
*** 155,160 
--- 155,166 
/row
  
row
+entrytypelsn/type/entry
+entry/entry
+entryLog Sequence Number/entry
+   /row
+ 
+   row
 entrytypemacaddr/type/entry
 entry/entry
 entryMAC (Media Access Control) address/entry
***
*** 4502,4507  SELECT * FROM pg_attribute
--- 4508,4527 
 /para
/sect1
  
+   sect1 id=datatype-lsn
+titleacronymLSN/ Type/title
+ 
+indexterm zone=datatype-lsn
+ primaryLSN/primary
+/indexterm
+ 
+para
+ The typelsn/type data type can be used to store LSN (Log Sequence
+ Number) data which is a pointer to a location in the XLOG. This type is a
+ representation of XLogRecPtr.
+/para
+   /sect1
+ 
sect1 id=datatype-pseudo
 titlePseudo-Types/title
  
*** a/src/backend/utils/adt/Makefile
--- b/src/backend/utils/adt/Makefile
***
*** 20,26  OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \
! 	lockfuncs.o misc.o nabstime.o name.o numeric.o numutils.o \
  	oid.o oracle_compat.o orderedsetaggs.o \
  	pseudotypes.o rangetypes.o rangetypes_gist.o \
  	rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \
--- 20,26 
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \
! 	lockfuncs.o lsn.o misc.o nabstime.o name.o numeric.o numutils.o \
  	oid.o oracle_compat.o orderedsetaggs.o \
  	pseudotypes.o rangetypes.o rangetypes_gist.o \
  	rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \
*** /dev/null
--- b/src/backend/utils/adt/lsn.c
***
*** 0 
--- 1,180 
+ /*-
+  *
+  * lsn.c
+  *	  Internal LSN operations
+  *

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-02-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 The problem I'm having with the way it stands now is that one would
 reasonably expect that Total time is the total of all times counted
 by EXPLAIN, including main plan execution time, trigger firing time,
 and now planning time.  Since it is not, any longer, a total, I think
 renaming it would be a good idea.  I'm not wedded to execution time
 in particular, but I don't like total.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] Proposing pg_hibernate

2014-02-03 Thread Gurjeet Singh

Please find attached the pg_hibernate extension. It is a
set-it-and-forget-it solution to enable hibernation of Postgres
shared-buffers. It can be thought of as an amalgam of pg_buffercache and
pg_prewarm.

It uses the background worker infrastructure. It registers one worker
process (BufferSaver) to save the shared-buffer metadata when server is
shutting down, and one worker per database (BlockReader) when restoring the
shared buffers.

It stores the buffer metadata under $PGDATA/pg_database/, one file per
database, and one separate file for global objects. It sorts the list of
buffers before storage, so that when it encounters a range of consecutive
blocks of a relation's fork, it stores that range as just one entry, hence
reducing the storage and I/O overhead.

On-disk binary format, which is used to create the per-database save-files,
is defined as:
1. One or more relation filenodes; stored as rrelfilenode.
2. Each realtion is followed by one or more fork number; stored as
fforknumber
3. Each fork number is followed by one or more block numbers; stored as
bblocknumber
4. Each block number is followed by zero or more range numbers; stored as
Nnumber

  {r {f {b N* }+ }+ }+

Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.

The extension currently works with PG 9.3, and may work on 9.4 without any
changes; I haven't tested, though.  If not, I think it'd be easy to port to
HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via
a GUC, and since my aim is to provide a non-intrusive no-tuning-required
extension, I'd like to use the new dynamic-background-worker infrastructure
in 9.4, which doesn't seem to have any preset limits (I think it's limited
by max_connections, but I may be wrong). I can work on 9.4 port, if there's
interest in including this as a contrib/ module.

To see the extension in action:

.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more
blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks
seen just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in
shared-buffers.

The code is also available as Git repository at
https://github.com/gurjeet/pg_hibernate/

Demo:

$ make -C contrib/pg_hibernate/
$ make -C contrib/pg_hibernate/ install
$ vi $B/db/data/postgresql.conf
$ grep shared_preload_libraries $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_hibernate'   # (change requires restart)

$ pgstart
waiting for server to start done
server started

$ pgsql -c 'create extension pg_buffercache;'
CREATE EXTENSION

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
   163
14
(2 rows)

$ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as
b from generate_series(1, 10) as s;'
SELECT 10

$ pgsql -c 'create index on test_hibernate(a);'
CREATE INDEX

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
  2254
14
(2 rows)

$ pgstop
waiting for server to shut down... done
server stopped

$ pgstart
waiting for server to start done
server started

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
  2264
17
(2 rows)

There are a few more blocks than the time they were saved, but all the
blocks from before the restart are present in shared buffers after the
restart.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com http://www.enterprisedb.com


pg_hibernate.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-03 Thread Peter Geoghegan

On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I refactored the loop in _bt_moveright to, well, not have that bug anymore.
 The 'page' and 'opaque' pointers are now fetched at the beginning of the
 loop. Did I miss something?

I think so, yes. You still aren't assigning the value returned by
_bt_getbuf() to 'buf'. Since, as I mentioned, _bt_finish_split()
ultimately unlocks *and unpins*, it may not be the same buffer as
before, so even with the refactoring there are race conditions. A
closely related issue is that you haven't mentioned anything about
buffer pins/refcount side effects in comments above
_bt_finish_split(), even though I believe you should.

A minor stylistic concern is that I think it would be better to only
have one pair of _bt_finish_split()/_bt_getbuf() calls regardless of
the initial value of 'access'.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?

2014-02-03 Thread Tom Lane

Michael Paquier michael.paqu...@gmail.com writes:
 Please find attached a patch implementing lsn as a datatype, based on
 the one Robert wrote a couple of years ago.

 Patch contains regression tests as well as a bit of documentation.
 Perhaps this is too late for 9.4, so if there are no objections I'll
 simply add this patch to the next commit fest in June for 9.5.

I may have lost count, but aren't a bunch of the affected functions new
in 9.4?  If so, there's a good argument to be made that we should get
this in now, rather than waiting and having an API change for those
functions in 9.5.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?

2014-02-03 Thread Michael Paquier

On Tue, Feb 4, 2014 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 Please find attached a patch implementing lsn as a datatype, based on
 the one Robert wrote a couple of years ago.

 Patch contains regression tests as well as a bit of documentation.
 Perhaps this is too late for 9.4, so if there are no objections I'll
 simply add this patch to the next commit fest in June for 9.5.

 I may have lost count, but aren't a bunch of the affected functions new
 in 9.4?  If so, there's a good argument to be made that we should get
 this in now, rather than waiting and having an API change for those
 functions in 9.5.
Cool... I have created a second patch that updates all the system
functions to use the new lsn datatype introduced in the 1st patch
(pg_start|stop_backup, replication_slot stuff, xlog diff things, etc.)
and it is attached. This cleans up quite a bit of code in xlogfuncs.c
because we do not need anymore the LSN - cstring transformations!
I am also attaching a v2 of the first patch, I noticed that lsn_in
introduced in the first patch was using some error messages not
consistent with the ones of validate_xlog_location:xlogfuncs.c. Note
as well that validate_xlog_location is removed in the 2nd patch where
all the system functions are swicthed to the new datatype.
Regards,
-- 
Michael
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***
*** 155,160 
--- 155,166 
/row
  
row
+entrytypelsn/type/entry
+entry/entry
+entryLog Sequence Number/entry
+   /row
+ 
+   row
 entrytypemacaddr/type/entry
 entry/entry
 entryMAC (Media Access Control) address/entry
***
*** 4502,4507  SELECT * FROM pg_attribute
--- 4508,4527 
 /para
/sect1
  
+   sect1 id=datatype-lsn
+titleacronymLSN/ Type/title
+ 
+indexterm zone=datatype-lsn
+ primaryLSN/primary
+/indexterm
+ 
+para
+ The typelsn/type data type can be used to store LSN (Log Sequence
+ Number) data which is a pointer to a location in the XLOG. This type is a
+ representation of XLogRecPtr.
+/para
+   /sect1
+ 
sect1 id=datatype-pseudo
 titlePseudo-Types/title
  
*** a/src/backend/utils/adt/Makefile
--- b/src/backend/utils/adt/Makefile
***
*** 20,26  OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \
! 	lockfuncs.o misc.o nabstime.o name.o numeric.o numutils.o \
  	oid.o oracle_compat.o orderedsetaggs.o \
  	pseudotypes.o rangetypes.o rangetypes_gist.o \
  	rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \
--- 20,26 
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \
! 	lockfuncs.o lsn.o misc.o nabstime.o name.o numeric.o numutils.o \
  	oid.o oracle_compat.o orderedsetaggs.o \
  	pseudotypes.o rangetypes.o rangetypes_gist.o \
  	rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \
*** /dev/null
--- b/src/backend/utils/adt/lsn.c
***
*** 0 
--- 1,180 
+ /*-
+  *
+  * lsn.c
+  *	  Internal LSN operations
+  *
+  * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/adt/lsn.c
+  *
+  *-
+  */
+ #include postgres.h
+ 
+ #include funcapi.h
+ #include libpq/pqformat.h
+ #include utils/builtins.h
+ #include utils/lsn.h
+ 
+ #define MAXLSNLEN		17
+ #define MAXLSNCOMPONENT	8
+ 
+ /*--
+  * Formatting and conversion routines.
+  *-*/
+ 
+ Datum
+ lsn_in(PG_FUNCTION_ARGS)
+ {
+ 	char	   *str = PG_GETARG_CSTRING(0);
+ 	int			len1, len2;
+ 	uint32		id, off;
+ 	XLogRecPtr	result;
+ 
+ 	/* Sanity check input format. */
+ 	len1 = strspn(str, 0123456789abcdefABCDEF);
+ 	if (len1  1 || len1  MAXLSNCOMPONENT || str[len1] != '/')
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+  errmsg(invalid input syntax for transaction log location: \%s\, str)));
+ 	len2 = strspn(str + len1 + 1, 0123456789abcdefABCDEF);
+ 	if (len2  1 || len2  MAXLSNCOMPONENT || str[len1 + 1 + len2] != '\0')
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+  errmsg(invalid input syntax for transaction log location: \%s\, str)));
+ 
+ 	/* Decode result. */
+ 	id = (uint32) strtoul(str, NULL, 16);
+ 	off = (uint32) strtoul(str + len1 + 1, NULL, 16);
+ 	result = (XLogRecPtr) ((uint64) id  32) | off;
+ 
+ 	PG_RETURN_LSN(result);

[HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 3, 2014 at 4:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 guaibasaurus doesn't like this patch: you need to be more careful
 about links, because the regression instructions are supposed to
 compile as a standalone document.

 I wonder if these standalone things are really worthwhile.  The whole
 point of this, ten years ago, was that people who were trying to get
 started with PostgreSQL might not have had neither the doc toolchain
 nor convenient Internet access available.  Plain text documentation
 was essential for getting off the ground.  This seems much less
 relevant today; is the annoyance of not being able to use links
 everywhere really buying us anything at this point?

That's a very fair question.  It's a reasonable bet that pretty much
nobody actually looks at the text versions of either HISTORY or
regress_README anymore.  It's conceivable that somebody somewhere makes
use of the text version of INSTALL when trying to get PG going on some
bare-bones platform ... but really, can't they look it up on the net?
How'd they get the PG sources they're installing, anyway?

I'm prepared to believe that these things are just dinosaurs now.
However, at least in the case of the release notes, we'd have to kill them
in all active branches not only HEAD, if we don't want surprises.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Peter Geoghegan

On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
 Here are the results of a benchmark on Nathan Boley's 64-core, 4
 socket server: 
 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/

 That's interesting. The maximum number of what you see here (~293125)
 is markedly lower than what I can get.

 ... poke around ...

 Hm, that's partially because you're using pgbench without -M prepared if
 I see that correctly. The bottleneck in that case is primarily memory
 allocation. But even after that I am getting higher
 numbers: ~342497.

 Trying to nail down the differnce it oddly seems to be your
 max_connections=80 vs my 100. The profile in both cases is markedly
 different, way much more spinlock contention with 80. All in
 Pin/UnpinBuffer().

I updated this benchmark, with your BufferDescriptors alignment patch
[1] applied on top of master (while still not using -M prepared in
order to keep the numbers comparable). So once again, that's:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/

It made a bigger, fairly noticeable difference, but not so big a
difference as you describe here. Are you sure that you saw this kind
of difference with only 64 clients, as you mentioned elsewhere [1]
(perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously
max_connections is still 80 in the above. Should I have gone past 64
clients to see the problem? The best numbers I see with the [1] patch
applied on master is only ~327809 for -S 10 64 clients. Perhaps I've
misunderstood.

[1] Misaligned BufferDescriptors causing major performance problems
on AMD: 
http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory usage of pg_upgrade

2014-02-03 Thread Bruce Momjian

On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
  In the case of tablespaces, I should have thought you could keep a
  hash table of the names and just store an entry id in the table
  structure. But that's just my speculation without actually looking
  at the code, so don't take my word for it :-)
 
 Yes, please feel free to improve the code.  I improved pg_upgrade CPU
 usage for a lerge number of objects, but never thought to look at memory
 usage.  It would be a big win to just palloc/pfree the memory, rather
 than allocate tones of memory.  If you don't get to it, I will in a few
 weeks.

Thanks you for pointing out this problem.  I have researched the cause
and the major problem was that I was allocating the maximum path length
in a struct rather than allocating just the length I needed, and was not
reusing string pointers that I knew were not going to change.

The updated attached patch significantly decreases memory consumption:

tables  orig  patch % decrease

127,168 kB  27,168 kB0
1k   46,136 kB  27,920 kB   39
2k   65,224 kB  28,796 kB   56
4k  103,276 kB  30,472 kB   70
8k  179,512 kB  33,900 kB   81
16k 331,860 kB  40,788 kB   88
32k 636,544 kB  54,572 kB   91
64k   1,245,920 kB  81,876 kB   93

As you can see, a database with 64k tables shows a 93% decrease in
memory use.  I plan to apply this for PG 9.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index f04707f..acf8660
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** create_script_for_old_cluster_deletion(c
*** 707,726 
  			fprintf(script, \n);
  			/* remove PG_VERSION? */
  			if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
! fprintf(script, RM_CMD  %s%s%cPG_VERSION\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
- 		fix_path_separator(old_cluster.tablespace_suffix),
  		PATH_SEPARATOR);
  
  			for (dbnum = 0; dbnum  old_cluster.dbarr.ndbs; dbnum++)
! 			{
! fprintf(script, RMDIR_CMD  %s%s%c%d\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
- 		fix_path_separator(old_cluster.tablespace_suffix),
  		PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid);
- 			}
  		}
  		else
  
  			/*
  			 * Simply delete the tablespace directory, which might be .old
--- 707,724 
  			fprintf(script, \n);
  			/* remove PG_VERSION? */
  			if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
! fprintf(script, RM_CMD  %s%cPG_VERSION\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
  		PATH_SEPARATOR);
  
  			for (dbnum = 0; dbnum  old_cluster.dbarr.ndbs; dbnum++)
! fprintf(script, RMDIR_CMD  %s%c%d\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
  		PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid);
  		}
  		else
+ 		{
+ 			char	*suffix_path = pg_strdup(old_cluster.tablespace_suffix);
  
  			/*
  			 * Simply delete the tablespace directory, which might be .old
*** create_script_for_old_cluster_deletion(c
*** 728,734 
  			 */
  			fprintf(script, RMDIR_CMD  %s%s\n,
  	fix_path_separator(os_info.old_tablespaces[tblnum]),
! 	fix_path_separator(old_cluster.tablespace_suffix));
  	}
  
  	fclose(script);
--- 726,734 
  			 */
  			fprintf(script, RMDIR_CMD  %s%s\n,
  	fix_path_separator(os_info.old_tablespaces[tblnum]),
! 	fix_path_separator(suffix_path));
! 			pfree(suffix_path);
! 		}
  	}
  
  	fclose(script);
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
new file mode 100644
index 3d0dd1a..fd083de
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** create_rel_filename_map(const char *old_
*** 108,127 
  		 * relation belongs to the default tablespace, hence relfiles should
  		 * exist in the data directories.
  		 */
! 		strlcpy(map-old_tablespace, old_data, sizeof(map-old_tablespace));
! 		strlcpy(map-new_tablespace, new_data, sizeof(map-new_tablespace));
! 		strlcpy(map-old_tablespace_suffix, /base, sizeof(map-old_tablespace_suffix));
! 		strlcpy(map-new_tablespace_suffix, /base, sizeof(map-new_tablespace_suffix));
  	}
  	else
  	{
  		/* relation belongs to a tablespace, so use the tablespace location */
! 		strlcpy(map-old_tablespace, old_rel-tablespace, sizeof(map-old_tablespace));
! 		strlcpy(map-new_tablespace, new_rel-tablespace, sizeof(map-new_tablespace));
! 		strlcpy(map-old_tablespace_suffix, old_cluster.tablespace_suffix,
! sizeof(map-old_tablespace_suffix));
! 		strlcpy(map-new_tablespace_suffix, new_cluster.tablespace_suffix,
! sizeof(map-new_tablespace_suffix));

Re: [HACKERS] Regression tests failing if not launched on db regression

2014-02-03 Thread Michael Paquier

On Tue, Feb 4, 2014 at 12:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 31, 2014 at 5:48 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 The part about the planning parameter looks good, thanks. The places
 used to mention the databases used also makes more sense. Thanks for
 your input.

 OK, committed and back-patched to 9.3.
Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR

2014-02-03 Thread Craig Ringer

On 02/01/2014 09:53 PM, MauMau wrote:
 
 Not all users use PostgreSQL built by EnterpriseDB, so I think
 src\tools\msvc\build.bat should produce modules that are not affected by
 ASLR.

I completely agree; just saying that any installer can set the key.

I'm convinced that setting this flag is appropriate, at least while Pg
relies on having the shared memory segment mapped in the same zone in
every executable. Just pointing out that there's a workaround in the
mean time.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prohibit row-security + inheritance in 9.4?

2014-02-03 Thread Craig Ringer

On 01/31/2014 11:24 PM, Yeb Havinga wrote:
 On 2014-01-31 16:05, Stephen Frost wrote:
 * Yeb Havinga (y.t.havi...@mgrid.net) wrote:
 IMHO, there is another way to implement this, other than the
 procedure to override the child-rel-quals with the ones from the
 parent. At DDL time, synchronize quals on the parent with rls quals
 of the childs. Isn't this also what happens with constraints?
 No, we're not going to do that.  We don't do it for GRANT and I don't
 think it makes sense to do it here.
 This reasoning could go either way. GRANT is on a complete set of rows.
 This is a restriction on the level of individual rows, and in that
 sense, it is more like a row-level CHECK constraint.
 If we wanted to make them the same then we'd throw out the ability to do
 any kind of changes or actions on the child and then we'd have actual
 partitioning.  We don't have that though, we have inheiritance.
 
 I fail to understand this, probably because I do not have a partition
 use case for inheritance, but rather an information model that is more
 ontology like. The more specific childs get down the inheritance tree,
 more columns they get, and their requirements might differ completely in
 nature from their siblings, and make no sense at all as well when
 specified at the level of the parent (or even impossible, since the
 parent does not have all the columns).

That's a bit inconsistent with how Pg's inheritance works, though.
Conceptually rows in children *are* part of the parent.

You cannot see the child columns when querying via the parent, so it's
not a problem to constrain the ability to see the extra child columns
with row-security. They can only be accessed when querying the child
directly, where the child's row-security expression will apply.

So you're not exposing information that's specific to the children, and
the inherited common cols are, conceptually *part* of the parent, so
applying the parent's qual makes sense.

 Then during expansion of the range table, no code is needed to
 ignore child rls quals and copy parent rels to child rels.
 This is what's already implemented and isn't a huge amount of code to
 begin with, so I don't see this as being an argument against having the
 flexibility.
 
 It would seem to me that any additional logic that can be avoided during
 planning is a good thing. Also, the argument that something is already
 implemented, does not itself make it good to commit.

The implementation with the minimum of required logic and complexity is
apply the parent's row-security quals to the children, don't check
children for quals.

Any special handling of child rels creates exciting challenges because
inheritance expansion happens _after_ a bunch of the query planner and
optimizer has run. Injecting new expression trees at this point is
messy, especially if you want those expression trees to in turn contain
row-security qualified tables, inheritance, etc.

As previously discussed, applying the parent qual to children ensures
that what's visible when querying a relation that has children is
consistent with its row-security qual.

If the parent qual is applied only on the parent rel directly, not
children, then querying the parent could emit rows not permitted by the
parent's row-security qual. I'm not happy with that, and as Stephen
poined out upthread, it isn't really consistent with how permissions
work with inheritance otherwise.

If instead the parent qual is applied to the parent and all children,
and you *also* add the quals of each child, you get a complex, hard to
debug, hard to optimize mess. You also run back into the problem
mentioned above, that adding quals after inhertiance expansion is messy
and problematic. It's also really ugly in what's going to be the most
common case, where the child quals are the same as the parent quals, as
you'll get nested identical quals.

Despite the challenges with it, I think that's the least insane way to
respect child quals on parents. It has pretty much zero chance of
happening in 9.4, though; the discussed approach of building
row-security on top of updatable security barrier views doesn't play
well with adding inheritance on inheritance-expanded children.

One answer for that would be to keep inheritance as it is for 9.4 (if I
can get the remaining issues sorted out) and in 9.5, if possible, allow
the addition of a row-security qual that, if it appears on a child rel
during inheritance expansion, _is_ expanded. At least, if it proves
necessary, which I'm not entirely convinced of.

 I suggested it as  a solution for a requirement worded upthread as It
 makes absolutely zero sense, in my head anyway, to have rows returned
 when querying the parent which should NOT be returned based on the quals
 of the parent. without disregarding rls-quals on childs.

I'm not sure I understand what you are saying here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-02-03 Thread Fujii Masao

On Mon, Feb 3, 2014 at 8:53 PM, Christian Kruse
christ...@2ndquadrant.com wrote:
 Hi Simon,

 On 03/02/14 10:43, Simon Riggs wrote:
  Singular:
  The following process is holding the lock: A. The request queue
  consists of: B.
 
  Plural:
  Following processes are holding the lock: A, B. The request queue
  consists of: C.

 Seems too complex. How about this...

 Lock holder(s): A. Lock waiter(s) B
 Lock holder(s): A, B. Lock waiter(s) C

 This is basically the same as before, it is even shorter. The
 complaint was that I don't use a whole sentence in this error
 detail. Won't the change fulfill the same complaint?

 To be honest, I'd like to stick with your earlier proposal:

 Singular:
 Process holding the lock: A. Request queue: B

 Plural:
 Processes holding the lock: A, B. Request queue: C, D

 This seems to be a good trade-off between project guidelines,
 readability and parsability.

ISTM that the phrase Request queue is not used much around the lock.
Using the phrase wait queue or Simon's suggestion sound better to at least me.
Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Inconsistency between pg_stat_activity and log_duration

2014-02-03 Thread Tatsuo Ishii

I found an interesting inconsistency between pg_stat_activity and
log_duration.

-[ RECORD 1 
]+---
datid| 16392
datname  | test
pid  | 21815
usesysid | 10
usename  | t-ishii
application_name | psql
client_addr  | 
client_hostname  | 
client_port  | -1
backend_start| 2014-02-04 12:46:55.178688+09
xact_start   | 2014-02-04 12:47:27.210976+09
query_start  | 2014-02-04 12:47:27.210976+09
state_change | 2014-02-04 12:47:27.210981+09
waiting  | f
state| active
query| select * from pg_stat_activity;

[snip]

-[ RECORD 3 
]+---
datid| 16392
datname  | test
pid  | 21850
usesysid | 10
usename  | t-ishii
application_name | pgbench
client_addr  | 
client_hostname  | 
client_port  | -1
backend_start| 2014-02-04 12:47:11.233245+09
xact_start   | 2014-02-04 12:47:11.235236+09
query_start  | 2014-02-04 12:47:11.241084+09
state_change | 2014-02-04 12:47:11.241085+09
waiting  | f
state| active
query| SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = 
pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'

As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT
count(*) FROM pg_catalog.pg_class... is active and it seems still
running.

On the other side, Here is an excerpt from PostgreSQL log:

21850 2014-02-04 12:47:11.241 JST LOG:  execute pgpool21805/pgpool21805: SELECT 
count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = 
pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'
21850 2014-02-04 12:47:11.241 JST LOG:  duration: 0.078 ms

The duration was shown as 0.078 ms thus it seems the query has been
already finished.

The reason why pg_stat_activity thinks that the query in question is,
sync message has not been sent to the backend yet (at least from
what I read from postgres.c).

I think this inconsistency is not very intutive to users...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers on foreign tables

2014-02-03 Thread Noah Misch

On Sun, Feb 02, 2014 at 11:53:51AM +0100, Ronan Dunklau wrote:
 Le jeudi 30 janvier 2014 14:05:08 Noah Misch a écrit :
  On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote:
   What do you think about this approach ? Is there something I missed which
   would make it not sustainable ?
  
  Seems basically reasonable.  I foresee multiple advantages from having one
  tuplestore per query level as opposed to one for the entire transaction. 
  You would remove the performance trap of backing up the tuplestore by
  rescanning. It permits reclaiming memory and disk space in
  AfterTriggerEndQuery() rather than at end of transaction.  You could remove
  ate_ptr1 and ate_ptr2 from AfterTriggerEventDataFDW and just store the
  flags word: depending on AFTER_TRIGGER_2CTIDS, grab either the next one or
  the next two tuples from the tuplestore.  Using work_mem per
  AfterTriggerBeginQuery() instead of per transaction is no problem.  What do
  you think of that design change?
  
 
 I agree that this design is better, but I have some objections.
 
 We can remove ate_ptr2 and rely on the AFTER_TRIGGER_2CTIDS flag, but the 
 rescanning and ate_ptr1 (renamed ate_tupleindex in the attached patch) can't 
 go away.
 
 Consider for example the case of a foreign table with more than one AFTER 
 UPDATE triggers. Unless we store the tuples once for each trigger, we will 
 have to rescan the tuplestore.

Will we?  Within a given query level, when do (non-deferred) triggers execute
in an order other than the enqueue order?

 To mitigate the effects of this behaviour, I added the option to perform a 
 reverse_seek when the looked-up tuple is nearer from the current index than 
 from the start.

If there's still a need to seek within the tuplestore, that should get rid of
the O(n^2) effect.  I'm hoping that per-query-level tuplestores will eliminate
the need to seek entirely.

  If you do pursue that change, make sure the code still does the right thing
  when it drops queued entries during subxact abort.
 
 I don't really understand what should be done at that stage. Since triggers 
 on 
 foreign tables are not allowed to be deferred, everything should be cleaned 
 up 
 at the end of each query, right ? So, there shouldn't be any queued entries.

I suspect that's right.  If you haven't looked over AfterTriggerEndSubXact(),
please do so and ensure all its actions still make sense in the context of
this new kind of trigger storage.

   The attached patch checks this, and add documentation for this limitation.
   I'm not really sure about how to phrase that correctly in the error
   message
   and the documentation. One can store at most INT_MAX foreign tuples, which
   means that at most INT_MAX insert or delete or half-updates can occur.
   By
   half-updates, I mean that for update two tuples are stored whereas in
   contrast to only one for insert and delete trigger.
   
   Besides, I don't know where this disclaimer should be in the
   documentation.
   Any advice here ?
  
  I wouldn't mention that limitation.
 
 Maybe it shouldn't be so prominent, but I still think a note somewhere 
 couldn't hurt.

Perhaps.  There's not much documentation of such implementation upper limits,
and there's no usage of INT_MAX outside of parts that discuss writing C
code.  I'm not much of a visionary when it comes to the documentation; I try
to document new things with an amount of detail similar to old features.

 Should the use of work_mem be documented somewhere, too ?

I wouldn't.  Most uses of work_mem are undocumented, even relatively major
ones like count(DISTINCT ...) and CTEs.  So, while I'd generally favor a patch
documenting all/most of the things that use work_mem, it would be odd to
document one new consumer apart from the others.

  This is the performance trap I mentioned above; it brings potential O(n^2)
  complexity to certain AFTER trigger execution scenarios.
 
 What scenarios do you have in mind ? I fixed the problem when there are 
 multiple triggers on a foreign table, do you have any other one ?

I'm not aware of such a performance trap in your latest design.  For what it's
worth, I don't think multiple triggers were ever a problem.  In the previous
design, a problem arose if you had a scenario like this:

Query level 1: queue one million events
...
  Repeat this section many times:
Query level 2: queue one event
Query level 3: queue one event
Query level 3: execute events
Query level 2: execute events -- had to advance through the 1M stored events
...
Query level 1: execute events

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)

2014-02-03 Thread Kyotaro HORIGUCHI

Hello, I have often seen inquiries about an log message from
PostgreSQL server.

 LOG:  could not create IPv6 socket: Address family not supported by protocol

This is emitted on ipv6-disabled environment which was available
for me by the following steps on CentOS 6.5,

 - Add 'NETWORKING_IPV6=no' into /etc/sysconfig/network

 - Create /etc/modprobe.d/disable-ipv6.conf with the following
   content.

options net-pf-10 off
install ipv6 /bin/true

 - Comment out the entry for ::1 in /etc/hosts

 - Reboot.

 - [Confirm that ip a and ifconfig -a don't show ipv6 addrs]

The cause is that the server collects available listen addresses
by getaddrinfo with hint.ai_flags not having AI_ADDRCONFIG set.

pqcomm.c:299 in function StreamServerPort
   hint.ai_family = family;
!  hint.ai_flags = AI_PASSIVE;
   hint.ai_socktype = SOCK_STREAM;

The man page of getaddrinfo says that AI_ADDRCONFIG would help
and I saw it actually did.

Well, I excavated some discussions about this option from ancient
pgsql-hackers, a decade ago.

http://www.postgresql.org/message-id/20030703204355.ga12...@ping.be
 This looks a little broken behaviour to me.  My guess is that
 it returns an AF_INET6 socket but doesn't support it in the
 kernel.  In that case with the AI_ADDRCONFIG option it
 shouldn't have returned that address.  The question is wether
 your getaddrinfo() supports that option, and wether it's
 working or not.

I suppose AI_ADDRCONFIG is a quite obscure option at the time,
but the time seems to have come when we can use this option.

man getaddrinfo
NOTES
   getaddrinfo() supports the address%scope-id notation for
   specifying the IPv6 scope-ID.
 
   AI_ADDRCONFIG, AI_ALL, and AI_V4MAPPED are available
   since glibc 2.3.3.  AI_NUMERICSERV is available since
   glibc 2.3.4.

RHEL/CentOS 4.9 and after seems satisfies the condition. I don't
know about other platforms but it is enough to define it as 0 if
not defined. In addition, it would not be a problem even if the
option doesn't work as expected because the error handling code
added at the time the above conversation took place would act as
the last resort. There would be no way it can work destructively.

Attached patch does this.

Any suggestions ?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 2b24793..121f70b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -297,7 +297,7 @@ StreamServerPort(int family, char *hostName, unsigned short portNumber,
 	/* Initialize hint structure */
 	MemSet(hint, 0, sizeof(hint));
 	hint.ai_family = family;
-	hint.ai_flags = AI_PASSIVE;
+	hint.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;
 	hint.ai_socktype = SOCK_STREAM;
 
 #ifdef HAVE_UNIX_SOCKETS
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 305d126..1dd7f2a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -344,7 +344,7 @@ pgstat_init(void)
 	/*
 	 * Create the UDP socket for sending and receiving statistic messages
 	 */
-	hints.ai_flags = AI_PASSIVE;
+	hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;
 	hints.ai_family = PF_UNSPEC;
 	hints.ai_socktype = SOCK_DGRAM;
 	hints.ai_protocol = 0;
diff --git a/src/include/getaddrinfo.h b/src/include/getaddrinfo.h
index 6192d1f..685d922 100644
--- a/src/include/getaddrinfo.h
+++ b/src/include/getaddrinfo.h
@@ -64,6 +64,11 @@
 #define AI_PASSIVE		0x0001
 #endif
 
+#ifndef AI_ADDRCONFIG
+/* Works as NOP if AI_ADDRCONFIG is not defined */
+#define AI_ADDRCONFIG	0x
+#endif
+
 #ifndef AI_NUMERICHOST
 /*
  * some platforms don't support AI_NUMERICHOST; define as zero if using

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-03 Thread Noah Misch

On Mon, Feb 03, 2014 at 08:48:06PM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Feb 3, 2014 at 4:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  guaibasaurus doesn't like this patch: you need to be more careful
  about links, because the regression instructions are supposed to
  compile as a standalone document.
 
  I wonder if these standalone things are really worthwhile.  The whole
  point of this, ten years ago, was that people who were trying to get
  started with PostgreSQL might not have had neither the doc toolchain
  nor convenient Internet access available.  Plain text documentation
  was essential for getting off the ground.  This seems much less
  relevant today; is the annoyance of not being able to use links
  everywhere really buying us anything at this point?
 
 That's a very fair question.  It's a reasonable bet that pretty much
 nobody actually looks at the text versions of either HISTORY or
 regress_README anymore.  It's conceivable that somebody somewhere makes
 use of the text version of INSTALL when trying to get PG going on some
 bare-bones platform ... but really, can't they look it up on the net?
 How'd they get the PG sources they're installing, anyway?

I sometimes read text-based INSTALL files when building other projects, but a
tiny file giving a URL is almost as good.  (The other two generated files do
seem much less important.)

 I'm prepared to believe that these things are just dinosaurs now.
 However, at least in the case of the release notes, we'd have to kill them
 in all active branches not only HEAD, if we don't want surprises.

I wonder how difficult it would be to make sufficient link data available when
building the standalone files.  There would be no linking per se; we would
just need the referent's text fragment emitted where the xref tag appears.
For example, have a stylesheet that produces a file bearing all link IDs and
titles appearing anywhere in the documentation.  Let the standalone builds
include that file.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Amit Kapila

On Mon, Feb 3, 2014 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 Also as per below link, it seems that PGDLLIMPORT is required for
 exported global variables.
 http://msdn.microsoft.com/en-us/library/8fskxacy(v=vs.80).aspx

 That was what we'd always assumed, but the fact that postgres_fdw has been
 working for the last year (on everything except narwhal) puts the lie to
 that as a blanket assumption.

 So what I want to know now is how come it wasn't failing; because then
 we might be able to exploit that to eliminate the need for the PGDLLIMPORT
 cruft everywhere.

Today, I had tried to debug the same scenario on Win XP 32-bit, but the value
for DateStyle is junk on that m/c as well. Is it possible that someway we can
check on buildfarm m/c (Windows) where regression test is passing what is
the value of DateStyle.

We can use something like attached patch to print the value in log.
In my test, it prints the values as below:

Without using PGDLLIMPORT for DateStyle
LOG:  Value of DateStyle is: 326903295

Using PGDLLIMPORT for DateStyle
LOG:  Value of DateStyle is: 1

In the function where it is used, it seems to me that it is setting DateStyle
as ISO if it is not ISO and function configure_remote_session() will set it
to ISO initially. So basically even if the value of DateStyle is junk, it will
just do what we wanted i.e setting it to ISO. Does the failure is due to reason
that it is not expecting DateStyle to be ISO and due to junk value of this
parameter it sets the same to ISO.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


elog_for_datestyle.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-03 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 Robert Haas robertmh...@gmail.com writes:
 I wonder if these standalone things are really worthwhile.

 I wonder how difficult it would be to make sufficient link data available when
 building the standalone files.  There would be no linking per se; we would
 just need the referent's text fragment emitted where the xref tag appears.

IIRC, that's basically what the workaround is, except it's not very
automated.  Even if it were automated, though, there's still a problem:
such links aren't really *useful* in flat text format.  I think that
forcing the author to actually think about what to put there in the
flat text version is a good thing, if we're going to retain the flat
text version at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-02-03 Thread Craig Ringer

On 01/30/2014 04:05 PM, Craig Ringer wrote:
 On 01/30/2014 01:25 PM, Craig Ringer wrote:
 On 01/29/2014 09:47 PM, Craig Ringer wrote:
 https://github.com/ringerc/postgres/compare/rls-9.4-upd-sb-views

 i.e. https://github.com/ringerc/postgres.git ,
  branch rls-9.4-upd-sb-views

 (subject to rebasing) or the non-rebased tag rls-9.4-upd-sb-views-v2

 Pushed an update to the branch. New update tagged
 rls-9.4-upd-sb-views-v3 . Fixes an issue with rowmarking that stems from
 the underlying updatable s.b. views patch.

 Other tests continue to fail, this isn't ready yet.
 
 Specifically:

 - row-security rule recursion detection isn't solved yet, it just
 overflows the stack.

This is now fixed in the newly tagged rls-9.4-upd-sb-views-v4 in
g...@github.com:ringerc/postgres.git .

I landed up adding a field to RangeTblEntry that keeps track of all the
oids of relations row-security expanded to produce this RTE. When
testing an RTE for row-security policy, this list is checked to see if
the oid of the relation being expanded is already on the list. The list
is copied to all RTEs inside sublinks after a relation is expanded using
a query_tree_walker.

 - COPY doesn't know anything about row-security

COPY will ERROR on row-security rels in rls-9.4-upd-sb-views-v4.

I'm looking at integrating the approach Kohei KaiGai took in the
original patch now, then I'll be moving on to:

 - I'm just starting to chase some odd errors in the tests, ERROR:
 failed to find unique expression in subplan tlist and ERROR:  could
 not open file base/16384/30070: No such file or directory. Their
 cause/origin is not yet known, but they're specific to when row-security
 policy is being applied.

I was hoping these would be fixed by solving the recursion issues, but
that wasn not the case.

Input/comments would be appreciated. I haven't looked into this yet.

 - policies based on current_user don't remember current_user when rows
 are pulled from refcursor returned by a security definer function.

This is actually a separate, existing bug, or surprising behaviour. I'd
like to address it, but it's really a separate patch.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-03 Thread Kyotaro HORIGUCHI

Hello, Now I got workable dll thanks for your advice.

 I think both the problems are related and the reason is that dsm_demo.dll
 is not built properly.
 Let us first try to solve your second problem, because I think if
 that is solved, you will not face problem-1.

Thank you for kindness. I got the situation after successfully
getting correct dll by using .def file after your advice.  cl
needs __declspec(dllexport) in the symbol definitions to reveal
them externally, without using .def file.

PostgreSQL platform(?) seems offering a macro PGDLLEXPORT for
such use. I suppose this should be used in extension module dlls
to expose symbols, like this,

- void  _PG_init(void);
- Datum dsm_demo_create(PG_FUNCTION_ARGS);
- Datum dsm_demo_read(PG_FUNCTION_ARGS);
===
+ PGDLLEXPORT void  _PG_init(void);
+ PGDLLEXPORT Datum dsm_demo_create(PG_FUNCTION_ARGS);
+ PGDLLEXPORT Datum dsm_demo_read(PG_FUNCTION_ARGS);

# This hardly seems to be used commonly...

I followed this instruction to make build environemnt,

http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows/

And the change above enables us to build this module without .def file.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)

2014-02-03 Thread Tom Lane

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Hello, I have often seen inquiries about an log message from
 PostgreSQL server.
 LOG:  could not create IPv6 socket: Address family not supported by protocol

That's merely a harmless log message.

 - hints.ai_flags = AI_PASSIVE;
 + hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;

This, on the other hand, might actively break things.  It did when we
had it in before (cf the thread you link to and commit df63503dc).
I don't have any faith that systems on which it is broken have vanished
from the face of the earth.

One good reason not to trust this too much is that getaddrinfo() is
fundamentally a userspace DNS access function, and as such it has
no very good way to know if there's currently an IPv4 or IPv6
interface configured on the local system.  At minimum there are
obvious race conditions in that.

If we're concerned about users worrying about log messages from
this, I'd rather see us downgrade those log messages to DEBUG level
than risk breaking the code with behaviors that were proven to be
a bad idea a decade ago.  But TBH I see no strong need to do anything
here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-03 Thread Tom Lane

Amit Kapila amit.kapil...@gmail.com writes:
 In the function where it is used, it seems to me that it is setting DateStyle
 as ISO if it is not ISO and function configure_remote_session() will set it
 to ISO initially. So basically even if the value of DateStyle is junk, it will
 just do what we wanted i.e setting it to ISO. Does the failure is due to 
 reason
 that it is not expecting DateStyle to be ISO and due to junk value of this
 parameter it sets the same to ISO.

Meh.  It might be that the DateStyle usage in postgres_fdw would
accidentally fail to malfunction if it saw a bogus value of the variable.
But it's hard to believe that this would be true of MainLWLockArray.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-02-03 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 I landed up adding a field to RangeTblEntry that keeps track of all the
 oids of relations row-security expanded to produce this RTE. When
 testing an RTE for row-security policy, this list is checked to see if
 the oid of the relation being expanded is already on the list. The list
 is copied to all RTEs inside sublinks after a relation is expanded using
 a query_tree_walker.

That's impossibly ugly, both as to the idea that an RTE is a mutable
data structure for this processing, and as to having to copy the list
around (meaning that you can't even claim to have one source of truth
for the state...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-02-03 Thread Craig Ringer

On 02/04/2014 03:14 PM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 I landed up adding a field to RangeTblEntry that keeps track of all the
 oids of relations row-security expanded to produce this RTE. When
 testing an RTE for row-security policy, this list is checked to see if
 the oid of the relation being expanded is already on the list. The list
 is copied to all RTEs inside sublinks after a relation is expanded using
 a query_tree_walker.
 
 That's impossibly ugly, both as to the idea that an RTE is a mutable
 data structure for this processing, and as to having to copy the list
 around (meaning that you can't even claim to have one source of truth
 for the state...)

I see. Do you have any suggestion about how this might be approached in
a manner that would be more satisfactory?

It's not strictly necessary to duplicate the parent list when annotating
the RTEs in an expanded row-security qual; I did so only out of concern
that an object of indeterminate/shared ownership may not be considered
desirable.

The only time it's strictly neccessary to copy the parent list is when a
new row-security qual is being expanded. In this case it cannot be
shared, because there may be multiple relations with row-security
applied, and a separate path is required for each. If the list is
shared, infinite recursion is falsely reported in cases like:

rel a has a qual referring to b
rel b has a qual referring to c
rel c has a qual referring to d

select ...
from a inner join b on ...

There's no infinite recursion there, but a simple approach that keeps a
global list of expanded quals will think there is as it'll see b and
c expanded twice.

So whenever you expand a qual, you have to fork the parent list,
copying it.

The approach used in the rewriter won't work here, because the rewriter
eagerly depth-first recurses when expanding things. In the optimizer,
all securityQuals get expanded in one pass, _then_ sublink processing
expands any added sublinks in a separate pass.

It's possible the parent list could be associated with the Query that's
produced by securityQual expansion instead, but at the time you're
expanding row-security on any RTEs within that you don't necessarily
have access to the Query node that might be a couple of subquery levels
up, since the security qual can be an arbitrary expression.

I looked at keeping a structure in PlannerGlobal that tracked the chain
of row-security qual expansion, but I couldn't figure out a way to
cleanly tell what branch of the query a given RTE that's being
expanded was in, and thus how to walk the global tree to check for
infinite recursion.



The only alternative I see is to immediately and recursively expand
row-security entries within subqueries using a walker, as soon as the
initial row-security qual is expanded on the top level rel.

I'm concerned about code duplication when following that path, since
that's pretty much what's already happening with sublink expansion, just
using the existing execution paths.

If I'm right, and immediate recursive expansion isn't an acceptable
alternative, any ideas on how the row-security expansion breadcrumb
trail might be stored in a more appropriate manner and accessed from
where they're needed?

Note that it's trivial to prevent simple, direct recursion. Preventing
mutual recursion without also breaking non-recursive cases where rels
are used in totally different parts of the query is harder.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-03 Thread Amit Kapila

On Tue, Feb 4, 2014 at 12:25 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, Now I got workable dll thanks for your advice.

 I think both the problems are related and the reason is that dsm_demo.dll
 is not built properly.
 Let us first try to solve your second problem, because I think if
 that is solved, you will not face problem-1.

 Thank you for kindness. I got the situation after successfully
 getting correct dll by using .def file after your advice.  cl
 needs __declspec(dllexport) in the symbol definitions to reveal
 them externally, without using .def file.

 PostgreSQL platform(?) seems offering a macro PGDLLEXPORT for
 such use. I suppose this should be used in extension module dlls
 to expose symbols, like this,

 - void  _PG_init(void);
 - Datum dsm_demo_create(PG_FUNCTION_ARGS);
 - Datum dsm_demo_read(PG_FUNCTION_ARGS);
 ===
 + PGDLLEXPORT void  _PG_init(void);
 + PGDLLEXPORT Datum dsm_demo_create(PG_FUNCTION_ARGS);
 + PGDLLEXPORT Datum dsm_demo_read(PG_FUNCTION_ARGS);

 # This hardly seems to be used commonly...

Yeah, for functions we mainly believe to export using .def file
only and so is the case for this module.
Anyway this is just a test module so if things works for you by
changing the above way, its fine. However I wonder why its not
generating .def file for you.

 I followed this instruction to make build environemnt,

 http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows/

 And the change above enables us to build this module without .def file.

Okay, you can complete your test in the way with which you are able to
successfully build it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

89 matches

Mail list logo