Re: [HACKERS] Draft release notes complete

2012-09-11 Thread Stefan Kaltenbrunner
On 09/10/2012 05:19 PM, Bruce Momjian wrote:
 On Mon, Sep 10, 2012 at 12:06:18PM -0300, Alvaro Herrera wrote:
 It is this kind of run-around that caused me to generate my own doc
 build in the past;  maybe I need to return to doing my own doc build.

 You keep threatening with that.  You are free, of course, to do anything
 you want, and no one will break sweat about it.  I already said I will
 work on getting this up and running, but I can't give you a deadline for
 when it'll be working.
 
 My point is that this frequent doc build feature was removed with no
 discussion, and adding it seems to be some herculean job that requires
 red tape only a government worker would love.

Not sure how you got that impression - but understand all requirements
to something is usually key to implementing a solution, so discussing
those requirements seems like a sensible thing to do.
sysadmin is a volunteer effort and we do our best to deal with both
keeping the existing infrastructure up and improving as we can but
resources are limited and we need to consider the time/effort ration of
stuff.
Anyway alvaro clearly stated he would deal with it but obviously
thatthat is not enough for your urgent demands so there is really not
much we can do about it...

 
 I have already started working on updating my script for git  --- should
 be done shortly, so you can remove my request.

ok


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistchoose vs. bloat

2012-09-11 Thread Jeff Davis
On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:

 New version of patch is attached. Parameter randomization was
 introduced. It controls whether to randomize choose. Choose algorithm
 was rewritten.
 
Do you expect it to be bad in any reasonable situations? I'm inclined to
just make it always randomize if it's better. I think it would be hard
for a user to guess when it's better and when not.

Maybe it's useful to turn randomization off for testing purposes, e.g.
to ensure determinism?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistchoose vs. bloat

2012-09-11 Thread Alexander Korotkov
On Tue, Sep 11, 2012 at 10:35 AM, Jeff Davis pg...@j-davis.com wrote:

 On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:

  New version of patch is attached. Parameter randomization was
  introduced. It controls whether to randomize choose. Choose algorithm
  was rewritten.
 
 Do you expect it to be bad in any reasonable situations? I'm inclined to
 just make it always randomize if it's better. I think it would be hard
 for a user to guess when it's better and when not.


Randomization should increase IO when index doesn't entirely fit to cache.
Without randomization only fraction of the tree would be used for actual
insertions. While with randomization whole tree would be potentially used
for insertions.


 Maybe it's useful to turn randomization off for testing purposes, e.g.
 to ensure determinism?


Yes, that's another good point. For example, randomization impede
reproducing of bugs.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts

2012-09-11 Thread Jeff Davis
On Mon, 2012-09-10 at 11:15 -0500, Kevin Grittner wrote:
 That's a fair point.  Do you have any suggested wording, or
 suggestions for exactly where in the documentation you think it
 would be most helpful?  The subsection on serializable transactions
 seems like the most obvious location:

Attached. I thought about putting it as a note, but it seems like it's
easy to go overboard with those.

Regards,
Jeff Davis


ssi_doc.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Move postgresql_fdw_validator into dblink

2012-09-11 Thread Shigeru HANADA
I'd like to propose moving postgresql_fdw_validator into contrib/dblink
as dblink's own validator.

Main purpose of this proposal is to reserve the name postgresql_fdw
for concrete FDW for PostgreSQL.  I used to use pgsql_fdw as the name,
but in previous CF I got comments that full product name is appropriate
rather than abbreviation (e.g. pgsql_fdw) for FDW's name.

In addition, this change would avoid potential problem that third-party
product might use this validator and introduce undesirable dependency
between PG core.

This change breaks backward compatibility, but AFAIK no one except
dblink seems to use this validator, so it would not be serious problem.
# Please let me know if any product uses this validator!

Changes in this patch are:

1) Move postgresql_fdw_validator from core backend to contrib/dblink
with renaming to dblink_fdw_validator.  Also I modified this validator
so that it uses PQconndefault() to get libpq's valid options instead of
having its own options list.

2) For ease of use, dblink's CREATE EXTENSION provides default FDW
dblink_fdw which accepts libpq's connection options via user mapping
(user and secret options such as password) and foreign server (all
other options).

3) Bump dblink's version to 1.1.  Of cource upgrade script is provided.

4) Update documents. (Should we mention removal of
postgresql_fdw_validator?)

5) Use simplified postgresql_fdw_validator in regression test
foreign_data.  I didn't change actual test cases because they don't seem
to depend on postgresql_fdw_validator deeply.

Comments and questions are welcome.
--
Shigeru HANADA

diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index ac63748..a27db88 100644
--- a/contrib/dblink/Makefile
+++ b/contrib/dblink/Makefile
@@ -7,7 +7,7 @@ SHLIB_LINK = $(libpq)
 SHLIB_PREREQS = submake-libpq
 
 EXTENSION = dblink
-DATA = dblink--1.0.sql dblink--unpackaged--1.0.sql
+DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql
 
 REGRESS = dblink
 
diff --git a/contrib/dblink/dblink--1.0--1.1.sql 
b/contrib/dblink/dblink--1.0--1.1.sql
new file mode 100644
index 000..f224d3d
--- /dev/null
+++ b/contrib/dblink/dblink--1.0--1.1.sql
@@ -0,0 +1,14 @@
+/* contrib/dblink/dblink--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use ALTER EXTENSION dblink UPDATE TO '1.1' to load this file. \quit
+
+CREATE FUNCTION dblink_fdw_validator(
+options text[],
+catalog oid
+)
+RETURNS void
+AS 'MODULE_PATHNAME', 'dblink_fdw_validator'
+LANGUAGE C IMMUTABLE;
+
+CREATE FOREIGN DATA WRAPPER dblink_fdw VALIDATOR dblink_fdw_validator;
diff --git a/contrib/dblink/dblink--1.0.sql b/contrib/dblink/dblink--1.0.sql
deleted file mode 100644
index 1fec9e3..000
--- a/contrib/dblink/dblink--1.0.sql
+++ /dev/null
@@ -1,223 +0,0 @@
-/* contrib/dblink/dblink--1.0.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use CREATE EXTENSION dblink to load this file. \quit
-
--- dblink_connect now restricts non-superusers to password
--- authenticated connections
-CREATE FUNCTION dblink_connect (text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_connect'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_connect (text, text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_connect'
-LANGUAGE C STRICT;
-
--- dblink_connect_u allows non-superusers to use
--- non-password authenticated connections, but initially
--- privileges are revoked from public
-CREATE FUNCTION dblink_connect_u (text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_connect'
-LANGUAGE C STRICT SECURITY DEFINER;
-
-CREATE FUNCTION dblink_connect_u (text, text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_connect'
-LANGUAGE C STRICT SECURITY DEFINER;
-
-REVOKE ALL ON FUNCTION dblink_connect_u (text) FROM public;
-REVOKE ALL ON FUNCTION dblink_connect_u (text, text) FROM public;
-
-CREATE FUNCTION dblink_disconnect ()
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_disconnect'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_disconnect (text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_disconnect'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_open (text, text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_open'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_open (text, text, boolean)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_open'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_open (text, text, text)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_open'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_open (text, text, text, boolean)
-RETURNS text
-AS 'MODULE_PATHNAME','dblink_open'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_fetch (text, int)
-RETURNS setof record
-AS 'MODULE_PATHNAME','dblink_fetch'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_fetch (text, int, boolean)
-RETURNS setof record
-AS 'MODULE_PATHNAME','dblink_fetch'
-LANGUAGE C STRICT;
-
-CREATE FUNCTION dblink_fetch (text, text, int)
-RETURNS setof record
-AS 'MODULE_PATHNAME','dblink_fetch'
-LANGUAGE C STRICT;
-
-CREATE 

[HACKERS] [Feature Request] explaining sql statements executed in UDF's

2012-09-11 Thread Hassan Syed
Hello Folks,

I hope this is the right place to ask for this feature.

I have just started working with Postgres seriously, and I come from a SQL
Server background. In SQL Server when one develops complex stored
procedures, it is possible to see the query plans of the stored procedure,
even when the invocations are nested.

I wanted to have the same functionality in Postgres so I asked this
question on the dba.stackexchange.com website:

http://dba.stackexchange.com/questions/23355/postgres-query-plan-of-a-udf-invocation-written-in-pgpsql

So, in order to achieve this use-case I have to first enable the feature in
a session and then tail -f the server log. It would be great if we could
have the server look into the UDF's and show the entirety of the query
plans.

This feature is quite beneficial when developing complex nested UDF
implementations. Eventually I hope I can get these query plans displayed
graphically in Pgadmin3.

Hassan


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-11 Thread Amit kapila
On Monday, September 10, 2012 8:20 PM Amit Kapila wrote:
On Sunday, September 09, 2012 1:37 PM Amit Kapila wrote:
On Friday, September 07, 2012 11:19 PM Tom Lane wrote:
Heikki Linnakangas hlinn...@iki.fi writes:
 Would socketpair(2) be simpler?



I've not done anything yet about the potential security issues
associated with untrusted libpq connection strings.  I think this
is still at the proof-of-concept stage; in particular, it's probably
 time to see if we can make it work on Windows before we worry more
about that.

 I have started working on this patch to make it work on Windows. The 3
main things to make it work are:

The patch which contains Windows implementation as well is attached with this 
mail. It contains changes related to following
1. waitpid
2. socketpair
3. fork-exec

The following is still left:
1. Error handling in all paths
2. During test, I found if i try to run with admin user, it throws error but 
psql doesn't comes out.
I will look into this issue. However as in previous mail discussion there 
is a decision pending whether in standalone mode
we need admin user behavior.
3. Will do some more test in Windows.

Currently I have prepared a patch on top of your changes, please let me know if 
that is okay.
Also, it will be better for me if you can tell me how I can further contribute.

With Regards,
Amit Kapiladiff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 33c5a0a..968959b 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -191,6 +191,8 @@ main(int argc, char *argv[])
AuxiliaryProcessMain(argc, argv);   /* does not 
return */
else if (argc  1  strcmp(argv[1], --describe-config) == 0)
GucInfoMain();  /* does not return */
+   else if (argc  1  strncmp(argv[1], --child=, 8) == 0)
+   ChildPostgresMain(argc, argv, get_current_username(progname)); 
/* does not return */
else if (argc  1  strcmp(argv[1], --single) == 0)
PostgresMain(argc, argv, get_current_username(progname)); /* 
does not return */
else
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 73520a6..c5730bd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -463,6 +463,7 @@ typedef struct
 
 static void read_backend_variables(char *id, Port *port);
 static void restore_backend_variables(BackendParameters *param, Port *port);
+static void read_standalone_child_variables(char *id, int *psock);
 
 #ifndef WIN32
 static bool save_backend_variables(BackendParameters *param, Port *port);
@@ -4268,6 +4269,97 @@ ExitPostmaster(int status)
proc_exit(status);
 }
 
+
+/*
+ * ChildPostgresMain - start a new-style standalone postgres process
+ *
+ * This may not belong here, but it does share a lot of code with ConnCreate
+ * and BackendInitialize.  Basically what it has to do is set up a
+ * MyProcPort structure and then hand off control to PostgresMain.
+ * Beware that not very much stuff is initialized yet.
+ *
+ * In the future it might be interesting to support a standalone
+ * multiprocess mode in which we have a postmaster process that doesn't
+ * listen for connections, but does supervise autovacuum, bgwriter, etc
+ * auxiliary processes.  So that's another reason why postmaster.c might be
+ * the right place for this.
+ */
+void
+ChildPostgresMain(int argc, char *argv[], const char *username)
+{
+   Port   *port;
+#ifdef WIN32
+   charparamHandleStr[32];
+#endif
+
+   /*
+* Fire up essential subsystems: error and memory management
+*/
+   MemoryContextInit();
+
+   /*
+* Build a Port structure for the client connection
+*/
+   if (!(port = (Port *) calloc(1, sizeof(Port
+   ereport(FATAL,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg(out of memory)));
+
+   /*
+* GSSAPI specific state struct must exist even though we won't use it
+*/
+#if defined(ENABLE_GSS) || defined(ENABLE_SSPI)
+   port-gss = (pg_gssinfo *) calloc(1, sizeof(pg_gssinfo));
+   if (!port-gss)
+   ereport(FATAL,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg(out of memory)));
+#endif
+
+#ifndef WIN32
+   /* The file descriptor of the client socket is the argument of --child 
*/
+   if (sscanf(argv[1], --child=%d, port-sock) != 1)
+   ereport(FATAL,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(invalid argument for --child: \%s\, 
argv[1])));
+#else
+/* The file descriptor of the client socket is the argument of --child */
+   if (sscanf(argv[1], --child=%s, paramHandleStr) != 1)
+   ereport(FATAL,
+   

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Alvaro Herrera
Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012:

 We have some use cases for this patch, when can you post
 a new version? I would test and review it.

What use cases do you have in mind?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Kohei KaiGai
2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:
 Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012:

 We have some use cases for this patch, when can you post
 a new version? I would test and review it.

 What use cases do you have in mind?

I'm motivated with this feature to implement background calculation server
to handle accesses to GPU device; to avoid limitation of number of processes
that can use GPU device simultaneously.

Probably, other folks have their use cases.
For example, Zoltan introduced his use case in the upthread as follows:
 - an SQL-driven scheduler, similar to pgAgent, it's generic enough,
   we might port it to this scheme and publish it
 - a huge volume importer daemon, it was written for a very specific
   purpose and for a single client, we cannot publish it.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Alvaro Herrera
Excerpts from Kohei KaiGai's message of mar sep 11 12:46:34 -0300 2012:
 2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:
  Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 
  2012:
 
  We have some use cases for this patch, when can you post
  a new version? I would test and review it.
 
  What use cases do you have in mind?
 
 I'm motivated with this feature to implement background calculation server
 to handle accesses to GPU device; to avoid limitation of number of processes
 that can use GPU device simultaneously.

Hmm, okay, so basically a worker would need a couple of LWLocks, a
shared memory area, and not much else?  Not a database connection.

 Probably, other folks have their use cases.
 For example, Zoltan introduced his use case in the upthread as follows:
  - an SQL-driven scheduler, similar to pgAgent, it's generic enough,
we might port it to this scheme and publish it

Hm, this would benefit from a direct backend connection to get the
schedule data (SPI interface I guess).

  - a huge volume importer daemon, it was written for a very specific
purpose and for a single client, we cannot publish it.

This one AFAIR requires more than one connection, so a direct data
connection is no good -- hence link libpq like walreceiver.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Kohei KaiGai
2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:
 Excerpts from Kohei KaiGai's message of mar sep 11 12:46:34 -0300 2012:
 2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:
  Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 
  2012:
 
  We have some use cases for this patch, when can you post
  a new version? I would test and review it.
 
  What use cases do you have in mind?
 
 I'm motivated with this feature to implement background calculation server
 to handle accesses to GPU device; to avoid limitation of number of processes
 that can use GPU device simultaneously.

 Hmm, okay, so basically a worker would need a couple of LWLocks, a
 shared memory area, and not much else?  Not a database connection.

Right. It needs shared memory area to communicate with each backend
and locking mechanism, but my case does not take database accesses
right now.

 Probably, other folks have their use cases.
 For example, Zoltan introduced his use case in the upthread as follows:
  - an SQL-driven scheduler, similar to pgAgent, it's generic enough,
we might port it to this scheme and publish it

 Hm, this would benefit from a direct backend connection to get the
 schedule data (SPI interface I guess).

I also think SPI interface will be first candidate for the daemons that
needs database access. Probably, lower layer interfaces (such as
heap_open and heap_beginscan) are also available if SPI interface
can be used.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Alvaro Herrera
Excerpts from Kohei KaiGai's message of mar sep 11 13:25:18 -0300 2012:
 2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:

   - an SQL-driven scheduler, similar to pgAgent, it's generic enough,
 we might port it to this scheme and publish it
 
  Hm, this would benefit from a direct backend connection to get the
  schedule data (SPI interface I guess).
 
 I also think SPI interface will be first candidate for the daemons that
 needs database access. Probably, lower layer interfaces (such as
 heap_open and heap_beginscan) are also available if SPI interface
 can be used.

Well, as soon as you have a database connection on which you can run
SPI, you need a lot of stuff to ensure your transaction is aborted in
case of trouble and so on.  At that point you can do direct access as
well.

I think it would be a good design to provide different cleanup routes
for the different use cases: for those that need database connections we
nede to go through AbortOutOfAnyTransaction() or something similar; for
others we can probably get away with much less than that.  Not 100% sure
at this point.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Math and logic mistakes in tsquery_opr_selec

2012-09-11 Thread Tom Lane
While reflecting on
http://archives.postgresql.org/pgsql-performance/2012-09/msg00030.php
I discovered that tsquery selectivity is capable of concluding that
word:* matches less stuff than word:

pub=# explain analyze select * from publications_test where 
to_tsvector('simple', title) @@ to_tsquery('simple', 'database');
   QUERY PLAN   
 
-
 Bitmap Heap Scan on publications_test  (cost=53.27..5717.00 rows=3776 
width=177) (actual time=22.359..57.078 rows=3885 loops=1)
   Recheck Cond: (to_tsvector('simple'::regconfig, title) @@ 
'''database'''::tsquery)
   -  Bitmap Index Scan on ii  (cost=0.00..52.32 rows=3776 width=0) (actual 
time=17.908..17.908 rows=3885 loops=1)
 Index Cond: (to_tsvector('simple'::regconfig, title) @@ 
'''database'''::tsquery)
 Total runtime: 73.254 ms
(5 rows)

pub=# explain analyze select * from publications_test where 
to_tsvector('simple', title) @@ to_tsquery('simple', 'database:*');
QUERY PLAN  
  
--
 Bitmap Heap Scan on publications_test  (cost=41.19..3021.55 rows=1185 
width=177) (actual time=49.031..101.935 rows=6448 loops=1)
   Recheck Cond: (to_tsvector('simple'::regconfig, title) @@ 
'''database'':*'::tsquery)
   -  Bitmap Index Scan on ii  (cost=0.00..40.89 rows=1185 width=0) (actual 
time=43.193..43.193 rows=6448 loops=1)
 Index Cond: (to_tsvector('simple'::regconfig, title) @@ 
'''database'':*'::tsquery)
 Total runtime: 127.576 ms
(5 rows)

Note the smaller estimated rowcount for the second example.  This is
patently ridiculous of course, since database must be included in
what matches database:*.

On investigation it appears that I made multiple logical errors in
commit 97532f7c29468010b87e40a04f8daa3eb097f654, which I have to admit
I didn't think about very hard because it seemed simple.  What the code
currently does for a prefix pattern is to add up the frequencies of MCVs
that match the prefix pattern (database is an MCV in this example),
as well as the total frequency of all MCVs, and then compute

selec = matched / allmcvs;

That looks correct if you don't stop to think, but it isn't.
It is equivalent to

selec = matched + (1 - allmcvs) * (matched / allmcvs);

that is, we're taking the matched frequency as-is, and then assuming
that matched / allmcvs is an appropriate selectivity estimate for the
non-MCV population.  But doing that overweights the more common MCVs.
What we should be doing, and what the comparable and better-tested code
in histogram_selectivity() actually does do, is weight each MCV equally;
there's no reason to assume that more-common MCVs are more
representative of the rest of the lexeme population than less-common
MCVs.  So the calculation should be more like

selec = matched + (1 - allmcvs) * (n_matched / n_mcvs);

The other problem with this math is that simply adding up the
frequencies is the wrong thing, because these aren't most common
*values*, they are most common *elements*.  Any given table row can
contain several different MCEs, so we shouldn't just add the
probabilities as if they were mutually-exclusive events.  We need
to treat them as independent events, so that the summing looks more
like

matched += t-frequency - matched * t-frequency;

(The reason my example comes out so obviously wrong is that allmcvs
actually sums to more than 1 without this correction, so that the
estimate becomes something less than the value of matched.)

Lastly, the code ends by clamping its estimate to be at least
DEFAULT_TS_MATCH_SEL:

/*
 * In any case, never believe that a prefix match has selectivity
 * less than DEFAULT_TS_MATCH_SEL.
 */
selec = Max(DEFAULT_TS_MATCH_SEL, selec);

On reflection, that seems like a horrid idea.  We should probably not
believe the selectivity is exactly zero if the pattern chances to match
none of the MCEs, but setting it equal to the default-for-no-statistics
is way too high.  I'm tempted to use DEFAULT_TS_MATCH_SEL/100 here, but
I wonder if anyone has an idea for a more principled minimum estimate.
In particular, does it make sense to consider the number of MCEs we
have, and/or the length of the pattern?  Having more MCEs seems to make
it less likely that we are underestimating the match frequency, and
longer patterns should match less stuff, too.  But I'm not sure what
to do with those intuitions.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Boszormenyi Zoltan

2012-09-11 17:58 keltezéssel, Alvaro Herrera írta:

Excerpts from Kohei KaiGai's message of mar sep 11 12:46:34 -0300 2012:

2012/9/11 Alvaro Herrera alvhe...@2ndquadrant.com:

Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012:


We have some use cases for this patch, when can you post
a new version? I would test and review it.

What use cases do you have in mind?


I'm motivated with this feature to implement background calculation server
to handle accesses to GPU device; to avoid limitation of number of processes
that can use GPU device simultaneously.

Hmm, okay, so basically a worker would need a couple of LWLocks, a
shared memory area, and not much else?  Not a database connection.


Probably, other folks have their use cases.
For example, Zoltan introduced his use case in the upthread as follows:

- an SQL-driven scheduler, similar to pgAgent, it's generic enough,
   we might port it to this scheme and publish it

Hm, this would benefit from a direct backend connection to get the
schedule data (SPI interface I guess).


Indeed. And the advantage is that the scheduler's lifetime is exactly
the server's lifetime so there is no need to try reconnecting as soon
as the server goes away and wait until it comes back.


- a huge volume importer daemon, it was written for a very specific
   purpose and for a single client, we cannot publish it.

This one AFAIR requires more than one connection, so a direct data
connection is no good -- hence link libpq like walreceiver.


Yes.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Doc typo: lexems - lexemes

2012-09-11 Thread Dan Scott
I ran across a minor typo while reviewing the full-text search
documentation. Attached is a patch to address the one usage of lexems
in a sea of lexemes.

diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
new file mode 100644
index 978aa54..5305198
*** a/doc/src/sgml/textsearch.sgml
--- b/doc/src/sgml/textsearch.sgml
*** ts_rank(optional replaceable class=P
*** 867,873 
  
listitem
 para
! Ranks vectors based on the frequency of their matching lexems.
 /para
/listitem
   /varlistentry
--- 867,873 
  
listitem
 para
! Ranks vectors based on the frequency of their matching lexemes.
 /para
/listitem
   /varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2012-09-11 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 We discussed it to the point of consensus, and Tom wrote a patch
 to implement that.  Testing in my shop hit problems for which the
 cause was not obvious.  I don't know whether there is a flaw in
 the designed approach that we all missed, a simple programming bug
 of some sort in the patch, or pilot error on this end.
 
It's looking like the last of those.  The problem was in BEFORE
DELETE triggers which, for example, would check that there were the
expected child records (throwing an error if they were missing)
right before deleting them.  When the reissue the DELETE and then
RETURN NULL trick was used to avoid errors with this patch, the
trigger would fail the second time through.  Of course, such
triggers were more than a bit silly and clearly The Wrong Thing To
Do in general.  I don't know how widespread such practice is, but it
will need to be fixed where it exists in order to use the proposed
patch and related workaround for cascade delete-like triggers.
 
Before someone says that foreign keys should just be used, I want to
point out the -like above.  In my experience, for about every ten
cases where a declarative constraint like UNIQUE or FOREIGN KEY can
cover a business rule, there is about one that is logically very
similar to such a declarative constraint but just different enough
that it must be implemented in custom code.  Jeff Davis has been
improving that ratio, but I doubt that the issue will ever disappear
entirely.
 
At any rate, I think we might want to apply Tom's patch for this
while 9.3 is still early in development, to see what else might
shake out from it while there is still plenty of time to fix any
issues.  It's now looking good from my perspective.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-09-11 Thread Bruce Momjian
On Tue, Sep 11, 2012 at 08:27:49AM +0200, Stefan Kaltenbrunner wrote:
 On 09/10/2012 05:19 PM, Bruce Momjian wrote:
  On Mon, Sep 10, 2012 at 12:06:18PM -0300, Alvaro Herrera wrote:
  It is this kind of run-around that caused me to generate my own doc
  build in the past;  maybe I need to return to doing my own doc build.
 
  You keep threatening with that.  You are free, of course, to do anything
  you want, and no one will break sweat about it.  I already said I will
  work on getting this up and running, but I can't give you a deadline for
  when it'll be working.
  
  My point is that this frequent doc build feature was removed with no
  discussion, and adding it seems to be some herculean job that requires
  red tape only a government worker would love.
 
 Not sure how you got that impression - but understand all requirements
 to something is usually key to implementing a solution, so discussing
 those requirements seems like a sensible thing to do.
 sysadmin is a volunteer effort and we do our best to deal with both
 keeping the existing infrastructure up and improving as we can but
 resources are limited and we need to consider the time/effort ration of
 stuff.
 Anyway alvaro clearly stated he would deal with it but obviously
 thatthat is not enough for your urgent demands so there is really not
 much we can do about it...

Don't know about urgent, but I made this request in May:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00480.php

and at a certain point, waiting four months and discussing it repeatedly
just isn't an efficient use of my time.

It only took me 15 minutes to implement.  I am guessing the complexity
of the Postgres infrastructure just makes the job much harder to
implement there.  

This is a good example of why some organizations like cloud services,
where they can host things without waiting for the item to get to the
top of the IT TODO list.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cast Operator Precedence

2012-09-11 Thread David E. Wheeler
Hackers,

I found this surprising:

david=# CREATE DOMAIN STATUS AS INTEGER CHECK ( VALUE IN (1, 2, 3) );
CREATE DOMAIN

david=# select -4::status;
ERROR:  value for domain status violates check constraint status_check

david=# select -1::status;
 ?column? 
--
   -1
(1 row)

david=# select (-1)::status;
ERROR:  value for domain status violates check constraint status_check

So I guess the precedence of :: is higher than -?

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cast Operator Precedence

2012-09-11 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 So I guess the precedence of :: is higher than -?

Sure.  Otherwise, you might get the wrong semantics of -.

http://www.postgresql.org/docs/9.2/static/sql-syntax-lexical.html#SQL-PRECEDENCE

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Correction to comment regarding atomicity of an operation

2012-09-11 Thread Gurjeet Singh
This comment in UpdateFullPageWrites() seems to be inaccurate:

 * It's safe to check the shared full_page_writes without the lock,
 * because we assume that there is no concurrently running process which
 * can update it.

That assumption does not hold on any sane SMP system.

I think the real reason is that we assume that read/write to an integer is
atomic, like we do for TransactionId variables:

heapam.c: TransactionId read/write is assumed atomic anyway.

Best regards,

PS: As usual, I hope I am not missing something very obvious.
-- 
Gurjeet Singh

http://gurjeet.singh.im/


Re: [HACKERS] Doc typo: lexems - lexemes

2012-09-11 Thread Kevin Grittner
Dan Scott  wrote:
 I ran across a minor typo while reviewing the full-text search
 documentation. Attached is a patch to address the one usage of
 lexems in a sea of lexemes.
 
Applied to HEAD.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction to comment regarding atomicity of an operation

2012-09-11 Thread Amit Kapila
On Wednesday, September 12, 2012 5:33 AM Gurjeet Singh wrote:

 

 

 This comment in UpdateFullPageWrites() seems to be inaccurate:

 * It's safe to check the shared full_page_writes without the lock,
 * because we assume that there is no concurrently running process
which
 * can update it.

 That assumption does not hold on any sane SMP system.

Do you able to see any case where it can be updated when being accessed
here.

 

With Regards,

Amit Kapila.



Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Amit Kapila
On Tuesday, September 11, 2012 9:09 PM Alvaro Herrera wrote:
Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012:

 We have some use cases for this patch, when can you post
 a new version? I would test and review it.

 What use cases do you have in mind?

  Wouldn't it be helpful for some features like parallel query in future?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts

2012-09-11 Thread Dan Ports
On Mon, Sep 10, 2012 at 10:44:57PM -0700, Jeff Davis wrote:
 For the archives, and for those not following the paper in detail, there
 is one situation in which SSI will abort a read-only transaction.
 
 When there are three transactions forming a dangerous pattern where T1
 (read-only) has a conflict out to T2, and T2 has a conflict out to T3;
 and T3 is committed and T2 is prepared (for two-phase commit). In that
 situation, SSI can't roll back the committed or prepared transactions,
 so it must roll back the read-only transaction (T1).

This is true, but it isn't the only situation where a read-only
transaction can be rolled back -- this can happen even without
two-phase commit involved. 

You can have a situation where two read/write transactions T2 and T3
conflict such that T2 appears to have executed first in the serial
order, but T3 commits before T2. If there's a read-only transaction T1
that takes its snapshot between when T3 and T2 commit, it can't be
allowed to read the data that the other two transactions modified: it'd
see the changes made by T3 but not T2, violating the serial order.

Given a choice, we'd prevent this by aborting one of the read/write
transactions. But if they've both already committed by the time the
read-only transaction T1 does its reads, we'd have to abort it instead.

(Note that this is still an improvement over two-phase locking, which
wouldn't allow any of the transactions to execute concurrently!)


What I was getting at in my previous mail was that there aren't any
situations where COMMIT will return a serialization failure for
a read-only transaction.

Dan

-- 
Dan R. K. PortsUW CSEhttp://drkp.net/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-11 Thread Alvaro Herrera
Excerpts from Amit Kapila's message of mié sep 12 00:30:40 -0300 2012:
 On Tuesday, September 11, 2012 9:09 PM Alvaro Herrera wrote:
 Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012:
 
  We have some use cases for this patch, when can you post
  a new version? I would test and review it.
 
  What use cases do you have in mind?
 
   Wouldn't it be helpful for some features like parallel query in future?

Maybe, maybe not -- but I don't think it's a wise idea to include too
much complexity just to support such a thing.  I would vote to leave
that out for now and just concentrate on getting external stuff working.
There are enough use cases that it's already looking nontrivial.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ossp-uuid Contrib Patch

2012-09-11 Thread David E. Wheeler
On Sep 10, 2012, at 6:01 PM, Peter Eisentraut pete...@gmx.net wrote:

 Well given that OSSP seems to be abandon ware (no activity since July
 2008), it might be time to dump it in favor of something else.
 
 Are there any outstanding issues that would require an update?

Many. Look at all the issues with the Subsys uuid here:

  http://cvs.ossp.org/rptview?rn=1order_by=7order_dir=ASC

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] ossp-uuid Contrib Patch

2012-09-11 Thread David E. Wheeler
On Sep 10, 2012, at 6:05 PM, Peter Eisentraut pete...@gmx.net wrote:

 Yeah, maybe.  It doesn't even seem to be the standard implementation
 on Linux or Mac.  A bit of research says that Theodore Ts'o's libuuid
 is what comes native with the OS on those platforms.  No idea whether
 the functionality is equivalent, though.
 
 They have different interfaces that would also affect the exposed SQL
 interfaces.  We could provide two different extensions, wrapping each
 library.

Yes, I think the question becomes how much pg wants to depend on abandonware 
for its contrib extensions.

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Cast Operator Precedence

2012-09-11 Thread David E. Wheeler
On Sep 11, 2012, at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Sure.  Otherwise, you might get the wrong semantics of -.
 
 http://www.postgresql.org/docs/9.2/static/sql-syntax-lexical.html#SQL-PRECEDENCE

Well, I guess that's what I get for writing test in literal SQL pushed through 
psql. Prepared statements FTW!

David




smime.p7s
Description: S/MIME cryptographic signature