date:20140103

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-03 Thread knizhnik


Hi David,

Sorry, but I do not completely understand your suggestions:

1. IMCS really contains single patch file sysv_shmem.patch.
Applying this patch is not mandatory for using IMCS: it just solves the 
problem with support of > 256Gb of shared memory.
Right now PostgreSQL is not able to use more than 256Gb shared buffers 
at Linux with standard 4kb pages.

I have found proposal for using MAP_HUGETLB flag in commit fest:

http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org

but unfortunately it was rejected. Hugepages are intensively used by 
Oracle and I think that them will be useful for improving performance of 
PorstreSQL. So not just IMCS can benefit from this patch. My patch  is 
much more simple - I specially limited scope of this patch to one file. 
Certainly switch huge tlb on/off should be done through postgresql.conf 
configuration file.


In any case - IMCS can be used without this patch: you just could not 
use more than 256Gb memory, even if your system has more RAM.


2. I do not understand "The add-on is not formatted as an EXTENSION"
IMCS was created as standard extension - I just look at the examples of 
other PostgreSQL extensions included in PostgreSQL distribution
(for example pg_stat_statements). It can be added using "create 
extension imcs" and removed "drop extension imcs" commands.


If there are some violations of PostgreSQL extensions rules, please let 
me know, I will fix them.

But I thought that I have done everything in legal way.






On 01/04/2014 03:21 AM, David Fetter wrote:

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:

I want to announce implementation of In-Memory Columnar Store
extension for PostgreSQL.
Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1.  There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways.  Please rearrange the patch so it doesn't require
this.  This leads to:

2.  The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] cleanup in code

2014-01-03 Thread Amit Kapila

1. compiling with msvc shows warning in relcache.c
1>e:\workspace\postgresql\master\postgresql\src\backend\utils\cache\relcache.c(3959):
warning C4715: 'RelationGetIndexAttrBitmap' : not all control paths
return a value

Attached patch remove_msvc_warning.patch to remove above warning

2. It seems option K is not used in pg_dump:
while ((c = getopt_long(argc, argv,
"abcCd:E:f:F:h:ij:K:n:N:oOp:RsS:t:T:U:vwWxZ:",
 long_options, &optindex)) != -1)
I have checked both docs and code but didn't find the use of this option.
Am I missing something here?

Attached patch remove_redundant_option_K_pgdump.patch to remove this option
from code.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


remove_msvc_warning.patch
Description: Binary data


remove_redundant_option_K_pgdump.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan

On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane  wrote:
> BTW, so far as the syntax goes, I'm quite distressed by having to make
> REJECTS into a fully-reserved word.  It's not reserved according to the
> standard, and it seems pretty likely to be something that apps might be
> using as a table or column name.

I've been looking at this, but I'm having a hard time figuring out a
way to eliminate shift/reduce conflicts while not maintaining REJECTS
as a fully reserved keyword - I'm pretty sure it's impossible with an
LALR parser. I'm not totally enamored with the exact syntax proposed
-- I appreciate the flexibility on the one hand, but on the other hand
I suppose that REJECTS could just as easily be any number of other
words.

One possible compromise would be to use a synonym that is not imagined
to be in use very widely, although I looked up "reject" in a thesaurus
and didn't feel too great about that idea afterwards. Another idea
would be to have a REJECTING keyword, as the sort of complement of
RETURNING (currently you can still ask for RETURNING, without REJECTS
but with ON DUPLICATE KEY LOCK FOR UPDATE if that happens to make
sense). I think that would work fine, and might actually be more
elegant. Now, REJECTING will probably have to be a reserved keyword,
but that seems less problematic, particularly as RETURNING is itself a
reserved keyword not described by the standard. In my opinion
REJECTING would reinforce the notion of projecting the complement of
what RETURNING would project in the same context.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] new json funcs

2014-01-03 Thread Andrew Dunstan



Here is a patch for the new json functions I mentioned a couple of 
months ago. These are:


json_to_record
json_to_recordset
json_object
json_build_array
json_build_object
json_object_agg

So far there are no docs, but the way these work is illustrated in the 
regression tests - I hope to have docs within a few days.


cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index af8ddc6..9b9d11f 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -68,6 +68,11 @@ static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
   bool use_line_feeds);
 static void array_to_json_internal(Datum array, StringInfo result,
 	   bool use_line_feeds);
+static void
+datum_to_json(Datum val, bool is_null, StringInfo result,
+			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar);
+static void add_json(Datum orig_val, bool is_null, StringInfo result, 
+	 Oid val_type, bool key_scalar);
 
 /* the null action object used for pure validation */
 static JsonSemAction nullSemAction =
@@ -1217,7 +1222,7 @@ extract_mb_char(char *s)
  */
 static void
 datum_to_json(Datum val, bool is_null, StringInfo result,
-			  TYPCATEGORY tcategory, Oid typoutputfunc)
+			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar)
 {
 	char	   *outputstr;
 	text	   *jsontext;
@@ -1239,23 +1244,31 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			composite_to_json(val, result, false);
 			break;
 		case TYPCATEGORY_BOOLEAN:
-			if (DatumGetBool(val))
-appendStringInfoString(result, "true");
+			if (!key_scalar)
+appendStringInfoString(result, DatumGetBool(val) ? "true" : "false");
 			else
-appendStringInfoString(result, "false");
+escape_json(result, DatumGetBool(val) ? "true" : "false");
 			break;
 		case TYPCATEGORY_NUMERIC:
 			outputstr = OidOutputFunctionCall(typoutputfunc, val);
-			/*
-			 * Don't call escape_json here if it's a valid JSON number.
-			 */
-			dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
-			dummy_lex.input_length = strlen(dummy_lex.input);
-			json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
-			if (! numeric_error)
-appendStringInfoString(result, outputstr);
-			else
+			if (key_scalar)
+			{
+/* always quote keys */
 escape_json(result, outputstr);
+			}
+			else
+			{
+/*
+ * Don't call escape_json for a non-key if it's a valid JSON number.
+ */
+dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
+dummy_lex.input_length = strlen(dummy_lex.input);
+json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
+if (! numeric_error)
+	appendStringInfoString(result, outputstr);
+else
+	escape_json(result, outputstr);
+			}
 			pfree(outputstr);
 			break;
 		case TYPCATEGORY_JSON:
@@ -1273,6 +1286,10 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			break;
 		default:
 			outputstr = OidOutputFunctionCall(typoutputfunc, val);
+			if (key_scalar && *outputstr == '\0')
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must not be empty")));
 			escape_json(result, outputstr);
 			pfree(outputstr);
 			break;
@@ -1306,7 +1323,7 @@ array_dim_to_json(StringInfo result, int dim, int ndims, int *dims, Datum *vals,
 		if (dim + 1 == ndims)
 		{
 			datum_to_json(vals[*valcount], nulls[*valcount], result, tcategory,
-		  typoutputfunc);
+		  typoutputfunc, false);
 			(*valcount)++;
 		}
 		else
@@ -1487,13 +1504,85 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 		else
 			tcategory = TypeCategory(tupdesc->attrs[i]->atttypid);
 
-		datum_to_json(val, isnull, result, tcategory, typoutput);
+		datum_to_json(val, isnull, result, tcategory, typoutput, false);
 	}
 
 	appendStringInfoChar(result, '}');
 	ReleaseTupleDesc(tupdesc);
 }
 
+static void
+add_json(Datum orig_val, bool is_null, StringInfo result, Oid val_type, bool key_scalar)
+{
+Datum   val;
+	TYPCATEGORY tcategory;
+	Oid			typoutput;
+	bool		typisvarlena;
+	Oid castfunc = InvalidOid;
+
+if (val_type == InvalidOid)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not determine input data type")));
+
+
+	getTypeOutputInfo(val_type, &typoutput, &typisvarlena);
+
+	if (val_type > FirstNormalObjectId)
+	{
+		HeapTuple   tuple;
+			Form_pg_cast castForm;
+
+			tuple = SearchSysCache2(CASTSOURCETARGET,
+	ObjectIdGetDatum(val_type),
+	ObjectIdGetDatum(JSONOID));
+			if (HeapTupleIsValid(tuple))
+			{
+castForm = (Form_pg_cast) GETSTRUCT(tuple);
+
+if (castForm->castmethod == COERCION_METHOD_FUNCTION)
+	castfunc = typoutput = castForm->castfunc;
+
+ReleaseSysCache(tuple);
+			}
+	}
+
+	if (castfunc != InvalidOid)
+		tcategory = TYPCATEGORY_JSON_CAST;
+	else if (val_type == RECORDARRAYOID)
+		tcategory =

[HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-03 Thread Gabriele Bartolini

Hello,

  please find attached the patch that adds basic support for the
pg_stat_archiver system view, which allows users that have continuous
archiving procedures in place to keep track of some important metrics
and information.

  Currently, pg_stat_archiver displays:

* archived_wals: number of successfully archived WAL files since start
(or the last reset)
* last_archived_wal: last successfully archived WAL file
* last_archived_wal_time: timestamp of the latest successful WAL archival
* stats_reset: time of last stats reset

  This is an example of output:

postgres=# select * from pg_stat_archiver ;
-[ RECORD 1 ]--+--
archived_wals  | 1
last_archived_wal  | 00010001
last_archived_wal_time | 2014-01-04 01:01:08.858648+01
stats_reset| 2014-01-04 00:59:25.895034+01

  Similarly to pg_stat_bgwriter, it is possible to reset statistics just
for this context, calling the pg_stat_reset_shared('archiver') function.

  The patch is here for discussion and has been prepared against HEAD.
It includes also changes in the documentation and the rules.out test.

  I plan to add further information to the pg_stat_archiver view,
including the number of failed attempts of archival and the WAL and
timestamp of the latest failure. However, before proceeding, I'd like to
get some feedback on this small patch as well as advice on possible
regression tests to be added.

  Thank you.

Cheers,
Gabriele

-- 
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4ec6981..6d45972 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -270,6 +270,14 @@ postgres: user database 
host 
 
  
+  
pg_stat_archiverpg_stat_archiver
+  One row only, showing statistics about the
+   WAL archiver process's activity. See
+for details.
+ 
+ 
+
+ 
   
pg_stat_bgwriterpg_stat_bgwriter
   One row only, showing statistics about the
background writer process's activity. See
@@ -648,6 +656,49 @@ postgres: user database 
host 
   
 
+  
+   pg_stat_archiver View
+
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+
+ 
+  archived_wals
+  bigint
+  Number of WAL files that have been successfully archived
+ 
+ 
+  last_archived_wal
+  text
+  Name of the last successfully archived WAL file
+ 
+ 
+  last_archived_wal_time
+  timestamp with time zone
+  Time of the last successful archival operation
+ 
+ 
+  stats_reset
+  timestamp with time zone
+  Time at which these statistics were last reset
+ 
+
+
+  
+
+  
+   The pg_stat_archiver view will always have a
+   single row, containing data about the archiver process of the cluster.
+  
+
+
   
pg_stat_bgwriter View
 
@@ -1613,6 +1664,8 @@ postgres: user database 
host pg_stat_reset_shared('bgwriter') will zero all the
counters shown in the pg_stat_bgwriter view.
+   Calling pg_stat_reset_shared('archiver') will zero all the
+   counters shown in the pg_stat_archiver view.
   
  
 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 575a40f..3a8d7b4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -672,6 +672,13 @@ CREATE VIEW pg_stat_xact_user_functions AS
 WHERE P.prolang != 12  -- fast check to eliminate built-in functions
   AND pg_stat_get_xact_function_calls(P.oid) IS NOT NULL;
 
+CREATE VIEW pg_stat_archiver AS
+SELECT
+pg_stat_get_archiver_archived_wals() AS archived_wals,
+pg_stat_get_archiver_last_archived_wal() AS last_archived_wal,
+pg_stat_get_archiver_last_archived_wal_time() AS 
last_archived_wal_time,
+pg_stat_get_archiver_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_bgwriter AS
 SELECT
 pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 2bb572e..3cb1ddc 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -36,6 +36,7 @@
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
@@ -46,6 +47,7 @@
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
 #include "utils/ps_status.h"
+#include "utils/timestamp.h"
 
 
 /* --
@@ -755,4 +757,12 @@ pgarch_archiveDone(char *xlog)
(errcode_for_file_access(),
 errmsg("could not rename file \"%s\" to 
\"%s\": %m",
rlogready, rlogdone

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-03 Thread David Fetter

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
> I want to announce implementation of In-Memory Columnar Store
> extension for PostgreSQL.
> Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1.  There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways.  Please rearrange the patch so it doesn't require
this.  This leads to:

2.  The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan

On Fri, Jan 3, 2014 at 7:39 AM, Peter Eisentraut  wrote:
> This patch doesn't apply anymore.

Yes, there was some bit-rot. I previous deferred dealing with a
shift/reduce conflict implied by commit
1b4f7f93b4693858cb983af3cd557f6097dab67b. I've fixed that problem now
using non operator precedence, and performed a clean rebase on master.
I've also fixed the basis of your much earlier complaint about
breakage of ecpg's regression tests (without adding support for the
feature to ecpg). All make check-world tests pass. Patch is attached.
I have yet to figure out how to make REJECTS a non-reserved keyword,
or even just a type_func_name_keyword, though intuitively I have a
sense that the latter ought to be possible.

This is the same basic patch as benchmarked above, with various tricks
to avoid stronger lock acquisition when that's likely profitable (we
can even do _bt_check_unique() with only a shared lock and no hwlock
much of the time, on the well-informed suspicion that it won't be
necessary to insert, but only to return a TID). There has also been
some clean-up to aspects of serializable behavior, but that needs
further attention and scrutiny from a subject matter expert, hopefully
Heikki. Though it's probably also true that I should find time to
think about transaction isolation some more.

I've since had another idea relating to performance optimization,
which was to hint that the last attempt to insert a key was
unsuccessful, so the next one (after the conflicting transaction's
commit/abort) of that same value will very likely conflict too, making
lock avoidance profitable on average. This appears to be much more
effective than the previous woolly heuristic (never published, just
benchmarked), which I've left in as an additional reason to avoid
heavyweight locking, if only for discussion. This benchmark now shows
my approach winning convincingly with this additional "priorConflict"
optimization:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/

If someone had time to independently recreate the benchmark I have
here, or perhaps to benchmark the patch in some other way, that would
be useful (for full details see my recent e-mail about the prior
benchmark, where the exact details are described - this is the same,
but with one more run for the priorConflict optimization).

Subtleties of visibility also obviously deserve closer inspection, but
perhaps I shouldn't be so hasty: No consensus on the way forward looks
even close to emerging. How do people feel about my approach now?

-- 
Peter Geoghegan


btreelock_insert_on_dup.v6.2014_01_03.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] costing of hash join

2014-01-03 Thread Tom Lane

Jeff Janes  writes:
> I'm trying to figure out why hash joins seem to be systematically underused
> in my hands.  In the case I am immediately looking at it prefers a merge
> join with both inputs getting seq scanned and sorted, despite the hash join
> being actually 2 to 3 times faster, where inputs and intermediate working
> sets are all in memory.  I normally wouldn't worry about a factor of 3
> error, but I see this a lot in many different situations.  The row
> estimates are very close to actual, the errors is only in the cpu estimates.

Can you produce a test case for other people to look at?

What datatype(s) are the join keys?

> A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
> into the hash table:

Doesn't seem like monkeying with that is going to account for a 3x error.

Have you tried using perf or oprofile or similar to see where the time is
actually, rather than theoretically, going?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] costing of hash join

2014-01-03 Thread Jeff Janes

I'm trying to figure out why hash joins seem to be systematically underused
in my hands.  In the case I am immediately looking at it prefers a merge
join with both inputs getting seq scanned and sorted, despite the hash join
being actually 2 to 3 times faster, where inputs and intermediate working
sets are all in memory.  I normally wouldn't worry about a factor of 3
error, but I see this a lot in many different situations.  The row
estimates are very close to actual, the errors is only in the cpu estimates.

A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
into the hash table:

 * charge one cpu_operator_cost for each column's hash function.  Also,
 * tack on one cpu_tuple_cost per inner row, to model the costs of
 * inserting the row into the hashtable.

But a sort is not charged a similar charge to insert a tuple into the sort
memory pool:

 * Also charge a small amount (arbitrarily set equal to operator cost)
per
 * extracted tuple.  We don't charge cpu_tuple_cost because a Sort node
 * doesn't do qual-checking or projection, so it has less overhead than
 * most plan nodes.  Note it's correct to use tuples not output_tuples

Are these operations different enough to justify this difference?  The
qual-checking (and I think projection) needed on a hash join should have
already been performed by and costed to the seq scan feeding the hashjoin,
right?

Cheers,

Jeff

Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

2014-01-03 Thread Omar Kilani

We had the same issues running 9.2.4:

[2013-10-15 00:23:01 GMT/0/15396] WARNING:  page 8789807 of relation
base/16429/2349631976 is uninitialized
[2013-10-15 00:23:01 GMT/0/15396] CONTEXT:  xlog redo vacuum: rel
1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0
[2013-10-15 00:23:01 GMT/0/15396] PANIC:  WAL contains references to
invalid pages
[2013-10-15 00:23:01 GMT/0/15396] CONTEXT:  xlog redo vacuum: rel
1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0
[2013-10-15 00:23:11 GMT/0/15393] LOG:  startup process (PID 15396)
was terminated by signal 6: Aborted
[2013-10-15 00:23:11 GMT/0/15393] LOG:  terminating any other active
server processes

Also on an index. I ended up manually patching the heap files at that
block location to "fix" the problem. It happened again about 2 weeks
after that, then never again. It hit all connected secondaries.

On Fri, Jan 3, 2014 at 12:50 PM, Sergey Konoplev  wrote:
> On Thu, Jan 2, 2014 at 11:59 AM, Christophe Pettus  wrote:
>> In both cases, the indicated relation was a primary key index.  In one case, 
>> rebuilding the primary key index caused the problem to go away permanently 
>> (to date).  In the second case, the problem returned even after a full dump 
>> / restore of the master database (that is, after a dump / restore of the 
>> master, and reimaging the secondary, the problem returned at the same 
>> primary key index, although of course with a different OID value).
>>
>> It looks like this has been experienced on 9.2.6, as well:
>>
>> 
>> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jxe6v4r7h...@mail.gmail.com
>
> This problem worries me a lot too. If someone is interested I still
> have a file system copy of the buggy cluster including WAL.
>
> --
> Kind regards,
> Sergey Konoplev
> PostgreSQL Consultant and DBA
>
> http://www.linkedin.com/in/grayhemp
> +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
> gray...@gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

2014-01-03 Thread Sergey Konoplev

On Thu, Jan 2, 2014 at 11:59 AM, Christophe Pettus  wrote:
> In both cases, the indicated relation was a primary key index.  In one case, 
> rebuilding the primary key index caused the problem to go away permanently 
> (to date).  In the second case, the problem returned even after a full dump / 
> restore of the master database (that is, after a dump / restore of the 
> master, and reimaging the secondary, the problem returned at the same primary 
> key index, although of course with a different OID value).
>
> It looks like this has been experienced on 9.2.6, as well:
>
> 
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jxe6v4r7h...@mail.gmail.com

This problem worries me a lot too. If someone is interested I still
have a file system copy of the buggy cluster including WAL.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ISN extension bug? (with patch)

2014-01-03 Thread Fabien COELHO




If so, there is only the one-liner patch to consider.


This patch doesn't apply anymore.  Please submit an updated patch for 
the commit fest.


In src/include/utils/elog.h there is an include for "utils/errcodes.h" 
which is generated somehow when compiling postgresql but not present by 
default. So you have to compile postgresql and then the contrib, or use 
PGXS with an already installed version.


With this caveat, the one-liner patch (4 characters removed) reattached 
does compile for me:


  sh> git branch ismn2
  sh> git checkout ismn2
  sh> patch -p1 < ~/ismn-checksum.patch
  patching file contrib/isn/isn.c
  sh> ...
  sh> cd contrib/isn
  sh> make
  gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-fpic -I. -I. -I../../src/include -D_GNU_SOURCE   -c -o isn.o isn.c
  gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-fpic -L../../src/port -L../../src/common -Wl,--as-needed 
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  -shared -o isn.so isn.o
  sh>

--
Fabiendiff --git a/contrib/isn/isn.c b/contrib/isn/isn.c
index 3db6b84..c4e2333 100644
--- a/contrib/isn/isn.c
+++ b/contrib/isn/isn.c
@@ -827,7 +827,7 @@ string2ean(const char *str, bool errorOK, ean13 *result,
 		case ISMN:
 			strncpy(buf, "9790", 4);	/* this isn't for sure yet, for now
 		 * ISMN it's only 9790 */
-			valid = (valid && ((rcheck = checkdig(buf + 3, 10)) == check || magic));
+			valid = (valid && ((rcheck = checkdig(buf, 13)) == check || magic));
 			break;
 		case ISBN:
 			strncpy(buf, "978", 3);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire  wrote:
> On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane  wrote:
>> Claudio Freire  writes:
>>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer  wrote:
 Loading data into the database isn't such an uncommon task.  Not everything
 is OLTP.
>>
>>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.  Which
>> eliminates a whole lot of apps.  I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading.  It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
> You're forgetting ORM workloads.
>
> ORMs can usually plan the inserts to be in a sequence that both don't
> require feedback (except the knowledge that they were successful), and
> that do not violate constraints.
>
> Flushing a whole object hierarchy for instance, can be done without
> feedback. Not even serial columns need feedback, since many ORMs
> (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
> (by issuing a proper select nextval).
>
> I agree, that with the proposed API, it's too error prone to be
> useful. But I also think, if the API is simple and fool-proof enough,
> it could be "build them and they will come". I know I'll be happy to
> implement support for SQLAlchemy (since it will benefit me), if the
> API resembles the proposition below (at least in simplicity).
>
> Per-query expectations could be such a thing. And it can even work with 
> PQexec:
>
> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> ... 9 times...
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
> do {
>// do something useful
> } while (PQflush());
>
> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
> send if the buffer is full). After any ASYNC call, PQflush would be
> necessary (to flush the send queue and to consume the expected
> responses), but I can imagine any synchronous call (PQexec,
> PQsendQuery or whatever) could detect a non-empty buffer and just
> blockingly flush right there.
>
> This can benefit many useful patterns. ORM flush, is one, if there can
> be preallocation of IDs (which I know at least SQLAlchemy and
> Hibernate both support).
>
> Execute-many of prepared statements is another one, quite common.
>
> I'm not sure what would happen if one of the queries returned an
> error. If in a transaction, all the following queries would error out
> I'd imagine. If not, they would simply be executed blindly.. am I
> correct?

Long term, I'd rather see an optimized 'ORM flush' assemble the data
into a structured data set (perhaps a JSON document) and pass it to
some receiving routine that decomposed it into records.  This is a
better way to so things on so many levels.  Maybe I'm an old cranky
guy yelling at pigeons, but I don't think the current approach that
many ORMs take is going to withstand the test of time.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Claudio Freire

On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer  wrote:
>>> Loading data into the database isn't such an uncommon task.  Not everything
>>> is OLTP.
>
>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>
> I think Florian has a good point there, and the reason is this: what
> you are talking about will be of exactly zero use to applications that
> want to see the results of one query before launching the next.  Which
> eliminates a whole lot of apps.  I suspect that almost the *only*
> common use case in which a stream of queries can be launched without
> feedback is going to be bulk data loading.  It's not clear at all
> that pipelining the PQexec code path is the way to better performance
> for that --- why not use COPY, instead?

You're forgetting ORM workloads.

ORMs can usually plan the inserts to be in a sequence that both don't
require feedback (except the knowledge that they were successful), and
that do not violate constraints.

Flushing a whole object hierarchy for instance, can be done without
feedback. Not even serial columns need feedback, since many ORMs
(SQLAlchemy, Hibernate) support allocation of ID sequences in batches
(by issuing a proper select nextval).

I agree, that with the proposed API, it's too error prone to be
useful. But I also think, if the API is simple and fool-proof enough,
it could be "build them and they will come". I know I'll be happy to
implement support for SQLAlchemy (since it will benefit me), if the
API resembles the proposition below (at least in simplicity).

Per-query expectations could be such a thing. And it can even work with PQexec:

PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
... 9 times...
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
do {
   // do something useful
} while (PQflush());

Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
send if the buffer is full). After any ASYNC call, PQflush would be
necessary (to flush the send queue and to consume the expected
responses), but I can imagine any synchronous call (PQexec,
PQsendQuery or whatever) could detect a non-empty buffer and just
blockingly flush right there.

This can benefit many useful patterns. ORM flush, is one, if there can
be preallocation of IDs (which I know at least SQLAlchemy and
Hibernate both support).

Execute-many of prepared statements is another one, quite common.

I'm not sure what would happen if one of the queries returned an
error. If in a transaction, all the following queries would error out
I'd imagine. If not, they would simply be executed blindly.. am I
correct?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer  wrote:
> On 01/03/2014 04:20 PM, Tom Lane wrote:
>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.  Which
>> eliminates a whole lot of apps.  I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading.  It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
>
> The data I encounter has to be distributed across multiple tables. Switching
> between the COPY TO commands would again need client-side buffering and
> heuristics for sizing these buffers.  Lengths of runs vary a lot in my case.
>
> I also want to use binary mode as a far as possible to avoid the integer
> conversion overhead, but some columns use custom enum types and are better
> transferred in text mode.
>
> Some INSERTs happen via stored procedures, to implement de-duplication.
>
> These issues could be addressed by using temporary staging tables. However,
> when I did that in the past, this caused pg_shdepend bloat. Carefully
> reusing them when possible might avoid that.  Again, due to the variance in
> lengths of runs, the staging tables are not always beneficial.
>
> I understand that pipelining introduces complexity.  But solving the issues
> described above is no picnic, either.

Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)?  It
transfers most everything in binary (enums notably are handled as
strings).  A typical usage of libpqtypes would be to arrange multiple
records into an array on the client then hand them off to a stored
procedure on the server side (perhaps over an asynchronous call while
you assemble the next batch).  libpqtypes was written for C
applications with very high performance requirements (for non
performance critical cases we might use json instead).  In my
experience it's not too difficult to arrange an assembly/push loop
that amortizes the round trip overhead to zero; it's not as efficient
as COPY but much more flexible and will blow away any scheme that
sends data row per query.

I agree with Tom that major changes to the libpq network stack is
probably not a good idea.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-01-03 Thread Alvaro Herrera

Michael Paquier escribió:
> Hi all,
> 
> Please find attached updated patches for the support of REINDEX
> CONCURRENTLY, renamed 2.0 for the occasion:
> - 20131114_1_index_drop_comments.patch, patch that updates some
> comments in index_drop. This updates only a couple of comments in
> index_drop but has not been committed yet. It should be IMO...

Pushed this one, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly

2014-01-03 Thread Peter Eisentraut

On 12/25/13, 6:40 AM, MauMau wrote:
> pg_regress must wait for postgres to terminate by calling waitpid(),
> because it invoked postgres directly.  The attached
> pg_regress_pg_stop.patch does this.  If you like the combination of this
> and the original fix for pg_ctl in one patch, please use
> pg_stop_fail_v3.patch.

This patch doesn't apply.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Doc fix for VACUUM FREEZE

2014-01-03 Thread Peter Eisentraut

On 12/17/13, 8:16 PM, Maciek Sakrejda wrote:
> (now with patch--sorry about that)

This patch doesn't apply.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Florian Weimer


On 01/03/2014 04:20 PM, Tom Lane wrote:


I think Florian has a good point there, and the reason is this: what
you are talking about will be of exactly zero use to applications that
want to see the results of one query before launching the next.  Which
eliminates a whole lot of apps.  I suspect that almost the *only*
common use case in which a stream of queries can be launched without
feedback is going to be bulk data loading.  It's not clear at all
that pipelining the PQexec code path is the way to better performance
for that --- why not use COPY, instead?


The data I encounter has to be distributed across multiple tables. 
Switching between the COPY TO commands would again need client-side 
buffering and heuristics for sizing these buffers.  Lengths of runs vary 
a lot in my case.


I also want to use binary mode as a far as possible to avoid the integer 
conversion overhead, but some columns use custom enum types and are 
better transferred in text mode.


Some INSERTs happen via stored procedures, to implement de-duplication.

These issues could be addressed by using temporary staging tables. 
However, when I did that in the past, this caused pg_shdepend bloat. 
Carefully reusing them when possible might avoid that.  Again, due to 
the variance in lengths of runs, the staging tables are not always 
beneficial.


I understand that pipelining introduces complexity.  But solving the 
issues described above is no picnic, either.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Eisentraut

This patch doesn't apply anymore.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ISN extension bug? (with patch)

2014-01-03 Thread Peter Eisentraut

On 12/24/13, 10:29 AM, Fabien COELHO wrote:
> 
>> On 12/22/13, 2:36 AM, Fabien COELHO wrote:
>>> I'm not sure whether the policy is to update the version number of the
>>> extension for such a change. As the library is always "isn.so", two
>>> versions cannot live in parallel anyway. If it is useful, the second
>>> patch attached also upgrade the version number.
>>
>> If you are not changing anything in the SQL, then you don't need to
>> change the version number.
> 
> Ok, thanks for the information. I understand that the version number is
> about the API, not the implementation.
> 
> If so, there is only the one-liner patch to consider.

This patch doesn't apply anymore.  Please submit an updated patch for
the commit fest.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Tom Lane

Claudio Freire  writes:
> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer  wrote:
>> Loading data into the database isn't such an uncommon task.  Not everything
>> is OLTP.

> Truly, but a sustained insert stream of 10 Mbps is certainly way
> beyond common non-OLTP loads. This is far more specific than non-OLTP.

I think Florian has a good point there, and the reason is this: what
you are talking about will be of exactly zero use to applications that
want to see the results of one query before launching the next.  Which
eliminates a whole lot of apps.  I suspect that almost the *only*
common use case in which a stream of queries can be launched without
feedback is going to be bulk data loading.  It's not clear at all
that pipelining the PQexec code path is the way to better performance
for that --- why not use COPY, instead?

Or to put it another way, I don't subscribe to "if you build it they
will come" for this proposed feature.  I think that making any use of
it would be so complex and error-prone that the vast majority of apps
won't bother.  Before we start adding a lot of complexity to libpq's
API and internals to support this, you need to make a better case
that there would be a significant number of users.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

2014-01-03 Thread Andres Freund

On 2013-12-12 10:01:21 -0500, Robert Haas wrote:
> On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund  wrote:
> > I think there'll always be a bit of a difference between slots for
> > physical and logical data, even if 90% of the implementation is the
> > same. We can signal that difference by specifying logical/physical as an
> > option or having two different sets of commands.
> >
> > Maybe?
> >
> > ACQUIRE_REPLICATION_SLOT slot_name PHYSICAL physical_opts
> > ACQUIRE_REPLICATION_SLOT slot_name LOGICAL logical_opts
> > -- already exists without slot, PHYSICAL arguments
> > START_REPLICATION [SLOT slot] [PHYSICAL] RECPTR opt_timeline
> > START_REPLICATION SLOT LOGICAL slot plugin_options
> > RELEASE_REPLICATION_SLOT slot_name
> 
> I assume you meant START_REPLICATION SLOT slot LOGICAL plugin_options,
> but basically this seems OK to me.

When writing the code for this, I decided that I need to reneg a bit on
those names - they don't work nicely enough on the C level for
me. Specifically during a START_REPLICATION we need to temporarily mark
the slot as being actively used and mark it unused again
afterwards. That's much more Acquire/Release like than the persistent
Acquire/Release above for me.

The C names in the version I am working on currently are:
extern void ReplicationSlotCreate(const char *name);
extern void ReplicationSlotDrop(const char *name);
extern void ReplicationSlotAcquire(const char *name);
extern void ReplicationSlotRelease(void);
extern void ReplicationSlotSave(void);

which would make the walsender ones

CREATE_REPLICATION_SLOT ...
START_REPLICATION [SLOT slot] [LOGICAL | PHYSICAL] ...
DROP_REPLICATION_SLOT ...

where START_REPLICATION internally does acquire/release on the passed
SLOT.

Does that work for you?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-01-03 Thread Andres Freund

Hi,

On 2014-01-03 11:11:13 -0300, Alvaro Herrera wrote:
> > Yeah.  Since we expect mxids to be composed at a much lower rate than
> > xids, we can keep pg_multixact small without needing to increase the
> > rate of full table scans.

I don't think that's necessarily true - there have been several
pg_controldata outputs posted lately which had more multis used than
xids. In workloads using explicit row locking or heavily used FKs that's
not that suprising.

> > However, it seems to me that we ought to
> > have GUCs for mxid_freeze_table_age and mxid_freeze_min_age.  There's
> > no principled way to derive those values from the corresponding values
> > for XIDs, and I can't see any reason to suppose that we know how to
> > auto-tune brand new values better than we know how to auto-tune their
> > XID equivalents that we've had for years.
> > 
> > One million is probably a reasonable default for mxid_freeze_min_age, 
> > though.

I think setting mxid_freeze_min_age to something lower is fair game, I'd
even start at 100k or so. What I think is important is that we do *not*
set mxid_freeze_table_age to something very low. People justifiedly hate
anti-wraparound vacuums.

What's your thought about the autovacuum_freeze_max_age equivalent?

I am not sure about introducing new GUCs in the back branches, I don't
have a problem with it, but I am also not sure it's necessary. Fixing
members wraparound into itself seems more important and once we trigger
vacuums via that it doesn't seem to be too important to have low
settings.

> Also, what would be good names?  Peter E. complained recently about the
> word MultiXactId being exposed in some error messages; maybe "mxid" is
> too short an abbreviation of that.  Perhaps
>   multixactid_freeze_min_age  = 1 million
>   multixactid_freeze_table_age = 3 million
> ?

I personally am fine with mxid - we use xid in other settings after all.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-01-03 Thread Alvaro Herrera

Robert Haas escribió:
> On Mon, Dec 30, 2013 at 10:59 PM, Alvaro Herrera
>  wrote:
> > One problem I see is length of time before freezing multis: they live
> > for far too long, causing the SLRU files to eat way too much disk space.
> > I ran burnmulti in a loop, creating multis of 3 members each, with a min
> > freeze age of 50 million, and this leads to ~770 files in
> > pg_multixact/offsets and ~2900 files in pg_multixact/members.  Each file
> > is 32 pages long. 256kB apiece.  Probably enough to be bothersome.
> >
> > I think for computing the freezing point for multis, we should slash
> > min_freeze_age by 10 or something like that.  Or just set a hardcoded
> > one million.
> 
> Yeah.  Since we expect mxids to be composed at a much lower rate than
> xids, we can keep pg_multixact small without needing to increase the
> rate of full table scans.  However, it seems to me that we ought to
> have GUCs for mxid_freeze_table_age and mxid_freeze_min_age.  There's
> no principled way to derive those values from the corresponding values
> for XIDs, and I can't see any reason to suppose that we know how to
> auto-tune brand new values better than we know how to auto-tune their
> XID equivalents that we've had for years.
> 
> One million is probably a reasonable default for mxid_freeze_min_age, though.

I didn't want to propose having new GUCs, but if there's no love for my
idea of deriving it from the Xid freeze policy, I guess it's the only
solution.  Just keep in mind we will need to back-patch these new GUCs
to 9.3.  Are there objections to this?

Also, what would be good names?  Peter E. complained recently about the
word MultiXactId being exposed in some error messages; maybe "mxid" is
too short an abbreviation of that.  Perhaps
  multixactid_freeze_min_age  = 1 million
  multixactid_freeze_table_age = 3 million
?
I imagine this stuff would be described somewhere in the docs, perhaps
within the "routine maintenance" section somewhere.

FWIW the idea of having a glossary sounds good to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-03 Thread Alvaro Herrera

Robert Haas escribió:

> The other thing that bothers me here is that, while a normalized
> command string sounds great in theory, as soon as you want to allow
> (for example) mapping schema A on node 1 to schema B on node 2, the
> wheels come off: you'll have to deparse that normalized command string
> so you can change out the schema name and then reassemble it back into
> a command string again.  So we're going to parse the user input, then
> deparse it, hand over the results to the application code, which will
> then parse it, modify that, and deparse it again.

I have considered several ideas on this front, but most of them turn out
to be useless or too cumbersome to use.  What seems most adequate is to
build a command string containing certain patterns, and an array of
replacement values for such patterns; each pattern corresponds to one
element that somebody might want modified in the command.  As a trivial
example, a command such as

   CREATE TABLE foo (bar INTEGER);

would return a string like
   CREATE TABLE ${table_schema}.${table_name} (bar INTEGER);

and the replacement array would be
   {table_schema => "public", table_name => "foo"}

If we additionally provide a function to expand the replacements in the
string, we would have the base funcionality of a normalized command
string.  If somebody wants to move the table to some other schema, they
can simply modify the array to suit their taste, and again expand using
the provided function; this doesn't require parsing SQL.  It's likely
that there are lots of fine details that need exploring before this is a
fully workable idea -- I have just started work on it, so please bear
with me.

I think this is basically what you call "a JSON blob".

> Finally, I'm very skeptical of the word "normalized".  To me, that
> sounds like an alias for "modifying the command string in unspecified
> ways that big brother thinks will be useful to event trigger authors".
>  Color me skeptical.  What if somebody doesn't want their command
> string normalized?  What if they want it normalized in a way that's
> different from the way that we've chosen to normalize it?  I fear that
> this whole direction amounts to "we don't know how to design a real
> API so let's just do surgery on the command string and call whatever
> pops out the API".

You might criticize the example above by saying that I haven't
considered using a JSON array for the list of table elements; in a
sense, I would be being Big Brother and deciding that you (as the user)
don't need to mess up with the column/constraints list in a table you're
creating.  I thought about it and wasn't sure if there was a need to
implement that bit in the first iteration of this implementation.  One
neat thing about this string+replaceables idea is that we can later
change what replaceable elements the string has, thus providing more
functionality (thus, for example, perhaps the column list can be altered
in v2 that was a "constant" in v1), without breaking existing users of
the v1.

> > but there
> > is a slight problem for some kind of objects that are represented partly
> > as ALTER state during creation; for example creating a table with a
> > sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
> > be other cases I'm missing, also.  (The REFRESH command is nominally
> > also supported.)
> 
> There are lots of places in the DDL code where we pass around
> constructed parse trees as a substitute for real argument lists.  I
> expect that many of those places will eventually get refactored away,
> so it's important that this feature does not end up relying on
> accidents of the current code structure.  For example, an
> AlterTableStmt can actually do a whole bunch of different things in a
> single statement: SOME of those are handled by a loop in
> ProcessUtilitySlow() and OTHERS are handled internally by AlterTable.
> I'm pretty well convinced that that division of labor is a bad design,
> and I think it's important that this feature doesn't make that dubious
> design decision into documented behavior.

Yeah, the submitted patch took care of these elements by invoking the
appropriate collection function at all the right places.  Most of it
happened right in ProcessUtilitySlow, but other bits were elsewhere (for
instance, sub-objects created in a complex CREATE SCHEMA command).  I
mentioned the ALTER SEQUENCE example above because that happens in a
code path that wasn't even close to the rest of the stuff.

> > Now about the questions I mentioned above:
> >
> > a) It doesn't work to reverse-parse the statement nodes in all cases;
> > there are several unfixable bugs if we only do that.  In order to create
> > always-correct statements, we need access to the catalogs for the
> > created objects.  But if we are doing catalog access, then it seems to
> > me that we can do away with the statement parse nodes completely and
> > just reconstruct the objects from catalog information.  Shall we go that
> > ro

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Claudio Freire

On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer  wrote:
> On 01/02/2014 07:52 PM, Claudio Freire wrote:
>
>>> No, because this doesn't scale automatically with the bandwidth-delay
>>> product.  It also requires that the client buffers queries and their
>>> parameters even though the network has to do that anyway.
>>
>>
>> Why not? I'm talking about transport-level packets, btw, not libpq
>> frames/whatever.
>>
>> Yes, the network stack will sometimes do that. But the it doesn't have
>> to do it. It does it sometimes, which is not the same.
>
>
> The network inevitably buffers because the speed of light is not infinite.
>
> Here's a concrete example.  Suppose the server is 100ms away, and you want
> to send data at a constant rate of 10 Mbps.  The server needs to acknowledge
> the data you sent, but this acknowledgment arrives after 200 ms.  As a
> result, you've sent 2 Mbits before the acknowledgment arrives, so the
> network appears to have buffered 250 KB.  This effect can actually be used
> for data storage, called "delay line memory", but it is somewhat out of
> fashion now.
...
>> So, trusting the network start to do the quick start won't work. For
>> steady streams of queries, it will work. But not for short bursts,
>> which will be the most heavily used case I believe (most apps create
>> short bursts of inserts and not continuous streams at full bandwidth).
>
>
> Loading data into the database isn't such an uncommon task.  Not everything
> is OLTP.

Truly, but a sustained insert stream of 10 Mbps is certainly way
beyond common non-OLTP loads. This is far more specific than non-OLTP.

Buffering will benefit the vast majority of applications that don't do
steady, sustained query streams. Which is the vast majority of
applications. An ORM doing a flush falls in this category, so it's an
overwhelmingly common case.

>> And buffering algorithms are quite platform-dependent anyway, so it's
>> not the best idea to make libpq highly reliant on them.
>
>
> That is why I think libpq needs to keep sending until the first response
> from the server arrives.  Batching a fixed number of INSERTs together in a
> single conceptual query does not achieve auto-tuning to the buffering
> characteristics of the path.

Not on its own, but it does improve thoughput during slow start, which
benefits OLTP, which is a hugely common use case. As you say, the
network will then auto-tune when the query stream is consistent
enough, so what's the problem with explicitly buffering a little then?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Florian Weimer


On 01/02/2014 07:52 PM, Claudio Freire wrote:


No, because this doesn't scale automatically with the bandwidth-delay
product.  It also requires that the client buffers queries and their
parameters even though the network has to do that anyway.


Why not? I'm talking about transport-level packets, btw, not libpq
frames/whatever.

Yes, the network stack will sometimes do that. But the it doesn't have
to do it. It does it sometimes, which is not the same.


The network inevitably buffers because the speed of light is not infinite.

Here's a concrete example.  Suppose the server is 100ms away, and you 
want to send data at a constant rate of 10 Mbps.  The server needs to 
acknowledge the data you sent, but this acknowledgment arrives after 200 
ms.  As a result, you've sent 2 Mbits before the acknowledgment arrives, 
so the network appears to have buffered 250 KB.  This effect can 
actually be used for data storage, called "delay line memory", but it is 
somewhat out of fashion now.



And buffering algorithms are quite platform-dependent anyway, so it's
not the best idea to make libpq highly reliant on them.


That is why I think libpq needs to keep sending until the first response 
from the server arrives.  Batching a fixed number of INSERTs together in 
a single conceptual query does not achieve auto-tuning to the buffering 
characteristics of the path.



So, trusting the network start to do the quick start won't work. For
steady streams of queries, it will work. But not for short bursts,
which will be the most heavily used case I believe (most apps create
short bursts of inserts and not continuous streams at full bandwidth).


Loading data into the database isn't such an uncommon task.  Not 
everything is OLTP.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-03 Thread Oskari Saarenmaa

On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
> On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa  wrote:
> > This allows the variables to be moved from .data to .rodata section which
> > means that more data can be shared by processes and makes sure that nothing
> > can accidentally overwrite the read-only definitions.  On a x86-64 Linux
> > system this moves roughly 9 kilobytes of previously writable data to the
> > read-only data segment in the backend and 4 kilobytes in libpq.
> >
> > https://github.com/saaros/postgres/compare/constify
> >
> > 24 files changed, 108 insertions(+), 137 deletions(-)
> 
> This sounds like a broadly good thing, but I've had enough painful
> experiences with const to be a little wary.  And how much does this
> really affect data sharing?  Doesn't copy-on-write do the same thing
> for writable data?  Could we get most of the benefit by const-ifying
> one or two large data structures and forget the rest?

Thanks for the review and sorry for the late reply, I was offline for a
while.

As Wim Lewis pointed out in his mail the const data is most likely
mixed with non-const data and copy-on-write won't help with all of it. 
Also, some of the const data includes duplicates and thus .data actually
shrinks more than .rodata grows.

We'd probably get most of the space-saving benefits by just constifying the
biggest variables, but I think applying const to more things will also make
things more correct.

> Other comments:
> 
> - The first hunk of the patch mutilates the comment it modifies for no
> apparent reason.  Please revert.
> 
> - Why change the API of transformRelOptions()?

The comment was changed to reflect the new API, I modified
transformRelOptions to only accept a single valid namespace to make things
simpler in the calling code.  Nothing used more than one valid namespace
anyway, and it allows us to just use a constant "toast" without having to
create a 2 char* array with a NULL.

> -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
> +/* The extra NUL-terminator will make sure a warning is raised if the
> + * storage space for name is too small, otherwise when strlen(name) ==
> + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
> + */
> +#define DEF_ENC2NAME(name, codepage) { #name "\0", PG_##name }
> 
> - The above hunk is not related to the primary purpose of this patch.

It sort-of is.  Without fixed size char-arrays it's not possible to move
everything to .rodata, but fixed size char-arrays come with the drawback of
silently dropping the NUL-terminator when strlen(str) == sizeof(array), by
forcing a NUL-terminator in we always get a warning if it would've been
dropped and the size of the array can then be increased.

Thanks,
Oskari

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

[HACKERS] cleanup in code

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

[HACKERS] new json funcs

[HACKERS] [PATCH] Support for pg_stat_archiver view

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Re: [HACKERS] costing of hash join

[HACKERS] costing of hash join

Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

Re: [HACKERS] ISN extension bug? (with patch)

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly

Re: [HACKERS] [PATCH] Doc fix for VACUUM FREEZE

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Re: [HACKERS] ISN extension bug? (with patch)

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] Changeset Extraction Interfaces

Re: [HACKERS] truncating pg_multixact/members

Re: [HACKERS] truncating pg_multixact/members

Re: [HACKERS] Add CREATE support to event triggers

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] RFC: Async query processing

Re: [HACKERS] [PATCH] Make various variables read-only (const)

29 matches

Site Navigation

Mail list logo

Footer information