Re: [HACKERS] vac truncation scan problems

2015-03-30 Thread Jeff Janes
On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes  wrote:

> After freeing up the rows at the end of the table so it is eligible for
> truncation, then running a manual VACUUM to actually release the space, I
> kept running into the problem that the truncation scan was consistently
> suspended and then aborted due to a conflicting lock requested/held.
>
> But the perversity is that that conflicting lock request can only be
> coming, as far as I can tell, from the autovac process.  I'm not sure how
> this happens, as I thought autovac never waited for locks but only obtained
> one if it were instantaneously available, but that it is the only
> explanation I can think of.
>
> I'm not seeing this in 9.4, but I'm not sure how deterministic it is so
> maybe that is just luck.
>


It looks like the culprit is this:

commit 0d831389749a3baaced7b984205b9894a82444b9
Author: Alvaro Herrera 
Date:   Wed Mar 18 11:52:33 2015 -0300

Rationalize vacuuming options and parameters

I'd guess the autovac nature of the autovac process is getting lost in
there where, but I don't see where.

Cheers,

Jeff


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2015-03-30 Thread Jehan-Guillaume de Rorthais
On Tue, 3 Mar 2015 11:15:13 -0500
Bruce Momjian  wrote:

> On Tue, Oct 14, 2014 at 01:21:53PM -0400, Bruce Momjian wrote:
> > On Tue, Oct 14, 2014 at 09:20:22AM -0700, Jeff Janes wrote:
> > > On Mon, Oct 13, 2014 at 12:11 PM, Bruce Momjian  wrote:
> > > 
> > > 
> > > I looked into this, and came up with more questions.  Why is
> > > checkpoint_completion_target involved in the total number of WAL
> > > segments?  If checkpoint_completion_target is 0.5 (the default), the
> > > calculation is:
> > > 
> > >         (2 + 0.5) * checkpoint_segments + 1
> > > 
> > > while if it is 0.9, it is:
> > > 
> > >         (2 + 0.9) * checkpoint_segments + 1
> > > 
> > > Is this trying to estimate how many WAL files are going to be created
> > > during the checkpoint?  If so, wouldn't it be (1 +
> > > checkpoint_completion_target), not "2 +".  My logic is you have the
> > > old WAL files being checkpointed (that's the "1"), plus you have new WAL
> > > files being created during the checkpoint, which would be
> > > checkpoint_completion_target * checkpoint_segments, plus one for the
> > > current WAL file.
> > > 
> > > 
> > > WAL is not eligible to be recycled until there have been 2 successful
> > > checkpoints.
> > > 
> > > So at the end of a checkpoint, you have 1 cycle of WAL which has just
> > > become eligible for recycling,
> > > 1 cycle of WAL which is now expendable but which is kept anyway, and
> > > checkpoint_completion_target worth of WAL which has occurred while the
> > > checkpoint was occurring and is still needed for crash recovery.
> > 
> > OK, so based on this analysis, what is the right calculation?  This?
> > 
> > (1 + checkpoint_completion_target) * checkpoint_segments + 1 +
> > max(wal_keep_segments, checkpoint_segments)
> 
> Now that we have min_wal_size and max_wal_size in 9.5, I don't see any
> value to figuring out the proper formula for backpatching.

I guess it worth backpatching the documentation as 9.4 -> 9.1 will be supported
for somes the next 4 years

-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-30 Thread Michael Paquier
On Tue, Mar 31, 2015 at 9:40 AM, Peter Eisentraut  wrote:

> On 3/30/15 6:29 PM, Michael Paquier wrote:
> >
> >
> > On Tue, Mar 31, 2015 at 5:39 AM, Peter Eisentraut  > > wrote:
> >
> > On 3/25/15 1:32 AM, Michael Paquier wrote:
> > > Well, I have no other cases than ones of the type mentioned
> upthread,
> > > and honestly I am fine as long as we do not apply maths to a
> version
> > > string. So attached is a patch that adds VERSION_NUM in
> > > Makefile.global.
> >
> > How would you make use of this in an extension makefile?
> >
> >
> > One use case is regression test list filtering depending on backend
> version.
>
> I'm interested in the exact syntax you'd use, to compare it to the
> currently used techniques.
>

With the presence of VERSION_NUM directly in pg_config, the following
expression:
VERSION_NUM=$(shell $(PG_CONFIG) --version-num)

With its presence in Makefile.global, that's close to what you can do with
pg_config.h already:
VERSION_NUM := $(shell cat `$(PG_CONFIG) --libdir`/pgxs/src/Makefile.global
\
| perl -ne 'print $$1 and exit if /VERSION_NUM =\s+(\d+)/')
But that looks a little bit magic..

Another advantage of putting this information in pg_config is for
environments that do not have PGXS installed, for example MSVC.
-- 
Michael


[HACKERS] GUC context information in the document.

2015-03-30 Thread Kyotaro HORIGUCHI
Hello, I had a question that whether a change of some GUC
parameter needs restart or not and similar questions come every
now and then.

As shown below, descriptions about GUC context is surely put
there but I believe the average reader of the document doesn't
recognize that clearly while looking into an item without
comparing it with several other itmes that he/she knows how
behave.

Addition to that, the description for PGC_POSTMASTER doesn't say
that it also can be set on command line, and description for
PGC_SIGHUP doesn't state clearly that realoading makes it
effective, and PGC_USERSET has even no description.

If I'm not missing anyting, putting stereotyped information about
GUC contexts like following would be usable.

> share_buffers (integer), (effective after server restart)

> log_destination (string), (effetive after config reload)

> log_min_duration_statement (integer), (effective in-session, superuser only)

> DateStyle (string), (effective in-session)


What do you think about this?

regards,

=
The typical descriptions are listed below.

A. PGC_POSTMASTER

http://www.postgresql.org/docs/devel/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

> shared_buffers (integer)
>... This parameter can only be set at server start.


B. PGC_SIGHUP

http://www.postgresql.org/docs/devel/static/runtime-config-logging.html#GUC-LOG-DESTINATION

> log_destination (string)
>... This parameter can only be set in the postgresql.conf
>file or on the server command line.

C. PGC_SUSET

http://www.postgresql.org/docs/devel/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT

> log_min_duration_statement (integer)
> ... Only superusers can change this setting.


D. PGC_USERSET

http://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-DATESTYLE

> DateStyle (string)
 ... nothing about GUC context is written ...

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] vac truncation scan problems

2015-03-30 Thread Jeff Janes
After freeing up the rows at the end of the table so it is eligible for
truncation, then running a manual VACUUM to actually release the space, I
kept running into the problem that the truncation scan was consistently
suspended and then aborted due to a conflicting lock requested/held.

But the perversity is that that conflicting lock request can only be
coming, as far as I can tell, from the autovac process.  I'm not sure how
this happens, as I thought autovac never waited for locks but only obtained
one if it were instantaneously available, but that it is the only
explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so
maybe that is just luck.

On an otherwise idle system:

pgbench -i -s 1000 -n

alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey ;
delete from pgbench_accounts;
vacuum verbose pgbench_accounts;

As soon as truncation scan starts, it suspends and then quickly terminates.

Anyone have theories on what's going on?

Cheers,

Jeff


Re: [HACKERS] Relation extension scalability

2015-03-30 Thread Amit Kapila
On Mon, Mar 30, 2015 at 8:57 PM, Stephen Frost  wrote:
>
>
> If we're able to extend based on page-level locks rather than the global
> relation locking that we're doing now, then I'm not sure we really need
> to adjust how big the extents are any more.  The reason for making
> bigger extents is because of the locking problem we have now when lots
> of backends want to extend a relation, but, if I'm following correctly,
> that'd go away with Andres' approach.
>

The benefit of extending in bigger chunks in background is that backend
would need to perform such an operation at relatively lesser frequency
which in itself could be a win.

> We don't have full patches for either of these and so I don't mind
> saying that, basically, I'd prefer to see if we still have a big
> bottleneck here with lots of backends trying to extend the same relation
> before we work on adding this particular feature in as it might end up
> being unnecessary.

Agreed, I think it is better to first see the results of current
patch on which Andres is working and then if someone is interested
and can show any real benefit with the patch to extend relation
in bigger chunks, then that might be worth consideration.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] What exactly is our CRC algorithm?

2015-03-30 Thread Petr Jelinek

On 25/03/15 18:24, Heikki Linnakangas wrote:

On 03/25/2015 07:20 PM, Andres Freund wrote:

On 2015-03-25 19:18:51 +0200, Heikki Linnakangas wrote:

Or better yet, a direct configure test to check if the
intrinsic exists - that way we get to also use it on Intel compilers,
which
I believe also has the same intrinsics.


Maybe I'm missing something, but configure isn't run for msvc?


Good point. On MSVC, we use the pre-built pg_config.h.win32 file
instead. There are already a couple of cases like this in it:

/* Define to 1 if you have the `rint' function. */
#if (_MSC_VER >= 1800)
#define HAVE_RINT 1
#endif

I think we should do that for the CRC32 intrinsic too.



Yeah, 1500 being MSVC 2008. But I think there is little point in putting 
it in pg_config.h.win32 when it's only going to be used in 2 places in 
single header file.


Do you plan to commit this otherwise as is?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cleanup double semicolons at the end of source lines

2015-03-30 Thread Petr Jelinek

Hi,

While reading the code I noticed couple of double semicolons at the end 
of lines so I searched for all of them and replaced them with single 
ones. The resulting diff is attached.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..94fab18 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -317,7 +317,7 @@ foreign_expr_walker(Node *node,
 			break;
 		case T_ArrayRef:
 			{
-ArrayRef   *ar = (ArrayRef *) node;;
+ArrayRef   *ar = (ArrayRef *) node;
 
 /* Assignment should not be in restrictions. */
 if (ar->refassgnexpr != NULL)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3aa9e42..88ec83c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -14308,7 +14308,7 @@ makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner)
 			r->relname = strVal(lsecond(names));
 			break;
 		case 3:
-			r->catalogname = strVal(linitial(names));;
+			r->catalogname = strVal(linitial(names));
 			r->schemaname = strVal(lsecond(names));
 			r->relname = strVal(lthird(names));
 			break;
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 2a41eb1..7d6d154 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -226,7 +226,7 @@ start_logical_replication:
 {
 	StartReplicationCmd *cmd;
 	cmd = makeNode(StartReplicationCmd);
-	cmd->kind = REPLICATION_KIND_LOGICAL;;
+	cmd->kind = REPLICATION_KIND_LOGICAL;
 	cmd->slotname = $3;
 	cmd->startpoint = $5;
 	cmd->options = $6;
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d0d7206..f08e288 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -165,7 +165,7 @@ static inline void
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
 	if (!lex_accept(lex, token, NULL))
-		report_parse_error(ctx, lex);;
+		report_parse_error(ctx, lex);
 }
 
 /* chars to consider as part of an alphanumeric token */
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 5833401..7e2c359 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1592,7 +1592,7 @@ jsonb_agg_transfn(PG_FUNCTION_ARGS)
 if (v.type == jbvString)
 {
 	/* copy string values in the aggregate context */
-	char	   *buf = palloc(v.val.string.len + 1);;
+	char	   *buf = palloc(v.val.string.len + 1);
 	snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 	v.val.string.val = buf;
 }
@@ -1753,7 +1753,7 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 if (v.type == jbvString)
 {
 	/* copy string values in the aggregate context */
-	char	   *buf = palloc(v.val.string.len + 1);;
+	char	   *buf = palloc(v.val.string.len + 1);
 	snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 	v.val.string.val = buf;
 }
@@ -1811,7 +1811,7 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 if (v.type == jbvString)
 {
 	/* copy string values in the aggregate context */
-	char	   *buf = palloc(v.val.string.len + 1);;
+	char	   *buf = palloc(v.val.string.len + 1);
 	snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 	v.val.string.val = buf;
 }
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index a8cdeaa..274f64c 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -2675,7 +2675,7 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
 
 	/* make these in a sufficiently long-lived memory context */
 	old_cxt = MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory);
-	state->ret_tdesc = CreateTupleDescCopy(tupdesc);;
+	state->ret_tdesc = CreateTupleDescCopy(tupdesc);
 	BlessTupleDesc(state->ret_tdesc);
 	state->tuple_store = tuplestore_begin_heap(rsi->allowedModes &
 			   SFRM_Materialize_Random,
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 723be9a..8e896eb 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -959,7 +959,7 @@ chr			(PG_FUNCTION_ARGS)
 		if (bytes == 2)
 		{
 			wch[0] = 0xC0 | ((cvalue >> 6) & 0x1F);
-			wch[1] = 0x80 | (cvalue & 0x3F);;
+			wch[1] = 0x80 | (cvalue & 0x3F);
 		}
 		else if (bytes == 3)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication

2015-03-30 Thread Tatsuo Ishii
> On 31/03/15 12:45, Tatsuo Ishii wrote:
>> In the doc:
>> 
>> 25.2.5. Streaming Replication
>> :
>> The standby connects to the primary, which streams WAL records to the
>> standby as they're generated, without waiting for the WAL file to be
>> filled.
>> 
>> This seems to claim that walsender sends WAL files which has not been
>> fsync'd yet. However, in walsender.c:
>> 
>>  /*
>>   * Streaming the current timeline on a master.
>>   *
>>   * Attempt to send all data that's already been written out and
>>   * fsync'd to disk.  We cannot go further than what's been 
>> written out
>>   * given the current implementation of XLogRead().  And in any 
>> case
>>   * it's unsafe to send WAL that is not securely down to disk on 
>> the
>>   * master: if the master subsequently crashes and restarts, 
>> slaves
>>   * must not have applied any WAL that gets lost on the master.
>>   */
>>  
>> This one says walsender sends WAL records as long as there are
>> fsync'd.
>> 
>> Am I missing something?
>> 
>>
> 
> I think the docs are trying to say that streaming replication doesn't
> wait for a (16MB) WAL *file* to be filled. but sends each (fsync'd) WAL
> record. I had to reread it several times too :-)

Thanks for the explanation. That makes sense. The docs definitely has
a room for improvement.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug #10432 failed to re-find parent key in index

2015-03-30 Thread Joshua D. Drake


Hello,

Just wondering if what Peter said was the last word on this?

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-30 Thread Tomas Vondra
On 29.3.2015 10:58, Fabien COELHO wrote:

>>> So my overall conclusion is:
>>>
>>> (1) The simple thread-shared file approach would save pgbench from
>>> post-processing merge-sort heavy code, for a reasonable cost.
>>
>> No it wouldn't - you're missing the fact that the proposed approach
>> (shared file + fprintf) only works with raw transaction log.
>>
>> It does not work with aggregated log - the threads would have to somehow
>> track the progress of the other threads somehow, in a very non-trivial
>> way (e.g. what if one of the threads executes a long query, and thus
>> does not send the results for a long time?).
> 
> The counters are updated when the transaction is finished anyway?

Yes, but the thread does not know it's time to write the results until
it completes the first transaction after the interval ends ...

Let's say the very first query in thread #1 takes a minute for some
reason, while the other threads process 100 transactions per second. So
before the thread #1 can report 0 transactions for the first second, the
other threads already have results for the 60 intervals.

I think there's no way to make this work except for somehow tracking
timestamp of the last submitted results for each thread, and only
flushing results older than the minimum of the timestamps. But that's
not trivial - it certainly is more complicated than just writing into a
shared file descriptor.

>> Another option would be to update shared aggregated results, but
>> that requires locking.
> 
> That is what I had in mind. ISTM that the locking impact would be
> much lower than for logging, the data is just locked for a counter
> update, and if counters are per-thread, a conflict may only occur
> when the data are gathered for actual logging, which would be rather
> infrequent. Even if the counters are shared, the locking would be for
> small time that would imply a low conflict probability. So I do not
> see this one as a significant performance issue.

No, for the reasons I explained above (thread not sending results for a
long time, etc.).

Merging results for each transaction would not have this issue, but it
would also use the lock much more frequently, and that seems like a
pretty bad idea (especially for the workloads with short transactions
that you suggested are bad match for detailed log - this would make the
aggregated log bad too).

Also notice that with all the threads will try to merge the data (and
thus acquire the lock) at almost the same time - this is especially true
for very short transactions. I would be surprised if this did not cause
issues on read-only workloads with large numbers of threads.

>>> (2) The feature would not be available for the thread-emulation with
>>> this approach, but I do not see this as a particular issue as I
>>> think that it is pretty much only dead code and a maintenance burden.
>>
>> I'm not willing to investigate that, nor am I willing to implement
>> another feature that works only sometimes (I've done that in the past,
>> and I find it a bad practice).
> 
> Hmmm. Keeping an obsolete feature with significant impact on how
> other features can be implemented, so basically a maintenance burden,
> does not look like best practice *either*.

That is not what I said, though.

If thread emulation is considered obsolete - fine, let's remove it. But
that's not really certain at this point, and until it's actually removed
I'm not particularly thrilled by adding a feature that does not work
with --disable-thread-safety.

>> If someone else is willing to try to eliminate the thread
>> emulation, I won't object to that.
> 
> Hmmm. I'll try to trigger a discussion in another thread to test the
> idea.

OK, good.

> 
>> But as I pointed out above, simple fprintf is not going to work
>> for the aggregated log - solving that will need more code (e.g. 
>> maintaining aggregated results for all threads, requiring
>> additional locking etc).
> 
> The code for that is probably simple and short, and my wish is to
> try to avoid an external merge sort post processing, if possible,
> which is not especially cheap anyway, neither in code nor in time.

First, I don't think it's as simple as you think.

Second, which the merge sort is not free, it happens *after* the pgbench
run completed and does not interfere with it at all.


>>> (3) Optimizing doLog from its current fprintf-based implementation
>>> may be a good thing.
>>
>> That's probably true. The simplest thing we can do right now is 
>> buffering the data into larger chunks and writing those chunks. 
>> That amortizes the costs of locking.
> 
> If it is buffered in the process, that would mean more locking. If
> it is buffered per threads that would result in out of order logs.
> Would that be an issue? It would be fine with me.

You're mixing two things - optimizing doLog and writing into a shared
file. I was talking about optimizing doLog as it stands now, i.e. per
thread, and in that case it does not cause any out-of-order logs.

Als

[HACKERS] pg_rewind tests

2015-03-30 Thread Peter Eisentraut
There are some small issues with the pg_rewind tests.

This technique

check: all
$(prove_check) :: local
$(prove_check) :: remote

for passing arguments to "prove" does not work with the tools included
in Perl 5.8.

While sorting out the portability issues in the TAP framework during the
9.4 release cycle, we had set 5.8 as the oldest Perl version that is
supported.  (It's the Perl version in RHEL 5.)  I suggest using
environment variables instead, unless we want to change that.

Moreover,

if ($test_mode == "local")
...
elsif ($test_mode == "remote")

don't work, because those are numerical comparisons, not string
comparisons.  So the remote branch is never actually run.

Finally, RewindTest.pm should use

use strict;
use warnings;

and the warnings caused by that should be addressed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-30 Thread Peter Eisentraut
On 3/30/15 6:29 PM, Michael Paquier wrote:
> 
> 
> On Tue, Mar 31, 2015 at 5:39 AM, Peter Eisentraut  > wrote:
> 
> On 3/25/15 1:32 AM, Michael Paquier wrote:
> > Well, I have no other cases than ones of the type mentioned upthread,
> > and honestly I am fine as long as we do not apply maths to a version
> > string. So attached is a patch that adds VERSION_NUM in
> > Makefile.global.
> 
> How would you make use of this in an extension makefile?
> 
> 
> One use case is regression test list filtering depending on backend version.

I'm interested in the exact syntax you'd use, to compare it to the
currently used techniques.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication

2015-03-30 Thread Mark Kirkwood
On 31/03/15 12:45, Tatsuo Ishii wrote:
> In the doc:
> 
> 25.2.5. Streaming Replication
> :
> The standby connects to the primary, which streams WAL records to the
> standby as they're generated, without waiting for the WAL file to be
> filled.
> 
> This seems to claim that walsender sends WAL files which has not been
> fsync'd yet. However, in walsender.c:
> 
>   /*
>* Streaming the current timeline on a master.
>*
>* Attempt to send all data that's already been written out and
>* fsync'd to disk.  We cannot go further than what's been 
> written out
>* given the current implementation of XLogRead().  And in any 
> case
>* it's unsafe to send WAL that is not securely down to disk on 
> the
>* master: if the master subsequently crashes and restarts, 
> slaves
>* must not have applied any WAL that gets lost on the master.
>*/
>   
> This one says walsender sends WAL records as long as there are
> fsync'd.
> 
> Am I missing something?
> 
>

I think the docs are trying to say that streaming replication doesn't
wait for a (16MB) WAL *file* to be filled. but sends each (fsync'd) WAL
record. I had to reread it several times too :-)

Cheers

Mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

2015-03-30 Thread Jim Nasby

On 3/30/15 5:46 AM, Ronan Dunklau wrote:

Hello hackers,

I've tried my luck on pgsql-bugs before, with no success, so I report these
problem here.

The documentation mentions the following limits for sizes:

Maximum Field Size  1 GB
Maximum Row Size1.6 TB

However, it seems like rows bigger than 1GB can't be COPYed out:

ro=# create table test_text (c1 text, c2 text);
CREATE TABLE
ro=# insert into test_text (c1) VALUES (repeat('a', 536870912));
INSERT 0 1
ro=# update test_text set c2 = c1;
UPDATE 1

Then, trying to dump or copy that results in the following error:

ro=# COPY test_text TO '/tmp/test';
ERROR:  out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912
more bytes.

In fact, the same thing happens when using a simple SELECT:

ro=# select * from test_text ;
ERROR:  out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912
more bytes.

In the case of COPY, the server uses a StringInfo to output the row. The
problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row
should be able to hold much more than that.


Yeah, shoving a whole row into one StringInfo is ultimately going to 
limit a row to 1G, which is a far cry from what the docs claim. There's 
also going to be problems with FE/BE communications, because things like 
pq_sendbyte all use StringInfo as a buffer too. So while Postgres can 
store a 1.6TB row, you're going to find a bunch of stuff that doesn't 
work past around 1GB.



So, is this a bug ? Or is there a caveat I would have missed in the
documentation ?


I suppose that really depends on your point of view. The real question 
is whether we think it's worth fixing, or a good idea to change the 
behavior of StringInfo.


StringInfo uses int's to store length, so it could possibly be changed, 
but then you'd just error out due to MaxAllocSize.


Now perhaps those could both be relaxed, but certainly not to the extent 
that you can shove an entire 1.6TB row into an output buffer.


The other issue is that there's a LOT of places in code that blindly 
copy detoasted data around, so while we technically support 1GB toasted 
values you're probably going to be quite unhappy with performance. I'm 
actually surprised you haven't already seen this with 500MB objects.


So long story short, I'm not sure how worthwhile it would be to try and 
fix this. We probably should improve the docs though.


Have you looked at using large objects for what you're doing? (Note that 
those have their own set of challenges and limitations.)



We also hit a second issue, this time related to bytea encoding.


There's probably several other places this type of thing could be a 
problem. I'm thinking of conversions in particular.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication

2015-03-30 Thread Tatsuo Ishii
In the doc:

25.2.5. Streaming Replication
:
The standby connects to the primary, which streams WAL records to the
standby as they're generated, without waiting for the WAL file to be
filled.

This seems to claim that walsender sends WAL files which has not been
fsync'd yet. However, in walsender.c:

/*
 * Streaming the current timeline on a master.
 *
 * Attempt to send all data that's already been written out and
 * fsync'd to disk.  We cannot go further than what's been 
written out
 * given the current implementation of XLogRead().  And in any 
case
 * it's unsafe to send WAL that is not securely down to disk on 
the
 * master: if the master subsequently crashes and restarts, 
slaves
 * must not have applied any WAL that gets lost on the master.
 */

This one says walsender sends WAL records as long as there are
fsync'd.

Am I missing something?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-03-30 Thread Fabrízio de Royes Mello
On Mon, Mar 30, 2015 at 7:41 PM, Jim Nasby  wrote:
>
> On 3/27/15 2:23 PM, Fabrízio de Royes Mello wrote:
>>
>> Hi all,
>>
>> I'm tweaking some autovacuum settings in a table with high write usage
>> but with ALTER TABLE .. SET ( .. ) this task was impossible, so I did a
>> catalog update  (pg_class) to change reloptions.
>>
>> Maybe it's a stupid doubt, but why we need to get an AccessExclusiveLock
>> on relation to set reloptions if we just touch in pg_class tuples
>> (RowExclusiveLock) ?
>
>
> For a very long time catalog access was not MVCC safe. I think that's
been changed, so at this point it may be OK to relax the lock, at least in
the case of autovac settings. There may well be other settings in there
where it would not be safe.
>

Hummm There are a comment in AlterTableGetLockLevel:

 3017 /*
 3018  * Rel options are more complex than first appears.
Options
 3019  * are set here for tables, views and indexes; for
historical
 3020  * reasons these can all be used with ALTER TABLE, so
we can't
 3021  * decide between them using the basic grammar.
 3022  *
 3023  * XXX Look in detail at each option to determine
lock level,
 3024  * e.g. cmd_lockmode = GetRelOptionsLockLevel((List *)
 3025  * cmd->def);
 3026  */
 3027 case AT_SetRelOptions:  /* Uses MVCC in getIndexes()
and
 3028  * getTables() */
 3029 case AT_ResetRelOptions:/* Uses MVCC in getIndexes()
and
 3030  * getTables() */
 3031 cmd_lockmode = AccessExclusiveLock;
 3032 break;


Maybe it's time to implement "GetRelOptionsLockLevel" to relax the lock to
autovac settings (AccessShareLock). To other settings we continue using
AccessExclusiveLock.

There are some objection to implement in that way?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] How about to have relnamespace and relrole?

2015-03-30 Thread Jim Nasby

On 3/29/15 1:55 PM, Tom Lane wrote:

Andrew Dunstan  writes:

I have just claimed this as committer in the CF, but on reviewing the
emails it looks like there is disagreement about the need for it at all,
especially from Tom and Robert.



I confess I have often wanted regnamespace, particularly, and
occasionally regrole, simply as a convenience. But I'm not going to
commit it against substantial opposition.



Do we need a vote?


My concern about it is basically that I don't see where we stop.
The existing regFOO alias types are provided for object classes which
have nontrivial naming conventions (schema qualification, overloaded
argument types, etc), so that you can't just do "select ... from
catalog where objectname = 'blah'".  That doesn't apply to namespaces
or roles.  So I'm afraid that once this precedent is established,
there will be demands for regFOO for every object class we have,
and I don't want that much clutter.



IMHO the real issue here is it's just a royal PITA to query the catalogs 
in so many cases, especially on an ad-hoc basis. information_schema 
helps in many cases, but it sometimes doesn't have PG-specific stuff 
that you need. And if you're writing code you can't depend on it 
actually being there. (I've also heard it's horribly slow...)


I don't see a good way to really solve that other than reviving 
pg_sysview and pulling that in.


BTW, it would arguably be better if we just exposed the function that 
deals with quoting and schemas; IIRC it doesn't actually need the catalog.



It may be that these two cases are so much more useful than any other
conceivable cases that we can do them and stop, but I don't think that
argument has been made convincingly.


Short of fixing the underlying problem I think regnamspace would 
absolutely be worth it. I find myself wanting that all the time.


I generally don't care about roles too much, but maybe that's just me.

FWIW, (and off-topic...) the other thing I find very painful is dealing 
with ACLs. has_*_privilege does what it does well, but if you want to do 
anything else (such as ensure permissions on an object are specified as 
you expect) you're stuck resorting to things like


NOT EXISTS( SELECT has_*_privilege() FROM pg_roles WHERE rolname NOT IN() )

which is awkward because you actually need one of those for every 
permission you want to check.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-30 Thread David Fetter
On Tue, Mar 31, 2015 at 07:29:07AM +0900, Michael Paquier wrote:
> On Tue, Mar 31, 2015 at 5:39 AM, Peter Eisentraut  wrote:
> 
> > On 3/25/15 1:32 AM, Michael Paquier wrote:
> > > Well, I have no other cases than ones of the type mentioned upthread,
> > > and honestly I am fine as long as we do not apply maths to a version
> > > string. So attached is a patch that adds VERSION_NUM in
> > > Makefile.global.
> >
> > How would you make use of this in an extension makefile?
> >
> 
> One use case is regression test list filtering depending on backend version.

So basically, extension writers get to win a Rube Goldberg if they use
it for actually building software :/

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug fix for missing years in make_date()

2015-03-30 Thread David Fetter
On Mon, Mar 30, 2015 at 05:35:29PM -0500, Jim Nasby wrote:
> On 3/26/15 5:26 PM, David Fetter wrote:
> >+ * Note: Non-positive years are take to be BCE.
> 
> s/take/taken/

Good point.  Next patch attached.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index d66f640..610 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -248,10 +248,15 @@ make_date(PG_FUNCTION_ARGS)
tm.tm_mday = PG_GETARG_INT32(2);
 
/*
-* Note: we'll reject zero or negative year values.  Perhaps negatives
-* should be allowed to represent BC years?
+* Note: Non-positive years are taken to be BCE.
 */
-   dterr = ValidateDate(DTK_DATE_M, false, false, false, &tm);
+   if (tm.tm_year >= 0)
+   dterr = ValidateDate(DTK_DATE_M, false, false, false, &tm);
+   else
+   {
+   tm.tm_year = -1 * tm.tm_year;
+   dterr = ValidateDate(DTK_DATE_M, false, false, true, &tm);
+   }
 
if (dterr != 0)
ereport(ERROR,
diff --git a/src/test/regress/expected/date.out 
b/src/test/regress/expected/date.out
index 8923f60..73b3062 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1191,6 +1191,12 @@ select make_date(2013, 7, 15);
  07-15-2013
 (1 row)
 
+select make_date(-44, 3, 15);  -- Non-positive years are BCE
+   make_date   
+---
+ 03-15-0044 BC
+(1 row)
+
 select make_time(8, 20, 0.0);
  make_time 
 ---
@@ -1204,8 +1210,6 @@ select make_date(2013, 13, 1);
 ERROR:  date field value out of range: 2013-13-01
 select make_date(2013, 11, -1);
 ERROR:  date field value out of range: 2013-11--1
-select make_date(-44, 3, 15);  -- perhaps we should allow this sometime?
-ERROR:  date field value out of range: -44-03-15
 select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
diff --git a/src/test/regress/sql/date.sql b/src/test/regress/sql/date.sql
index a62e92a..e6bff17 100644
--- a/src/test/regress/sql/date.sql
+++ b/src/test/regress/sql/date.sql
@@ -279,11 +279,11 @@ select isfinite('infinity'::date), 
isfinite('-infinity'::date), isfinite('today'
 
 -- test constructors
 select make_date(2013, 7, 15);
+select make_date(-44, 3, 15);  -- Non-positive years are BCE
 select make_time(8, 20, 0.0);
 -- should fail
 select make_date(2013, 2, 30);
 select make_date(2013, 13, 1);
 select make_date(2013, 11, -1);
-select make_date(-44, 3, 15);  -- perhaps we should allow this sometime?
 select make_time(10, 55, 100.1);
 select make_time(24, 0, 2.1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-03-30 Thread Jim Nasby

On 3/27/15 2:23 PM, Fabrízio de Royes Mello wrote:

Hi all,

I'm tweaking some autovacuum settings in a table with high write usage
but with ALTER TABLE .. SET ( .. ) this task was impossible, so I did a
catalog update  (pg_class) to change reloptions.

Maybe it's a stupid doubt, but why we need to get an AccessExclusiveLock
on relation to set reloptions if we just touch in pg_class tuples
(RowExclusiveLock) ?


For a very long time catalog access was not MVCC safe. I think that's 
been changed, so at this point it may be OK to relax the lock, at least 
in the case of autovac settings. There may well be other settings in 
there where it would not be safe.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a join algorithm in Postgres

2015-03-30 Thread Jim Nasby

On 3/27/15 3:01 AM, Ravi Kiran wrote:

I have written a C program which reads from 3 files(Each file is table
having 2 columns and thousands of rows).The program is used to join
those 3 tables and the algorithm which I have written will work only for
those 3 files. Now I want to test this program for postgres. Can someone
tell me how to do that, where should I include this program so that I
can successfully implement that program in postgres.


Without seeing the program, no.

This also isn't the correct list; discussion like this should go on 
pgsql-general@.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug fix for missing years in make_date()

2015-03-30 Thread Jim Nasby

On 3/26/15 5:26 PM, David Fetter wrote:

+* Note: Non-positive years are take to be BCE.


s/take/taken/
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-30 Thread Michael Paquier
On Tue, Mar 31, 2015 at 5:39 AM, Peter Eisentraut  wrote:

> On 3/25/15 1:32 AM, Michael Paquier wrote:
> > Well, I have no other cases than ones of the type mentioned upthread,
> > and honestly I am fine as long as we do not apply maths to a version
> > string. So attached is a patch that adds VERSION_NUM in
> > Makefile.global.
>
> How would you make use of this in an extension makefile?
>

One use case is regression test list filtering depending on backend version.
-- 
Michael


Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-03-30 Thread Jim Nasby

On 3/27/15 5:15 AM, Vladimir Borodin wrote:

Hi all.

I have described [0] a problem with delaying replicas after vacuuming a
relation with big btree index. It stucks in replaying WAL record of
type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):

rmgr: Btree   len (rec/tot): 20/52, tx:  0, lsn:
4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel
1663/16420/16796; blk 31222118, lastBlockVacuumed 0

Master writes this record to xlog in btvacuumscan [1] function after
vacuuming of all index pages. And in case of no pages with deleted items
xlog record would contain lastBlockVacuumed 0.

In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed
to last block of the index while applying this record because there is
no api in the buffer manager to understand if the page is unpinned.

So if the index is quite big (200+ GB in described case) it takes much
time to do it. So the questions are:

1. Aren’t there still any api in buffer manager to understand that the
page is not in shared_buffers without reading it?
I don't know offhand, but since XLogReadBufferExtended already has a 
mode argument it wouldn't be too hard to add it there.



2. Is it possible not to write to xlog record with lastBlockVacuumed 0
in some cases? For example, in case of not deleting any pages.


Possibly, but that's much higher risk. Without studying it, if we wanted 
to mess around with that it might actually make more sense to XLOG a set 
of blkno's that got vacuumed, but I suspect that wouldn't be a win.



Or maybe there are some better ways of improving this situation?

[0]
http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name
[1]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
[2]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482

--
May the force be with you…
https://simply.name




--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ignoring entries generated by autoconf in code tree

2015-03-30 Thread Tom Lane
Peter Eisentraut  writes:
> On 3/25/15 10:34 AM, Tom Lane wrote:
>> Michael Paquier  writes:
>>> When running autoconf from the root tree, autom4te.cache/ is
>>> automatically generated.
>>> Wouldn't it make sense to add an entry in .gitignore for that?

>> Personally, I don't want such a thing, as then I would tend to forget
>> to remove that cache file.  And you do want to remove it.  autoconf
>> goes pretty berserk if the cache file hangs around across significant
>> changes to configure.in, such as if you were to switch branches.
>> (Or at least that used to be true --- last time I got burnt by it
>> was quite some time ago, but possibly that's just because I'm careful
>> about removing the cache file.)

> I think you might be confusing it with config.cache?

No, I've never used config.cache.  I'm not actually sure what autoconf
keeps in autom4te.cache/, but I've definitely seen it generate wacko
output files when that directory still has contents from a previous
run with some other version of the input files.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ignoring entries generated by autoconf in code tree

2015-03-30 Thread Peter Eisentraut
On 3/25/15 10:34 AM, Tom Lane wrote:
> Michael Paquier  writes:
>> When running autoconf from the root tree, autom4te.cache/ is
>> automatically generated.
>> Wouldn't it make sense to add an entry in .gitignore for that?
> 
> Personally, I don't want such a thing, as then I would tend to forget
> to remove that cache file.  And you do want to remove it.  autoconf
> goes pretty berserk if the cache file hangs around across significant
> changes to configure.in, such as if you were to switch branches.
> (Or at least that used to be true --- last time I got burnt by it
> was quite some time ago, but possibly that's just because I'm careful
> about removing the cache file.)

I think you might be confusing it with config.cache?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-30 Thread Peter Eisentraut
On 3/25/15 1:32 AM, Michael Paquier wrote:
> Well, I have no other cases than ones of the type mentioned upthread,
> and honestly I am fine as long as we do not apply maths to a version
> string. So attached is a patch that adds VERSION_NUM in
> Makefile.global.

How would you make use of this in an extension makefile?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: searching in array function - array_position

2015-03-30 Thread Pavel Stehule
2015-03-30 21:30 GMT+02:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2015-03-22 11:30 GMT+01:00 Dean Rasheed :
>
> > > In the public docs, you should s/position/subscript because that's the
> > > term used throughout the docs for an index into an array. I still like
> > > the name array_position() for the function though, because it's
> > > consistent with the existing position() functions.
> >
> > updated patch
>
> Thanks, pushed.
>
>
Thank you very much

Pavel


> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] [COMMITTERS] pgsql: Centralize definition of integer limits.

2015-03-30 Thread Andres Freund
On 2015-03-30 15:38:16 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Mar 30, 2015 at 7:48 PM, Andres Freund  wrote:
> >> On 2015-03-30 14:01:25 +0900, Michael Paquier wrote:
> >>> My OSX dev box is generating a couple of warnings since this commit:
> >>> pg_dump.c:14548:45: warning: format specifies type 'long' but the
> >>> argument has type 'long long' [-Wformat]
> 
> FWIW, I'm seeing the same thing on my OSX laptop.  I think the explanation
> is simple: both "long" and "long long" are 64 bits on this machine.
> Our configure chooses the former to be "int64", but stdint.h is using
> the latter.  I'm not sure there's any good way to know what stdint.h
> thinks is int64, but we can't do a half-baked integration job like this.
> Either we revert 83ff1618b altogether, or we make int64/uint64 depend
> on definitions from stdint.h rather than being chosen independently by
> configure.

Obviously we need to fix this, but I'd actually say this isn't
originally the fault of that change. The disparity in types already
existed in a bunch of places (timestamp.c, pgbench.c), now it's more
central, that's all.

I'm too fried from the redeye back from pgconf nyc to do anything
complicated, but it seems quite possible to define int64/uint64 based
the stdint.h types if available. And generally a good idea too. I guess
I'll try that tomorrow; unless Andrew beats me to it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Centralize definition of integer limits.

2015-03-30 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Mar 30, 2015 at 7:48 PM, Andres Freund  wrote:
>> On 2015-03-30 14:01:25 +0900, Michael Paquier wrote:
>>> My OSX dev box is generating a couple of warnings since this commit:
>>> pg_dump.c:14548:45: warning: format specifies type 'long' but the
>>> argument has type 'long long' [-Wformat]

FWIW, I'm seeing the same thing on my OSX laptop.  I think the explanation
is simple: both "long" and "long long" are 64 bits on this machine.
Our configure chooses the former to be "int64", but stdint.h is using
the latter.  I'm not sure there's any good way to know what stdint.h
thinks is int64, but we can't do a half-baked integration job like this.
Either we revert 83ff1618b altogether, or we make int64/uint64 depend
on definitions from stdint.h rather than being chosen independently by
configure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: searching in array function - array_position

2015-03-30 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2015-03-22 11:30 GMT+01:00 Dean Rasheed :

> > In the public docs, you should s/position/subscript because that's the
> > term used throughout the docs for an index into an array. I still like
> > the name array_position() for the function though, because it's
> > consistent with the existing position() functions.
> 
> updated patch

Thanks, pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-03-30 Thread Alvaro Herrera
Michael Paquier wrote:
> On Thu, Mar 26, 2015 at 1:02 AM, Fabrízio de Royes Mello
>  wrote:

> > I didn't found any other similar bug introduced by 85b506bb.
> >
> > Attached the original patch provided by Michael with some regression tests.
> 
> Thanks for adding a test, this looks fine to me (did some sanity
> checks and tutti-quanti for people wondering). On temporary tables
> this was failing with an error in md.c...

Yeah, I extended the test a bit to use a temp table too, and pushed.
Thanks everybody.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug #10432 failed to re-find parent key in index

2015-03-30 Thread Peter Geoghegan
On Mon, Mar 30, 2015 at 7:50 PM, Joshua D. Drake  wrote:
> We have a database that has run into this problem. The version is 9.1.15 on
> Linux. I note in this thread:
>
> http://www.postgresql.org/message-id/cam-w4hp34ppwegtcwjbznwhq0cmu-lxna62vjku8qrtwlob...@mail.gmail.com
>
> That things appear to be fixed in 9.4 but they have not been back-patched?
> What is the current status?

I believe that Heikki said he'd backpatch that when 9.4 was considered
very stable. I don't think that we've reached that level of confidence
in the invasive B-Tree bugfixes that went into 9.4 yet.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug #10432 failed to re-find parent key in index

2015-03-30 Thread Joshua D. Drake


Hello,

We have a database that has run into this problem. The version is 9.1.15 
on Linux. I note in this thread:


http://www.postgresql.org/message-id/cam-w4hp34ppwegtcwjbznwhq0cmu-lxna62vjku8qrtwlob...@mail.gmail.com

That things appear to be fixed in 9.4 but they have not been 
back-patched? What is the current status?


Sincerely,

jD
--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL format changes break the suppression of do-nothing checkpoints.

2015-03-30 Thread Jeff Janes
commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs
saying "If no WAL has been written since the previous checkpoint, new
checkpoints will be skipped even if checkpoint_timeout has passed",
presumably by accident.

It seems that this part is no longer true when it should be true:

if (curInsert == ControlFile->checkPoint +
MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint))

MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount by
which curInsert gets advanced is still 104, like it was before the commit.


Cheers,

Jeff


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-03-30 Thread Peter Geoghegan
On Sat, Mar 28, 2015 at 6:36 PM, Andres Freund  wrote:
> Just had a longer chat with Peter about this patch.

It was a very useful chat. Thanks for making yourself available to do it.

> * Not a fan of the heap flags usage, the reusage seems sketch to me
> * Explain should show the arbiter index in text as well
> * AddUniqueSpeculative is a bad name, it should refer IndexInfo
> * Work on the ExecInsert() comments
> * Let's remove the planner choosing the "cheapest" arbiter index; it
>   should do all.
> * s/infer_unique_index/infer_arbiter_index/

OK.

> * Not supporting inheritance properly makes me uncomfortable. I don't
>   think users will think that's a acceptable/reasonable restriction.

I'll look into making the inference specification deduce a child relation index.

> * Let's not use t_ctid directly, but add a wrapper

We talked about a union. This seems quite doable.

> * The code uses LockTupleExclusive to lock rows. That means the fkey key
>   locking doesn't work; That's annoying. This means that using upsert
>   will potentially cause deadlocks in other sessions :(. I think you'll
>   have to determine what lock to acquire by fetching the tuple, doing
>   the key comparison, and acquire the appropriate lock. That should be
>   fine.

Looking into the implementation of this. As we discussed, the
difficulty about avoiding a lock escalation within ExecUpdate() is
that we must fetch the row, run EvalPlanQual() to check if the new row
version generated by updating will require a LockTupleExclusive or
LockTupleNoKeyExclusive, and then lock the row using the right
lockmode, and only then call ExecUpdate(). Right now, UPSERT benefits
from fetching and locking the row together, so going this way imposes
a little additional complexity. It's probably worth it, though.

> * I think we should decouple the insertion and wal logging more. I think
>   the promise tuple insertion should be different from the final
>   insertion of the actual tuple. For one it seems cleaner to me, for
>   another it will avoid the uglyness around logical decoding. I think
>   also that the separation will make it more realistic to use something
>   like this for a COPY variant that doesn't raise unique violations and
>   such.

Your COPY argument swung this for me. I'm looking into the implementation.

> * We discussed the infererence and that it should just reuse (code,
>   grammar, docs) the column specification from create index.

Agreed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-03-30 Thread Stephen Frost
* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> There have been numerous threads on replacing our MD5 authentication
> method, so I started hacking on that to see what it might look like.
> Just to be clear, this is 9.6 material. Attached is a WIP patch
> series that adds support for SCRAM. There's no need to look at the
> details yet, but it demonstrates what the protocol changes and the
> code structure would be like.

Great!  Very glad that you're working on this.

> I'm not wedded to SCRAM - SRP or JPAKE or something else might be
> better. But replacing the algorithm, or adding more of them, should
> be straightforward with this.

Excellent.

> 3. Allow storing multiple verifiers in pg_authid
> 
> 
> Replace the pg_authid.rolpassword text field with an array, and
> rename it to 'rolverifiers'. This allows storing multiple password
> hashes: an MD5 hash for MD5 authentication, and a SCRAM salt and
> stored key for SCRAM authentication, etc. Each element in the array
> is a string that begins with the method's name. For example
> "md5:", or "password:".
> 
> For dump/reload, and for clients that wish to create the hashes in
> the client-side, there is a new option to CREATE/ALTER USER
> commands: PASSWORD VERIFIERS '{ ... }', that allows replacing the
> array.
> 
> The old "ENCRYPTED/UNENCRYPTED PASSWORD 'foo'" options are still
> supported for backwards-compatibility, but it's not clear what it
> should mean now.
> 
> TODO:
> 
> * Password-checking hook needs to be redesigned, to allow for more
> kinds of hashes.
> 
> * With "CREATE USER PASSWORD 'foo'", which hashes/verifiers should
> be generated by default? We currently have a boolean
> password_encryption setting for that. Needs to be a list.

This generally sounds good to me but we definitely need to have that
list of hashes to be used.  The MIT KDC for Kerberos (and I believe all
the other Kerberos implementations) have a similar setting for what will
be stored and what will be allowed for hashing and encryption options.
It's very important that we allow users to tweak this list, as we will
want to encourage users to migrate off of the existing md5 storage
mechanism and on to the SCRAM based one eventually.

Unfortunately, the first major release with this will certainly need to
default to including md5 as we can't have a password update or change
break clients right off the bat.  What I think would be fantastic would
be a warning, perhaps in the first release or maybe the second, which
deprecates md5 as an auth method and is thrown when a password is set
which includes storing an md5-based password.  I'm sure there will be
plenty of discussion about that in the future.

One additional item is that we need to have a way to prefer SCRAM-based
auth while allowing a fall-back to md5 if the client doesn't support it.
This might have to be driven by the client side explicitly saying "I
support SCRAM" from the start to avoid breaking existing clients.

> 4. Implement SCRAM
> --
> 
> The protocol and the code is structured so that it would be fairly
> easy to add more built-in SASL mechanisms, or to use a SASL library
> to provide more. But for now I'm focusing on adding exactly one new
> built-in mechanism, to replace MD5 in the long term.
> 
> In the protocol, there is a new AuthenticationSASL message,
> alongside the existing AuthenticationMD5, AuthenticationSSPI etc.
> The AuthenticationSASL message contains the name of the SASL
> mechanism used ("SCRAM-SHA-1"). Just like in the GSSAPI/SSPI
> authentication, a number of PasswordMessage and
> AuthenticationSASLContinue messages are exchanged after that,
> carrying the data specified by the SCRAM spec, until the
> authentication succeeds (or not).
> 
> TODO:
> 
> * Per the SCRAM specification, the client sends the username in the
> handshake. But in the FE/BE protocol, we've already sent it in the
> startup packet. In the patch, libpq always sends an empty username
> in the SCRAM exchange, and the username from the startup packet is
> what matters. We could also require it to be the same, but in SCRAM
> the username to be UTF-8 encoded, while in PostgreSQL the username
> can be in any encoding. That is a source of annoyance in itself, as
> it's not well-defined in PostgreSQL which encoding to use when
> sending a username to the server. But I don't want to try fixing
> that in this patch, so it seems easiest to just require the username
> to be empty.

I don't like having it be empty..  I'm not looking at the spec right at
the moment, but have you confirmed that the username being empty during
the SCRAM discussion doesn't reduce the effectiveness of the
authentication method overall in some way?  Is it ever used in
generation of the authentication verifier, etc?  One way to address the
risk which you bring up about the different encodings might be to simply
discourage using non-UTF8-compliant encodings by throwing a warning or
refusing to

Re: [HACKERS] Relation extension scalability

2015-03-30 Thread Stephen Frost
* David Steele (da...@pgmasters.net) wrote:
> On 3/30/15 6:45 AM, Andres Freund wrote:
> > On 2015-03-30 09:33:57 +0530, Amit Kapila wrote:
> >> Another thing to note here is that during extension we are extending
> >> just one block, won't it make sense to increment it by some bigger
> >> number (we can even take input from user for the same where user
> >> can specify how much to autoextend a relation when the relation doesn't
> >> have any empty space).  During mdextend(), we might increase just one
> >> block, however we can register the request for background process to
> >> increase the size similar to what is done for fsync.
> > 
> > I think that's pretty much a separate patch. Made easier by moving
> > things out of under the lock maybe. Other than that I'd prefer not to
> > mix things. There's a whole bunch of unrelated complexity that I don't
> > want to attach to the topic at the same time (autovacuum truncayting
> > again and so on).
> 
> Agreed that it makes more sense for this to be in a separate patch, but
> I definitely like the idea.
> 
> A user configurable setting would be fine, but better would be to learn
> from the current growth rate of the table and extend based on that.
> 
> For, instance, if a table is very large but is only growing by a few
> rows a day, there's probably no need for a large extent.  Conversely, an
> initially small table growing by 1GB per minute would definitely benefit
> from large extents and it would be good to be able to track growth and
> compute extent sizes accordingly.
> 
> Of course, a manual setting to start with would cover most use cases.
> Large tables in a database are generally in the minority and known in
> advance.

If we're able to extend based on page-level locks rather than the global
relation locking that we're doing now, then I'm not sure we really need
to adjust how big the extents are any more.  The reason for making
bigger extents is because of the locking problem we have now when lots
of backends want to extend a relation, but, if I'm following correctly,
that'd go away with Andres' approach.

We don't have full patches for either of these and so I don't mind
saying that, basically, I'd prefer to see if we still have a big
bottleneck here with lots of backends trying to extend the same relation
before we work on adding this particular feature in as it might end up
being unnecessary.  Now, if someone shows up tomorrow with a patch to do
this and Andres' approach ends up not progressing, then we should
certainly consider it (in due time and with consideration to the
activities going on for 9.5, of course).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2015-03-30 Thread Robert Haas
On Wed, Mar 25, 2015 at 6:27 AM, Amit Kapila  wrote:
> Apart from that I have moved the Initialization of dsm segement from
> InitNode phase to ExecFunnel() (on first execution) as per suggestion
> from Robert.  The main idea is that as it creates large shared memory
> segment, so do the work when it is really required.

So, suppose we have a plan like this:

Append
-> Funnel
  -> Partial Seq Scan
-> Funnel
  -> Partial Seq Scan
(repeated many times)

In earlier versions of this patch, that was chewing up lots of DSM
segments.  But it seems to me, on further reflection, that it should
never use more than one at a time.  The first funnel node should
initialize its workers and then when it finishes, all those workers
should get shut down cleanly and the DSM destroyed before the next
scan is initialized.

Obviously we could do better here: if we put the Funnel on top of the
Append instead of underneath it, we could avoid shutting down and
restarting workers for every child node.  But even without that, I'm
hoping it's no longer the case that this uses more than one DSM at a
time.  If that's not the case, we should see if we can't fix that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-03-30 Thread Robert Haas
On Wed, Mar 18, 2015 at 11:43 PM, Amit Kapila  wrote:
>> I think I figured out the problem.  That fix only helps in the case
>> where the postmaster noticed the new registration previously but
>> didn't start the worker, and then later notices the termination.
>> What's much more likely to happen is that the worker is started and
>> terminated so quickly that both happen before we create a
>> RegisteredBgWorker for it.  The attached patch fixes that case, too.
>
> Patch fixes the problem and now for Rescan, we don't need to Wait
> for workers to finish.

I realized that there is a problem with this.  If an error occurs in
one of the workers just as we're deciding to kill them all, then the
error won't be reported. Also, the new code to propagate
XactLastRecEnd won't work right, either.  I think we need to find a
way to shut down the workers cleanly.  The idea generally speaking
should be:

1. Tell all of the workers that we want them to shut down gracefully
without finishing the scan.

2. Wait for them to exit via WaitForParallelWorkersToFinish().

My first idea about how to implement this is to have the master detach
all of the tuple queues via a new function TupleQueueFunnelShutdown().
Then, we should change tqueueReceiveSlot() so that it does not throw
an error when shm_mq_send() returns SHM_MQ_DETACHED.  We could modify
the receiveSlot method of a DestReceiver to return bool rather than
void; a "true" value can mean "continue processing" where as a "false"
value can mean "stop early, just as if we'd reached the end of the
scan".

This design will cause each parallel worker to finish producing the
tuple it's currently in the middle of generating, and then shut down.
You can imagine cases where we'd want the worker to respond faster
than that, though; for example, if it's applying a highly selective
filter condition, we'd like it to stop the scan right away, not when
it finds the next matching tuple.  I can't immediately see a real
clean way of accomplishing that, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-03-30 Thread Robert Haas
On Fri, Mar 27, 2015 at 2:34 AM, Amit Kapila  wrote:
> The reason of this problem is that above tab-completion is executing
> query [1] which contains subplan for the funnel node and currently
> we don't have capability (enough infrastructure) to support execution
> of subplans by parallel workers.  Here one might wonder why we
> have choosen Parallel Plan (Funnel node) for such a case and the
> reason for same is that subplans are attached after Plan generation
> (SS_finalize_plan()) and if want to discard such a plan, it will be
> much more costly, tedious and not worth the effort as we have to
> eventually make such a plan work.
>
> Here we have two choices to proceed, first one is to support execution
> of subplans by parallel workers and second is execute/scan locally for
> Funnel node having subplan (don't launch workers).

It looks to me like the is an InitPlan, not a subplan.  There
shouldn't be any problem with a Funnel node having an InitPlan; it
looks to me like all of the InitPlan stuff is handled by common code
within the executor (grep for initPlan), so it ought to work here the
same as it does for anything else.  What I suspect is failing
(although you aren't being very clear about it here) is the passing
down of the parameters set by the InitPlan to the workers.  I think we
need to make that work; it's an integral piece of the executor
infrastructure and we shouldn't leave it out just because it requires
a bit more IPC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrent calls of _hash_getnewbuf()

2015-03-30 Thread Tom Lane
Antonin Houska  writes:
> When doing my experiments with bucket split ([1]), I noticed a comment that
> _hash_getnewbuf should not be called concurrently. However, there's no
> synchronization of calls from _hash_splitbucket in HEAD. I could reproduce
> such concurrent calls using gdb (multiple bucket splits in progress at a
> time).

Ugh.

> [ locking the metapage around the _hash_getnewbuf call ]
> I think it'd also be the easiest fix for _hash_splitbucket. Or should a
> separate ("regular") lock be introduced and used and used in both cases?

That doesn't really fix the problem: once we release lock on the metapage,
if a new split starts then it would try to create a bucket at the next
higher page number, so that the _hash_getnewbuf() calls might occur "out
of order", leading to failures of the crosschecks therein.

I think the only simple fix here is to rearrange _hash_splitbucket's API
so that the _hash_getnewbuf() call is done before we drop the metapage
lock at all.  Not sure offhand if the best way is to move that call
into _hash_expandtable(), or to change the API definition so that
_hash_splitbucket is responsible for dropping the metapage lock at the
right time.

Obviously, this whole business of needing the metapage lock to add pages
is nasty.  I wonder if we could use the relation extension lock instead
to serialize those operations, rather than requiring a metapage buffer
lock.  But that would be a much more invasive thing.  In any case, the
current state of affairs is that the relation physical length is tightly
tied to the bucket mapping defined by the metapage contents, and we have
to change both of those together.

Actually ... one of the other things that's not being accounted for here
is the possibility of ENOSPC while adding the new page.  Really, we want
to attempt the extension *before* we modify the bucket mapping.  So that
seems to argue for moving the _hash_getnewbuf call into _hash_expandtable,
where it could be done just before we scribble on the metapage.  OTOH,
we're not any less screwed if we get ENOSPC while trying to add an
overflow page later on :-(, since future readers would assume the bucket
split's been performed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-03-30 Thread David Steele
On 3/30/15 6:45 AM, Andres Freund wrote:
> On 2015-03-30 09:33:57 +0530, Amit Kapila wrote:
>> Another thing to note here is that during extension we are extending
>> just one block, won't it make sense to increment it by some bigger
>> number (we can even take input from user for the same where user
>> can specify how much to autoextend a relation when the relation doesn't
>> have any empty space).  During mdextend(), we might increase just one
>> block, however we can register the request for background process to
>> increase the size similar to what is done for fsync.
> 
> I think that's pretty much a separate patch. Made easier by moving
> things out of under the lock maybe. Other than that I'd prefer not to
> mix things. There's a whole bunch of unrelated complexity that I don't
> want to attach to the topic at the same time (autovacuum truncayting
> again and so on).

Agreed that it makes more sense for this to be in a separate patch, but
I definitely like the idea.

A user configurable setting would be fine, but better would be to learn
from the current growth rate of the table and extend based on that.

For, instance, if a table is very large but is only growing by a few
rows a day, there's probably no need for a large extent.  Conversely, an
initially small table growing by 1GB per minute would definitely benefit
from large extents and it would be good to be able to track growth and
compute extent sizes accordingly.

Of course, a manual setting to start with would cover most use cases.
Large tables in a database are generally in the minority and known in
advance.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] getting rid of "thread fork emulation" in pgbench?

2015-03-30 Thread Robert Haas
On Mon, Mar 30, 2015 at 2:00 AM, Fabien COELHO  wrote:
>> I really, really wish you'd stop arguing against the patch to allow
>> merging of pgbench logs in this basis.
>
> Hmmm. I'm lost. I thought that discussing how to best implement a feature
> was part of reviewing a patch.

Of course it is.  But the feature you are talking about and the
feature Tomas is talking about are two different features.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-30 Thread Simon Riggs
On 30 March 2015 at 01:08, David Rowley  wrote:
> On 18 December 2014 at 02:48, Simon Riggs  wrote:
>>
>> David, if you can update your patch with some docs to explain the
>> behaviour, it looks complete enough to think about committing it in
>> early January, to allow other patches that depend upon it to stand a
>> chance of getting into 9.5. (It is not yet ready, but I see it could
>> be).
>>
>
> I've been thinking of bumping this patch to the June commitfest as the patch
> only exists to provide the basic infrastructure for things like parallel
> aggregation, aggregate before join, and perhaps auto updating materialised
> views.
>
> It seems unlikely that any of those things will happen for 9.5.
> Does anybody object to me moving this to June's commitfest?

If the patch is ready, it should stay in the queue.

A global decision to move all uncommitted patches to June might occur
later, but that's a different decision. I don't think we should be
prejudging that situation, all it would do is hide the extent of the
real problem with reviews.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump / copy bugs with "big lines" ?

2015-03-30 Thread Ronan Dunklau
Hello hackers,

I've tried my luck on pgsql-bugs before, with no success, so I report these 
problem here.

The documentation mentions the following limits for sizes:

Maximum Field Size  1 GB
Maximum Row Size1.6 TB

However, it seems like rows bigger than 1GB can't be COPYed out:

ro=# create table test_text (c1 text, c2 text);
CREATE TABLE
ro=# insert into test_text (c1) VALUES (repeat('a', 536870912));
INSERT 0 1
ro=# update test_text set c2 = c1;
UPDATE 1

Then, trying to dump or copy that results in the following error:

ro=# COPY test_text TO '/tmp/test';
ERROR:  out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912 
more bytes.

In fact, the same thing happens when using a simple SELECT:

ro=# select * from test_text ;
ERROR:  out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912 
more bytes.

In the case of COPY, the server uses a StringInfo to output the row. The 
problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row 
should be able to hold much more than that.

So, is this a bug ? Or is there a caveat I would have missed in the 
documentation ?

We also hit a second issue, this time related to bytea encoding.

This test case is a bit more complicated, since I had to use an external 
(client) program to insert my data. It involves inserting a string that fit 
into 1GB when encoded in escape format, but is larger than that in hex, and 
another string which fits in 1GB using the hex format, but is larger than that 
in escape:

from psycopg2 import connect
from io import BytesIO

conn = connect(dbname="ro")
cur = conn.cursor()
fullcontent = BytesIO()

# Write a binary string that weight less
# than 1 GB when escape encoded, but more than
# that if hex encoded
for i in range(200):
fullcontent.write(b"aaa" * 100)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

fullcontent.seek(0)
fullcontent.truncate()

# Write another binary string that weight
# less than 1GB when hex encoded, but more than
# that if escape encoded
cur.execute("SET bytea_output = 'hex'")
fullcontent.write(b"x")
for i in range(300):
fullcontent.write(b"00" * 100)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

cur.execute("COMMIT;")
cur.close()

I couldn't find an invocation of pg_dump which would allow me to dump both 
lines:

ro@ronan_laptop /tmp % PGOPTIONS="-c bytea_output=escape" pg_dump  -Fc  > 
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult() 
failed.
pg_dump: Error message from server: ERROR:  invalid memory alloc request size 
120001
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;
ro@ronan_laptop /tmp % PGOPTIONS="-c bytea_output=hex" pg_dump  -Fc  > 
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult() 
failed.
pg_dump: Error message from server: ERROR:  invalid memory alloc request size 
120003
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;


Using a COPY with binary format works:

ro=# COPY test_bytea TO '/tmp/test' WITH BINARY;


There seems to be a third issue, with regards to escape encoding: the 
backslash character is escaped, by adding another backslash. This means that a 
field which size is less than 1GB using the escape  sequence will not be able 
to be output once the backslash are escaped.

For example, lets consider a string consisting of 3 '\' characters:


ro=# select length(c1) from test_bytea;
  length   
---
 3
(1 ligne)

ro=# select length(encode(c1, 'escape')) from test_bytea ;
  length   
---
 6
(1 ligne)

ro=# set bytea_output to escape;
SET
ro=# copy test_bytea to '/tmp/test.csv' ; 

ERROR:  out of memory
DÉTAIL : Cannot enlarge string buffer containing 1073741822 bytes by 1 more 
bytes.

I think pg_dump should not error out on any data which was valid upon 
insertion. It seems the fix would be non-trivial, since StringInfo structures 
are relying on a limit of MaxAllocSize. Or am I missing something ?

Thank you.

 
-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Relation extension scalability

2015-03-30 Thread Andres Freund
On 2015-03-30 09:33:57 +0530, Amit Kapila wrote:
> In the past, I have observed in one of the Write-oriented tests that
> backend's have to flush the pages by themselves many a times, so
> in above situation that can lead to more severe bottleneck.

Yes.

> > I've prototyped solving this for heap relations moving the smgrnblocks()
> > + smgrextend() calls to RelationGetBufferForTuple(). With some care
> > (including a retry loop) it's possible to only do those two under the
> > extension lock. That indeed fixes problems in some of my tests.
> >
> 
> So do this means that the problem is because of contention on extension
> lock?

Yes, at least commonly. Obviously the extension lock would be less of a
problem if we were better at having clean victim buffer ready.

> > I'm not sure whether the above is the best solution however.
> 
> Another thing to note here is that during extension we are extending
> just one block, won't it make sense to increment it by some bigger
> number (we can even take input from user for the same where user
> can specify how much to autoextend a relation when the relation doesn't
> have any empty space).  During mdextend(), we might increase just one
> block, however we can register the request for background process to
> increase the size similar to what is done for fsync.

I think that's pretty much a separate patch. Made easier by moving
things out of under the lock maybe. Other than that I'd prefer not to
mix things. There's a whole bunch of unrelated complexity that I don't
want to attach to the topic at the same time (autovacuum truncayting
again and so on).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans with btree_gist

2015-03-30 Thread Heikki Linnakangas

On 03/29/2015 04:30 AM, Andreas Karlsson wrote:

On 03/29/2015 03:25 AM, Andreas Karlsson wrote:

On 03/28/2015 09:36 PM, Andreas Karlsson wrote:

Thanks! Do you know if it is possible to add index-only scan support to
range indexes? I have never looked at those and do not know if they are
lossy.


Seems like range types are not compressed at all so implementing
index-only scans was trivial. A patch is attached.


Noticed a couple of typos, so to avoid having them sneak into the code
here is a version without them.


Thanks, pushed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Concurrent calls of _hash_getnewbuf()

2015-03-30 Thread Antonin Houska
When doing my experiments with bucket split ([1]), I noticed a comment that
_hash_getnewbuf should not be called concurrently. However, there's no
synchronization of calls from _hash_splitbucket in HEAD. I could reproduce
such concurrent calls using gdb (multiple bucket splits in progress at a
time).

When the function is called from _hash_getovflpage, content lock of metapage
buffer seems to be (mis)used to synchronize the calls:

/*
 * Fetch the page with _hash_getnewbuf to ensure smgr's idea of the
 * relation length stays in sync with ours.  XXX It's annoying to do 
this
 * with metapage write lock held; would be better to use a lock that
 * doesn't block incoming searches.
 */
newbuf = _hash_getnewbuf(rel, blkno, MAIN_FORKNUM);

I think it'd also be the easiest fix for _hash_splitbucket. Or should a
separate ("regular") lock be introduced and used and used in both cases?


[1] http://www.postgresql.org/message-id/32423.1427413442@localhost

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
new file mode 100644
index 46c6c96..25c1dd1
*** a/src/backend/access/hash/hashpage.c
--- b/src/backend/access/hash/hashpage.c
*** _hash_splitbucket(Relation rel,
*** 765,771 
--- 765,773 
  	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
  
  	nblkno = start_nblkno;
+ 	_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
  	nbuf = _hash_getnewbuf(rel, nblkno, MAIN_FORKNUM);
+ 	_hash_chgbufaccess(rel, metabuf, HASH_WRITE, HASH_NOLOCK);
  	npage = BufferGetPage(nbuf);
  
  	/* initialize the new bucket's primary page */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-30 Thread Albe Laurenz
Michael Paquier wrote:
> Well, I am not sure about that... But reading this thread changing the
> default rounding sounds unwelcome. So it may be better to just put in
> words the rounding method used now in the docs, with perhaps a mention
> that this is not completely in-line with the SQL spec if that's not
> the case.

The SQL standard does not care, it says that numbers and other data types
should, whenever necessary, be rounded or truncated in an implementation-
defined fashion.

I cannot find any mention of a round() function.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers