Re: [HACKERS] question about replication docs

2017-06-06 Thread Peter Eisentraut
On 5/29/17 09:11, Dave Cramer wrote:
> The following makes an explicit reference to the simple query protocol
> being the only protocol allowed in walsender mode. It is my
> understanding this is true for logical replication as well ??

It's the same thing (for this purpose).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] question about replication docs

2017-05-29 Thread Dave Cramer
The following makes an explicit reference to the simple query protocol
being the only protocol allowed in walsender mode. It is my understanding
this is true for logical replication as well ??


51.3. Streaming Replication Protocol
To initiate streaming replication, the frontend sends the replication
parameter in the startup message. A Boolean value of true tells the backend
to go into walsender mode, wherein a small set of replication commands can
be issued instead of SQL statements. Only the simple query protocol can be
used in walsender mode. Replication commands are logged in the server log
when log_replication_commands is enabled. Passing database as the value
instructs walsender to connect to the database specified in the dbname
parameter, which will allow the connection to be used for logical
replication from that database.


Dave Cramer


Re: [HACKERS] Question about toasting code

2017-05-07 Thread Mat Arye
On Sun, May 7, 2017 at 3:48 PM, Tom Lane  wrote:

> Mat Arye  writes:
> > This is in arrayfuncs.c:5022 (postgre 9.6.2)
>
> > /*
> > * Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if
> > * it's varlena.  (You might think that detoasting is not needed here
> > * because construct_md_array can detoast the array elements later.
> > * However, we must not let construct_md_array modify the ArrayBuildState
> > * because that would mean array_agg_finalfn damages its input, which is
> > * verboten.  Also, this way frequently saves one copying step.)
> > */
>
> > I am a bit confused by the comment.
>
> > Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue?
>
> No.
>
> What the comment is on about is that construct_md_array does this:
>
> /* make sure data is not toasted */
> if (elmlen == -1)
> elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));
>
> that is, it intentionally modifies the passed-in elems array in
> the case that one of the elements is a toasted value.  For most
> callers, the elems array is only transient storage anyway.  But
> it's problematic for makeMdArrayResult, because it would mean
> that the ArrayBuildState is changed --- and while it's not changed
> in a semantically significant way, that's still a problem, because
> the detoasted value would be allocated in a context that is possibly
> shorter-lived than the ArrayBuildState is.  (In a hashed aggregation
> situation, the ArrayBuildState is going to live in storage that is
> persistent across the aggregation, but we are calling makeMdArrayResult
> in the context where we want the result value to be created, which
> is of only per-output-tuple lifespan.)
>
> You could imagine fixing this by having construct_md_array do some
> context switches internally, but that would complicate it.  And in
> any case, fixing the problem where it is allows us to combine the
> possible detoasting with copying the value into the ArrayBuildState's
> context, so we'd probably keep doing that even if it was safe not to.
>
>
Thanks. That clears it up.


> > The thing I am struggling with is with the serialize/deserialize
> functions.
> > Can I detoast inside the serialize function if I use _COPY? Or is there a
> > reason I have to detoast during the sfunc?
>
> Should be able to detoast in serialize if you want.  Are you having
> trouble with that?  (It might be inefficient to do it that way, if
> you have to serialize the same value multiple times.  But I'm not
> sure if that can happen.)
>
>
I haven't run into any actual problems yet, just wanted to figure out a
clean mental model before implementing. Thanks a lot for the clarification.


> regards, tom lane
>


Re: [HACKERS] Question about toasting code

2017-05-07 Thread Tom Lane
Mat Arye  writes:
> This is in arrayfuncs.c:5022 (postgre 9.6.2)

> /*
> * Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if
> * it's varlena.  (You might think that detoasting is not needed here
> * because construct_md_array can detoast the array elements later.
> * However, we must not let construct_md_array modify the ArrayBuildState
> * because that would mean array_agg_finalfn damages its input, which is
> * verboten.  Also, this way frequently saves one copying step.)
> */

> I am a bit confused by the comment.

> Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue?

No.

What the comment is on about is that construct_md_array does this:

/* make sure data is not toasted */
if (elmlen == -1)
elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));

that is, it intentionally modifies the passed-in elems array in
the case that one of the elements is a toasted value.  For most
callers, the elems array is only transient storage anyway.  But
it's problematic for makeMdArrayResult, because it would mean
that the ArrayBuildState is changed --- and while it's not changed
in a semantically significant way, that's still a problem, because
the detoasted value would be allocated in a context that is possibly
shorter-lived than the ArrayBuildState is.  (In a hashed aggregation
situation, the ArrayBuildState is going to live in storage that is
persistent across the aggregation, but we are calling makeMdArrayResult
in the context where we want the result value to be created, which
is of only per-output-tuple lifespan.)

You could imagine fixing this by having construct_md_array do some
context switches internally, but that would complicate it.  And in
any case, fixing the problem where it is allows us to combine the
possible detoasting with copying the value into the ArrayBuildState's
context, so we'd probably keep doing that even if it was safe not to.

> The thing I am struggling with is with the serialize/deserialize functions.
> Can I detoast inside the serialize function if I use _COPY? Or is there a
> reason I have to detoast during the sfunc?

Should be able to detoast in serialize if you want.  Are you having
trouble with that?  (It might be inefficient to do it that way, if
you have to serialize the same value multiple times.  But I'm not
sure if that can happen.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about toasting code

2017-05-07 Thread Mat Arye
Hi,

I am trying to create a custom aggregate and have run across some puzzling
code while trying to figure out how to implement it.

This is in arrayfuncs.c:5022 (postgre 9.6.2)

/*
* Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if
* it's varlena.  (You might think that detoasting is not needed here
* because construct_md_array can detoast the array elements later.
* However, we must not let construct_md_array modify the ArrayBuildState
* because that would mean array_agg_finalfn damages its input, which is
* verboten.  Also, this way frequently saves one copying step.)
*/
if (!disnull && !astate->typbyval)
{
if (astate->typlen == -1)
dvalue = PointerGetDatum(PG_DETOAST_DATUM_COPY(dvalue));
else
dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen);
}

I am a bit confused by the comment.

Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue? Shouldn't that  not
modify the value (implied by _COPY)? If so then why does the detoasting
have to happen ahead of time and cannot happen at a later stage (in the
finalfunc or serializefunc etc.)? I understand that at those later stages
you cannot modify the input, but why would you have to in order to DETOAST?

The thing I am struggling with is with the serialize/deserialize functions.
Can I detoast inside the serialize function if I use _COPY? Or is there a
reason I have to detoast during the sfunc?


Thanks,
Mat
TimescaleDB


Re: [HACKERS] question: data file update when pg_basebackup in progress

2017-04-25 Thread Michael Paquier
On Wed, Apr 26, 2017 at 1:45 AM, David G. Johnston
 wrote:
> The first write to a page after a checkpoint is always recorded in the WAL
> as a full page write.  Every WAL file since the checkpoint must also be
> copied to the backed up system.  The replay of those WAL files is what
> brings the remote and local system into sync with respect to all changes
> since the backup checkpoint.

Bringing to the point that the presence of backup_label in a backup is
critical, as this tells Postgres from which position in WAL it should
begin recovery to bring the system up to a consistent state.
pg_basebackup also makes sure that the last WAL segment needed is
archived before the backup completes so as recovery can completely be
done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] question: data file update when pg_basebackup in progress

2017-04-25 Thread David G. Johnston
On Tue, Apr 25, 2017 at 9:08 AM, Rui Hai Jiang  wrote:

> When pg_basebackup is launched, a checkpoint is created first, then all
> files are transferred to the  pg_basebackup client.  Is it possible that a
> data page(say page-N) in a data file is changed after the checkpoint and
> before the pg_basebackup is finished?
>

​I believe so.
​

> If this happens,  is it possible that only part of the changed page be
> transferred to the pg_basebackup client?  i.e.  the pg_basebackup client
> gets page-N with part of the old content and part of the new content. How
> does postgreSQL handle this kind of data page?
>

​The first write to a page after a checkpoint is always recorded in the WAL
as a full page write.  Every ​WAL file since the checkpoint must also be
copied to the backed up system.  The replay of those WAL files is what
brings the remote and local system into sync with respect to all changes
since the backup checkpoint.

David J.


[HACKERS] question: data file update when pg_basebackup in progress

2017-04-25 Thread Rui Hai Jiang
Hello,
I'm checking how the pg_basebackup works and I got a question(maybe there are 
no such issues):

When pg_basebackup is launched, a checkpoint is created first, then all files 
are transferred to the  pg_basebackup client.  Is it possible that a data 
page(say page-N) in a data file is changed after the checkpoint and before the 
pg_basebackup is finished?

If this happens,  is it possible that only part of the changed page be 
transferred to the pg_basebackup client?  i.e.  the pg_basebackup client gets 
page-N with part of the old content and part of the new content. How does 
postgreSQL handle this kind of data page?

Thanks,
Rui Hai



[HACKERS] Question about one of the old Autonomous Transaction approach

2017-04-06 Thread Vaishnavi Prabakaran
Hi All,

Regarding the discussion about Autonomous transaction in below message ID,
long time ago, it has been specified that having a new structure "Struct
PGAutonomousXACT" was rejected in PGCon hackers meeting. Can anyone know
why is it been rejected? What is the disadvantage/problem identified with
that approach?

ca+u5nmkeum4abrqbndlyt5ledektae8rbiyw3977yhmeowq...@mail.gmail.com



I tried to look for answers going through various mails related to
Autonomous transaction with no luck. Any answer or hint about where to look
for answers will be helpful.


Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


[HACKERS] Question about memory contexts in ReindexMultipleTables()

2017-02-17 Thread Andreas Karlsson

Hi,

When working on REINDEX CONCURRENTLY I noticed that the new memory 
context created in the ReindexMultipleTables() seems pointless.


The purpose claimed in the code for introducing the 
ReindexMultipleTables context is to make sure the list we build with 
relation IDs survive the commit, since a commit destroys the 
TopTransactionContext and ReindexMultipleTables() first runs one 
transaction to list which tables should be reindexed and then reindexes 
each index in a separate transaction.


But in the first transactions where the lsit is built we actually never 
use TopTransactionContext, isntead PortalHeapMemory is used which is a 
memory context which does not go away until the REINDEX command has 
completed. So everything should work in the same way even if we just 
remove the ReindexMultipleTables memory context.


Am I missing something? Should the ReindexMultipleTables memory context 
be removed, or should we switch to TopTransactionContext at the begining 
of ReindexMultipleTables() so temporary resources used in the initial 
transaction can be freed?


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-21 Thread Masahiko Sawada
On Mon, Oct 17, 2016 at 10:04 PM, Kevin Grittner  wrote:
> On Sun, Oct 16, 2016 at 9:26 PM, Masahiko Sawada  
> wrote:
>
>> When I set old_snapshot_threshold = 0 I got error at step #3, which
>> means that the error is occurred without table pruning.
>
> The "snapshot too old" error can happen without pruning, but only
> because there is no way to tell the difference between a page that
> has been pruned since the snapshot was taken and a page which has
> had some other kind of modification since the snapshot was taken.
>
> Ignoring false positives for a moment (where the page is updated by
> something other than pruning), what is required for early pruning
> is that the snapshot has expired (which due to "rounding" and
> avoidance of locking could easily take up to a minute or two more
> than the old_snapshot_threshold setting) and then there is page
> pruning due to a vacuum or just HOT pruning from a page read.  At
> some point after that, a read which is part of returning data to
> the user (e.g., not just positioning for index modification) can
> see that the snapshot is too old and that the LSN for the page is
> past the snapshot LSN.  That is when you get the error.
>> We have regression test for this feature but it sets
>> old_snapshot_threshold = 0, I doubt about we can test it properly.
>> Am I missing something?
>
> This is a hard feature to test properly, and certainly hard to test
> without the test running for a long time.  The zero setting is
> really not intended to be used in production, but only to allow
> some half-way decent testing that doesn't take extreme lengths of
> time.  If you add some delays of a few minutes each at key points
> in a test, you should be able to get a test that works with a
> setting of 1min.  It is not impossible that we might need to add a
> memory barrier to one or two places to get such tests to behave
> consistently, but I have not been able to spot where, if anywhere,
> that would be.


Thank you for explanation! I understood.
When old_snapshot_threshold = 0, it skips to allocate shared memory
area for the xid array and skips the some logic in order to avoid
using the shared memory, so I was concerned about that a little.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-19 Thread Kevin Grittner
On Wed, Oct 19, 2016 at 11:11 AM, Bruce Momjian  wrote:
> On Wed, Oct 19, 2016 at 11:08:28AM -0500, Kevin Grittner wrote:
>> On Wed, Oct 19, 2016 at 10:04 AM, Bruce Momjian  wrote:
>>
>>> Slide 10 of this presentation has an example showing
>>> old_snapshot_threshold set to '1min':
>>>
>>> http://momjian.us/main/writings/pgsql/features.pdf
>>
>> If the presentation is intending to show reasonable values for
>> production use, that would be better as, maybe, '2h'.

> The example is just to illustrate the activities required to trigger it,
> e.g.  pg_sleep(), VACUUM.

Yeah, in such a demonstration you probably don't want to have
everyone sit for 2 hours, so 1 minute makes perfect sense.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-19 Thread Bruce Momjian
On Wed, Oct 19, 2016 at 11:08:28AM -0500, Kevin Grittner wrote:
> On Wed, Oct 19, 2016 at 10:04 AM, Bruce Momjian  wrote:
> 
> > Slide 10 of this presentation has an example showing
> > old_snapshot_threshold set to '1min':
> >
> > http://momjian.us/main/writings/pgsql/features.pdf
> 
> If the presentation is intending to show reasonable values for
> production use, that would be better as, maybe, '2h'.
> 
> As the documentation says at:
> 
> https://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-OLD-SNAPSHOT-THRESHOLD

The example is just to illustrate the activities required to trigger it,
e.g.  pg_sleep(), VACUUM.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-19 Thread Kevin Grittner
On Wed, Oct 19, 2016 at 10:04 AM, Bruce Momjian  wrote:

> Slide 10 of this presentation has an example showing
> old_snapshot_threshold set to '1min':
>
> http://momjian.us/main/writings/pgsql/features.pdf

If the presentation is intending to show reasonable values for
production use, that would be better as, maybe, '2h'.

As the documentation says at:

https://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-OLD-SNAPSHOT-THRESHOLD

| Useful values for production work probably range from a small
| number of hours to a few days. The setting will be coerced to a
| granularity of minutes, and small numbers (such as 0 or 1min) are
| only allowed because they may sometimes be useful for testing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-19 Thread Bruce Momjian
On Mon, Oct 17, 2016 at 08:04:43AM -0500, Kevin Grittner wrote:
> > We have regression test for this feature but it sets
> > old_snapshot_threshold = 0, I doubt about we can test it properly.
> > Am I missing something?
> 
> This is a hard feature to test properly, and certainly hard to test
> without the test running for a long time.  The zero setting is
> really not intended to be used in production, but only to allow
> some half-way decent testing that doesn't take extreme lengths of
> time.  If you add some delays of a few minutes each at key points
> in a test, you should be able to get a test that works with a
> setting of 1min.  It is not impossible that we might need to add a
> memory barrier to one or two places to get such tests to behave
> consistently, but I have not been able to spot where, if anywhere,
> that would be.

Slide 10 of this presentation has an example showing
old_snapshot_threshold set to '1min':

http://momjian.us/main/writings/pgsql/features.pdf

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-17 Thread Kevin Grittner
On Sun, Oct 16, 2016 at 9:26 PM, Masahiko Sawada  wrote:

> When I set old_snapshot_threshold = 0 I got error at step #3, which
> means that the error is occurred without table pruning.

The "snapshot too old" error can happen without pruning, but only
because there is no way to tell the difference between a page that
has been pruned since the snapshot was taken and a page which has
had some other kind of modification since the snapshot was taken.

Ignoring false positives for a moment (where the page is updated by
something other than pruning), what is required for early pruning
is that the snapshot has expired (which due to "rounding" and
avoidance of locking could easily take up to a minute or two more
than the old_snapshot_threshold setting) and then there is page
pruning due to a vacuum or just HOT pruning from a page read.  At
some point after that, a read which is part of returning data to
the user (e.g., not just positioning for index modification) can
see that the snapshot is too old and that the LSN for the page is
past the snapshot LSN.  That is when you get the error.

> We have regression test for this feature but it sets
> old_snapshot_threshold = 0, I doubt about we can test it properly.
> Am I missing something?

This is a hard feature to test properly, and certainly hard to test
without the test running for a long time.  The zero setting is
really not intended to be used in production, but only to allow
some half-way decent testing that doesn't take extreme lengths of
time.  If you add some delays of a few minutes each at key points
in a test, you should be able to get a test that works with a
setting of 1min.  It is not impossible that we might need to add a
memory barrier to one or two places to get such tests to behave
consistently, but I have not been able to spot where, if anywhere,
that would be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-16 Thread Masahiko Sawada
On Fri, Oct 14, 2016 at 11:29 PM, Kevin Grittner  wrote:
> On Fri, Oct 14, 2016 at 8:53 AM, Amit Kapila  wrote:
>> On Fri, Oct 14, 2016 at 1:40 PM, Masahiko Sawada  
>> wrote:
>
>>> For example, I set old_snapshot_threshold = 1min and prepare a table
>>> and two terminals.
>>> And I did the followings steps.
>>>
>>> 1. [Terminal 1] Begin transaction and get snapshot data and wait.
>>>  BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
>>>  SELECT * FROM test;
>>>
>>> 2. [Terminal 2] Another session updates test table in order to make
>>> snapshot dirty.
>>>  BEGIN;
>>>  UPDATE test SET c = c + 100;
>>>  COMMIT;
>>>
>>> 3. [Terminal 1] 1 minute after, read the test table again in same
>>> transaction opened at #1. I got no error.
>>> SELECT * FROM test;
>>>
>>> 4. [Terminal 2] Another session reads the test table.
>>>  BEGIN;
>>>  SELECT * FROM test;
>>>  COMMIT;
>>>
>>> 5. [Terminal 1] 1 minute after, read the test table again, and got
>>> "snapshot error" error.
>>>  SELECT * FROM test;
>>>
>>> Since #2 makes a snapshot I got at #1 dirty, I expected to get
>>> "snapshot too old" error at #3 where I read test table again after
>>> enough time. But I could never get "snapshot too old" error at #3.
>>>
>>
>> Here, the basic idea is that till the time corresponding page is not
>> pruned or table vacuuming hasn't triggered, this error won't occur.
>> So, I think what is happening here that during step #4 or step #3, it
>> has pruned the table, after which you started getting error.
>
> The pruning might be one factor.  Another possible issue is that
> effectively it doesn't start timing that 1 minute until the clock
> hits the start of the next minute (i.e., 0 seconds after the next
> minute).  The old_snapshot_threshold does not attempt to guarantee
> that the snapshot too old error will happen at the earliest
> opportunity, but that the error will *not* happen until the
> snapshot is *at least* that old.  Keep in mind that the expected
> useful values for this parameter are from a small number of hours
> to a day or two, depending on the workload.  The emphasis was on
> minimizing overhead, even when it meant the cleanup might not be
> quite as "eager" as it could otherwise be.
>

Thanks! I understood.
I've tested with autovacuum = off, so it has pruned the table at step #4.

When I set old_snapshot_threshold = 0 I got error at step #3, which
means that the error is occurred without table pruning.
We have regression test for this feature but it sets
old_snapshot_threshold = 0, I doubt about we can test it properly.
Am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question on "record type has not been registered"

2016-10-16 Thread Jim Nasby
I'm working on a function(attached) that returns a bitmask of NULL 
fields in a record. It works fine if I feed it a row directly, but fails 
in this case:


select record_nulls(r), expected, CASE WHEN record_nulls(r) <> expected 
THEN 'BAD' END AS bad, r

from (values(row(NULL,NULL,NULL,2,2,NULL,2,2), '11100100'::varbit),
(row(2),'0')
) v(r,expected)
;
ERROR:  record type has not been registered

I'm not sure why this is failing; a simple SELECT * from that FROM 
clause works fine. I also tried removing the second row in case the 
mismatch of record types was the issue; that didn't help either.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 622bb88..cd185b8 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -25,6 +25,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/typcache.h"
+#include "utils/varbit.h"
 
 
 /*
@@ -1818,3 +1819,79 @@ btrecordimagecmp(PG_FUNCTION_ARGS)
 {
PG_RETURN_INT32(record_image_cmp(fcinfo));
 }
+
+/*
+ * record_nulls: return null map as bit
+ */
+Datum
+record_nulls(PG_FUNCTION_ARGS)
+{
+   HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
+   Oid tupType;
+   int32   tupTypmod;
+   TupleDesc   tupdesc;
+   HeapTupleData tuple;
+   int ncolumns,
+   natts,
+   len,/* varlena length of 
result */
+   attno;
+   boolhasnulls;
+   bits8  *bp; /* ptr to null bitmap in tuple 
*/
+   bits8   x = 0;
+   VarBit *result; /* The resulting bit string */
+   bits8  *r;  /* pointer into the result */
+
+   /* Extract type info from the tuple itself */
+   tupType = HeapTupleHeaderGetTypeId(rec);
+   tupTypmod = HeapTupleHeaderGetTypMod(rec);
+   tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
+   natts = tupdesc->natts;
+
+   /* Build a temporary HeapTuple control structure */
+   /* XXX there's probably a cheaper way to do this... */
+   tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
+   ItemPointerSetInvalid(&(tuple.t_self));
+   tuple.t_tableOid = InvalidOid;
+   tuple.t_data = rec;
+   hasnulls = HeapTupleHasNulls();
+
+   bp = rec->t_bits;   /* ptr to null bitmap in tuple */
+
+   /* Find how many columns are dropped */
+   ncolumns = natts;
+   if (hasnulls)
+   for (attno = 0; attno < natts; attno++)
+   if (tupdesc->attrs[attno]->attisdropped)
+   ncolumns--;
+
+   len = VARBITTOTALLEN(ncolumns);
+   /* set to 0 so that *r is always initialised and string is zero-padded 
*/
+   result = (VarBit *) palloc0(len);
+   SET_VARSIZE(result, len);
+   VARBITLEN(result) = ncolumns;
+
+   /* If there are no NULLs then we're done */
+   if (hasnulls)
+   {
+   r = VARBITS(result);
+   x = HIGHBIT;
+   for (attno = 0; attno < natts; attno++)
+   {
+   /* Ignore dropped columns in datatype */
+   if (tupdesc->attrs[attno]->attisdropped)
+   continue;
+
+   if (att_isnull(attno, bp))
+   *r |= x;
+   x >>= 1;
+   if (x == 0)
+   {
+   x = HIGHBIT;
+   r++;
+   }
+   }
+   }
+
+   ReleaseTupleDesc(tupdesc);
+   PG_RETURN_VARBIT_P(result);
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index e2d08ba..092d5f4 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4949,6 +4949,10 @@ DATA(insert OID = 3186 (  record_image_ge   
PGNSP PGUID 12 1 0 0 0 f f f f t f
 DATA(insert OID = 3187 (  btrecordimagecmpPGNSP PGUID 12 1 0 0 0 f f f 
f t f i s 2 0 23 "2249 2249" _null_ _null_ _null_ _null_ _null_ 
btrecordimagecmp _null_ _null_ _null_ ));
 DESCR("less-equal-greater based on byte images");
 
+/* misc record functions */
+DATA(insert OID = 3343 (  record_nulls PGNSP PGUID 12 1 0 0 0 
f f f f t f s s 1 0 1560 "2249" _null_ _null_ _null_ _null_ _null_ record_nulls 
_null_ _null_ _null_ ));
+DESCR("bitmap of fields in a record that are NULL");
+
 /* Extensions */
 DATA(insert OID = 3082 (  pg_available_extensions  PGNSP PGUID 12 
10 100 0 0 f f f f t t s s 0 0 2249 "" "{19,25,25}" 

Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-14 Thread Kevin Grittner
On Fri, Oct 14, 2016 at 8:53 AM, Amit Kapila  wrote:
> On Fri, Oct 14, 2016 at 1:40 PM, Masahiko Sawada  
> wrote:

>> For example, I set old_snapshot_threshold = 1min and prepare a table
>> and two terminals.
>> And I did the followings steps.
>>
>> 1. [Terminal 1] Begin transaction and get snapshot data and wait.
>>  BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
>>  SELECT * FROM test;
>>
>> 2. [Terminal 2] Another session updates test table in order to make
>> snapshot dirty.
>>  BEGIN;
>>  UPDATE test SET c = c + 100;
>>  COMMIT;
>>
>> 3. [Terminal 1] 1 minute after, read the test table again in same
>> transaction opened at #1. I got no error.
>> SELECT * FROM test;
>>
>> 4. [Terminal 2] Another session reads the test table.
>>  BEGIN;
>>  SELECT * FROM test;
>>  COMMIT;
>>
>> 5. [Terminal 1] 1 minute after, read the test table again, and got
>> "snapshot error" error.
>>  SELECT * FROM test;
>>
>> Since #2 makes a snapshot I got at #1 dirty, I expected to get
>> "snapshot too old" error at #3 where I read test table again after
>> enough time. But I could never get "snapshot too old" error at #3.
>>
>
> Here, the basic idea is that till the time corresponding page is not
> pruned or table vacuuming hasn't triggered, this error won't occur.
> So, I think what is happening here that during step #4 or step #3, it
> has pruned the table, after which you started getting error.

The pruning might be one factor.  Another possible issue is that
effectively it doesn't start timing that 1 minute until the clock
hits the start of the next minute (i.e., 0 seconds after the next
minute).  The old_snapshot_threshold does not attempt to guarantee
that the snapshot too old error will happen at the earliest
opportunity, but that the error will *not* happen until the
snapshot is *at least* that old.  Keep in mind that the expected
useful values for this parameter are from a small number of hours
to a day or two, depending on the workload.  The emphasis was on
minimizing overhead, even when it meant the cleanup might not be
quite as "eager" as it could otherwise be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about behavior of snapshot too old feature

2016-10-14 Thread Amit Kapila
On Fri, Oct 14, 2016 at 1:40 PM, Masahiko Sawada  wrote:
> Hi all,
>
> I have a question about behavior of snapshot too old feature.
>
> For example, I set old_snapshot_threshold = 1min and prepare a table
> and two terminals.
> And I did the followings steps.
>
> 1. [Terminal 1] Begin transaction and get snapshot data and wait.
>  BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
>  SELECT * FROM test;
>
> 2. [Terminal 2] Another session updates test table in order to make
> snapshot dirty.
>  BEGIN;
>  UPDATE test SET c = c + 100;
>  COMMIT;
>
> 3. [Terminal 1] 1 minute after, read the test table again in same
> transaction opened at #1. I got no error.
> SELECT * FROM test;
>
> 4. [Terminal 2] Another session reads the test table.
>  BEGIN;
>  SELECT * FROM test;
>  COMMIT;
>
> 5. [Terminal 1] 1 minute after, read the test table again, and got
> "snapshot error" error.
>  SELECT * FROM test;
>
> Since #2 makes a snapshot I got at #1 dirty, I expected to get
> "snapshot too old" error at #3 where I read test table again after
> enough time. But I could never get "snapshot too old" error at #3.
>

Here, the basic idea is that till the time corresponding page is not
pruned or table vacuuming hasn't triggered, this error won't occur.
So, I think what is happening here that during step #4 or step #3, it
has pruned the table, after which you started getting error.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about behavior of snapshot too old feature

2016-10-14 Thread Masahiko Sawada
Hi all,

I have a question about behavior of snapshot too old feature.

For example, I set old_snapshot_threshold = 1min and prepare a table
and two terminals.
And I did the followings steps.

1. [Terminal 1] Begin transaction and get snapshot data and wait.
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 SELECT * FROM test;

2. [Terminal 2] Another session updates test table in order to make
snapshot dirty.
 BEGIN;
 UPDATE test SET c = c + 100;
 COMMIT;

3. [Terminal 1] 1 minute after, read the test table again in same
transaction opened at #1. I got no error.
SELECT * FROM test;

4. [Terminal 2] Another session reads the test table.
 BEGIN;
 SELECT * FROM test;
 COMMIT;

5. [Terminal 1] 1 minute after, read the test table again, and got
"snapshot error" error.
 SELECT * FROM test;

Since #2 makes a snapshot I got at #1 dirty, I expected to get
"snapshot too old" error at #3 where I read test table again after
enough time. But I could never get "snapshot too old" error at #3.
On the other hand, when I set old_snapshot_threshold = 0 I can got the
error at #3.

Is this expected behavior?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-10 Thread Francisco Olarte
On Mon, Oct 10, 2016 at 4:51 AM, Jim Nasby  wrote:
> On 10/5/16 9:58 AM, Francisco Olarte wrote:
>> Is the system catalog a bottleneck for people who has real use for
>> paralell vacuum? I mean, to me someone who does this must have a very
>> big db on a big iron. If that does not consist of thousands and
>> thousands of smallish relations, it will normally be some very big
>> tables and a much smaller catalog.
> Not necessarily. Anyone that makes extensive use of temp tables can end up
> with a very large (and bloated) pg_attribute. AFAIK you can actually create
> "temp" versions of any object that lives in a schema by specifying pg_temp
> as the schema, but in practice I don't think you'll really see anything
> other than pg_attribute get really large. So it would be nice if
> pg_attribute could be done in parallel, but I suspect it's one of the
> catalog tables that could be causing these problems.

This I see, but if you crunch on temp tables I'm not sure you should
do full vacuum on the catalog ( I fear full catalog vacuum is going to
lock DDL, and this kind of situation is better served by autovacuum
maintaning free space in the catalog so it gets to an stable size ).

I do not think it is neccessary to make every operation as fast as
possible, I prefer a simpler system. I feel someone having a multi
terabyte database which needs full vacuums due to its use patterns, in
paralell, and also crunchs on a lots of temporarary tables, which a
strange use pattern which mandates full vacuums of pg_attribut ( I
could concoct a situtation for these, but not easily ) is a
specialized power DBA which should be able to easily script vacuum
tasks taking into account the usage pattern much better than any
reasonable simple alternative. After all -t is there and can be
repeated for something.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-09 Thread Jim Nasby

On 10/5/16 9:58 AM, Francisco Olarte wrote:

Is the system catalog a bottleneck for people who has real use for
paralell vacuum? I mean, to me someone who does this must have a very
big db on a big iron. If that does not consist of thousands and
thousands of smallish relations, it will normally be some very big
tables and a much smaller catalog.


Not necessarily. Anyone that makes extensive use of temp tables can end 
up with a very large (and bloated) pg_attribute. AFAIK you can actually 
create "temp" versions of any object that lives in a schema by 
specifying pg_temp as the schema, but in practice I don't think you'll 
really see anything other than pg_attribute get really large. So it 
would be nice if pg_attribute could be done in parallel, but I suspect 
it's one of the catalog tables that could be causing these problems.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-07 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte
>  wrote:
> > On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
> >> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  
> >> wrote:
> > ...
> >>> I wonder if the real answer isn't just to disallow -f with parallel
> >>> vacuuming.
> >> Seems like we should figure out which catalog tables are needed in
> >> order to perform a VACUUM, and force those to be done last and one at
> >> a time.
> >
> > Is the system catalog a bottleneck for people who has real use for
> > paralell vacuum?
> 
> I don't know, but it seems like the documentation for vacuumdb
> currently says, more or less, "Hey, if you use -j with -f, it may not
> work!", which seems unacceptable to me.  It should be the job of the
> person writing the feature to make it work in all cases, not the job
> of the person using the feature to work around the problem when it
> doesn't.

The most interesting use case of vacuumdb is lazy vacuuming, I think, so
committing that patch as it was submitted previously was a good step
forward even if it didn't handle VACUUM FULL 100%.

I agree that it's better to have both modes Just Work in parallel, which
is the point of this subsequent patch.  So let's move forward.  I
support Francisco's effort to make -f work with -j.  I don't have a
strong opinion on which of the various proposals presented so far is the
best way to implement it, but let's figure that out and get it done.

If you want to argue that vacuumdb -f -j not working properly is a bug
in the vacuumdb -j commit, ISTM you're arguing that we should backpatch
whatever we come up with as a bug fix, but I would disagree with that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about pg_control usage

2016-10-07 Thread Jeff Janes
On Fri, Oct 7, 2016 at 8:24 AM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> Hi, hackers!
>
> I am examining various global variables in ShmemVariableCache,
> pg_control and pg_controldata.  To force and debug xid wraparound,
> I've implemented a function, that allows to set nextXid value in control
> file manually, but it doesn't work as expected.
>

Why not use pg_resetxlog -x ?

After an investigation I found that, while loading global variables in
> StartupXlog()
> we read checkPoint location from ControlFile, and then read all the
> information
> from the checkPoint record. And never ever use nextXid stored in
> ControlFile.
>
> What is the purpose of having CheckPoint stored in control file then?
> I thought, that it allows us to start up server after correct shutdown
> without any xlog, as said in the comment in pg_control.h
>
>  * Body of CheckPoint XLOG records.  This is declared here because we keep
>  * a copy of the latest one in pg_control for possible disaster recovery.
>  * Changing this struct requires a PG_CONTROL_VERSION bump.
>
> But as far as I see, we never use this data.
>
> Could you please explain, why don't we use information from control file
> in StartapXlog()?



I believe pg_resetxlog uses the data from the control file to create an
artificial checkpoint record, and injects that record into an artificial
xlog file, which will allow you start the system.  So it is a small use,
but a sometimes very important one.

Cheers,

Jeff


Re: [HACKERS] Question about pg_control usage

2016-10-07 Thread Tom Lane
Anastasia Lubennikova  writes:
> What is the purpose of having CheckPoint stored in control file then?

IIRC, we use some but not all of the fields.  It seemed prudent (and
simpler) to keep a copy of the whole thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-07 Thread Francisco Olarte
Robert:

On Fri, Oct 7, 2016 at 3:20 PM, Robert Haas  wrote:
> On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte
>  wrote:
>> On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
>>> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  
>>> wrote:
>> ...
 I wonder if the real answer isn't just to disallow -f with parallel
 vacuuming.
>>> Seems like we should figure out which catalog tables are needed in
>>> order to perform a VACUUM, and force those to be done last and one at
>>> a time.
>>
>> Is the system catalog a bottleneck for people who has real use for
>> paralell vacuum?

> I don't know, but it seems like the documentation for vacuumdb
> currently says, more or less, "Hey, if you use -j with -f, it may not
> work!", which seems unacceptable to me.  It should be the job of the
> person writing the feature to make it work in all cases, not the job
> of the person using the feature to work around the problem when it
> doesn't.

That may be the case, but the only ways to solve it seems to be
disallow full paralell as suggested.

OTOH what I was asking was just if people think the time gained by
minimizing the part of pg_catalog serially processed on a
full-paralell case would be enough to warrant the increased code
complexity and bug surface.

Anyway, I'll stick to my original plan even if someone decides to fix
or disallow full paralell as I think it has it uses.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about pg_control usage

2016-10-07 Thread Anastasia Lubennikova

Hi, hackers!

I am examining various global variables in ShmemVariableCache,
pg_control and pg_controldata.  To force and debug xid wraparound,
I've implemented a function, that allows to set nextXid value in control
file manually, but it doesn't work as expected.

After an investigation I found that, while loading global variables in 
StartupXlog()
we read checkPoint location from ControlFile, and then read all the 
information
from the checkPoint record. And never ever use nextXid stored in 
ControlFile.


What is the purpose of having CheckPoint stored in control file then?
I thought, that it allows us to start up server after correct shutdown
without any xlog, as said in the comment in pg_control.h

 * Body of CheckPoint XLOG records.  This is declared here because we keep
 * a copy of the latest one in pg_control for possible disaster recovery.
 * Changing this struct requires a PG_CONTROL_VERSION bump.

But as far as I see, we never use this data.

Could you please explain, why don't we use information from control file
in StartapXlog()?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-07 Thread Robert Haas
On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte
 wrote:
> On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
>> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  
>> wrote:
> ...
>>> I wonder if the real answer isn't just to disallow -f with parallel
>>> vacuuming.
>> Seems like we should figure out which catalog tables are needed in
>> order to perform a VACUUM, and force those to be done last and one at
>> a time.
>
> Is the system catalog a bottleneck for people who has real use for
> paralell vacuum?

I don't know, but it seems like the documentation for vacuumdb
currently says, more or less, "Hey, if you use -j with -f, it may not
work!", which seems unacceptable to me.  It should be the job of the
person writing the feature to make it work in all cases, not the job
of the person using the feature to work around the problem when it
doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-05 Thread Francisco Olarte
On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  
> wrote:
...
>> I wonder if the real answer isn't just to disallow -f with parallel
>> vacuuming.
> Seems like we should figure out which catalog tables are needed in
> order to perform a VACUUM, and force those to be done last and one at
> a time.

Is the system catalog a bottleneck for people who has real use for
paralell vacuum? I mean, to me someone who does this must have a very
big db on a big iron. If that does not consist of thousands and
thousands of smallish relations, it will normally be some very big
tables and a much smaller catalog. Then you can vacuum paralell
everything but system catalogs and then vaccum serial those. I do not
have that big dbs, but in my modest case system catalogs are very fast
to vacuum. If 99% of the time is spent vacuuming non-system it does
not make much sense to spend effort on speeding maybe one half of the
system catalogs vacuum ( I mean, 99% of the time is non-system, half
of system can be done in paralell, with a ten-fold speed up we go from
99+0.5+0.5 to 9.9+0.5+0.5 = 10.9 with full serial system catalogs and
to 9.9+0.5+0.05=10.45 with hybrid system vacuum  and with a 100 fold
speedup, in the realm of SF for me, to 0.99+0.5+0.5=1.99 and
0.99+0.5+0.05=1.54, not that much to be gained )

( Just asking. )

OTOH while I dig into the code I will take a look to see how complex
it will be to build to lists, paralell + serial, and loop on them.
This could be used on a first approach to split on !pg_catalog +
pg_catalog and used as a base for having and explicit list or some
flag in the catalog later.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-04 Thread Robert Haas
On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte
>>  wrote:
>> > After some messages due to vacuumdb auto-deadlocking itself on the
>> > system tables when doing paralell vacuum of a full database I
>> > suggested adding some flags to make vacuumdb process schemas. I was
>> > asked wether I could write a patch for that and I am thinking on doing
>> > it.
>>
>> What messages are you seeing, exactly? "auto-deadlocking" isn't a thing.
>
> https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com
>
> I wonder if the real answer isn't just to disallow -f with parallel
> vacuuming.

Seems like we should figure out which catalog tables are needed in
order to perform a VACUUM, and force those to be done last and one at
a time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-04 Thread Francisco Olarte
On Mon, Oct 3, 2016 at 11:44 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte
>> What messages are you seeing, exactly? "auto-deadlocking" isn't a thing.
> https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com

Besides that even the docs for -j state "Note that using this mode
together with the -f (FULL) option might cause deadlock failures if
certain system catalogs are processed in parallel."

So the only "safe" way to do -j -f seems to be using a table list. My
proposed patch just makes it easy to build that by doing schema
filtering.

> I wonder if the real answer isn't just to disallow -f with parallel
> vacuuming.

It may be. I do not feel it is necessary, the docs are clear, this may
be like disallowing knifes because you can cut yourself. IMO vacuumdb
-f and -j are for people who know what they are doing, a warning may
be nice anyway.

Anyway, even if the combo is disallowed I feel schema filtering has
its use. As an example, in some of my systems I have CDR tables
partitioned by timestamp, either monthly or other period. As most of
the data does not change I routinely coalesce many partitions and move
them to a historic schema ( like I make a yearly partition and zap
monthly ones, I still inherit from it ). This let me dump the historic
schema when I change it and dump without the historic schema daily,
greatly reducing dump times as the historic schema typically contains
>90% of the data and it only changes when I have to do a back-fix to
the data, which is very rare. Being able to do the same thing with
vacuumdb could be useful.

So, I'll just follow on executing my plan, but I'm prepared to abort
it anytime if people feel it does not hold its weight. Most of the
work is going to be learning how to submit a patch so it is reusable
for me.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-03 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte
>  wrote:
> > After some messages due to vacuumdb auto-deadlocking itself on the
> > system tables when doing paralell vacuum of a full database I
> > suggested adding some flags to make vacuumdb process schemas. I was
> > asked wether I could write a patch for that and I am thinking on doing
> > it.
> 
> What messages are you seeing, exactly? "auto-deadlocking" isn't a thing.

https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com

I wonder if the real answer isn't just to disallow -f with parallel
vacuuming.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-10-03 Thread Robert Haas
On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte
 wrote:
> After some messages due to vacuumdb auto-deadlocking itself on the
> system tables when doing paralell vacuum of a full database I
> suggested adding some flags to make vacuumdb process schemas. I was
> asked wether I could write a patch for that and I am thinking on doing
> it.

What messages are you seeing, exactly? "auto-deadlocking" isn't a thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question / requests.

2016-09-30 Thread Vitaly Burovoy
On 9/30/16, Francisco Olarte  wrote:
> Hello everyone.

Hello, Francisco!

> Also, although I feel confident in my coding I have zero knowledge of
> developing for postgres,

It is easy enough and all important steps are documented in the wiki.
Also some interesting things can be found in presentations from
hackers about how to hack PostgreSQL.

> and I am not confident in my git or testing habilities.
> I can develop it, as it is just modifying a single libpq
> client program and only in the easy part of the string lists and may
> be emitting a new error (as this can introduce a new failure mode of
> 'no tables to vacuum'), I can test it locally and produce a patch for
> that file, but I'm not confident on integrating it, making git patchs
> or going further, so I would like to know if doing that would be
> enough and then I can give the code to someone to review or integrate
> it.

Do your best and send a patch. No one is good enough at understanding
all the code base at once. There are lot of people who know different
parts of the code and who have ability to test patches in different
ways.

You can be sure you get a feedback, your code will not be merged to
the code base without deep review and independent testing.

Just be ready to improve your patch according to a feedback and be
ready that usually it takes several rounds of sending-review before
patch is committed.

Also you can follow a discussion from one of simple patches in a
commitfest to be familiar with the process.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question / requests.

2016-09-30 Thread Francisco Olarte
Hello everyone. I've been using the bugs/general mailing lists for a
while, but never been on hackers, so please take that into account.

After some messages due to vacuumdb auto-deadlocking itself on the
system tables when doing paralell vacuum of a full database I
suggested adding some flags to make vacuumdb process schemas. I was
asked wether I could write a patch for that and I am thinking on doing
it.

Having began to read the developer FAQ I have searched the TODO list
for similar things, and I'm asking here to know if someone is already
working on something similar to avoid duplicating efforts.

What I'm planning to do is adding a couple of include-schema /
exclude-schema options to vacuumdb, so you can first do paralell
vacuum excluding pg_catalog and then do serial one including
pg_catalog to finally tidy the db. Or you can move rarely updated
tables to their schema and avoid vacuuming them. After that I may try
a couple of shortcuts for the system ( in case a future
pg_extra_catalog apears ). I was planning on reusing the code to get
all the tables from the catalog used in paralel vacuums when no tables
is specified, so the modifications are mainly string juggling, as I
feel the extra time / flexibility gained by doing a finely tuned query
does not justify the extra bug surface added.

I would like to know if someone is doing something intersecting with this.

Also, although I feel confident in my coding I have zero knowledge of
developing for postgres, and I am not confident in my git or testing
habilities. I can develop it, as it is just modifying a single libpq
client program and only in the easy part of the string lists and may
be emitting a new error ( as this can introduce a new failure mode of
'no tables to vacuum' ), I can test it locally and produce a patch for
that file, but I'm not confident on integrating it, making git patchs
or going further, so I would like to know if doing that would be
enough and then I can give the code to someone to review or integrate
it.

Waiting for orientation.

 Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Antonin Houska
Robert Haas  wrote:
> On Tue, Aug 23, 2016 at 10:10 AM, Antonin Houska  wrote:

> CLUSTER preserves xmin/xmax when rewriting, but ALTER TABLE does not.

Indeed, CLUSTER was the command I picked for my experiments. I didn't expect
such a subtle difference. Thanks for answer.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 10:10 AM, Antonin Houska  wrote:
> I'm failing to understand why [1] mentions "table-rewriting forms of ALTER
> TABLE" besides TRUNCATE command.
>
> For TRUNCATE it seems clear: if transaction A takes the snapshot before it
> accesses the table first time (typically because isolation level is at least
> REPEATABLE READ) and transaction B manages to commit TRUNCATE soon enough,
> then A sees pg_class entry of the table already affected by B, which has the
> new (empty) relfilenode. (The original pg_class entry is no longer visible by
> catalog snapshot, nor does it contain valid OID of the original relfilenode.)
>
> But IMO heap rewriting changes neither table contents, nor visibility. Can
> anyone explain what I miss?

CLUSTER preserves xmin/xmax when rewriting, but ALTER TABLE does not.

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# insert into foo values (1);
INSERT 0 1
rhaas=# insert into foo values (2);
INSERT 0 1
rhaas=# select xmin, a from foo;
 xmin | a
--+---
  934 | 1
  935 | 2
(2 rows)

rhaas=# alter table foo alter a type text;
ALTER TABLE
rhaas=# select xmin, a from foo;
 xmin | a
--+---
  936 | 1
  936 | 2
(2 rows)

This is sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Antonin Houska
I'm failing to understand why [1] mentions "table-rewriting forms of ALTER
TABLE" besides TRUNCATE command.

For TRUNCATE it seems clear: if transaction A takes the snapshot before it
accesses the table first time (typically because isolation level is at least
REPEATABLE READ) and transaction B manages to commit TRUNCATE soon enough,
then A sees pg_class entry of the table already affected by B, which has the
new (empty) relfilenode. (The original pg_class entry is no longer visible by
catalog snapshot, nor does it contain valid OID of the original relfilenode.)

But IMO heap rewriting changes neither table contents, nor visibility. Can
anyone explain what I miss?

[1] https://www.postgresql.org/docs/9.6/static/mvcc-caveats.html
-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about an inconsistency - 3

2016-07-19 Thread Tom Lane
"pet...@gmail.com"  writes:
> In file postgres/postgresql-9.4.4/src/timezone/zic.c
> function stringzone line 2091we have

> if (stringrule(result, stdrp, dstrp->r_stdoff, zp->z_gmtoff) != 0)

> Is it ok to have as the 3rd argument dstrp->r_stdoff or should we
> have stdrp->r_stdoff? In line 2085 dstrp is used in both arguments.

Doubt it, because if the coding were like that there would be no need
to pass the third argument separately at all; the subroutine could just
access rp->r_stdoff for itself.  But this isn't our code, so if you
want an authoritative answer you should ask at
https://mm.icann.org/mailman/listinfo/tz

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about an inconsistency - 2

2016-07-19 Thread Tom Lane
"pet...@gmail.com"  writes:
> In file postgresql-9.4.4/src/backend/utils/adt/format_type.c
> function format_type_internal line 159, shouldn’t be used
> array_base_type instead of type_oid?

IIRC, that was intentional.  Supposing there's a pg_type row with
a corrupted typelem value, it's more useful to point at the bogus
row than to report the junk value with no context.  I suppose you
could argue that it should be more like

elog(ERROR, "cache lookup failed for type %u (typelem of %u)",
 array_base_type, type_oid);

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about an inconsistency - 2

2016-07-19 Thread pet...@gmail.com
Hi,

In file postgresql-9.4.4/src/backend/utils/adt/format_type.c
function format_type_internal line 159, shouldn’t be used
array_base_type instead of type_oid?

In line 153 it is searched for array_base_type and thus
shouldn’t we use it (ie., array_base_type) to report that the
type is not found. It is similar to the case of searching for
type_oid in lines 129-135.

Thanks,
Petru Florin Mihancea




signature.asc
Description: Message signed with OpenPGP using GPGMail


[HACKERS] Question about an inconsistency - 3

2016-07-19 Thread pet...@gmail.com

Hi,

In file postgres/postgresql-9.4.4/src/timezone/zic.c
function stringzone line 2091we have

if (stringrule(result, stdrp, dstrp->r_stdoff, zp->z_gmtoff) != 0)

Is it ok to have as the 3rd argument dstrp->r_stdoff or should we
have stdrp->r_stdoff? In line 2085 dstrp is used in both arguments.

Not very sure because I do not understand the semantics of the
invoked operation, but I thought it would be better to ask.

Thanks,
Petru Florin Mihancea


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] Question about an inconsistency - 1

2016-07-06 Thread Tom Lane
"pet...@gmail.com"  writes:
> I have a question regarding the source code in file pg_proc.h 
> (postgresql-9.4.4).

> At line 1224 (copied below) why the 28th identifier is timestamp_eq?

> DATA(insert OID = 1152 (  timestamptz_eq   PGNSP PGUID 12 1 0 0 0 f f f t t f 
> i 2 0 16 "1184 1184" _null_ _null_ _null_ _null_ timestamp_eq _null_ _null_ 
> _null_ ));

> I would expect it to be timestamptz_eq (the same as the 5th identifier
> from the same line).

If timestamptz_eq actually existed as a separate C function, then yes that
would be correct.  But see this bit in timestamp.h:

extern int  timestamp_cmp_internal(Timestamp dt1, Timestamp dt2);

/* timestamp comparison works for timestamptz also */
#define timestamptz_cmp_internal(dt1,dt2)   timestamp_cmp_internal(dt1, dt2)

AFAICS there are not similar #defines equating timestamptz_eq to
timestamp_eq and so on, probably because we don't have much need
to refer to those functions in the C code.  But even if we had
such #defines, I think that pg_proc.h could not rely on them.
It has to reference actual C functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about an inconsistency - 1

2016-07-06 Thread pet...@gmail.com
Hi,

I have a question regarding the source code in file pg_proc.h 
(postgresql-9.4.4).

At line 1224 (copied below) why the 28th identifier is timestamp_eq?

DATA(insert OID = 1152 (  timestamptz_eq   PGNSP PGUID 12 1 0 0 0 f f f t t f i 
2 0 16 "1184 1184" _null_ _null_ _null_ _null_ timestamp_eq _null_ _null_ 
_null_ ));

I would expect it to be timestamptz_eq (the same as the 5th identifier from the 
same line). Shouldn't it be the same
like in the line 2940 copied also below having the same identifier in the 28th 
and 5th positions (timestamp_eq)?

DATA(insert OID = 2052 (  timestamp_eq  PGNSP PGUID 12 1 0 0 0 f f f t 
t f i 2 0 16 "1114 1114" _null_ _null_ _null_ _null_ timestamp_eq _null_ _null_ 
_null_ ));

A similar question can be asked for lines 1225 to 1229 (but for their specific 
identifiers).

Regards,
Pepi


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-23 Thread Bruce Momjian
On Thu, Jun 23, 2016 at 06:42:57AM +, Tsunakawa, Takayuki wrote:
> > From: Bruce Momjian [mailto:br...@momjian.us]
> > We have this text in src/tools/RELEASE_CHANGES:
> > ...
> > This is saying running against a mismatched minor version should be fine
> > for a binary.
> 
> Thanks for a good rationale.
> 
> 
> > I know this thread is old but it bounced around a lot of ideas.  I think
> > there are some open questions:
> > 
> > *  Will a new application link against an older minor-version libpq?
> > *  Will an old application link against a newer minor-version libpq?
> 
> The former does not always hold true, if the application uses a new libpq 
> function which is not in an old miner-version.  But I think the 
> backward-compatibility is enough.

Yes, I think that is correct, and I think that is covered in the file
posted:

Adding a new function should NOT force an increase in the major version
number.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-23 Thread Tsunakawa, Takayuki
> From: Bruce Momjian [mailto:br...@momjian.us]
> We have this text in src/tools/RELEASE_CHANGES:
> ...
> This is saying running against a mismatched minor version should be fine
> for a binary.

Thanks for a good rationale.


> I know this thread is old but it bounced around a lot of ideas.  I think
> there are some open questions:
> 
> *  Will a new application link against an older minor-version libpq?
> *  Will an old application link against a newer minor-version libpq?

The former does not always hold true, if the application uses a new libpq 
function which is not in an old miner-version.  But I think the 
backward-compatibility is enough.


Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-22 Thread Bruce Momjian
On Mon, May 30, 2016 at 03:04:24AM +, Tsunakawa, Takayuki wrote:
> Hello,
> 
> I'd like to ask you about the policy of application binary compatibility.  
> And have a suggestion as well.
> 
> QUESTION
> ==
> 
> My customer asked me if the following usage is correct.
> 
> - Build an embedded SQL C application with PostgreSQL 9.2.
> - Distribute the app without ecpg nor libpq libraries.  Require users to 
> install PostgreSQL client which includes ecpg and libpq libraries.
> - Use the app with newer PostgreSQL major versions without rebuilding the 
> app.  That is, the users just replaces the PostgreSQL client.
> 
> I expect this is legal, because the so_major versions of ecpg and libpq are 6 
> and 5 respectively for all PostgreSQL 9.x versions so far.  However, I could 
> not find any description of this binary compatibility policy in the manual, 
> so I haven't been able to answer the customer.
> 
> What's the official policy of application binary compatibility?  I found the 
> same discussion in the below thread, but I'm afraid any clear answer wasn't 
> given:
> 
> https://www.postgresql.org/message-id/522f0b6b.1040...@4js.com
> 
> 
> SUGGESTION
> ==
> 
> How about adding an article about application binary compatibility in the 
> following section, as well as chapters for libpq, ECPG, etc?
> 
> 17.6. Upgrading a PostgreSQL Clust
> https://www.postgresql.org/docs/devel/static/upgrading.html
> 
> There are three kinds of application assets that users are concerned about 
> when upgrading.  Are there anything else to mention?
> 
> * libpq app
> * ECPG app
> * C-language user defined function (and other stuff dependent on it, such as 
> extensions, UDTs, etc.)

I know this thread is old but it bounced around a lot of ideas.  I think
there are some open questions:

*  Will a new application link against an older minor-version libpq?
*  Will an old application link against a newer minor-version libpq?

I think we are all in agreement that a binary cannot run using a libpq
with a different major version number.

We have this text in src/tools/RELEASE_CHANGES:

Major Version
=

The major version number should be updated whenever the source of the
library changes to make it binary incompatible. Such changes include,
but are not limited to:

1. Removing a public function or structure (or typedef, enum, ...)

2. Modifying a public functions arguments.

3. Removing a field from a public structure.

4. Adding a field to a public structure, unless steps have been
previously taken to shield users from such a change, for example by
such structures only ever being allocated/instantiated by a library
function which would give the new field a suitable default value.

Adding a new function should NOT force an increase in the major version
number. (Packagers will see the standard minor number update and install
the new library.)  When the major version is increased all applications
which link to the library MUST be recompiled - this is not desirable. 
When
the major version is updated the minor version gets reset.

Minor Version
=

The minor version number should be updated whenever the functionality of
the library has changed, typically a change in source code between 
releases
would mean an increase in the minor version number so long as it does 
not
require a major version increase.

Given that we make at least minor changes to our libraries in every 
major
PostgreSQL version, we always bump all minor library version numbers at 
the
start of each development cycle as a matter of policy.

This is saying running against a mismatched minor version should be fine
for a binary.

ecpg is a little tricker because it has knowledge of SQL data types and
might not support certain features if the ecpg library isn't _run_
against the same major server version.  My guess is that older ecpg
libraries will run fine against newer servers, but the opposite might
not be true.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-01 Thread Michael Meskes
> However, the problem I pointed out is that when the new library is
> incompatible with the older one, say the major version of libpq.dll
> changes from 5 to 6, the application user and/or developer cannot
> notice the incompatibility immediately and easily.  On Unix/Linux,
> the application fails to start because the older library is not
> found.  On the other hand, the application will start on Windows and
> probably cause difficult trouble due to the incompatibility.

I don't think this is a very good situation, but I have no idea if this
can be solved. However, I'd prefer a technical solution over a
documentation one.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Craig Ringer
On 1 June 2016 at 13:09, Tsunakawa, Takayuki  wrote:

> From: pgsql-hackers-ow...@postgresql.org [mailto:
> pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
>
> While that's probably OK, it's not especially desirable. The typical
> Windows deployment model involves the application bundling all its direct
> dependencies except when those are provided by a runtime redistributable
> designed for that purpose.
>
>
>
>
>
> I think so, too, but I'm not confident that's typical.  Some people may
> think of PostgreSQL binaries as a shared component for their applications
> and place it in one place, just like the PostgreSQL library package is in
> /usr/lib/pgsql.
>

(Your reply formatting seems mangled, mixing my text with yours)

/usr/lib/pgsql works because *nix systems don't typically do binary
distribution.

Windows apps that rely on binary distribution should bundle the libraries
they require.

Maybe a note on windows distribution in the libpq manual would be
warranted. I thought it was so accepted as the way it's done that nobody
would really do anything else.

(Then again, EDB's installers don't bundle their Python, Perl, etc
runtimes, but I think that's partly a legal thing).



> There's a client-only installation method as follows, although I haven't
> checked whether EnterpriseDB, OpenSCG, or any other PostgreSQL-based
> products provide client-only installation.
>
>
> https://www.postgresql.org/docs/devel/static/install-windows-full.html#AEN30192
>

Right, and EDBs installers also let you install just the client AFAIK, but
there's no simple client library redist package, like you'd expect if it
was intended for use that way.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
While that's probably OK, it's not especially desirable. The typical Windows 
deployment model involves the application bundling all its direct dependencies 
except when those are provided by a runtime redistributable designed for that 
purpose.


I think so, too, but I'm not confident that's typical.  Some people may think 
of PostgreSQL binaries as a shared component for their applications and place 
it in one place, just like the PostgreSQL library package is in /usr/lib/pgsql.


- Use the app with newer PostgreSQL major versions without rebuilding the app.  
That is, the users just replaces the PostgreSQL client.

... especially since there isn't a client-only PostgreSQL distribution Windows.


There's a client-only installation method as follows, although I haven't 
checked whether EnterpriseDB, OpenSCG, or any other PostgreSQL-based products 
provide client-only installation.
https://www.postgresql.org/docs/devel/static/install-windows-full.html#AEN30192

[Excerpt]
--
If you want to install only the client applications and interface libraries, 
then you can use these commands:

install c:\destination\directory client
--


How about adding an article about application binary compatibility in the 
following section, as well as chapters for libpq, ECPG, etc?

It would be sensible to better define the binary compatibility expectations 
that clients may rely upon and, when they are broken, a managed way in which 
they're broken (soname bump, etc).

If you have an interest in the area it might be worth drafting a proposal after 
taking a look at past binary compatibility discussions on -hackers.

Sure, I'll submit a patch to pgsql-docs.  Thanks to Michael's confirmation, I 
could answer the customer's question, so this is not an immediate task now.  
But I'll do.


- On-disk format
- Wire protocol
- SQL-level (data types, syntax, encoding handling, settings, ...)

Yes, I recognize these items.  I omitted them because:

- On-disk format: this is handled in the PostgreSQL manual article about 
upgrading
- Wire protocol: I believe the compatibility will be retained
- SQL-level: ditto

But if you feel a need for their compatibility description for completeness, 
I'll add it.  ... Yes, the explicit explanation may be necessary so that users 
are assured that the PostgreSQL compatibility policy matches their expectation.


The simplest solution would be to incorporate the soname, so it becomes 
libpq0509.dll for example. Like VS does with the VS runtime. The main downside 
is that because it's not a true soname and the OS has no such concept, the 
linker for everything compiled against that DLL must specify the versioned DLL 
name, it can't just link to 'libpq' .

Although I haven’t examined yet, some directive in .def file might enable 
applications to specify libpq.lib at build time and to link with libpq5.dll at 
run time.


Regards
Takayuki Tsunakawa



Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Craig Ringer
 On 30 May 2016 at 11:04, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> Hello,
>
> I'd like to ask you about the policy of application binary compatibility.
> And have a suggestion as well.
>
> QUESTION
> ==
>
> My customer asked me if the following usage is correct.
>
> - Build an embedded SQL C application with PostgreSQL 9.2.
> - Distribute the app without ecpg nor libpq libraries.  Require users to
> install PostgreSQL client which includes ecpg and libpq libraries.
>

Why?

While that's probably OK, it's not especially desirable. The typical
Windows deployment model involves the application bundling all its direct
dependencies except when those are provided by a runtime redistributable
designed for that purpose.


> - Use the app with newer PostgreSQL major versions without rebuilding the
> app.  That is, the users just replaces the PostgreSQL client.
>

... especially since there isn't a client-only PostgreSQL distribution
Windows.



> How about adding an article about application binary compatibility in the
> following section, as well as chapters for libpq, ECPG, etc?
>

It would be sensible to better define the binary compatibility expectations
that clients may rely upon and, when they are broken, a managed way in
which they're broken (soname bump, etc).

If you have an interest in the area it might be worth drafting a proposal
after taking a look at past binary compatibility discussions on -hackers.



> There are three kinds of application assets that users are concerned about
> when upgrading.  Are there anything else to mention?
>
> * libpq app
> * ECPG app
> * C-language user defined function (and other stuff dependent on it, such
> as extensions, UDTs, etc.)
>

- On-disk format
- Wire protocol
- SQL-level (data types, syntax, encoding handling, settings, ...)
- ...


>  On the other hand, the application will start on Windows and probably
cause difficult trouble due to the incompatibility.

Yeah, we just write 'libpq.dll' on Windows.

The simplest solution would be to incorporate the soname, so it becomes
libpq0509.dll for example. Like VS does with the VS runtime. The main
downside is that because it's not a true soname and the OS has no such
concept, the linker for everything compiled against that DLL must specify
the versioned DLL name, it can't just link to 'libpq' .

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
> From: Michael Meskes [mailto:mes...@postgresql.org]
> e.g. a random hit from google:=C2=A0https://www.bottomupcs.com/libra
> ries_and_the_linker.xhtml
> 
> There even is a wikipedia page about
> it:=C2=A0https://en.wikipedia.org/wiki/
> Soname

Thank you for good pointers.  The former is particularly nice.

> > BTW, although this may be a separate topic, it may be better that we
> > add the major version in the library names like libpq5.dll and
> > libecpg6.dll, so that the application can fail to run with the
> > incompatible versions of libpq and libecpg.=C2=A0=C2=A0FYI:
> 
> Does this mean you cannot have to versions of libpq installed on the same
> Windows system at the same time?

No, you can have different versions in separate folders, as in:

C:\Program Files\PostgreSQL\9.2
C:\Program Files\PostgreSQL\9.5

and change the PATH environment variable to point to a newer version when you 
want to use the existing application without rebuilding it.

However, the problem I pointed out is that when the new library is incompatible 
with the older one, say the major version of libpq.dll changes from 5 to 6, the 
application user and/or developer cannot notice the incompatibility immediately 
and easily.  On Unix/Linux, the application fails to start because the older 
library is not found.  On the other hand, the application will start on Windows 
and probably cause difficult trouble due to the incompatibility.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Marco Atzeri
> on cygwin the postgresql binary package already include the library
> versions:
> 
>/usr/bin/cygecpg-6.dll
>/usr/bin/cygecpg_compat-3.dll
>/usr/bin/cygpgtypes-3.dll
>/usr/bin/cygpq-5.dll
> 
> attached the patch used for the build.

Thanks for the information.   I didn't know there's a PostgreSQL binary package 
for Cygwin.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Michael Meskes
> I couldn't find appropriate system documentation.  Regarding Linux, I
> remember I saw some HOWTO on tldp.org website which explains the
> concept of shared library soname, but it's not very friendly for
> users who just want to know the application binary compatibility
> policy of PostgreSQL.  And I don't think there's suitable 

I would expect Unix sysadmins to understand that howto, but there are
others, e.g. a random hit from google: https://www.bottomupcs.com/libra
ries_and_the_linker.xhtml

There even is a wikipedia page about it: https://en.wikipedia.org/wiki/
Soname

> documentation on Windows.  Even if there is any, users will not be
> sure whether PostgreSQL follows those platform-specific
> conventions.  They may have doubts about it, because even the product
> version "PostgreSQL x.y.z" causes misconception that x is the major
> version and y is the minor one.

I don't know anything about the Windows setup in PostgreSQL, but I
would find it fair to assume that PostgreSQL follows conventions.

> BTW, although this may be a separate topic, it may be better that we
> add the major version in the library names like libpq5.dll and
> libecpg6.dll, so that the application can fail to run with the
> incompatible versions of libpq and libecpg.  FYI:

Does this mean you cannot have to versions of libpq installed on the
same Windows system at the same time?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Marco Atzeri


On 31/05/2016 08:10, Tsunakawa, Takayuki wrote:

From: Michael Meskes [mailto:mes...@postgresql.org]

Yes, but Windows users probably don't understand or know it.  So, I
suggested explicitly describing the application binary compatibility
policy in the PostgreSQL manual.  What do you think about it?


Couldn't you point your customer to the system documentation?

Personally I don't think standard system behavior should be documented for
each application relying on it but ymmv.


I couldn't find appropriate system documentation.  Regarding Linux, I remember I saw some 
HOWTO on tldp.org website which explains the concept of shared library soname, but it's 
not very friendly for users who just want to know the application binary compatibility 
policy of PostgreSQL.  And I don't think there's suitable documentation on Windows.  Even 
if there is any, users will not be sure whether PostgreSQL follows those 
platform-specific conventions.  They may have doubts about it, because even the product 
version "PostgreSQL x.y.z" causes misconception that x is the major version and 
y is the minor one.

So, I suggested documenting the compatibility policy for clarification and user 
friendliness as in the Oracle Database documentation below.

http://docs.oracle.com/database/121/UPGRD/app.htm#UPGRD12547


BTW, although this may be a separate topic, it may be better that we add the 
major version in the library names like libpq5.dll and libecpg6.dll, so that 
the application can fail to run with the incompatible versions of libpq and 
libecpg.  FYI:

https://en.wikipedia.org/wiki/Side-by-side_assembly

[Excerpt]
Microsoft Visual C++ 2005 and 2008 employ SxS with all C runtime libraries. 
However, runtime libraries in Visual C++ 2010 no longer use this technology; 
instead, they include the version number of a DLL in its file name, which means 
that different versions of one DLL will technically be completely different 
DLLs now.


Any comments on these?  If there's no strong objection, I think I'll submit a 
documentation patch in the future.

Regards
Takayuki Tsunakawa



Hi,
on cygwin the postgresql binary package already include
the library versions:

  /usr/bin/cygecpg-6.dll
  /usr/bin/cygecpg_compat-3.dll
  /usr/bin/cygpgtypes-3.dll
  /usr/bin/cygpq-5.dll

attached the patch used for the build.

Regards
Marco


--- origsrc/postgresql-9.4.2/src/Makefile.shlib 2015-05-20 00:33:58.0 
+0200
+++ src/Makefile.shlib  2015-05-27 23:01:09.379468300 +0200
@@ -267,7 +267,7 @@ endif
 ifeq ($(PORTNAME), cygwin)
   LINK.shared  = $(CC) -shared
   ifdef SO_MAJOR_VERSION
-shlib  = cyg$(NAME)$(DLSUFFIX)
+shlib  = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
   endif
   haslibarule   = yes
 endif
@@ -359,12 +359,9 @@ ifeq ($(PORTNAME), cygwin)
 # Cygwin case
 
 $(shlib): $(OBJS) | $(SHLIB_PREREQS)
-   $(CC) $(CFLAGS)  -shared -o $@  $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) 
$(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
+   $(CC) $(CFLAGS)  -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) 
$(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
 
-$(stlib): $(OBJS) | $(SHLIB_PREREQS)
-   rm -f $@
-   $(LINK.static) $@ $^
-   $(RANLIB) $@
+$(stlib): $(shlib) ;
 
 else
 
--- origsrc/postgresql-9.4.2/src/interfaces/libpq/Makefile  2015-05-20 
00:33:58.0 +0200
+++ src/interfaces/libpq/Makefile   2015-05-27 22:56:43.193200600 +0200
@@ -45,7 +45,7 @@ OBJS += ip.o md5.o
 OBJS += encnames.o wchar.o
 
 ifeq ($(PORTNAME), cygwin)
-override shlib = cyg$(NAME)$(DLSUFFIX)
+override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
 endif
 
 ifeq ($(PORTNAME), win32)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
> From: Michael Meskes [mailto:mes...@postgresql.org]
> > Yes, but Windows users probably don't understand or know it.  So, I
> > suggested explicitly describing the application binary compatibility
> > policy in the PostgreSQL manual.  What do you think about it?
> 
> Couldn't you point your customer to the system documentation?
> 
> Personally I don't think standard system behavior should be documented for
> each application relying on it but ymmv.

I couldn't find appropriate system documentation.  Regarding Linux, I remember 
I saw some HOWTO on tldp.org website which explains the concept of shared 
library soname, but it's not very friendly for users who just want to know the 
application binary compatibility policy of PostgreSQL.  And I don't think 
there's suitable documentation on Windows.  Even if there is any, users will 
not be sure whether PostgreSQL follows those platform-specific conventions.  
They may have doubts about it, because even the product version "PostgreSQL 
x.y.z" causes misconception that x is the major version and y is the minor one.

So, I suggested documenting the compatibility policy for clarification and user 
friendliness as in the Oracle Database documentation below.

http://docs.oracle.com/database/121/UPGRD/app.htm#UPGRD12547


BTW, although this may be a separate topic, it may be better that we add the 
major version in the library names like libpq5.dll and libecpg6.dll, so that 
the application can fail to run with the incompatible versions of libpq and 
libecpg.  FYI:

https://en.wikipedia.org/wiki/Side-by-side_assembly

[Excerpt]
Microsoft Visual C++ 2005 and 2008 employ SxS with all C runtime libraries. 
However, runtime libraries in Visual C++ 2010 no longer use this technology; 
instead, they include the version number of a DLL in its file name, which means 
that different versions of one DLL will technically be completely different 
DLLs now.


Any comments on these?  If there's no strong objection, I think I'll submit a 
documentation patch in the future.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-30 Thread Michael Meskes
> > Sorry I fail to understand what you mean with "legal"? Are you
> > wondering
> > about license restrictions? There are none.
> 
> Sorry, I just meant "correct" or "valid".

Ah ok, thanks for clarifying.

> > As for compatibility, that's what major version numbers are for.
> > This is
> > not a PostgreSQL topic, but a general system one as for instance
> > the same
> > holds for libc.
> 
> Yes, but Windows users probably don't understand or know it.  So, I
> suggested explicitly describing the application binary compatibility
> policy in the PostgreSQL manual.  What do you think about it?

Couldn't you point your customer to the system documentation?

Personally I don't think standard system behavior should be documented
for each application relying on it but ymmv.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-30 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> Sorry I fail to understand what you mean with "legal"? Are you wondering
> about license restrictions? There are none.

Sorry, I just meant "correct" or "valid".


> As for compatibility, that's what major version numbers are for. This is
> not a PostgreSQL topic, but a general system one as for instance the same
> holds for libc.

Yes, but Windows users probably don't understand or know it.  So, I suggested 
explicitly describing the application binary compatibility policy in the 
PostgreSQL manual.  What do you think about it?


Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-30 Thread Michael Meskes
Hi,

> - Build an embedded SQL C application with PostgreSQL 9.2.
> - Distribute the app without ecpg nor libpq libraries.  Require users
> to install PostgreSQL client which includes ecpg and libpq libraries.
> - Use the app with newer PostgreSQL major versions without rebuilding
> the app.  That is, the users just replaces the PostgreSQL client.
> 
> I expect this is legal, because the so_major versions of ecpg and
> libpq are 6 and 5 respectively for all PostgreSQL 9.x versions so
> far.  However, I could not find any description of this binary
> compatibility policy in the manual, so I haven't been able to answer
> the customer.

Sorry I fail to understand what you mean with "legal"? Are you
wondering about license restrictions? There are none.

As for compatibility, that's what major version numbers are for. This
is not a PostgreSQL topic, but a general system one as for instance the
same holds for libc.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question and suggestion about application binary compatibility policy

2016-05-29 Thread Tsunakawa, Takayuki
Hello,

I'd like to ask you about the policy of application binary compatibility.  And 
have a suggestion as well.

QUESTION
==

My customer asked me if the following usage is correct.

- Build an embedded SQL C application with PostgreSQL 9.2.
- Distribute the app without ecpg nor libpq libraries.  Require users to 
install PostgreSQL client which includes ecpg and libpq libraries.
- Use the app with newer PostgreSQL major versions without rebuilding the app.  
That is, the users just replaces the PostgreSQL client.

I expect this is legal, because the so_major versions of ecpg and libpq are 6 
and 5 respectively for all PostgreSQL 9.x versions so far.  However, I could 
not find any description of this binary compatibility policy in the manual, so 
I haven't been able to answer the customer.

What's the official policy of application binary compatibility?  I found the 
same discussion in the below thread, but I'm afraid any clear answer wasn't 
given:

https://www.postgresql.org/message-id/522f0b6b.1040...@4js.com


SUGGESTION
==

How about adding an article about application binary compatibility in the 
following section, as well as chapters for libpq, ECPG, etc?

17.6. Upgrading a PostgreSQL Clust
https://www.postgresql.org/docs/devel/static/upgrading.html

There are three kinds of application assets that users are concerned about when 
upgrading.  Are there anything else to mention?

* libpq app
* ECPG app
* C-language user defined function (and other stuff dependent on it, such as 
extensions, UDTs, etc.)

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about Restart point and checkpoint_segments

2016-02-18 Thread Benoit Lobréau
Sorry for the noise, I got my answer.

2016-02-16 18:02 GMT+01:00 Benoit Lobréau :

> Hello,
>
> I am using a hot_standby setup on PostgreSQL 9.1
> While I was testing, I found out that only checkpoint_timeout (+ a
> checkpoint since the last restart point) could trigger a restart point.
>
> The code (bgwriter.c) seems to confirm this:
>
> /*
> * Check progress against WAL segments written and checkpoint_segments.
> *
> * We compare the current WAL insert location against the location
> * computed before calling CreateCheckPoint. The code in XLogInsert that
> * actually triggers a checkpoint when checkpoint_segments is exceeded
> * compares against RedoRecptr, so this is not completely accurate.
> * However, it's good enough for our purposes, we're only calculating an
> * estimate anyway.
> */
> if (!RecoveryInProgress())  ===> Only in case of primary
> {
>recptr = GetInsertRecPtr();
>elapsed_xlogs =
>(((double) (int32) (recptr.xlogid -
> ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
>((double) recptr.xrecoff - (double)
> ckpt_start_recptr.xrecoff) / XLogSegSize) /
>CheckPointSegments;
>
>if (progress < elapsed_xlogs)  ===> progress in volume
>{
>ckpt_cached_elapsed = elapsed_xlogs;
>return false;
>}
> }
>
> /*
> * Check progress against time elapsed and checkpoint_timeout.
> */
> gettimeofday(, NULL);
> elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) +
>
> now.tv_usec / 100.0) / CheckPointTimeout;
>
>
> if (progress < elapsed_time)  ===> progress in time
> {
>ckpt_cached_elapsed = elapsed_time;
>return false;
> }
>
> /* It looks like we're on schedule. */
> return true;
>
> I also found a post from Simon Riggs [1]: "checkpoint_segments is ignored
> on standby."
>
> The documentation is stating the opposite [2]: "In standby mode, a
> restartpoint is also triggered if checkpoint_segments log segments have
> been replayed since last restartpoint and at least one checkpoint record
> has been replayed."
>
> Since I am not a native english speaker, maybe I misunderstood the
> documentation. But to me, it looks wrong.
> If it's indeed wrong. Could you explain why checkpoint_segments doesn't
> trigger a restart_point in standby mode ?
>
> Thank you
> Benoit
>
> [1]
> http://www.postgresql.org/message-id/CA+U5nMKdf7odZzYNnoRkkCZmJpGEy=oqbu9nan_zva_rtzi...@mail.gmail.com
> [2] http://www.postgresql.org/docs/9.1/static/wal-configuration.html
>


[HACKERS] Question about Restart point and checkpoint_segments

2016-02-16 Thread Benoit Lobréau
Hello,

I am using a hot_standby setup on PostgreSQL 9.1
While I was testing, I found out that only checkpoint_timeout (+ a
checkpoint since the last restart point) could trigger a restart point.

The code (bgwriter.c) seems to confirm this:

/*
* Check progress against WAL segments written and checkpoint_segments.
*
* We compare the current WAL insert location against the location
* computed before calling CreateCheckPoint. The code in XLogInsert that
* actually triggers a checkpoint when checkpoint_segments is exceeded
* compares against RedoRecptr, so this is not completely accurate.
* However, it's good enough for our purposes, we're only calculating an
* estimate anyway.
*/
if (!RecoveryInProgress())  ===> Only in case of primary
{
   recptr = GetInsertRecPtr();
   elapsed_xlogs =
   (((double) (int32) (recptr.xlogid -
ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
   ((double) recptr.xrecoff - (double)
ckpt_start_recptr.xrecoff) / XLogSegSize) /
   CheckPointSegments;

   if (progress < elapsed_xlogs)  ===> progress in volume
   {
   ckpt_cached_elapsed = elapsed_xlogs;
   return false;
   }
}

/*
* Check progress against time elapsed and checkpoint_timeout.
*/
gettimeofday(, NULL);
elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) +

now.tv_usec / 100.0) / CheckPointTimeout;


if (progress < elapsed_time)  ===> progress in time
{
   ckpt_cached_elapsed = elapsed_time;
   return false;
}

/* It looks like we're on schedule. */
return true;

I also found a post from Simon Riggs [1]: "checkpoint_segments is ignored
on standby."

The documentation is stating the opposite [2]: "In standby mode, a
restartpoint is also triggered if checkpoint_segments log segments have
been replayed since last restartpoint and at least one checkpoint record
has been replayed."

Since I am not a native english speaker, maybe I misunderstood the
documentation. But to me, it looks wrong.
If it's indeed wrong. Could you explain why checkpoint_segments doesn't
trigger a restart_point in standby mode ?

Thank you
Benoit

[1]
http://www.postgresql.org/message-id/CA+U5nMKdf7odZzYNnoRkkCZmJpGEy=oqbu9nan_zva_rtzi...@mail.gmail.com
[2] http://www.postgresql.org/docs/9.1/static/wal-configuration.html


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
Hi

2016-01-12 11:57 GMT+01:00 Michal Novotny :

> Dear PostgreSQL Hackers,
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.
>
> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?
>

drop table should be fast.

There can be two reasons why not:

1. locks - are you sure, so this statement didn't wait on some lock?

2. filesystem issue  - can you check the speed of rm 5TB file on your IO?

Regards

Pavel





>
> PostgreSQL version used is PgSQL 9.4.
>
> Thanks a lot!
> Michal
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Marko Tiikkaja

On 12/01/16 12:17, Pavel Stehule wrote:

2016-01-12 12:14 GMT+01:00 Michal Novotny :


Hi Pavel,
thanks for the information. I've been doing more investigation of this
issue and there's autovacuum running on the table however it's
automatically starting even if there is "autovacuum = off" in the
postgresql.conf configuration file.



Real autovacuum is automatically cancelled. It looks like VACUUM started by
cron, maybe?


Not if it's to prevent wraparound, which isn't unlikely if autovacuum=off.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Pavel,
thanks for the information. I've been doing more investigation of this
issue and there's autovacuum running on the table however it's
automatically starting even if there is "autovacuum = off" in the
postgresql.conf configuration file.

The test of rm 5T file was fast and not taking 24 hours already. I guess
the autovacuum is the issue. Is there any way how to disable it? If I
killed the process using 'kill -9' yesterday the process started again.

Is there any way how to cancel this process and disallow PgSQL to run
autovacuum again and do the drop instead?

Thanks,
Michal

On 01/12/2016 12:01 PM, Pavel Stehule wrote:
> Hi
> 
> 2016-01-12 11:57 GMT+01:00 Michal Novotny  >:
> 
> Dear PostgreSQL Hackers,
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.
> 
> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?
> 
> 
> drop table should be fast.
> 
> There can be two reasons why not:
> 
> 1. locks - are you sure, so this statement didn't wait on some lock?
> 
> 2. filesystem issue  - can you check the speed of rm 5TB file on your IO?
> 
> Regards
> 
> Pavel
> 
> 
> 
>  
> 
> 
> PostgreSQL version used is PgSQL 9.4.
> 
> Thanks a lot!
> Michal
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
2016-01-12 12:14 GMT+01:00 Michal Novotny :

> Hi Pavel,
> thanks for the information. I've been doing more investigation of this
> issue and there's autovacuum running on the table however it's
> automatically starting even if there is "autovacuum = off" in the
> postgresql.conf configuration file.
>

Real autovacuum is automatically cancelled. It looks like VACUUM started by
cron, maybe?


>
> The test of rm 5T file was fast and not taking 24 hours already. I guess
> the autovacuum is the issue. Is there any way how to disable it? If I
> killed the process using 'kill -9' yesterday the process started again.
>
> Is there any way how to cancel this process and disallow PgSQL to run
> autovacuum again and do the drop instead?
>
> Thanks,
> Michal
>
> On 01/12/2016 12:01 PM, Pavel Stehule wrote:
> > Hi
> >
> > 2016-01-12 11:57 GMT+01:00 Michal Novotny  > >:
> >
> > Dear PostgreSQL Hackers,
> > I've discovered an issue with dropping a large table (~5T). I was
> > thinking drop table is fast operation however I found out my
> assumption
> > was wrong.
> >
> > Is there any way how to tune it to drop a large table in the matter
> of
> > seconds or minutes? Any configuration variable in the
> postgresql.conf or
> > any tune up options available?
> >
> >
> > drop table should be fast.
> >
> > There can be two reasons why not:
> >
> > 1. locks - are you sure, so this statement didn't wait on some lock?
> >
> > 2. filesystem issue  - can you check the speed of rm 5TB file on your IO?
> >
> > Regards
> >
> > Pavel
> >
> >
> >
> >
> >
> >
> > PostgreSQL version used is PgSQL 9.4.
> >
> > Thanks a lot!
> > Michal
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> > )
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
> >
>


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,

thanks a lot. I've managed to run DROP TABLE and then cancel process
using pg_cancel_backend(autovacuum_pid) and it passed and dropped the 5T
table.

Thanks a lot!
Michal

On 01/12/2016 12:37 PM, Andres Freund wrote:
> Hi,
> 
> On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
>> thanks a lot for your reply. Unfortunately I've found out most it didn't
>> really start DROP TABLE yet and it's locked on autovacuum running for
>> the table and even if I kill the process it's autostarting again and again.
> 
> Start the DROP TABLE and *then* cancel the autovacuum session. That
> should work.
> 
>> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
>> entirely? Please note the "autovacuum = off" is set in the config file
>> (postgresql.conf).
> 
> That actually is likely to have caused the problem. Every
> autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
> data can't be interpreted correctly anymore at some point. That's called
> 'anti-wraparound vacuum". It's started even if you disabled autovacuum,
> to prevent database corruption.
> 
> If you disable autovacuum, you really should start vacuums in some other
> way.
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,
thanks a lot for your reply. Unfortunately I've found out most it didn't
really start DROP TABLE yet and it's locked on autovacuum running for
the table and even if I kill the process it's autostarting again and again.

Is there any way how to do the DROP TABLE and bypass/disable autovacuum
entirely? Please note the "autovacuum = off" is set in the config file
(postgresql.conf).

Thanks a lot,
Michal

On 01/12/2016 12:05 PM, Andres Freund wrote:
> Hi Michal,
> 
> This isn't really a question for -hackers, the list for postgres
> development. -general or -performance would have been more appropriate.
> 
> On 2016-01-12 11:57:05 +0100, Michal Novotny wrote:
>> I've discovered an issue with dropping a large table (~5T). I was
>> thinking drop table is fast operation however I found out my assumption
>> was wrong.
> 
> What exactly did you do, and how long did it take. Is there any chance
> you were actually waiting for the lock on that large table, instead of
> waiting for the actual execution?
> 
>> Is there any way how to tune it to drop a large table in the matter of
>> seconds or minutes? Any configuration variable in the postgresql.conf or
>> any tune up options available?
> 
> The time for dropping a table primarily is spent on three things:
> 1) acquiring the exclusive lock. How long this takes entirely depends on
>the concurrent activity. If there's a longrunning session using that
>table it'll take till that session is finished.
> 2) The cached portion of that table needs to be eviced from cache. How
>long that takes depends on the size of shared_buffers - but usually
>this is a relatively short amount of time, and only matters if you
>drop many, many relations.
> 3) The time the filesystem takes to actually remove the, in your case
>5000 1GB, files. This will take a while, but shouldn't be minutes.
> 
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
2016-01-12 12:22 GMT+01:00 Marko Tiikkaja :

> On 12/01/16 12:17, Pavel Stehule wrote:
>
>> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
>>
>> Hi Pavel,
>>> thanks for the information. I've been doing more investigation of this
>>> issue and there's autovacuum running on the table however it's
>>> automatically starting even if there is "autovacuum = off" in the
>>> postgresql.conf configuration file.
>>>
>>>
>> Real autovacuum is automatically cancelled. It looks like VACUUM started
>> by
>> cron, maybe?
>>
>
> Not if it's to prevent wraparound, which isn't unlikely if autovacuum=off.
>

I didn't know it.

Thank you

Pavel


>
>
> .m
>


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
On 2016-01-12 12:17:09 +0100, Pavel Stehule wrote:
> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
> 
> > Hi Pavel,
> > thanks for the information. I've been doing more investigation of this
> > issue and there's autovacuum running on the table however it's
> > automatically starting even if there is "autovacuum = off" in the
> > postgresql.conf configuration file.
> >
> 
> Real autovacuum is automatically cancelled. It looks like VACUUM started by
> cron, maybe?

Unless it's an anti-wraparound autovacuum...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
Hi,

On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
> thanks a lot for your reply. Unfortunately I've found out most it didn't
> really start DROP TABLE yet and it's locked on autovacuum running for
> the table and even if I kill the process it's autostarting again and again.

Start the DROP TABLE and *then* cancel the autovacuum session. That
should work.

> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
> entirely? Please note the "autovacuum = off" is set in the config file
> (postgresql.conf).

That actually is likely to have caused the problem. Every
autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
data can't be interpreted correctly anymore at some point. That's called
'anti-wraparound vacuum". It's started even if you disabled autovacuum,
to prevent database corruption.

If you disable autovacuum, you really should start vacuums in some other
way.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,

On 01/12/2016 12:37 PM, Andres Freund wrote:
> Hi,
> 
> On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
>> thanks a lot for your reply. Unfortunately I've found out most it didn't
>> really start DROP TABLE yet and it's locked on autovacuum running for
>> the table and even if I kill the process it's autostarting again and again.
> 
> Start the DROP TABLE and *then* cancel the autovacuum session. That
> should work.


By cancelling the autovacuum session you mean to run
pg_cancel_backend(pid int) *after* running DROP TABLE ?


> 
>> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
>> entirely? Please note the "autovacuum = off" is set in the config file
>> (postgresql.conf).


So should I set autovacuum to enable (on) and restart pgsql before doing
DROP TABLE (and pg_cancel_backend() as mentioned above)?


> 
> That actually is likely to have caused the problem. Every
> autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
> data can't be interpreted correctly anymore at some point. That's called
> 'anti-wraparound vacuum". It's started even if you disabled autovacuum,
> to prevent database corruption.

Ok, any recommendation how to set autovacuum_freeze_max_age?

Thanks,
Michal


> 
> If you disable autovacuum, you really should start vacuums in some other
> way.
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
Hi Michal,

This isn't really a question for -hackers, the list for postgres
development. -general or -performance would have been more appropriate.

On 2016-01-12 11:57:05 +0100, Michal Novotny wrote:
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.

What exactly did you do, and how long did it take. Is there any chance
you were actually waiting for the lock on that large table, instead of
waiting for the actual execution?

> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?

The time for dropping a table primarily is spent on three things:
1) acquiring the exclusive lock. How long this takes entirely depends on
   the concurrent activity. If there's a longrunning session using that
   table it'll take till that session is finished.
2) The cached portion of that table needs to be eviced from cache. How
   long that takes depends on the size of shared_buffers - but usually
   this is a relatively short amount of time, and only matters if you
   drop many, many relations.
3) The time the filesystem takes to actually remove the, in your case
   5000 1GB, files. This will take a while, but shouldn't be minutes.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny

On 01/12/2016 12:20 PM, Andres Freund wrote:
> On 2016-01-12 12:17:09 +0100, Pavel Stehule wrote:
>> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
>>
>>> Hi Pavel,
>>> thanks for the information. I've been doing more investigation of this
>>> issue and there's autovacuum running on the table however it's
>>> automatically starting even if there is "autovacuum = off" in the
>>> postgresql.conf configuration file.
>>>
>>
>> Real autovacuum is automatically cancelled. It looks like VACUUM started by
>> cron, maybe?
> 
> Unless it's an anti-wraparound autovacuum...
> 
> Andres
> 

Autovacuum is not started by CRON. How should I understand the
"anti-wraparound autovacuum" ?

Thanks,
Michal


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Dear PostgreSQL Hackers,
I've discovered an issue with dropping a large table (~5T). I was
thinking drop table is fast operation however I found out my assumption
was wrong.

Is there any way how to tune it to drop a large table in the matter of
seconds or minutes? Any configuration variable in the postgresql.conf or
any tune up options available?

PostgreSQL version used is PgSQL 9.4.

Thanks a lot!
Michal


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-17 Thread Kevin Grittner
On Tuesday, November 17, 2015 12:43 AM, konstantin knizhnik 
 wrote:
> On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:

>> If you are saying that DTM tries to roll back a transaction after
>> any participating server has entered the RecordTransactionCommit()
>> critical section, then IMO it is broken.  Full stop.  That can't
>> work with any reasonable semantics as far as I can see.
>
> DTM is not trying to rollback committed transaction.
> What he tries to do is to hide this commit.
> As I already wrote, the idea was to implement "lightweight" 2PC
> because prepared transactions mechanism in PostgreSQL adds too much
> overhead and cause soe problems with recovery.

The point remains that there must be *some* "point of no return"
beyond which rollback (or "hiding" is not possible).  Until this
point, all heavyweight locks held by the transaction must be
maintained without interruption, data modification of the
transaction must not be visible, and any attempt to update or
delete data updated or deleted by the transaction must block or
throw an error.  It sounds like you are attempting to move the
point at which this "point of no return" is, but it isn't as clear
as I would like.  It seems like all participating nodes are
responsible for notifying the arbiter that they have completed, and
until then the arbiter gets involved in every visibility check,
overriding the "normal" value?

> The transaction is normally committed in xlog, so that it can
> always be recovered in case of node fault.
> But before setting correspondent bit(s) in CLOG and releasing
> locks we first contact arbiter to get global status of transaction.
> If it is successfully locally committed by all nodes, then
> arbiter approves commit and commit of transaction normally
> completed.
> Otherwise arbiter rejects commit. In this case DTM marks
> transaction as aborted in CLOG and returns error to the client.
> XLOG is not changed and in case of failure PostgreSQL will try to
> replay this transaction.
> But during recovery it also tries to restore transaction status
> in CLOG.
> And at this placeDTM contacts arbiter to know status of
> transaction.
> If it is marked as aborted in arbiter's CLOG, then it wiull be
> also marked as aborted in local CLOG.
> And according to PostgreSQL visibility rules no other transaction
> will see changes made by this transaction.

If a node goes through crash and recovery after it has written its
commit information to xlog, how are its heavyweight locks, etc.,
maintained throughout?  For example, does each arbiter node have
the complete set of heavyweight locks?  (Basically, all the
information which can be written to files in pg_twophase must be
held somewhere by all arbiter nodes, and used where appropriate.)

If a participating node is lost after some other nodes have told
the arbiter that they have committed, and the lost node will never
be able to indicate that it is committed or rolled back, what is
the mechanism for resolving that?

>>> We can not just call elog(ERROR,...) in SetTransactionStatus
>>> implementation because inside critical section it cause Postgres
>>> crash with panic message. So we have to remember that transaction is
>>> rejected and report error later after exit from critical section:
>>
>> I don't believe that is a good plan.  You should not enter the
>> critical section for recording that a commit is complete until all
>> the work for the commit is done except for telling the all the
>> servers that all servers are ready.
>
> It is good point.
> May be it is the reason of performance scalability problems we
> have noticed with DTM.

Well, certainly the first phase of two-phase commit can take place
in parallel, and once that is complete then the second phase
(commit or rollback of all the participating prepared transactions)
can take place in parallel.  There is no need to serialize that.

> Sorry, some clarification.
> We get 10x slowdown of performance caused by 2pc on very heavy
> load on the IBM system with 256 cores.
> At "normal" servers slowdown of 2pc is smaller - about 2x.

That suggests some contention point, probably on spinlocks.  Were
you able to identify the particular hot spot(s)?


On Tuesday, November 17, 2015 3:09 AM, konstantin knizhnik 
 wrote:
> On Nov 17, 2015, at 10:44 AM, Amit Kapila wrote:

>> I think the general idea is that if Commit is WAL logged, then the
>> operation is considered to committed on local node and commit should
>> happen on any node, only once prepare from all nodes is successful.
>> And after that transaction is not supposed to abort.  But I think you are
>> trying to optimize the DTM in some way to not follow that kind of protocol.
>
> DTM is still following 2PC protocol:
> First transaction is saved in WAL at all nodes and only after it
> commit is completed at all nodes.

So, essentially you are treating the traditional commit point as
phase 1 in a new approach to two-phase 

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-17 Thread Alvaro Herrera
konstantin knizhnik wrote:

> The transaction is normally committed in xlog, so that it can always be 
> recovered in case of node fault.
> But before setting correspondent bit(s) in CLOG and releasing locks we first 
> contact arbiter to get global status of transaction.
> If it is successfully locally committed by all nodes, then arbiter approves 
> commit and commit of transaction normally completed.
> Otherwise arbiter rejects commit. In this case DTM marks transaction as 
> aborted in CLOG and returns error to the client.
> XLOG is not changed and in case of failure PostgreSQL will try to replay this 
> transaction.
> But during recovery it also tries to restore transaction status in CLOG.
> And at this placeDTM contacts arbiter to know status of transaction.
> If it is marked as aborted in arbiter's CLOG, then it wiull be also marked as 
> aborted in local CLOG.
> And according to PostgreSQL visibility rules no other transaction will see 
> changes made by this transaction.

One problem I see with this approach is that the WAL replay can happen
long after it was written; for instance you might have saved a
basebackup and WAL stream and replay it all several days or weeks later,
when the arbiter no longer has information about the XID.  Later
transactions might (will) depend on the aborted state of the transaction
in question, so this effectively corrupts the database.

In other words, while it's reasonable to require that the arbiter can
always be contacted for transaction commit/abort at run time, but it's
not reasonable to contact the arbiter during WAL replay.

I think this merits more explanation:

> The transaction is normally committed in xlog, so that it can always be 
> recovered in case of node fault.

Why would anyone want to "recover" a transaction that was aborted?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-17 Thread konstantin knizhnik

On Nov 17, 2015, at 10:44 AM, Amit Kapila wrote:

> 
> I think the general idea is that if Commit is WAL logged, then the
> operation is considered to committed on local node and commit should
> happen on any node, only once prepare from all nodes is successful.
> And after that transaction is not supposed to abort.  But I think you are
> trying to optimize the DTM in some way to not follow that kind of protocol.

DTM is still following 2PC protocol:
First transaction is saved in WAL at all nodes and only after it commit is 
completed at all nodes.
We try to avoid maintaining of separate log files for 2PC (as now for prepared 
transactions)
and do not want to change logic of work with WAL.

DTM approach is based on the assumption that PostgreSQL CLOG and visibility 
rules allows to "hide" transaction even if it is committed in WAL.


> By the way, how will arbiter does the recovery in a scenario where it
> crashes, won't it need to contact all nodes for the status of in-progress or
> prepared transactions? 

The current answer is that arbiter can not crash. To provide fault tolerance we 
spawn replicas of arbiter which are managed using Raft protocol.
If master is crashed or network is partitioned then new master is chosen.
PostgreSQL backends have list of possible arbiter addresses. Once connection 
with arbiter is broken, backend tries to reestablish connection using 
alternative addresses.
But only master accepts incomming connections.


> I think it would be better if more detailed design of DTM with respect to
> transaction management and recovery could be updated on wiki for having
> discussion on this topic.  I have seen that you have already updated many
> details of the system, but still the complete picture of DTM is not clear.

I agree.
But please notice that pg_dtm is just one of the possible implementations of 
distributed transaction management.
We also experimenting with other implementations, for example pg_tsftm based on 
timestamps. It doesn't require central arbiter and so shows much better (almost 
linear) scalability.
But recovery in case of pg_tsdtm is even more obscure.
Also performance of pg_tsdtm greatly depends on system clock synchronization 
and network delays. We git about 70k TPS on cluster with 12 nodes connected 
with 10Gbit network., 
But when we run the same test on hosts located in different geographic regions 
(several thousands km), then performance falls down to 15 TPS.
 


> 
> 
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Alvaro Herrera
Konstantin Knizhnik wrote:

> But you may notice that original TransactionIdSetTreeStatus function is void
> - it is not intended to return anything.
> It is called in RecordTransactionCommit in critical section where it is not
> expected that commit may fail.
> But in case of DTM transaction may be rejected by arbiter. XTM API allows to
> control access to CLOG, so everybody will see that transaction is aborted.
> But we in any case have to somehow notify client about abort of transaction.

I think you'll need to rethink how a transaction commits rather
completely, rather than consider localized tweaks to specific functions.
For one thing, the WAL record about transaction commit has already been
written by XactLogCommitRecord much earlier than calling
TransactionIdCommitTree.  So if you were to crash at that point, it
doesn't matter how much the arbiter has rejected the transaction, WAL
replay would mark it as committed.  Also, what about the replication
origin stuff and the TransactionTreeSetCommitTsData() call?

I think you need to involve the arbiter earlier, so that the commit
process can be aborted earlier than those things.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Konstantin Knizhnik

On 11/16/2015 10:54 PM, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


But you may notice that original TransactionIdSetTreeStatus function is void
- it is not intended to return anything.
It is called in RecordTransactionCommit in critical section where it is not
expected that commit may fail.
But in case of DTM transaction may be rejected by arbiter. XTM API allows to
control access to CLOG, so everybody will see that transaction is aborted.
But we in any case have to somehow notify client about abort of transaction.

I think you'll need to rethink how a transaction commits rather
completely, rather than consider localized tweaks to specific functions.
For one thing, the WAL record about transaction commit has already been
written by XactLogCommitRecord much earlier than calling
TransactionIdCommitTree.  So if you were to crash at that point, it
doesn't matter how much the arbiter has rejected the transaction, WAL
replay would mark it as committed.


Yes, WAL replay will recover this transaction and try to mark it in CLOG as 
completed, but ... we have caught control over CLOG using XTM.
And instead of direct writing to CLOG, DTM will contact arbiter and ask his 
opinion concerning this transaction.
If arbiter doesn't think that it was committed, then it will not be marked as 
committed in local CLOG.



  Also, what about the replication
origin stuff and the TransactionTreeSetCommitTsData() call?

I think you need to involve the arbiter earlier, so that the commit
process can be aborted earlier than those things.





Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Kevin Grittner
On Monday, November 16, 2015 2:47 AM, Konstantin Knizhnik 
 wrote:

> Some time ago at PgConn.Vienna we have proposed eXtensible
>Transaction Manager API (XTM).
> The idea is to be able to provide custom implementation of
>transaction managers as standard Postgres extensions,
> primary goal is implementation of distritibuted transaction manager.
> It should not only support 2PC, but also provide consistent
>snapshots for global transaction executed at different nodes.
>
> Actually, current version of XTM API  propose any particular 2PC
>model. It can be implemented either at coordinator side
> (as it is done in our pg_tsdtm implementation based on timestamps
> and not requiring centralized arbiter), either by arbiter
> (pg_dtm).

I'm not entirely clear on what you're saying here.  I admit I've
not kept in close touch with the distributed processing discussions
lately -- is there a write-up and/or diagram to give an overview of
where we're at with this effort?

> In the last case 2PC logic is hidden under XTM
> SetTransactionStatus method:
>
>  bool (*SetTransactionStatus)(TransactionId xid, int nsubxids,
>TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
>
> which encapsulates TransactionIdSetTreeStatus in clog.c.
> But you may notice that original TransactionIdSetTreeStatus function
>is void - it is not intended to return anything.
> It is called in RecordTransactionCommit in critical section where it
>is not expected that commit may fail.

This issue, though, seems clear enough.  At some point a
transaction must cross a hard line between when it is not committed
and when it is, since after commit subsequent transactions can then
see the data and modify it.  There has to be some "point of no
return" in order to have any sane semantics.  Entering that critical
section is it.

> But in case of DTM transaction may be rejected by arbiter. XTM API
>allows to control access to CLOG, so everybody will see that
>transaction is aborted. But we in any case have to somehow notify
>client about abort of transaction.

If you are saying that DTM tries to roll back a transaction after
any participating server has entered the RecordTransactionCommit()
critical section, then IMO it is broken.  Full stop.  That can't
work with any reasonable semantics as far as I can see.

> We can not just call elog(ERROR,...) in SetTransactionStatus
>implementation because inside critical section it cause Postgres
>crash with panic message. So we have to remember that transaction is
>rejected and report error later after exit from critical section:

I don't believe that is a good plan.  You should not enter the
critical section for recording that a commit is complete until all
the work for the commit is done except for telling the all the
servers that all servers are ready.

> There is one more problem - at this moment the state of transaction
>is TRANS_COMMIT.
> If ERROR handler will try to abort it, then we get yet another fatal
>error: attempt to rollback committed transaction.
> So we need to hide the fact that transaction is actually committed
>in local XLOG.

That is one of pretty much an infinite variety of problems you have
if you don't have a "hard line" for when the transaction is finally
committed.

> This approach works but looks a little bit like hacker approach. It
>requires not only to replace direct call of
>TransactionIdSetTreeStatus with indirect (though XTM API), but also
>requires  to make some non obvious changes in
>RecordTransactionCommit.
>
> So what are the alternatives?
>
> 1. Move RecordTransactionCommit to XTM. In this case we have to copy
>original RecordTransactionCommit to DTM implementation and patch it
>here. It is also not nice, because it will complicate maintenance of
>DTM implementation.
> The primary idea of XTM is to allow development of DTM as standard
>PostgreSQL extension without creating of specific clones of main
>PostgreSQL source tree. But this idea will be compromised if we have
>copy some pieces of PostgreSQL code.
> In some sense it is even worser than maintaining separate branch -
>in last case at least we have some way to perfrtom automatic merge.

You can have a call in XTM that says you want to record the the
commit on all participating servers, but I don't see where that
would involve moving anything we have now out of each participating
server -- it would just need to function like a real,
professional-quality distributed transaction manager doing the
second phase of a two-phase commit.  If any participating server
goes through the first phase and reports that all the heavy lifting
is done, and then is swallowed up in a pyroclastic flow of an
erupting volcano before phase 2 comes around, the DTM must
periodically retry until the administrator cancels the attempt.

> 2. Propose some alternative two-phase commit implementation in
>

[HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Konstantin Knizhnik

Hello,

Some time ago at PgConn.Vienna we have proposed eXtensible Transaction 
Manager API (XTM).
The idea is to be able to provide custom implementation of transaction 
managers as standard Postgres extensions,

primary goal is implementation of distritibuted transaction manager.
It should not only support 2PC, but also provide consistent snapshots 
for global transaction executed at different nodes.


Actually, current version of XTM API  propose any particular 2PC model. 
It can be implemented either at coordinator side
(as it is done in our pg_tsdtm  
implementation based on timestamps and not requiring centralized 
arbiter), either by arbiter
(pg_dtm ). In the last case 2PC 
logic is hidden under XTM SetTransactionStatus method:


 bool (*SetTransactionStatus)(TransactionId xid, int nsubxids, 
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);


which encapsulates TransactionIdSetTreeStatus in clog.c.
But you may notice that original TransactionIdSetTreeStatus function is 
void - it is not intended to return anything.
It is called in RecordTransactionCommit in critical section where it is 
not expected that commit may fail.
But in case of DTM transaction may be rejected by arbiter. XTM API 
allows to control access to CLOG, so everybody will see that transaction 
is aborted. But we in any case have to somehow notify client about abort 
of transaction.


We can not just call elog(ERROR,...) in SetTransactionStatus 
implementation because inside critical section it cause Postgres crash 
with panic message. So we have to remember that transaction is rejected 
and report error later after exit from critical section:



/*
 * Now we may update the CLOG, if we wrote a COMMIT record above
 */
if (markXidCommitted) {
committed = TransactionIdCommitTree(xid, nchildren, children);
}
...
/*
 * If we entered a commit critical section, leave it now, and let
 * checkpoints proceed.
 */
if (markXidCommitted)
{
MyPgXact->delayChkpt = false;
END_CRIT_SECTION();
if (!committed) {
CurrentTransactionState->state = TRANS_ABORT;
CurrentTransactionState->blockState = TBLOCK_ABORT_PENDING;
elog(ERROR, "Transaction commit rejected by XTM");
}
}

There is one more problem - at this moment the state of transaction is 
TRANS_COMMIT.
If ERROR handler will try to abort it, then we get yet another fatal 
error: attempt to rollback committed transaction.
So we need to hide the fact that transaction is actually committed in 
local XLOG.


This approach works but looks a little bit like hacker approach. It 
requires not only to replace direct call of TransactionIdSetTreeStatus 
with indirect (though XTM API), but also requires  to make some non 
obvious changes in RecordTransactionCommit.


So what are the alternatives?

1. Move RecordTransactionCommit to XTM. In this case we have to copy 
original RecordTransactionCommit to DTM implementation and patch it 
here. It is also not nice, because it will complicate maintenance of DTM 
implementation.
The primary idea of XTM is to allow development of DTM as standard 
PostgreSQL extension without creating of specific clones of main 
PostgreSQL source tree. But this idea will be compromised if we have 
copy some pieces of PostgreSQL code.
In some sense it is even worser than maintaining separate branch - in 
last case at least we have some way to perfrtom automatic merge.


2. Propose some alternative two-phase commit implementation in 
PostgreSQL core. The main motivation for such "lightweight" 
implementation of 2PC in pg_dtm is that original mechanism of prepared 
transactions in PostgreSQL adds to much overhead.
In our benchmarks we have found that simple credit-debit banking test 
(without any DTM) works almost 10 times slower with PostgreSQL 2PC than 
without it. This is why we try to propose alternative solution (right 
now pg_dtm is 2 times slower than vanilla PostgreSQL, but it not only 
performs 2PC but also provide consistent snapshots).


May be somebody can suggest some other solution?
Or give some comments concerning current approach?

Thank in advance,
Konstantin,
Postgres Professional



Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Atri Sharma
> I think the general idea is that if Commit is WAL logged, then the
> operation is considered to committed on local node and commit should
> happen on any node, only once prepare from all nodes is successful.
> And after that transaction is not supposed to abort.  But I think you are
> trying to optimize the DTM in some way to not follow that kind of
protocol.
> By the way, how will arbiter does the recovery in a scenario where it
> crashes, won't it need to contact all nodes for the status of in-progress
or
> prepared transactions?
> I think it would be better if more detailed design of DTM with respect to
> transaction management and recovery could be updated on wiki for having
> discussion on this topic.  I have seen that you have already updated many
> details of the system, but still the complete picture of DTM is not clear.

I agree.

I have not been following this discussion but from what I have read above I
think the recovery model in this design is broken. You have to follow some
protocol, whichever you choose.

I think you can try using something like Paxos,  if you are looking at a
higher reliable model but don't want the overhead of 3PC.


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Amit Kapila
On Tue, Nov 17, 2015 at 12:12 PM, konstantin knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Thank you for your response.
>
>
> On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
>
> I'm not entirely clear on what you're saying here.  I admit I've
> not kept in close touch with the distributed processing discussions
> lately -- is there a write-up and/or diagram to give an overview of
> where we're at with this effort?
>
>
> https://wiki.postgresql.org/wiki/DTM
>
>
> If you are saying that DTM tries to roll back a transaction after
> any participating server has entered the RecordTransactionCommit()
> critical section, then IMO it is broken.  Full stop.  That can't
> work with any reasonable semantics as far as I can see.
>
>
> DTM is not trying to rollback committed transaction.
> What he tries to do is to hide this commit.
> As I already wrote, the idea was to implement "lightweight" 2PC because
> prepared transactions mechanism in PostgreSQL adds too much overhead and
> cause soe problems with recovery.
>
> The transaction is normally committed in xlog, so that it can always be
> recovered in case of node fault.
> But before setting correspondent bit(s) in CLOG and releasing locks we
> first contact arbiter to get global status of transaction.
> If it is successfully locally committed by all nodes, then arbiter
> approves commit and commit of transaction normally completed.
> Otherwise arbiter rejects commit. In this case DTM marks transaction as
> aborted in CLOG and returns error to the client.
> XLOG is not changed and in case of failure PostgreSQL will try to replay
> this transaction.
> But during recovery it also tries to restore transaction status in CLOG.
> And at this placeDTM contacts arbiter to know status of transaction.
>

I think the general idea is that if Commit is WAL logged, then the
operation is considered to committed on local node and commit should
happen on any node, only once prepare from all nodes is successful.
And after that transaction is not supposed to abort.  But I think you are
trying to optimize the DTM in some way to not follow that kind of protocol.
By the way, how will arbiter does the recovery in a scenario where it
crashes, won't it need to contact all nodes for the status of in-progress or
prepared transactions?
I think it would be better if more detailed design of DTM with respect to
transaction management and recovery could be updated on wiki for having
discussion on this topic.  I have seen that you have already updated many
details of the system, but still the complete picture of DTM is not clear.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread konstantin knizhnik
Thank you for your response.


On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
> I'm not entirely clear on what you're saying here.  I admit I've
> not kept in close touch with the distributed processing discussions
> lately -- is there a write-up and/or diagram to give an overview of
> where we're at with this effort?

https://wiki.postgresql.org/wiki/DTM

> 
> If you are saying that DTM tries to roll back a transaction after
> any participating server has entered the RecordTransactionCommit()
> critical section, then IMO it is broken.  Full stop.  That can't
> work with any reasonable semantics as far as I can see.

DTM is not trying to rollback committed transaction.
What he tries to do is to hide this commit.
As I already wrote, the idea was to implement "lightweight" 2PC because 
prepared transactions mechanism in PostgreSQL adds too much overhead and cause 
soe problems with recovery.

The transaction is normally committed in xlog, so that it can always be 
recovered in case of node fault.
But before setting correspondent bit(s) in CLOG and releasing locks we first 
contact arbiter to get global status of transaction.
If it is successfully locally committed by all nodes, then arbiter approves 
commit and commit of transaction normally completed.
Otherwise arbiter rejects commit. In this case DTM marks transaction as aborted 
in CLOG and returns error to the client.
XLOG is not changed and in case of failure PostgreSQL will try to replay this 
transaction.
But during recovery it also tries to restore transaction status in CLOG.
And at this placeDTM contacts arbiter to know status of transaction.
If it is marked as aborted in arbiter's CLOG, then it wiull be also marked as 
aborted in local CLOG.
And according to PostgreSQL visibility rules no other transaction will see 
changes made by this transaction.



> 
>> We can not just call elog(ERROR,...) in SetTransactionStatus
>>   implementation because inside critical section it cause Postgres
>>   crash with panic message. So we have to remember that transaction is
>>   rejected and report error later after exit from critical section:
> 
> I don't believe that is a good plan.  You should not enter the
> critical section for recording that a commit is complete until all
> the work for the commit is done except for telling the all the
> servers that all servers are ready.

It is good point. 
May be it is the reason of performance scalability problems we have noticed 
with DTM.

>> In our benchmarks we have found that simple credit-debit banking
>>   test (without any DTM) works almost 10 times slower with PostgreSQL
>>   2PC than without it. This is why we try to propose alternative
>>   solution (right now pg_dtm is 2 times slower than vanilla
>>   PostgreSQL, but it not only performs 2PC but also provide consistent
>>   snapshots).
> 
> Are you talking about 10x the latency on a commit, or that the
> overall throughput under saturation load is one tenth of running
> without something to guarantee the transactional integrity of the
> whole set of nodes?  The former would not be too surprising, while
> the latter would be rather amazing.

Sorry, some clarification.
We get 10x slowdown of performance caused by 2pc on very heavy load on the IBM 
system with 256 cores.
At "normal" servers slowdown of 2pc is smaller - about 2x.

> 
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Question about TEMP tables

2015-03-18 Thread David G. Johnston
On Tuesday, March 17, 2015, Воронин Дмитрий carriingfat...@yandex.ru
wrote:

   Make sure to show your full command(s) and the full, exact text of any
 errors.

 OK, I use PostgreSQL version 9.4.1.

 I create cluster 'main' and connect to it. After cluster init we have
 those shemas:

 postgres=# SELECT nspname FROM pg_namespace ;
   nspname
 
  pg_toast
  pg_temp_1
  pg_toast_temp_1
  pg_catalog
  public
  information_schema
 (6 rows)

 Owner of those schemas is postgres (OID 10).

 Now we try to create TEMP TABLE, for example:

 postgres=# CREATE TEMP TABLE temptable();
 CREATE TABLE

 Show namespaces:

 postgres=# SELECT nspname FROM pg_namespace ;
   nspname
 
  pg_toast
  pg_temp_1
  pg_toast_temp_1
  pg_catalog
  public
  information_schema
  pg_temp_2
  pg_toast_temp_2
 (8 rows)

 Now we create a new database testdb and connect to it:

 CREATE DATABASE testdb;
 \c testdb

 SHOW namespaces of testdb (we already connect to it):

 testdb=# SELECT nspname FROM pg_namespace ;
   nspname
 
  pg_toast
  pg_temp_1
  pg_toast_temp_1
  pg_catalog
  public
  information_schema
 (6 rows)

 OK, namespaces pg_temp_2 and pg_toast_temp_2 are not visible. But
 pg_temp_1 and pg_toast_temp_1 are visible. WHY?

 If we create some temp objects in testdb Postgres wiil create namespaces
 pg_temp_3 and pg_toast_temp_3.

 Try to create temp table at pg_temp_1:


As I note below, you don't get to choose; you just say CREATE TEMP TABLE
schemaless_name


 CREATE TEMP TABLE pg_temp_1.temptable();
 ERROR: cannot create relations in temporary schemas of other sessions

 I catch those error if I create some TEMP objects in postgres database.

 --
 Best regards, Dmitry Voronin


Schemas are not global and so can vary between databases.

You do not specify the schema in which temp tables are created.  The system
auto-assigns them, and also creates them based on need.

Temporary objects only survive for the life of the session creating them.

Empty temp schemas are ugly but aside from ignoring/hiding them from your
viewer there isn't much worth doing.  The system will just recreate them if
you drop them manually.

It will create numbers potentially up to the number of simultaneous
connections you allow.  It my have affinity but that is an implementation
detail you shouldn't care about.

David J.


[HACKERS] Question about TEMP tables

2015-03-17 Thread Воронин Дмитрий
Hello, all.

We can create temp namespaces and temp objects that contains it. So, for 
example, temp table will be create at pg_temp_N (N - backendID). But afrer 
cluster init we have pg_temp_1 and pg_toast_temp_1 namespaces with OIDs 11333 
and 11334. Those namespaces are visible from any cluster database, but we 
cannot create any temp objects (please, correct me).

So, how can we use those namespaces and what are needed for?
 
Thank you.
 
-- 
Best regards, Dmitry Voronin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about TEMP tables

2015-03-17 Thread Воронин Дмитрий
  Make sure to show your full command(s) and the full, exact text of any 
 errors.

OK, I use PostgreSQL version 9.4.1.

I create cluster 'main' and connect to it. After cluster init we have those 
shemas:

postgres=# SELECT nspname FROM pg_namespace ;
  nspname   

 pg_toast
 pg_temp_1
 pg_toast_temp_1
 pg_catalog
 public
 information_schema
(6 rows)

Owner of those schemas is postgres (OID 10). 

Now we try to create TEMP TABLE, for example:

postgres=# CREATE TEMP TABLE temptable();
CREATE TABLE

Show namespaces:

postgres=# SELECT nspname FROM pg_namespace ;
  nspname   

 pg_toast
 pg_temp_1
 pg_toast_temp_1
 pg_catalog
 public
 information_schema
 pg_temp_2
 pg_toast_temp_2
(8 rows)

Now we create a new database testdb and connect to it:

CREATE DATABASE testdb;
\c testdb

SHOW namespaces of testdb (we already connect to it):

testdb=# SELECT nspname FROM pg_namespace ;
  nspname   

 pg_toast
 pg_temp_1
 pg_toast_temp_1
 pg_catalog
 public
 information_schema
(6 rows)

OK, namespaces pg_temp_2 and pg_toast_temp_2 are not visible. But pg_temp_1 and 
pg_toast_temp_1 are visible. WHY?

If we create some temp objects in testdb Postgres wiil create namespaces 
pg_temp_3 and pg_toast_temp_3.

Try to create temp table at pg_temp_1:

CREATE TEMP TABLE pg_temp_1.temptable();
ERROR: cannot create relations in temporary schemas of other sessions

I catch those error if I create some TEMP objects in postgres database. 

-- 
Best regards, Dmitry Voronin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about TEMP tables

2015-03-16 Thread Dmitry Voronin
Hello, all.

We can create temp namespaces and temp objects that contains it. So, for 
example, temp table will be create at pg_temp_N (N - backendID). But afrer 
cluster init we have pg_temp_1 and pg_toast_temp_1 namespaces with OIDs 11333 
and 11334. Those namespaces are visible from any cluster database, but we 
cannot create any temp objects (please, correct me).

So, how can we use those namespaces and what are needed for?

Thank you.

-- 
Best regards, Dmitry Voronin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about TEMP tables

2015-03-16 Thread Craig Ringer
On 16 March 2015 at 16:31, Dmitry Voronin carriingfat...@yandex.ru wrote:

 Hello, all.

 We can create temp namespaces and temp objects that contains it. So, for
 example, temp table will be create at pg_temp_N (N - backendID). But afrer
 cluster init we have pg_temp_1 and pg_toast_temp_1 namespaces with OIDs
 11333 and 11334. Those namespaces are visible from any cluster database,
 but we cannot create any temp objects (please, correct me).


This is better suited to the pgsql-general or pgsql-admin mailing lists.

Make sure to show your full command(s) and the full, exact text of any
errors.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


[HACKERS] Question about TEMP tables

2015-03-16 Thread Dmitry Voronin
Hello, all.We can create temp namespaces and temp objects that contains it. So, for example, temp table will be create at pg_temp_N (N - backendID). But afrer cluster init we have pg_temp_1 and pg_toast_temp_1 namespaces with OIDs 11333 and 11334. Those namespaces are visible from any cluster database, but we cannot create any temp objects (please, correct me). So, how can we use those namespaces and what are needed for? Thank you. -- Best regards, Dmitry Voronin 



Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-10 Thread Noah Misch
I'm okay with any of the proposed designs or with dropping the idea.  Closing
the loop on a few facts:

On Sat, Mar 07, 2015 at 04:34:41PM -0600, Jim Nasby wrote:
 If we go that route, does it still make sense to explicitly use
 repalloc_huge? It will just cut over to that at some point (128M?) anyway,
 and if you're vacuuming a small relation presumably it's not worth messing
 with.

repalloc_huge() differs from repalloc() only in the size ceiling beyond which
they raise errors.  repalloc() raises errors for requests larger than ~1 GiB,
while repalloc_huge() is practically unconstrained on 64-bit and permits up to
~2 GiB on 32-bit.

On Mon, Mar 09, 2015 at 05:12:22PM -0500, Jim Nasby wrote:
 Speaking of which... people have referenced allowing  1GB of dead tuples,
 which means allowing maintenance_work_mem  MAX_KILOBYTES. The comment for
 that says:
 
 /* upper limit for GUC variables measured in kilobytes of memory */
 /* note that various places assume the byte size fits in a long variable
 */
 
 So I'm not sure how well that will work. I think that needs to be a separate
 patch.

On LP64 platforms, MAX_KILOBYTES already covers maintenance_work_mem values up
to ~2 TiB.  Raising the limit on ILP32 platforms is not worth the trouble.
Raising the limit on LLP64 platforms is a valid but separate project.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Alvaro Herrera
Robert Haas wrote:
 On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
  I was thinking the simpler route of just repalloc'ing... the memcpy would
  suck, but much less so than the extra index pass. 64M gets us 11M tuples,
  which probably isn't very common.
 
  That has the chance of considerably increasing the peak memory usage
  though, as you obviously need both the old and new allocation during the
  repalloc().
 
  And in contrast to the unused memory at the tail of the array, which
  will usually not be actually allocated by the OS at all, this is memory
  that's actually read/written respectively.
 
 Yeah, I'm not sure why everybody wants to repalloc() that instead of
 making several separate allocations as needed.  That would avoid
 increasing peak memory usage, and would avoid any risk of needing to
 copy the whole array.  Also, you could grow in smaller chunks, like
 64MB at a time instead of 1GB or more at a time.  Doubling an
 allocation that's already 1GB or more gets big in a hurry.

Yeah, a chunk list rather than a single chunk seemed a good idea to me
too.

Also, I think the idea of starting with an allocation assuming some
small number of dead tuples per page made sense -- and by the time that
space has run out, you have a better estimate of actual density of dead
tuples, so you can do the second allocation based on that new estimate
(but perhaps clamp it at say 1 GB, just in case you just scanned a
portion of the table with an unusually high percentage of dead tuples.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Robert Haas
On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

 That has the chance of considerably increasing the peak memory usage
 though, as you obviously need both the old and new allocation during the
 repalloc().

 And in contrast to the unused memory at the tail of the array, which
 will usually not be actually allocated by the OS at all, this is memory
 that's actually read/written respectively.

Yeah, I'm not sure why everybody wants to repalloc() that instead of
making several separate allocations as needed.  That would avoid
increasing peak memory usage, and would avoid any risk of needing to
copy the whole array.  Also, you could grow in smaller chunks, like
64MB at a time instead of 1GB or more at a time.  Doubling an
allocation that's already 1GB or more gets big in a hurry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Jim Nasby

On 3/9/15 12:28 PM, Alvaro Herrera wrote:

Robert Haas wrote:

On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:

On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.


That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.


Yeah, I'm not sure why everybody wants to repalloc() that instead of
making several separate allocations as needed.  That would avoid
increasing peak memory usage, and would avoid any risk of needing to
copy the whole array.  Also, you could grow in smaller chunks, like
64MB at a time instead of 1GB or more at a time.  Doubling an
allocation that's already 1GB or more gets big in a hurry.


Yeah, a chunk list rather than a single chunk seemed a good idea to me
too.


That will be significantly more code than a simple repalloc, but as long 
as people are OK with that I can go that route.



Also, I think the idea of starting with an allocation assuming some
small number of dead tuples per page made sense -- and by the time that
space has run out, you have a better estimate of actual density of dead
tuples, so you can do the second allocation based on that new estimate
(but perhaps clamp it at say 1 GB, just in case you just scanned a
portion of the table with an unusually high percentage of dead tuples.)


I like the idea of using a fresh idea of dead tuple density when we need 
more space. We would also clamp this at maintenance_work_mem, not a 
fixed 1GB.


Speaking of which... people have referenced allowing  1GB of dead 
tuples, which means allowing maintenance_work_mem  MAX_KILOBYTES. The 
comment for that says:


/* upper limit for GUC variables measured in kilobytes of memory */
/* note that various places assume the byte size fits in a long 
variable */


So I'm not sure how well that will work. I think that needs to be a 
separate patch.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Andres Freund
On 2015-03-09 17:12:22 -0500, Jim Nasby wrote:
 That will be significantly more code than a simple repalloc, but as long as
 people are OK with that I can go that route.

I still would like to see some actual evidence of need for change before
we invest more time/code here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 12:48 AM, Noah Misch wrote:

On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:

Noah Misch n...@leadboat.com writes:

On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.



+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


+1 for repalloc'ing at need, but I'm not sure about the start far below
64 MiB part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).


True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)


A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.


Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


In the case of vacuum, I think we presumably have a pretty good 
indicator of how much space we should need; namely reltuples * 
autovacuum_scale_factor. There shouldn't be too much more space needed 
than that if autovac is keeping up with things.


If we go that route, does it still make sense to explicitly use 
repalloc_huge? It will just cut over to that at some point (128M?) 
anyway, and if you're vacuuming a small relation presumably it's not 
worth messing with.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Andres Freund
On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.

I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 4:49 PM, Andres Freund wrote:

On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.


That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.


That leaves me wondering why we bother with dynamic resizing in other 
areas (like sorts, for example) then? Why not just palloc work_mem and 
be done with it? What makes those cases different?



I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.


Perhaps; Noah seems to be the one one who's seen this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   >