Re: [HACKERS] Sequence Access Method WIP

2016-04-05 Thread Petr Jelinek

On 04/04/16 15:53, Fabrízio de Royes Mello wrote:



On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek > wrote:
 >
 > Hi,
 >
 > new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the
AMs can define their own type.
 >

Yeah... now all works fine... I had a minor issue when apply to the
current master but the attached fix it and I also added tab-complete
support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.



Cool thanks.


I ran all the regression tests and all passed.

I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in
GaplessSequenceState struct?



I missed we have that :)

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-30 Thread Petr Jelinek

Hi,

Thanks for review.

On 30/03/16 15:22, Jose Luis Tallon wrote:

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ 
contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master 
(3063e7a84026ced2aadd2262f75eebbe6240f85b)



Ouch good point, it does show how long the whole sequence am thing has 
been around.



DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and 
VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand the 
creation of a private schema named after the extension, it seems overkill for just a single table.
I would suggest to devote some reserved schema name for internal implementation 
details and/or AM implementation details, if deemed reasonable.
On the other hand, creating the schema/table upon extension installation makes 
the values table use the default tablespace for the database, which can be good 
for concurrency --- direct writes to less loaded storage
(Note that users may want to move this table into an SSD-backed tablespace 
or equivalently fast storage ... moreso when many --not just one-- gapless 
sequences are being used)



Schema is needed for the handler function as well. In general I don't 
want to add another internal schema that will be empty when no sequence 
AM is installed.



Yet, there is <20141203102425.gt2...@alap3.anarazel.de> where Andres argues 
against anything different than one-page-per-sequence implementations.
I guess this patch changes everything in this respect.


Not really, gapless just needs table for transactionality and as such is 
an exception. It does not change how the nontransactional sequence 
storage works though. I agree with Andres on this one.



CODE
   seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 
restart_value,
 bool restart_requested, bool is_init)
   -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid 
reading as "is_initIALIZED"


Sounds good.


DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't 
apply cleanly to current git master.
Please update/rebase the patch and resubmit.



The current version of seqam is 0001-seqam-2016-03-29 which should apply 
correctly.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-30 Thread Jose Luis Tallon
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ 
contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master 
(3063e7a84026ced2aadd2262f75eebbe6240f85b)
It does compile cleanly.

DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and 
VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand 
the creation of a private schema named after the extension, it seems overkill 
for just a single table.
I would suggest to devote some reserved schema name for internal implementation 
details and/or AM implementation details, if deemed reasonable.
On the other hand, creating the schema/table upon extension installation makes 
the values table use the default tablespace for the database, which can be good 
for concurrency --- direct writes to less loaded storage
   (Note that users may want to move this table into an SSD-backed tablespace 
or equivalently fast storage ... moreso when many --not just one-- gapless 
sequences are being used)

Yet, there is <20141203102425.gt2...@alap3.anarazel.de> where Andres argues 
against anything different than one-page-per-sequence implementations.
   I guess this patch changes everything in this respect.

CODE
  seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
bool restart_requested, bool is_init)
  -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to 
avoid reading as "is_initIALIZED"

DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't 
apply cleanly to current git master.
Please update/rebase the patch and resubmit.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek  wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek > > wrote:
>>  >
>>  > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>  >>
>>  >>
>>  >>  >
>>  >>  > Hmm I am unable to reproduce this. What OS? Any special configure
>>  >> flags you use?
>>  >>  >
>>  >>
>>  >> In my environment the error remains with your last patches.
>>  >>
>>  >> I didn't use any special.
>>  >>
>>  >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>>  >> --enable-coverage --enable-tap-tests --enable-depend
>>  >> make -s -j8 install
>>  >> make check-world
>>  >>
>>  >>
>>  >> My environment:
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ uname -a
>>  >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37
UTC
>>  >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ gcc --version
>>  >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>>  >> Copyright (C) 2013 Free Software Foundation, Inc.
>>  >> This is free software; see the source for copying conditions.  There
>> is NO
>>  >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>>  >>
>>  >
>>  > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>>  >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>

Hi,

After some debugging seems that the "seqstate->xid" in "wait_for_sequence"
isn't properly initialized or initialized with the wrong value (1258291200).

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Petr Jelinek

On 29/03/16 22:08, Fabrízio de Royes Mello wrote:



On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek > wrote:
 >
 > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
 >>
 >>
 >>  >
 >>  > Hmm I am unable to reproduce this. What OS? Any special configure
 >> flags you use?
 >>  >
 >>
 >> In my environment the error remains with your last patches.
 >>
 >> I didn't use any special.
 >>
 >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
 >> --enable-coverage --enable-tap-tests --enable-depend
 >> make -s -j8 install
 >> make check-world
 >>
 >>
 >> My environment:
 >>
 >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
 >> $ uname -a
 >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
 >> 2016 x86_64 x86_64 x86_64 GNU/Linux
 >>
 >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
 >> $ gcc --version
 >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
 >> Copyright (C) 2013 Free Software Foundation, Inc.
 >> This is free software; see the source for copying conditions.  There
is NO
 >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
 >>
 >
 > Hmm nothing special indeed, still can't reproduce, I did one blind
try for a fix. Can you test with attached?
 >

Same error... Now I'll leave in a trip but when I arrive I'll try to
figure out what happening debugging the code.



Okay, thanks.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek  wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>>  >
>>  > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>>  >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is
NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?
>

Same error... Now I'll leave in a trip but when I arrive I'll try to figure
out what happening debugging the code.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Petr Jelinek

On 29/03/16 19:46, Fabrízio de Royes Mello wrotez


 >
 > Hmm I am unable to reproduce this. What OS? Any special configure
flags you use?
 >

In my environment the error remains with your last patches.

I didn't use any special.

./configure --prefix=/home/fabrizio/pgsql --enable-cassert
--enable-coverage --enable-tap-tests --enable-depend
make -s -j8 install
make check-world


My environment:

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ uname -a
Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



Hmm nothing special indeed, still can't reproduce, I did one blind try 
for a fix. Can you test with attached?


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 7f42f2b420e2b931e1ca013f3fdeaccf302f3618 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 4 Mar 2016 15:03:44 +0100
Subject: [PATCH 2/2] gapless seq

---
 contrib/Makefile |   1 +
 contrib/gapless_seq/Makefile |  63 
 contrib/gapless_seq/expected/concurrency.out |  31 ++
 contrib/gapless_seq/expected/gapless_seq.out | 145 
 contrib/gapless_seq/gapless_seq--1.0.sql |  57 +++
 contrib/gapless_seq/gapless_seq.c| 530 +++
 contrib/gapless_seq/gapless_seq.control  |   6 +
 contrib/gapless_seq/specs/concurrency.spec   |  29 ++
 contrib/gapless_seq/sql/gapless_seq.sql  |  61 +++
 doc/src/sgml/contrib.sgml|   1 +
 doc/src/sgml/filelist.sgml   |   1 +
 doc/src/sgml/gapless-seq.sgml|  24 ++
 12 files changed, 949 insertions(+)
 create mode 100644 contrib/gapless_seq/Makefile
 create mode 100644 contrib/gapless_seq/expected/concurrency.out
 create mode 100644 contrib/gapless_seq/expected/gapless_seq.out
 create mode 100644 contrib/gapless_seq/gapless_seq--1.0.sql
 create mode 100644 contrib/gapless_seq/gapless_seq.c
 create mode 100644 contrib/gapless_seq/gapless_seq.control
 create mode 100644 contrib/gapless_seq/specs/concurrency.spec
 create mode 100644 contrib/gapless_seq/sql/gapless_seq.sql
 create mode 100644 doc/src/sgml/gapless-seq.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index d12dd63..d2a0620 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -19,6 +19,7 @@ SUBDIRS = \
 		earthdistance	\
 		file_fdw	\
 		fuzzystrmatch	\
+		gapless_seq	\
 		hstore		\
 		intagg		\
 		intarray	\
diff --git a/contrib/gapless_seq/Makefile b/contrib/gapless_seq/Makefile
new file mode 100644
index 000..9378b93
--- /dev/null
+++ b/contrib/gapless_seq/Makefile
@@ -0,0 +1,63 @@
+# contrib/gapless_seq/Makefile
+
+MODULE_big = gapless_seq
+OBJS = gapless_seq.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = gapless_seq
+DATA = gapless_seq--1.0.sql
+
+EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/gapless_seq
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check: regresscheck isolationcheck
+
+submake-regress:
+	$(MAKE) -C $(top_builddir)/src/test/regress all
+
+submake-isolation:
+	$(MAKE) -C $(top_builddir)/src/test/isolation all
+
+submake-gapless_seq:
+	$(MAKE) -C $(top_builddir)/contrib/gapless_seq
+
+REGRESSCHECKS=gapless_seq
+
+regresscheck: all | submake-regress submake-gapless_seq temp-install
+	$(MKDIR_P) regression_output
+	$(pg_regress_check) \
+	--temp-instance=./tmp_check \
+	--outputdir=./regression_output \
+	$(REGRESSCHECKS)
+
+regresscheck-install-force: | submake-regress submake-gapless_seq temp-install
+	$(pg_regress_installcheck) \
+	$(REGRESSCHECKS)
+
+ISOLATIONCHECKS=concurrency
+
+isolationcheck: all | submake-isolation submake-gapless_seq temp-install
+	$(MKDIR_P) isolation_output
+	$(pg_isolation_regress_check) \
+	--outputdir=./isolation_output \
+	$(ISOLATIONCHECKS)
+
+isolationcheck-install-force: all | submake-isolation submake-gapless_seq temp-install
+	$(pg_isolation_regress_installcheck) \
+	$(ISOLATIONCHECKS)
+
+PHONY: submake-gapless_seq submake-regress check \
+	regresscheck regresscheck-install-force \
+	isolationcheck isolationcheck-install-force
+
+temp-install: EXTRA_INSTALL=contrib/gapless_seq
diff --git a/contrib/gapless_seq/expected/concurrency.out b/contrib/gapless_seq/expected/concurrency.out
new file mode 100644
index 000..ec6a098
--- /dev/null
+++ b/contrib/gapless_seq/expected/concurrency.out
@@ -0,0 +1,31 @@
+Parsed 

Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek  wrote:
>
> On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 12:25 PM, David Steele > > wrote:
>>  >
>>  > Hi Petr,
>>  >
>>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>>  >
>>  >> fabrizio@bagual:~/pgsql
>>  >> $ bin/pg_dump > /tmp/fabrizio.sql
>>  >> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
>>  >> does not exist
>>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN
i...
>>  >> ^
>>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name,
start_value,
>>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =
>>  >> 9223372036854775807 THEN NULL  WHEN increment_by < 0 AND
max_value =
>>  >> -1 THEN NULL  ELSE max_value END AS max_value, CASE WHEN
>>  >> increment_by > 0 AND min_value = 1 THEN NULL  WHEN increment_by
< 0
>>  >> AND min_value = -9223372036854775807 THEN NULL  ELSE min_value
END
>>  >> AS min_value, cache_value, is_cycled FROM x
>>  >
>>  >
>>  > It looks like a new patch is needed.  I've marked this "waiting on
>> author".
>>  >
>>
>
> Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..
>

Now all applies without errors, build and "make check" too.



>> But there are other issue in the gapless-seq extension when I ran "make
>> check":
>>
>>   43   CREATE EXTENSION gapless_seq;
>>   44   CREATE SEQUENCE test_gapless USING gapless;
>>   45   SELECT nextval('test_gapless'::regclass);
>>   46 ! ERROR:  could not access status of transaction 1275068416
>>   47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>   48   BEGIN;
>>   49 SELECT nextval('test_gapless'::regclass);
>>   50 ! ERROR:  could not access status of transaction 1275068416
>>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>   52 SELECT nextval('test_gapless'::regclass);
>>   53 ! ERROR:  current transaction is aborted, commands ignored until
>> end of transaction block
>>   54 SELECT nextval('test_gapless'::regclass);
>>   55 ! ERROR:  current transaction is aborted, commands ignored until
>> end of transaction block
>>   56   ROLLBACK;
>>   57   SELECT nextval('test_gapless'::regclass);
>>   58 ! ERROR:  could not access status of transaction 1275068416
>>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>
>>
>> And I see the same running manually:
>>
>> fabrizio=# create extension gapless_seq;
>> CREATE EXTENSION
>> fabrizio=# create sequence x using gapless;
>> CREATE SEQUENCE
>> fabrizio=# select nextval('x');
>> ERROR:  could not access status of transaction 1258291200
>> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or
directory.
>>
>> Regards,
>>
>
> Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?
>

In my environment the error remains with your last patches.

I didn't use any special.

./configure --prefix=/home/fabrizio/pgsql --enable-cassert
--enable-coverage --enable-tap-tests --enable-depend
make -s -j8 install
make check-world


My environment:

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ uname -a
Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread David Steele

Hi Petr,

On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:


fabrizio@bagual:~/pgsql
$ bin/pg_dump > /tmp/fabrizio.sql
pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
does not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL  WHEN increment_by < 0 AND max_value =
-1 THEN NULL  ELSE max_value END AS max_value, CASE WHEN
increment_by > 0 AND min_value = 1 THEN NULL  WHEN increment_by < 0
AND min_value = -9223372036854775807 THEN NULL  ELSE min_value END
AS min_value, cache_value, is_cycled FROM x


It looks like a new patch is needed.  I've marked this "waiting on author".

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-28 Thread Fabrízio de Royes Mello
On Thu, Mar 24, 2016 at 6:12 PM, Petr Jelinek  wrote:
>
> Hi,
>
> I rebased this on top of the recently committed CREATE ACCESS METHOD.
>

Hi,

I got the above error trying to apply to the current master:

$ git apply /home/fabrizio/Downloads/0001-seqam-2016-03-24.patch
error: patch failed: src/backend/commands/amcmds.c:29
error: src/backend/commands/amcmds.c: patch does not apply


There are a wrong definition at the beginning of the amcmds.c:

 34 <<< ours
 35 static Oid  lookup_index_am_handler_func(List *handler_name, char
amtype);
 36 static const char *get_am_type_string(char amtype);
 37 ===
 38 static Oid  lookup_am_handler_func(List *handler_name, char amtype);
 39 static char *get_am_type_string(char amtype);
 40 >>> theirs


After this small fix I can build and ran regress tests without errors.

But running "check-world" I got the error:

make[1]: Leaving directory `/data/postgresql/src/test/regress'
make: Leaving directory `/data/postgresql'
+ pg_dumpall -f /data/postgresql/src/bin/pg_upgrade/tmp_check/dump1.sql
ok 9 - dropuser foobar1 exit code 0
ok 10 - SQL DROP ROLE run: SQL found in server log
ok 11 - fails with nonexistent user
ok
t/080_pg_isready.pl ...
1..10
ok 1 - pg_isready --help exit code 0
ok 2 - pg_isready --help goes to stdout
ok 3 - pg_isready --help nothing to stderr
ok 4 - pg_isready --version exit code 0
ok 5 - pg_isready --version goes to stdout
ok 6 - pg_isready --version nothing to stderr
ok 7 - pg_isready with invalid option nonzero exit code
ok 8 - pg_isready with invalid option prints error message
ok 9 - fails with no server running
pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
   ^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL  WHEN increment_by < 0 AND max_value = -1
THEN NULL  ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL  WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL  ELSE min_value END AS min_value,
cache_value, is_cycled FROM check_seq
pg_dumpall: pg_dump failed on database "regression", exiting
+ pg_dumpall1_status=1
+ [ /data/postgresql != /data/postgresql ]
+
/data/postgresql/src/bin/pg_upgrade/tmp_check/install//home/fabrizio/pgsql/bin/pg_ctl
-m fast stop
waiting for server to shut down done
server stopped
+ [ -n  ]
+ [ -n  ]
+ [ -n 1 ]
+ echo pg_dumpall of pre-upgrade database cluster failed
pg_dumpall of pre-upgrade database cluster failed
+ exit 1
+ rm -rf /tmp/pg_upgrade_check-3NUa0X
make[2]: *** [check] Error 1
make[2]: Leaving directory `/data/postgresql/src/bin/pg_upgrade'
make[1]: *** [check-pg_upgrade-recurse] Error 2
make[1]: *** Waiting for unfinished jobs



And testing pg_dump itself I got the same error trying to dump a database
that contains a sequence.

fabrizio=# create sequence x;
CREATE SEQUENCE
fabrizio=# \ds
  List of relations
 Schema | Name |   Type   |  Owner
+--+--+--
 public | x| sequence | fabrizio
(1 row)

fabrizio=# \d x
  Sequence "public.x"
Column|   Type|Value
--+---+-
 start_value  | bigint| 1
 increment_by | bigint| 1
 max_value| bigint| 9223372036854775807
 min_value| bigint| 1
 cache_value  | bigint| 1
 is_cycled| boolean   | f
 amstate  | seqam_local_state | (1,f,0)
Access Method: local

fabrizio=# \q

fabrizio@bagual:~/pgsql
$ bin/pg_dump > /tmp/fabrizio.sql
pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
   ^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL  WHEN increment_by < 0 AND max_value = -1
THEN NULL  ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL  WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL  ELSE min_value END AS min_value,
cache_value, is_cycled FROM x


Toninght I'll review some parts of the code.

Regards,


Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Sequence Access Method WIP

2016-03-20 Thread Petr Jelinek

On 19/03/16 23:02, Alvaro Herrera wrote:

Petr Jelinek wrote:


And finally the 0003-gapless-seq is example contrib module that implements
dependably and transitionally safe gapless sequence access method. It's
obviously slow as it has to do locking and basically serialize all the
changes to sequence so only one transaction may use it at a time but it's
truly gapless. It also serves as an example of use of the api and as a test.


I'm trying to figure out this one, and I think I found something very
surprising.  This code contains this

+#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
+#define VALUES_TABLE_NAME "seqam_gapless_values"

which as far as I understand is something like a side table where values
for all the sequences are stored.  Is that right?  If it is, I admit I
didn't realize that these sequences worked in this way.  This seems
broken to me.  I had been under the impression that the values were
always stored in the sequence's relation.  Maybe I'm misunderstanding;
please explain how this works.



Yes, I did explain in the original submission that added gapless 
sequence, which is about million messages upthread so it's 
understandable that it's hard to follow.


The sequence am API is non-transactional (and does not use MVCC) as 
sequences are non-trasactional and this provides significant performance 
boost. But the gapless sequence needs to handle things like 
subtransactions and rollbacks so it needs MVCC, for this reasons it also 
uses external table to handle that, the locking and stuff is still 
handled using normal sequence api but the MVCC part is handled by side 
table yes.


Just as a note to you and anybody who wasn't following in the beginning 
of this quite long thread - one of the goals of the sequence am api was 
to abstract the actual storage and WAL logging of the sequences away 
from the extension, so there is no generic WAL or anything like the 
index am provides here. At this point though, it might be worth 
reevaluating that approach if the generic WAL gets in.




FWIW I find it annoying that this submission's 0001 patch and
Alexander's 0001 have so much in common, yet they aren't similar enough
to consider that either is the definite form.  Also, you have the SGML
docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
versions isn't trivial.


Well it's relatively annoying for the patch author as well. I tried to 
write it so that it's as easy to merge as possible - the common code is 
in my 0001, except for the SQL (gram.y, docs) because the SQL does not 
have any meaning until either indexam or seqeunceam gets in (which also 
makes it impossible to submit as separate infrastructure patch, because 
there is no point of having the code in without the stuff that either 
indexam or sequenceam provides on top of it).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-19 Thread David Steele
On 3/4/16 11:09 PM, Petr Jelinek wrote:

> But first here is updated patch for sequence access methods. I went with
> the previously discussed custom type as this gives us proper control
> over the width of the state and making sure that it's not gonna be
> toastable, etc and we need this as sequence is limited to single page.
> 
> Attached are 3 separate files. The first one (0001-create-am) is mainly
> separate for the reason that there is some overlap with Alexander's
> index access methods patch, so I extracted common code into separate
> patch as it will make it easier to merge in case we are lucky enough to
> get these patches in (but it's still based on Alexander's code). It
> provides infra for defining new access methods from SQL, although
> without the parser and without any actual access methods.
> 
> The 0002-seqam provides the SQL interface and support for sequence
> access methods on top of the infra patch, and also provides all the
> sequence access methods infrastructure and abstraction and documentation.
> 
> And finally the 0003-gapless-seq is example contrib module that
> implements dependably and transitionally safe gapless sequence access
> method. It's obviously slow as it has to do locking and basically
> serialize all the changes to sequence so only one transaction may use it
> at a time but it's truly gapless. It also serves as an example of use of
> the api and as a test.

As far as I can see Petr has addressed all the outstanding issues in
this patch and it's ready for a review.  The patch applies with some
easily-fixed conflicts:

$ git apply -3 ../other/0002-seqam-2015-03-05.patch
error: patch failed: src/include/catalog/pg_proc.h:5225
Falling back to three-way merge...
Applied patch to 'src/include/catalog/pg_proc.h' with conflicts.

Simon, you are signed up to review.  Do you have an idea of when you'll
be doing that?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-19 Thread Alvaro Herrera
Petr Jelinek wrote:

> And finally the 0003-gapless-seq is example contrib module that implements
> dependably and transitionally safe gapless sequence access method. It's
> obviously slow as it has to do locking and basically serialize all the
> changes to sequence so only one transaction may use it at a time but it's
> truly gapless. It also serves as an example of use of the api and as a test.

I'm trying to figure out this one, and I think I found something very
surprising.  This code contains this

+#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
+#define VALUES_TABLE_NAME "seqam_gapless_values"

which as far as I understand is something like a side table where values
for all the sequences are stored.  Is that right?  If it is, I admit I
didn't realize that these sequences worked in this way.  This seems
broken to me.  I had been under the impression that the values were
always stored in the sequence's relation.  Maybe I'm misunderstanding;
please explain how this works.


FWIW I find it annoying that this submission's 0001 patch and
Alexander's 0001 have so much in common, yet they aren't similar enough
to consider that either is the definite form.  Also, you have the SGML
docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
versions isn't trivial.

Anyway I'm back at reviewing Alexander's 0001, which should be okay as a
basis for this series regardless of any edits I suggest there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-02-16 Thread Alexander Korotkov
On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek  wrote:

> On 29 January 2016 at 23:59, Robert Haas  wrote:
> > On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
> >> Alexander Korotkov  writes:
> >>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> >>> wrote:
>  I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or
> something
>  like that.
> >>
> >>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name
> HANDLER
> >>> handler;", but I don't insist.
> >>
> >> I think that Alvaro's idea is less likely to risk future grammar
> >> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> >> [ CONCURRENTLY ] to make me really wary of variables in the
> statement-name
> >> part of the syntax.
> >
> > Strong +1.  If we put the type of access method immediately after
> > CREATE, I'm almost positive we'll regret it for exactly that reason.
> >
>
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.
>
> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.
>
> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.


I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
index access method extendability.
Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
column amtype of pg_am stores access method type.
http://www.postgresql.org/message-id/capphfdu9gln7kuicwegsp50caamwx8q-jwzbpehc92rvfhz...@mail.gmail.com
It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
HANDLER handler;".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Petr Jelinek
On 29 January 2016 at 23:59, Robert Haas  wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
>> Alexander Korotkov  writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
>>> wrote:
 I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
 like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1.  If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Petr Jelinek
On 30 January 2016 at 13:48, Robert Haas  wrote:
> On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek  wrote:
>> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
>> conflict now, that's why my proposal was different, I didn't want to
>> add more keywords. I think Alvaro's proposal is fine as well.
>
> I missed your proposal, I guess, so please don't take as having any
> position on whether it's better or worse than Alvaro's.  I was only
> intending to vote for the proposition that the type of access method
> should follow the name of the access method.
>

No worries didn't mean it that way.

>> In any case this is slightly premature IMHO as DDL is somewhat unless
>> until we have sequence access methods implementation we can agree on,
>> or the generic WAL logging so that custom indexes can be crash safe.
>
> Generic WAL logging seems like a *great* idea to me.  But I think it's
> largely independent from the access method stuff.  If we have generic
> WAL logging, people can create WAL-logged stuff that is not a new
> access method.  If we have access methods, they can create new access
> methods that are not WAL-logged.  If we have both, then they can
> create WAL-logged access methods which of course is the payoff pitch,
> but I don't see it as necessary or desirable for the two systems to be
> tied together in any way.

Okay, I am only debating the usefulness of DDL for access methods in
the current situation where we only have custom index access methods
which can't create WAL records. In other words trying to nudge people
slightly back towards the actual patch(es).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek  wrote:
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.

I missed your proposal, I guess, so please don't take as having any
position on whether it's better or worse than Alvaro's.  I was only
intending to vote for the proposition that the type of access method
should follow the name of the access method.

> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.

Agreed.

> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

Generic WAL logging seems like a *great* idea to me.  But I think it's
largely independent from the access method stuff.  If we have generic
WAL logging, people can create WAL-logged stuff that is not a new
access method.  If we have access methods, they can create new access
methods that are not WAL-logged.  If we have both, then they can
create WAL-logged access methods which of course is the payoff pitch,
but I don't see it as necessary or desirable for the two systems to be
tied together in any way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Petr Jelinek
On 29 January 2016 at 14:48, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I would guess that the DDL boilterplate should come from Alexander
>> Korotkov's patch, right?  I think a first easy step may be to combine
>> parts both patches so that we get the "amkind" column from this patch
>> and the DDL support from Alexander's patch (means that his proposed
>> command needs a new clause to specify the amkind);
>
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.
>
> If we skimp on this infrastructure then we're just relying on the
> user not to make a typo, which doesn't seem like a good idea.
>

Yeah it has to be completely separate command for both that do
completely different sanity checks. It can't even be called CREATE
INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a
keyword due to preexisting CREATE INDEX/SEQUENCE  commands. The
previous version of the patch which used somewhat different underlying
catalog structure already had DDL and honestly writing the DDL part is
quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there.

> (Agreed though that a reasonable first step would be to add amkind to
> pg_am and make the appropriate adjustments to existing code, ie check
> that amkind is correct when fetching an index handler.  I considered
> putting that into the AM API refactor patch, but desisted.)
>

Well I was wondering how to handle this as well, the submitted patch
currently just adds Asserts, because IMHO the actual ERROR should be
thrown in the DDL not in the code that just uses it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alvaro Herrera  writes:
> I would guess that the DDL boilterplate should come from Alexander
> Korotkov's patch, right?  I think a first easy step may be to combine
> parts both patches so that we get the "amkind" column from this patch
> and the DDL support from Alexander's patch (means that his proposed
> command needs a new clause to specify the amkind);

Uh, what?  Surely we would provide a bespoke command for each possible
sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
that the provided function has the right signature, and then it would put
the correct amkind into the pg_am entry automatically.

If we skimp on this infrastructure then we're just relying on the
user not to make a typo, which doesn't seem like a good idea.

(Agreed though that a reasonable first step would be to add amkind to
pg_am and make the appropriate adjustments to existing code, ie check
that amkind is correct when fetching an index handler.  I considered
putting that into the AM API refactor patch, but desisted.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 18 January 2016 at 09:19, Craig Ringer  wrote:
> > Needs rework after the commit of https://commitfest.postgresql.org/8/336/
> 
> Here is version that applies to current master. There is some work to
> do (mostly cleanup) and the DDL is missing, but that's because I want
> to know what people think about the principle of how it works now and
> if it makes sense to finish it this way (explained in the original
> submission for Jan CF).

I would guess that the DDL boilterplate should come from Alexander
Korotkov's patch, right?  I think a first easy step may be to combine
parts both patches so that we get the "amkind" column from this patch
and the DDL support from Alexander's patch (means that his proposed
command needs a new clause to specify the amkind); then the really tough
stuff in both Alexander's and this patch can be rebased on top of that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane  wrote:
> > Alvaro Herrera  writes:

> > Uh, what?  Surely we would provide a bespoke command for each possible
> > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alvaro Herrera  writes:
>> On 29 January 2016 at 14:48, Tom Lane  wrote:
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.

> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.

That seems okay.  I had the impression you were proposing
CREATE ACCESS METHOD foobar TYPE 'i' USING functionname
or something like that, where it would be totally up to the user that
the amkind matched the function.  That seems unnecessarily error-prone,
not to mention that it would close the door forever on any hope that
we might allow non-superusers to execute such commands.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alexander Korotkov  writes:
> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
> wrote:
>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>> like that.

> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
> handler;", but I don't insist.

I think that Alvaro's idea is less likely to risk future grammar
conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY ] to make me really wary of variables in the statement-name
part of the syntax.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
>> wrote:
>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>> like that.
>
>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>> handler;", but I don't insist.
>
> I think that Alvaro's idea is less likely to risk future grammar
> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
> part of the syntax.

Strong +1.  If we put the type of access method immediately after
CREATE, I'm almost positive we'll regret it for exactly that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alexander Korotkov
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
wrote:

> Petr Jelinek wrote:
> > On 29 January 2016 at 14:48, Tom Lane  wrote:
> > > Alvaro Herrera  writes:
>
> > > Uh, what?  Surely we would provide a bespoke command for each possible
> > > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to
> check
> > > that the provided function has the right signature, and then it would
> put
> > > the correct amkind into the pg_am entry automatically.
>
> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.
>

Ok! Let's nail down the syntax and I can integrate it into my createam
patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
handler;", but I don't insist.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-01-18 Thread Craig Ringer
Needs rework after the commit of https://commitfest.postgresql.org/8/336/
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-18 Thread Craig Ringer
On 1 January 2016 at 07:51, Petr Jelinek  wrote:

>
> Other than that, this is based on the new am api by Alexander Korotkov
> [1]. It extends it by adding another column called amkind to the pg_am
> which can have either value "i" for index or "S" for sequence (same as
> relkind in pg_class for those).
>

This patch will no longer apply after 65c5fcd353 (
http://github.com/postgres/postgres/commit/65c5fcd353) as outlined in
http://www.postgresql.org/message-id/10804.1453077...@sss.pgh.pa.us .

Setting waiting-on-author in the CF app.

The good news is that the commit of the pg_am rework greatly eases the path
of this patch into core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Sequence Access Method WIP

2015-09-16 Thread Thom Brown
On 28 July 2015 at 19:51, Petr Jelinek  wrote:

> On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
>>
>> Petr, is this enough feedback on this patch for this commitfest, or are
>> there some other issues you want to discuss before I mark this as
>> returned?
>>
>>
> You can mark it as returned, I didn't have much time to actually do much
> useful work on this in the current CF.


Is this now dependant on the work Alexander Korotkov is doing on the AM
interface?

Thom


Re: [HACKERS] Sequence Access Method WIP

2015-09-16 Thread Petr Jelinek

On 2015-09-16 13:21, Thom Brown wrote:

On 28 July 2015 at 19:51, Petr Jelinek > wrote:

On 2015-07-28 20:11, Heikki Linnakangas wrote:


Petr, is this enough feedback on this patch for this commitfest,
or are
there some other issues you want to discuss before I mark this
as returned?


You can mark it as returned, I didn't have much time to actually do
much useful work on this in the current CF.


Is this now dependant on the work Alexander Korotkov is doing on the AM
interface?



Yes.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-07-28 Thread Heikki Linnakangas
So, we have this patch in the commitfest again. Let's see where we are, 
and try to find a consensus on what needs to be done before this can be 
committed.


On 06/17/2015 06:51 PM, Petr Jelinek wrote:

On 2015-06-15 11:32, Vik Fearing wrote:

I've been looking at these patches a bit and here are some comments:


Thanks for looking at this.


+1, thanks Vik.


As mentioned upthread, this patch isn't a seamless replacement for
what's already there because of the amdata field.  I wasn't part of the
conversation of FOSDEM unfortunately, and there's not enough information
in this thread to know why this solution is preferred over each seqam
having its own table type with all the columns it needs.  I see that
Heikki is waffling a bit between the two, and I have a fairly strong
opinion that amdata should be split into separate columns.  The patch
already destroys and recreates what it needs when changing access method
via ALTER SEQUENCE, so I don't really see what the problem is.


FOSDEM was just about agreeing that amdata is simpler after we discussed
it with Heikki. Nothing too important you missed there I guess.

I can try to summarize what are the differences:
- amdata is somewhat simpler in terms of code for both init, alter and
DDL, since with custom columns you have to specify them somehow and deal
with them in catalog, also ALTER SEQUENCE USING means that there are
going to be colums marked as deleted which produces needless waste, etc
- amdata make it easier to change the storage model as the tuple
descriptor is same for all sequences
- the separate columns are much nicer from user point of view
- my opinion is that separate columns also more nicely separate state
from options and I think that if we move to separate storage model,
there can be state table per AM which solves the tuple descriptor issue
- there is probably some slight performance benefit to amdata but I
don't think it's big enough to be important

I personally have slight preference to separate columns design, but I am
ok with both ways honestly.


Regarding the amdata issue, I'm also leaning towards set of columns. 
I've felt that way all along, but not very strongly, so I relented at 
some point when Andres felt strongly that a single column would be 
better. But the more I think about it, the more I feel that separate 
columns really would be better. As evidence, I offer this recent thread:


Tom said 
(http://www.postgresql.org/message-id/8739.1436893...@sss.pgh.pa.us):

I really don't see what's wrong with SELECT last_value FROM sequence,
especially since that has worked in every Postgres version since 6.x.
Anyone slightly worried about backwards compatibility wouldn't use
an equivalent function even if we did add one.


If we went with the single amdata column, that would break. Or we'd need 
to leave last_value as a separate column anyway, and leave it unused for 
sequence AMs where it's not applicable. But that's a bit ugly too.


Jim Nasby said in the same thread:

FWIW, I think it'd be better to have a pg_sequences view that's the
equivalent of SELECT * FROM sequence for every sequence in the
database. That would let you get whatever info you needed.


Creating such a view would be difficult if all the sequences have a 
different set of columns. But when you think about it, it's not really 
any better with a single amdata column. You can't easily access the data 
in the amdata column that way either.


Anyway, that's my opinion. Several others have weighed in to support 
separate columns, too, so I think that is the consensus. Separate 
columns it is.



There is no \d command for sequence access methods.  Without querying
pg_seqam directly, how does one discover what's available?


Good point.


Well, you can query pg_seqam. I don't think this deserves a \d command.


On the whole, I think this is a pretty good patchset.  Aside from the
design decision of whether amdata is a single opaque column or a set of
columns, there are only a few things that need to be changed before it's
ready for committer, and those things are mostly documentation.


Unfortunately the amdata being opaque vs set of columns is the main
issue here.


There was discussion on another thread on how the current sequence AM 
API is modeled after the indexam API, at 
http://www.postgresql.org/message-id/3896.1437059...@sss.pgh.pa.us. Will 
need to do something about that too.


Petr, is this enough feedback on this patch for this commitfest, or are 
there some other issues you want to discuss before I mark this as returned?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-07-28 Thread Petr Jelinek

On 2015-07-28 20:11, Heikki Linnakangas wrote:


Petr, is this enough feedback on this patch for this commitfest, or are
there some other issues you want to discuss before I mark this as returned?



You can mark it as returned, I didn't have much time to actually do much 
useful work on this in the current CF.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-06-17 Thread Petr Jelinek

On 2015-06-15 11:32, Vik Fearing wrote:

I've been looking at these patches a bit and here are some comments:



Thanks for looking at this.


Patch 1: seqam

I would like to see an example in the docs for CREATE SEQUENCE.  That's
perhaps not possible (or desirable) with only the local seqam?  Not sure.



It is possible to have example with local seqam, it might be confusing 
though, given it produces same results as not putting USING in the query.



In the docs for pg_class, there is no mention that relam refers to
pg_seqam for sequences but pg_am for all other types.

There are errant half-sentences in the documentation, for example to
the options in the CREATE SEQUENCE or ALTER SEQUENCE statement. in
Sequence Access Method Functions.


I think that's the side effect of all the rebases and rewrites over the 
2y(!) that this has been going forward and back. It can be easily fixed 
by proof reading before final submission. I didn't pay too much 
attention yet because it's not clear how the docs should look like if 
there is no real agreement on the api. (This applies to other comments 
about docs as well)




I'd prefer a README instead of the long comment at the start of seqam.c.
  The other ams do that.



OK, since things have been moved to separate directory, README is 
doable, I personally prefer the docs in the main .c file usually but I 
know project uses README sometimes for this.



As mentioned upthread, this patch isn't a seamless replacement for
what's already there because of the amdata field.  I wasn't part of the
conversation of FOSDEM unfortunately, and there's not enough information
in this thread to know why this solution is preferred over each seqam
having its own table type with all the columns it needs.  I see that
Heikki is waffling a bit between the two, and I have a fairly strong
opinion that amdata should be split into separate columns.  The patch
already destroys and recreates what it needs when changing access method
via ALTER SEQUENCE, so I don't really see what the problem is.



FOSDEM was just about agreeing that amdata is simpler after we discussed 
it with Heikki. Nothing too important you missed there I guess.


I can try to summarize what are the differences:
- amdata is somewhat simpler in terms of code for both init, alter and 
DDL, since with custom columns you have to specify them somehow and deal 
with them in catalog, also ALTER SEQUENCE USING means that there are 
going to be colums marked as deleted which produces needless waste, etc
- amdata make it easier to change the storage model as the tuple 
descriptor is same for all sequences

- the separate columns are much nicer from user point of view
- my opinion is that separate columns also more nicely separate state 
from options and I think that if we move to separate storage model, 
there can be state table per AM which solves the tuple descriptor issue
- there is probably some slight performance benefit to amdata but I 
don't think it's big enough to be important


I personally have slight preference to separate columns design, but I am 
ok with both ways honestly.




There is no \d command for sequence access methods.  Without querying
pg_seqam directly, how does one discover what's available?



Good point.



Patch 2: seqam ddl

When defining a new access method for sequences, it is possible to list
the arguments multiple times (last one wins).  Other defel loops raise
an error if the argument is specified more than once.  I haven't looked
at all of such loops to see if this is the only odd man out or not, but
I prefer the error behavior.



Hmm yeah, there should be error. I think only tsearch doesn't enforce 
errors from the existing stuff, should probably be fixed as well 
(separately of course).




Patch 3: gapless_seq

I really like the idea of having a gapless sequence in contrib.
Although it has big potential to be abused, doing it manually when it's
needed (like for invoices, at least in France) is a major pain.  So big
+1 for including this.



Yeah, people make gapless sequences regardless, it's better to provide 
them one that behaves correctly, also it's quite good test for the API.




It would be nice to be able to specify an access method when declaring a
serial or bigserial column.  This could be a separate patch later on,
though.



The patch originally had GUC for this, but Heikki didn't like it so it's 
left for the future developments.




On the whole, I think this is a pretty good patchset.  Aside from the
design decision of whether amdata is a single opaque column or a set of
columns, there are only a few things that need to be changed before it's
ready for committer, and those things are mostly documentation.



Unfortunately the amdata being opaque vs set of columns is the main 
issue here.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Sequence Access Method WIP

2015-05-13 Thread Petr Jelinek

On 13/05/15 12:56, Heikki Linnakangas wrote:

On 05/13/2015 07:10 AM, Alvaro Herrera wrote:

Heikki, do you have time to go through this at this point?


I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
won't have the time to review this thoroughly enough to commit. Sorry.

We discussed using a single amdata column vs. any number of am-specific
columns. We settled on amdata, but I'm still not 100% convinced that's
the best approach. Just as a data point, this removes the log_cnt field
and moves it into amdata in a non-human-readable format. So for someone
who only uses the local seqam, this just makes things slightly worse.
For more complicated seqam's, it would be even more important to display
the state in a human-readable format. Perhaps it's OK that each seqam
provides its own functions or similar to do that, but I'd like to
revisit that decision.



I do think it's ok for seqam to provide functions that can give you the 
data in human readable form.


I think main argument against custom human readable columns is that it 
will kill any possibility to have common storage for sequences.


There is also additional complexity in the API required for that.

The performance implications are probably small as one could still 
define opaque bytea column and store the data same way a now.



I still don't like the serial_sequenceam GUC. Not sure what to do
instead. Needs some thought.



I think it would be ok if this issue was solved by follow-up patch at 
later time.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-05-13 Thread Heikki Linnakangas

On 05/13/2015 07:10 AM, Alvaro Herrera wrote:

Heikki, do you have time to go through this at this point?


I'm afraid I won't :-(. I did intend to, but looking at the calendar, I 
won't have the time to review this thoroughly enough to commit. Sorry.


I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax 
patch at all yet.


We discussed using a single amdata column vs. any number of am-specific 
columns. We settled on amdata, but I'm still not 100% convinced that's 
the best approach. Just as a data point, this removes the log_cnt field 
and moves it into amdata in a non-human-readable format. So for someone 
who only uses the local seqam, this just makes things slightly worse. 
For more complicated seqam's, it would be even more important to display 
the state in a human-readable format. Perhaps it's OK that each seqam 
provides its own functions or similar to do that, but I'd like to 
revisit that decision.


I still don't like the serial_sequenceam GUC. Not sure what to do 
instead. Needs some thought.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-05-13 Thread Simon Riggs
On 13 May 2015 at 11:56, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 05/13/2015 07:10 AM, Alvaro Herrera wrote:

 Heikki, do you have time to go through this at this point?


 I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
 won't have the time to review this thoroughly enough to commit. Sorry.

 I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax
 patch at all yet.

 We discussed using a single amdata column vs. any number of am-specific
 columns. We settled on amdata, but I'm still not 100% convinced that's the
 best approach. Just as a data point, this removes the log_cnt field and
 moves it into amdata in a non-human-readable format. So for someone who
 only uses the local seqam, this just makes things slightly worse. For more
 complicated seqam's, it would be even more important to display the state
 in a human-readable format. Perhaps it's OK that each seqam provides its
 own functions or similar to do that, but I'd like to revisit that decision.

 I still don't like the serial_sequenceam GUC. Not sure what to do instead.
 Needs some thought.


This has had around 2 years of thought at this point. I don't agree it
needs more thought.

There is one clear use case for this and it is of benefit to many
distributed architectures.

I don't see what calamity will occur if we commit this. If you don't want a
sequence AM, don't ever use this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Sequence Access Method WIP

2015-05-13 Thread Petr Jelinek

On 13/05/15 14:15, Heikki Linnakangas wrote:

I don't see what calamity will occur if we commit this. If you don't
want a
sequence AM, don't ever use this.


I'd like the API to be good for its purpose. Also, I did mention that
the the current patch makes the situation slightly worse for people who
never use this: it makes the log_cnt field non human-readable. That's a
really minor thing, but it shows that it *does* matter how this is
implemented, even if you only ever use the local seq AM.



It definitely does matter.

I don't think we'll find perfect compromise here though, you can either 
do it one way or the other. Trust me it does not make me happy either, I 
like perfect solutions too, but when there is lack of perfect solution I 
prefer the simpler one.


Both of the solutions have drawbacks
 - amdata has opaque blob which does not store data in user visible way 
but that can be worked around by providing function that shows it in 
human readable way (and the dump function for many sequence types 
actually does that).
 - multiple columns basically kill any future ability to unify the 
storage for sequences and also adds complexity, especially around alter 
table (since it means drop/add column and stuff)


But I already wrote both versions anyway so from that perspective it 
does not matter much which part we merge.


(As a side-node I would have preferred to have this discussion earlier 
than 2 days before feature freeze because the current implementation is 
something that we agreed on several months ago so there was plenty of 
time to revisit that decision.)


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-05-13 Thread Heikki Linnakangas

On 05/13/2015 02:12 PM, Simon Riggs wrote:

This has had around 2 years of thought at this point. I don't agree it
needs more thought.


Noted.


There is one clear use case for this and it is of benefit to many
distributed architectures.


Right. What's your point?


I don't see what calamity will occur if we commit this. If you don't want a
sequence AM, don't ever use this.


I'd like the API to be good for its purpose. Also, I did mention that 
the the current patch makes the situation slightly worse for people who 
never use this: it makes the log_cnt field non human-readable. That's a 
really minor thing, but it shows that it *does* matter how this is 
implemented, even if you only ever use the local seq AM.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-22 Thread Petr Jelinek

On 20/04/15 17:50, Heikki Linnakangas wrote:

On 03/15/2015 09:07 PM, Petr Jelinek wrote:

Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.


I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch seqam,
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.


Thanks!



I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.


Yeah, I was thinking about this myself I just liked sending 10 
parameters to the function less than this.




The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.


Hmm yes it should and I am sure it did at some point, must have messed 
it during one of the rebases :(


And it's the reason why we need separate API function.



postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min  max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.


Well then we need to send restart as additional parameter to the init 
function as restart is normally not stored anywhere.


The checking is actually if new value is withing min/max but yes that 
can be done inside sequence.c I guess.




* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance
here.


Right, this is actually more of a relic of the custom columns 
implementation where I didn't want to build the tuple with NULLs for 
columns that might have been specified as NOT NULL, but with the amdata 
approach it can create the tuple safely.




* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.


Yeah that was exactly the reasoning. Helper function works for me (will 
check what Álvaro's suggested, maybe it can be moved somewhere and reused).





seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));

for (i = 0; i  count; i++)
{
if (pg_strcasecmp(keys[i], last_value) == 0)
seq-last_value = DatumGetInt64(DirectFunctionCall1(int8in,

CStringGetDatum(values[i])));
else if (pg_strcasecmp(keys[i], is_called) == 0)
seq-is_called = DatumGetBool(DirectFunctionCall1(boolin,

CStringGetDatum(values[i])));
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg(invalid state key \%s\ for local
sequence,
keys[i])));
}

sequence_save_tuple(seqh, NULL, true);


If that error happens after having already processed a last_value or
is_called entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place. 

Re: [HACKERS] Sequence Access Method WIP

2015-04-20 Thread Heikki Linnakangas

On 03/15/2015 09:07 PM, Petr Jelinek wrote:

Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.


Thanks!

With the patch, pg_class.relam column references to the pg_seqam table 
for sequences, but pg_indexam for indexes. I believe it's the first 
instance where we reuse a foreign key column like that. It's not a 
real foreign key, of course - that wouldn't work with a real foreign key 
at all - but it's a bit strange. That makes me a bit uncomfortable. How 
do others feel about that?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-20 Thread Andres Freund
On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:
 With the patch, pg_class.relam column references to the pg_seqam table for
 sequences, but pg_indexam for indexes. I believe it's the first instance
 where we reuse a foreign key column like that. It's not a real foreign
 key, of course - that wouldn't work with a real foreign key at all - but
 it's a bit strange. That makes me a bit uncomfortable. How do others feel
 about that?

Hm. I'd modeled it more as an extension of the 'relkind' column
mentally. I.e. it further specifies how exactly the relation is
behaving. Given that the field has been added to pg_class and not
pg_index, combined with it not having index in its name, makes me think
that it actually was intended to be used the way it's done in the patch.

It's not the first column that behaves that way btw, at least pg_depend
comes to mind.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-20 Thread Petr Jelinek

On 20/04/15 12:05, Andres Freund wrote:

On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:

With the patch, pg_class.relam column references to the pg_seqam table for
sequences, but pg_indexam for indexes. I believe it's the first instance
where we reuse a foreign key column like that. It's not a real foreign
key, of course - that wouldn't work with a real foreign key at all - but
it's a bit strange. That makes me a bit uncomfortable. How do others feel
about that?


Hm. I'd modeled it more as an extension of the 'relkind' column
mentally. I.e. it further specifies how exactly the relation is
behaving. Given that the field has been added to pg_class and not
pg_index, combined with it not having index in its name, makes me think
that it actually was intended to be used the way it's done in the patch.



That's how I think about it too. It's also short for access method, 
nothing really suggests to me that it should be index specific by design.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-20 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 * The transformations of the arrays in get_state() and set_state() functions
 are a bit complicated. The seqam_get_state() function returns two C arrays,
 and pg_sequence_get_state() turns them into a text[] array. Why not
 construct the text[] array directly in the AM? I guess it's a bit more
 convenient to the AM, if the pg_sequence_get_state() do that, but if that's
 an issue, you could create generic helper function to turn two C string
 arrays into text[], and call that from the AM.

Um, see strlist_to_textarray() in objectaddress.c if you do that.  Maybe
we need some generic place to store that kind of helper function.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-02-17 Thread Petr Jelinek

On 17/02/15 23:11, Robert Haas wrote:

On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek p...@2ndquadrant.com wrote:

sending new version that is updated along the lines of what we discussed at
FOSDEM, which means:

- back to single bytea amdata column (no custom columns)




Well, the main argument is still future possibility of moving into 
single table for storage. And when we discussed about it in person we 
agreed that there is not too big advantage in having separate columns 
since there need to be dump/restore state interfaces anyway so you can 
see the relevant state via those as we made the output more human 
readable (and the psql support reflects that).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-02-17 Thread Petr Jelinek

On 18/02/15 02:59, Petr Jelinek wrote:

On 17/02/15 23:11, Robert Haas wrote:

On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

sending new version that is updated along the lines of what we
discussed at
FOSDEM, which means:

- back to single bytea amdata column (no custom columns)




Well, the main argument is still future possibility of moving into
single table for storage. And when we discussed about it in person we
agreed that there is not too big advantage in having separate columns
since there need to be dump/restore state interfaces anyway so you can
see the relevant state via those as we made the output more human
readable (and the psql support reflects that).



Also makes the patch a little simpler obviously.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-02-17 Thread Robert Haas
On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 sending new version that is updated along the lines of what we discussed at
 FOSDEM, which means:

 - back to single bytea amdata column (no custom columns)

Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-28 Thread Heikki Linnakangas

On 01/23/2015 02:34 AM, Petr Jelinek wrote:

On 22/01/15 17:02, Petr Jelinek wrote:


The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...


Here it is as promised.


I generally like the division of labour between the seqam 
implementations and the API now.


I don't like the default_sequenceam GUC. That seems likely to create 
confusion. And it's not really enough for something like a remote 
sequence AM anyway that definitely needs some extra reloptions, like the 
hostname of the remote server. The default should be 'local', unless you 
specify something else with CREATE SEQUENCE USING - no GUCs.


Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number 
instead of checking whether pg_sequence_dump_state() function exists. 
That's what we usually do when new features are introduced. And there's 
actually a bug there: you have the check backwards. (try dumping a 
database with any sequences in it; it fails)


* pg_dump should not output a USING clause when a sequence uses the 
'local' sequence. No point in adding such noise - making the SQL command 
not standard-compatible - for the 99% of databases that don't use other 
sequence AMs.


* Be careful to escape all strings correctly in pg_dump. I think you're 
missing escaping for the 'state' variable at least.


In sequence_save_tuple:

else
{
/*
 * New tuple was not sent, so the original tuple was probably 
just
 * changed inline, all we need to do is mark the buffer dirty 
and
 * optionally log the update tuple.
 */
START_CRIT_SECTION();

MarkBufferDirty(seqh-buf);

if (do_wal)
log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, 
page);

END_CRIT_SECTION();
}


This is wrong when checksums are enabled and !do_wal. I believe this 
should use MarkBufferDirtyHint().



Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.


Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of 
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the 
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, 
updating a row in a table for every nextval() call is pretty darn 
expensive. But I don't actually see a problem with dump and restore.


Can you rewrite it without using the values table? AFAICS, there are a 
two of problems to solve:


1. If the top-level transaction aborts, you need to restore the old 
value. I suggest keeping two values in the sequence tuple: the old, 
definitely committed one, and the last value. The last value is only 
considered current if the associated XID committed; otherwise the old 
value is current. When a transaction is about to commit, it stores its 
top-level XID and the new value in the last field, and copies the 
previously current value to the old field.


2. You need to track the last value on a per-subtransaction basis, until 
the transaction commits/rolls back, in order to have rollback to 
savepoint to retreat the sequence's value. That can be done in 
backend-private memory, maintaining a stack of subtransactions and the 
last value of each. Use RegisterSubXactCallback to hook into subxact 
commit/abort. Just before top-level commit (in pre-commit callback), 
update the sequence tuple with the latest value, so that it gets 
WAL-logged before the commit record.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-28 Thread Petr Jelinek

On 28/01/15 18:09, Heikki Linnakangas wrote:

On 01/23/2015 02:34 AM, Petr Jelinek wrote:

On 22/01/15 17:02, Petr Jelinek wrote:


The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...


Here it is as promised.


I generally like the division of labour between the seqam
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create
confusion. And it's not really enough for something like a remote
sequence AM anyway that definitely needs some extra reloptions, like the
hostname of the remote server. The default should be 'local', unless you
specify something else with CREATE SEQUENCE USING - no GUCs.



Hmm, I am not too happy about this, I want SERIAL to work with custom 
sequences (as long as it's possible for the sequence to work that way at 
least). That's actually important feature for me. Your argument about 
this being potential problem for some sequenceAMs is valid though.



Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number
instead of checking whether pg_sequence_dump_state() function exists.
That's what we usually do when new features are introduced. And there's
actually a bug there: you have the check backwards. (try dumping a
database with any sequences in it; it fails)


Right.


* pg_dump should not output a USING clause when a sequence uses the
'local' sequence. No point in adding such noise - making the SQL command
not standard-compatible - for the 99% of databases that don't use other
sequence AMs.


Well this only works if we remove the GUC. Because otherwise if GUC is 
there then you always need to either add USING clause or set the GUC in 
advance (like we do with search_path for example).



* Be careful to escape all strings correctly in pg_dump. I think you're
missing escaping for the 'state' variable at least.


Ouch.


In sequence_save_tuple:

else
{
/*
 * New tuple was not sent, so the original tuple was probably
just
 * changed inline, all we need to do is mark the buffer dirty and
 * optionally log the update tuple.
 */
START_CRIT_SECTION();

MarkBufferDirty(seqh-buf);

if (do_wal)
log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, page);

END_CRIT_SECTION();
}


This is wrong when checksums are enabled and !do_wal. I believe this
should use MarkBufferDirtyHint().



Oh ok, didn't realize.


Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.


Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,


Yeah but at worst there are some unused rows there, it's not too bad. I 
could also create relation per sequence so that dependency code would 
handle everything correctly, but seems bit too expensive to create not 
one but two relations per sequence...



updating a row in a table for every nextval() call is pretty darn
expensive.


Yes it's expensive, but the gapless sequence also serializes access so 
it will never be speed daemon.



But I don't actually see a problem with dump and restore.


The problem is that the tuple stored in the sequence relation is always 
the one with latest state (and with frozen xid), so pg_dump has no way 
of getting the value as it was at the time it took the snapshot. This 
means that after dump/restore cycle, the sequence can be further away 
than the table it's attached to and you end up with a gap there.




Can you rewrite it without using the values table? AFAICS, there are a
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old
value. I suggest keeping two values in the sequence tuple: the old,
definitely committed one, and the last value. The last value is only
considered current if the associated XID committed; otherwise the old
value is current. When a transaction is about to commit, it stores its
top-level XID and the new value in the last field, and copies the
previously current value to the old field.



Yes, this is how the previous implementation worked.


2. You need to track the last value on a per-subtransaction basis, until
the transaction commits/rolls back, in order to have rollback to
savepoint to retreat the sequence's value. That can be done in
backend-private memory, maintaining a stack of subtransactions and the
last value of each. Use RegisterSubXactCallback to hook into subxact
commit/abort. Just before top-level commit (in pre-commit callback),
update the sequence tuple with the latest value, so that it gets
WAL-logged 

Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Heikki Linnakangas

On 01/12/2015 11:33 PM, Petr Jelinek wrote:

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.


Do we need DDL commands for this at all? I could go either way on that 
question. We recently had a discussion on that wrt. index access methods 
[1], and Tom opined that providing DDL for creating index access methods 
is not worth it. The extension can just insert the rows into pg_seqam 
with INSERT. Do we expect sequence access methods as extensions to be 
more popular than index access methods? Maybe, because the WAL-logging 
problem doesn't exist. But OTOH, if you're writing something like a 
replication system that needs global sequences as part of it, there 
aren't that many of those, and the installation scripts will need to 
deal with more complicated stuff than inserting a row in pg_seqam.


[1] http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us


And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.


I'll take a look at that to see how the API works, but we're not going 
to include it in the source tree, because it doesn't actually guarantee 
gaplessness. That makes it a pretty dangerous example. I'm sure we can 
come up with a better example that might even be useful. How about a 
Lamport's clock sequence, which advances once per second, in addition to 
when anyone calls nextval() ? Or a remote sequence that uses an FDW to 
call nextval() in a foreign server?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Petr Jelinek

On 22/01/15 16:50, Heikki Linnakangas wrote:

On 01/12/2015 11:33 PM, Petr Jelinek wrote:

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.


Do we need DDL commands for this at all? I could go either way on that
question. We recently had a discussion on that wrt. index access methods
[1], and Tom opined that providing DDL for creating index access methods
is not worth it. The extension can just insert the rows into pg_seqam
with INSERT. Do we expect sequence access methods as extensions to be
more popular than index access methods? Maybe, because the WAL-logging
problem doesn't exist. But OTOH, if you're writing something like a
replication system that needs global sequences as part of it, there
aren't that many of those, and the installation scripts will need to
deal with more complicated stuff than inserting a row in pg_seqam.


I don't know about popularity, and I've seen the discussion about 
indexes. The motivation for DDL for me was handling dependencies 
correctly, that's all. If we say we don't care about that (and allow 
DROP EXTENSION even though user has sequences that are using the AM) 
then we don't need DDL.




[1] http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us


And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.


I'll take a look at that to see how the API works, but we're not going
to include it in the source tree, because it doesn't actually guarantee
gaplessness. That makes it a pretty dangerous example. I'm sure we can
come up with a better example that might even be useful. How about a
Lamport's clock sequence, which advances once per second, in addition to
when anyone calls nextval() ? Or a remote sequence that uses an FDW to
call nextval() in a foreign server?



I have updated patch ready, just didn't submit it because I am otherwise 
busy this week, I hope to get to it today evening or tomorrow morning, 
so I'd wait until that with looking at the patch.


The new version (the one that is not submitted yet) of gapless sequence 
is way more ugly and probably not best example either but does guarantee 
gaplessness (it stores the last value in it's own value table). So I am 
not sure if it should be included either...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 10:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/12/2015 11:33 PM, Petr Jelinek wrote:
 Second patch adds DDL support. I originally wanted to make it
 CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
 a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
 which does not need to change anything (besides adding METHOD to
 unreserved keywords).
 The DDL support uses the DefineStmt infra with some very small change as
 the sequence ams are not schema qualified, but I think it's acceptable
 and saves considerable amount of boilerplate.

 Do we need DDL commands for this at all? I could go either way on that
 question. We recently had a discussion on that wrt. index access methods
 [1], and Tom opined that providing DDL for creating index access methods is
 not worth it. The extension can just insert the rows into pg_seqam with
 INSERT. Do we expect sequence access methods as extensions to be more
 popular than index access methods? Maybe, because the WAL-logging problem
 doesn't exist. But OTOH, if you're writing something like a replication
 system that needs global sequences as part of it, there aren't that many of
 those, and the installation scripts will need to deal with more complicated
 stuff than inserting a row in pg_seqam.

I think the main reason we don't need DDL for pg_am is because there's
no real hope of anybody actually inserting anything in there whether
we have a command for it or not.  If we actually expect this to be
used, I think it should probably have some kind of DDL, because
otherwise what will pg_dump emit?  Nothing is bad because then dumps
won't restore, and direct inserts to pg_seqam doesn't seem great
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-13 Thread Petr Jelinek

On 13/01/15 13:24, Tomas Vondra wrote:

On 12.1.2015 22:33, Petr Jelinek wrote:

On 15/12/14 11:36, Petr Jelinek wrote:

On 10/12/14 03:33, Petr Jelinek wrote:

On 24/11/14 12:16, Heikki Linnakangas wrote:

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly


This version does AlterSequence differently and better. Didn't attach
the gapless sequence again as that one is unchanged.




And another version, separated into patch-set of 3 patches.
First patch contains the seqam patch itself, not many changes there,
mainly docs/comments related. What I wrote in the previous email for
version 3 still applies.


I did a review of the first part today - mostly by reading through the
diff. I plan to do a bit more thorough testing in a day or two. I'll
also look at the two (much smaller) patches.



Thanks!


comments/questions/general nitpicking:

(1) Why treating empty string as equal to 'local'? Isn't enforcing the
 actual value a better approach?



Álvaro mentioned on IM also, I already changed it to saner normal GUC 
with 'local' as default value in my working copy



(2) NITPICK: Maybe we could use access_method in the docs (instead of
 sequence_access_method), as the 'sequence' part is clear from the
 context?


Yes.


(3) Why index_reloptions / sequence_reloptions when both do exactly the
 same thing (call to common_am_reloptions)? I'd understand this if
 the patch(es) then change the sequence_reloptions() but that's not
 happening. Maybe that's expected to happen?


That's leftover from the original design where AM was supposed to call 
this, since this is not exposed to AM anymore I think it can be single 
function now.




(4) DOCS: Each sequence can only use one access method at a time.

 Does that mean a sequence can change the access method during it's
 lifetime? My understanding is that the AM is fixed after creating
 the sequence?



Oh, I forgot to add ALTER SEQUENCE USING into docs, you can change AM 
even though you probably don't want to do it often, but for migrations 
it's useful.



(8) check_default_seqam without a transaction

  * If we aren't inside a transaction, we cannot do database access so
  * cannot verify the name. Must accept the value on faith.

 In which situation this happens? Wouldn't it be better to simply
 enforce the transaction and fail otherwise?


Reading postgresql.conf during startup, I don't think we want to fail in 
that scenario ;)



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-13 Thread Tomas Vondra
On 12.1.2015 22:33, Petr Jelinek wrote:
 On 15/12/14 11:36, Petr Jelinek wrote:
 On 10/12/14 03:33, Petr Jelinek wrote:
 On 24/11/14 12:16, Heikki Linnakangas wrote:

 About the rough edges:
 - The AlterSequence is not prettiest code around as we now have to
 create new relation when sequence AM is changed and I don't know how to
 do that nicely
 - I am not sure if I did the locking, command order and dependency
 handling in AlterSequence correcly

 This version does AlterSequence differently and better. Didn't attach
 the gapless sequence again as that one is unchanged.


 
 And another version, separated into patch-set of 3 patches.
 First patch contains the seqam patch itself, not many changes there,
 mainly docs/comments related. What I wrote in the previous email for
 version 3 still applies.

I did a review of the first part today - mostly by reading through the
diff. I plan to do a bit more thorough testing in a day or two. I'll
also look at the two (much smaller) patches.

comments/questions/general nitpicking:

(1) Why treating empty string as equal to 'local'? Isn't enforcing the
actual value a better approach?

(2) NITPICK: Maybe we could use access_method in the docs (instead of
sequence_access_method), as the 'sequence' part is clear from the
context?

(3) Why index_reloptions / sequence_reloptions when both do exactly the
same thing (call to common_am_reloptions)? I'd understand this if
the patch(es) then change the sequence_reloptions() but that's not
happening. Maybe that's expected to happen?

(4) DOCS: Each sequence can only use one access method at a time.

Does that mean a sequence can change the access method during it's
lifetime? My understanding is that the AM is fixed after creating
the sequence?

(5) DOCS/NITPICK: SeqAM / SeqAm ... (probably should be the same?)

(6) cache lookup failed for relation %u

I believe this should reference 'sequence access method' instead of
a relation.

(7) seqam_init

I believe this is a bug (not setting nulls[4] but twice nulls[3]):

+   fcinfo.argnull[0] = seqparams == NULL;
+   fcinfo.argnull[1] = reloptions_parsed == NULL;
+   fcinfo.argnull[2] = false;
+   fcinfo.argnull[3] = false;
+   fcinfo.argnull[3] = false;

(8) check_default_seqam without a transaction

 * If we aren't inside a transaction, we cannot do database access so
 * cannot verify the name.  Must accept the value on faith.

In which situation this happens? Wouldn't it be better to simply
enforce the transaction and fail otherwise?

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-15 Thread Petr Jelinek

On 10/12/14 03:33, Petr Jelinek wrote:

On 24/11/14 12:16, Heikki Linnakangas wrote:

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly


This version does AlterSequence differently and better. Didn't attach 
the gapless sequence again as that one is unchanged.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..818da15 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam sequence
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c16b38e..35818c0 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -321,6 +321,7 @@ static bool need_initialization = true;
 static void initialize_reloptions(void);
 static void parse_one_reloption(relopt_value *option, char *text_str,
 	int text_len, bool validate);
+static bytea *common_am_reloptions(RegProcedure amoptions, Datum reloptions, bool validate);
 
 /*
  * initialize_reloptions
@@ -821,7 +822,8 @@ untransformRelOptions(Datum options)
  * instead.
  *
  * tupdesc is pg_class' tuple descriptor.  amoptions is the amoptions regproc
- * in the case of the tuple corresponding to an index, or InvalidOid otherwise.
+ * in the case of the tuple corresponding to an index or sequence, InvalidOid
+ * otherwise.
  */
 bytea *
 extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
@@ -854,6 +856,9 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
 		case RELKIND_INDEX:
 			options = index_reloptions(amoptions, datum, false);
 			break;
+		case RELKIND_SEQUENCE:
+			options = sequence_reloptions(amoptions, datum, false);
+			break;
 		case RELKIND_FOREIGN_TABLE:
 			options = NULL;
 			break;
@@ -1299,13 +1304,31 @@ heap_reloptions(char relkind, Datum reloptions, bool validate)
 
 /*
  * Parse options for indexes.
+ */
+bytea *
+index_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for sequences.
+ */
+bytea *
+sequence_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for indexes or sequences.
  *
  *	amoptions	Oid of option parser
  *	reloptions	options as text[] datum
  *	validate	error flag
  */
-bytea *
-index_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+static bytea *
+common_am_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
 {
 	FmgrInfo	flinfo;
 	FunctionCallInfoData fcinfo;
diff --git a/src/backend/access/sequence/Makefile b/src/backend/access/sequence/Makefile
new file mode 100644
index 000..01a0dc8
--- /dev/null
+++ b/src/backend/access/sequence/Makefile
@@ -0,0 +1,17 @@
+#-
+#
+# Makefile--
+#Makefile for access/sequence
+#
+# IDENTIFICATION
+#src/backend/access/sequence/Makefile
+#
+#-
+
+subdir = src/backend/access/sequence
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = seqam.o seqlocal.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/sequence/seqam.c b/src/backend/access/sequence/seqam.c
new file mode 100644
index 000..ce57f16
--- /dev/null
+++ b/src/backend/access/sequence/seqam.c
@@ -0,0 +1,428 @@
+/*-
+ *
+ * seqam.c
+ *	  sequence access method routines
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/sequence/seqam.c
+ *
+ *
+ * Sequence access method allows the SQL Standard Sequence objects to be
+ * managed according to either the default access method or a pluggable
+ * replacement. Each sequence can only use one access method at a time,
+ * though different sequence access methods can be in use by different
+ * sequences at the same time.
+ *
+ * The SQL Standard assumes that each Sequence object is completely controlled
+ * from the current database node, preventing any form 

Re: [HACKERS] Sequence Access Method WIP

2014-12-04 Thread Jim Nasby

On 12/3/14, 8:50 AM, José Luis Tallón wrote:



May I possibly suggest a file-per-schema model instead? This approach would
certainly solve the excessive i-node consumption problem that --I guess--
Andres is trying to address here.

I don't think that really has any advantages.


Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k 
sequences per relation segment.
Arguably, not many real use cases will have that many sequences save for 
*massively* multi-tenant databases.

The downside being that all that random I/O --- in general, it can't really be sequential 
unless there are very very few sequences--- can't be spread to other spindles. Create a 
sequence_default_tablespace GUC + ALTER SEQUENCE SET TABLESPACE, to use an 
SSD for this purpose maybe?


Why not? RAID arrays typically use stripe sizes in the 128-256k range, which 
means only 16 or 32 sequences per stripe.

It still might make sense to allow controlling what tablespace a sequence is 
in, but IMHO the default should just be pg_default.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-04 Thread Andres Freund
 May I possibly suggest a file-per-schema model instead? This approach would
 certainly solve the excessive i-node consumption problem that --I guess--
 Andres is trying to address here.
 I don't think that really has any advantages.
 
 Just spreading the I/O load, nothing more, it seems:
 
 Just to elaborate a bit on the reasoning, for completeness' sake:
 Given that a relation's segment maximum size is 1GB, we'd have
 (1048576/8)=128k sequences per relation segment.
 Arguably, not many real use cases will have that many sequences save for
 *massively* multi-tenant databases.
 
 The downside being that all that random I/O --- in general, it can't really
 be sequential unless there are very very few sequences--- can't be spread to
 other spindles. Create a sequence_default_tablespace GUC + ALTER SEQUENCE
 SET TABLESPACE, to use an SSD for this purpose maybe?
  (I could take a shot at the patch, if deemed worthwhile)

I think that's just so far outside the sane usecases that I really don't
see us adding complexity to reign it in. If your frequently used
sequences get flushed out to disk by anything but the checkpointer the
schema is just badly designed...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón

On 12/02/2014 08:21 PM, Andres Freund wrote:

[snip]

2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence.


Hmm...  having done my fair bit of testing, I can say that this isn't 
actually that bad (though depends heavily on the underlying filesystem 
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy 
workload... unless all sequence activity is finally WAL-logged and hence 
writes to the actual files become mostly sequential and asynchronous.


May I possibly suggest a file-per-schema model instead? This approach 
would certainly solve the excessive i-node consumption problem that --I 
guess-- Andres is trying to address here.
Moreover, the one file per schema for sequences solution would fit a 
quite common model of grouping tables (in schemas) for physical 
[tablespace] location purposes

Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...


IMHO, this is jst as wasteful as the current approach (one-page file per 
sequence) in terms of disk usage and complicates the code a bit  but 
I really don't see how we can have more than one sequence state per page 
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer 
manager internals could certainly devise a better solution...


Just my 2c

Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread Andres Freund
On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
 On 12/02/2014 08:21 PM, Andres Freund wrote:
 [snip]
 2. Instead of the single amdata field, make it possible for the
 implementation to define any number of fields with any datatype in the
 tuple. That would make debugging, monitoring etc. easier.
 My main problem with that approach is that it pretty much nails the door
 shut for moving sequences into a catalog table instead of the current,
 pretty insane, approach of a physical file per sequence.
 
 Hmm...  having done my fair bit of testing, I can say that this isn't
 actually that bad (though depends heavily on the underlying filesystem and
 workload, of course)
 With this approach, I fear extreme I/O contention with an update-heavy
 workload... unless all sequence activity is finally WAL-logged and hence
 writes to the actual files become mostly sequential and asynchronous.

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.

 May I possibly suggest a file-per-schema model instead? This approach would
 certainly solve the excessive i-node consumption problem that --I guess--
 Andres is trying to address here.

I don't think that really has any advantages.

 Currently, with
 our without seqam, it'd not be all that hard to force it into a catalog,
 taking care to to force each tuple onto a separate page...
 
 IMHO, this is jst as wasteful as the current approach (one-page file per
 sequence) in terms of disk usage and complicates the code a bit  but I
 really don't see how we can have more than one sequence state per page
 without severe (page) locking problems.

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...

 However, someone with deeper knowledge of page pinning and buffer manager
 internals could certainly devise a better solution...

I think there's pretty much no chance of accepting more than one page
per

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread Petr Jelinek

On 02/12/14 20:21, Andres Freund wrote:

On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:

To be clear: I don't think this API is very good for its stated purpose, for
implementing global sequences for use in a cluster. For the reasons I've
mentioned before.  I'd like to see two changes to this proposal:
...
2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.


My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...



I don't know, I think if we decide to change storage format we can do 
serialization/conversion in seqam layer, it does not seem to matter too 
much if the serialization into some opaque type is done in AM itself or 
by the API layer. Or we can have one relation for all sequences in 
single AM, etc. In general I don't think that the custom columns for AM 
approach prohibits future storage changes.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón

On 12/03/2014 11:24 AM, Andres Freund wrote:

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:

snip]

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.


Got it, thank you for the explanation.


May I possibly suggest a file-per-schema model instead? This approach would
certainly solve the excessive i-node consumption problem that --I guess--
Andres is trying to address here.

I don't think that really has any advantages.


Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences save 
for *massively* multi-tenant databases.


The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a sequence_default_tablespace GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?

 (I could take a shot at the patch, if deemed worthwhile)


[snip]

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...


Yes, you are right. I stand corrected.


[snip]
I think there's pretty much no chance of accepting more than one page
per sequence


Definitively.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-02 Thread Andres Freund
On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
 To be clear: I don't think this API is very good for its stated purpose, for
 implementing global sequences for use in a cluster. For the reasons I've
 mentioned before.  I'd like to see two changes to this proposal:
 
 1. Make the AM implementation solely responsible for remembering the last
 value. (if it's a global or remote sequence, the current value might not be
 stored in the local server at all)

I think that reason isn't particularly good. The practical applicability
for such a implementation doesn't seem to be particularly large.

 2. Instead of the single amdata field, make it possible for the
 implementation to define any number of fields with any datatype in the
 tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-24 Thread Heikki Linnakangas

On 11/08/2014 04:21 PM, Simon Riggs wrote:

On 5 November 2014 17:32, Heikki Linnakangas hlinnakan...@vmware.com wrote:


Why does sequence_alloc need the current value? If it's a remote seqam,
the current value is kept in the remote server, and the last value that was
given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't fully
abstracted behind the AM interface, as sequence.c still needs to track it
itself. That's useful for the local AM, of course, and maybe some others,
but for others it's totally useless.


Please bear in mind what seqam is for...

At present it's only use is to provide Global Sequences. There's a few
ways of doing that, but they all look sorta similar.


Ok.


The only other use we thought of was shot down, so producing the
perfect API isn't likely to help anyone. It's really not worth the
effort to produce a better API.


Your argument seems to be that because this API is only used for 
creating global sequences, it's not worth it to make it better for that 
purpose. That doesn't make much sense, so that's probably not what you 
meant. I'm confused.


To be clear: I don't think this API is very good for its stated purpose, 
for implementing global sequences for use in a cluster. For the reasons 
I've mentioned before.  I'd like to see two changes to this proposal:


1. Make the AM implementation solely responsible for remembering the 
last value. (if it's a global or remote sequence, the current value 
might not be stored in the local server at all)


2. Instead of the single amdata field, make it possible for the 
implementation to define any number of fields with any datatype in the 
tuple. That would make debugging, monitoring etc. easier.



BDR doesn't require Global Sequences, nor are Global Sequences
restricted in their use to just BDR - lots of cluster configurations
would want something like this.


Sure. (I don't see how that's relevant to this discussion, however).

(marking this as Returned with Feedback in the commitfest)

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-09 Thread Heikki Linnakangas

On 11/08/2014 01:57 AM, Petr Jelinek wrote:

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?


Call nextval(), and use the value it returns as the starting point for 
the new AM.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-09 Thread Petr Jelinek

On 09/11/14 20:47, Heikki Linnakangas wrote:

On 11/08/2014 01:57 AM, Petr Jelinek wrote:

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?


Call nextval(), and use the value it returns as the starting point for
the new AM.



Hah, so simple, works for me.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-08 Thread Craig Ringer
On 11/08/2014 12:35 AM, Petr Jelinek wrote:

 Do you think it'd be simple to provide a blocking, transactional
 sequence allocator via this API? i.e. gapless sequences, much the same
 as typically implemented with SELECT ... FOR UPDATE on a counter table.

 It might be more digestible standalone, and would be a handy contrib/
 example extension demonstrating use of the API.

 
 Yes I think that's doable (once we iron out the API we can agree on).

Cool; I think that'd be an interesting self-contained example that'd
also have real-world uses ... and provide a nice succinct way to answer
those semi-regular how do I create a sequence that doesn't have holes
questions.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-08 Thread Petr Jelinek

On 08/11/14 03:10, Robert Haas wrote:

On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek p...@2ndquadrant.com wrote:

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?


That seems pretty well impossible to know anyway.  If the pluggable AM
was handing out values at random or in some unpredictable fashion,
there may be no well-defined point where it's safe for the default AM
to resume.  Granted, in the case of replication, it probably is
possible, and maybe that's important enough to be worth catering to.


While this is true, I think 99% of this use-case in practice is about 
converting existing schema with local sequence to some other sequence 
and perhaps fixing schema after people create sequence using wrong AM 
because their default is not what they thought it is.





And obviously, once the last_value is part of the compulsory columns we
again have to WAL log all the time for the use-case which Heikki is using as
model, so it does not help there (just to clear what my point was about).


But I don't know what to do about that.  :-(



Me neither and I don't think this is actually solvable, we either have 
last_value and logcnt as mandatory columns and the central sequence 
server example Heikki is talking about will be impacted by this, or we 
leave those columns to AM implementations and we lose the ability to do 
AM change for majority of cases, and also IMHO most AMs will then have 
to implement their own last_value and logcnt logic anyway.


Honestly, I am still not convinced that the centralized sequence server 
with no local caching is something we should be optimizing for as that 
one will suffer from performance problems anyway. And it can ignore the 
last_value input from postgres if it choses to, so it's not like the 
currently proposed patch forbids implementation of such AMs.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-08 Thread Simon Riggs
On 5 November 2014 17:32, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Why does sequence_alloc need the current value? If it's a remote seqam,
 the current value is kept in the remote server, and the last value that was
 given to this PostgreSQL server is irrelevant.

 That irks me with this API. The method for acquiring a new value isn't fully
 abstracted behind the AM interface, as sequence.c still needs to track it
 itself. That's useful for the local AM, of course, and maybe some others,
 but for others it's totally useless.

Please bear in mind what seqam is for...

At present it's only use is to provide Global Sequences. There's a few
ways of doing that, but they all look sorta similar.

The only other use we thought of was shot down, so producing the
perfect API isn't likely to help anyone. It's really not worth the
effort to produce a better API.

BDR doesn't require Global Sequences, nor are Global Sequences
restricted in their use to just BDR - lots of cluster configurations
would want something like this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-08 Thread Robert Haas
On Sat, Nov 8, 2014 at 6:17 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Honestly, I am still not convinced that the centralized sequence server with
 no local caching is something we should be optimizing for as that one will
 suffer from performance problems anyway. And it can ignore the last_value
 input from postgres if it choses to, so it's not like the currently proposed
 patch forbids implementation of such AMs.

I can buy that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-07 Thread Alvaro Herrera
Craig Ringer wrote:
 On 11/05/2014 05:01 AM, Petr Jelinek wrote:
  I guess I could port BDR sequences to this if it would help (once we
  have bit more solid agreement that the proposed API at least
  theoretically seems ok so that I don't have to rewrite it 10 times if at
  all possible).
 
 Because the BDR sequences rely on all the other BDR machinery I suspect
 that'd be a pretty big thing to review and follow for someone who
 doesn't know the BDR code.
 
 Do you think it'd be simple to provide a blocking, transactional
 sequence allocator via this API? i.e. gapless sequences, much the same
 as typically implemented with SELECT ... FOR UPDATE on a counter table.

I think that would be a useful contrib module too; gapless allocation is
a frequent user need.  If we can build it in some reasonable way with the
seqam API, it'd show the API is good.

But on the other hand, since BDR sequences are the ultimate goal of this
patch, we need to make sure that they can be made to work with whatever
the seqam API ends up being.  It's not necessary for reviewers here to
review that code, but the BDR developers need to say yes, this API
works for us -- just like the slony developers (well, ssinger) did for
logical decoding.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-07 Thread Petr Jelinek

On 06/11/14 11:22, Craig Ringer wrote:

On 11/05/2014 05:01 AM, Petr Jelinek wrote:

I guess I could port BDR sequences to this if it would help (once we
have bit more solid agreement that the proposed API at least
theoretically seems ok so that I don't have to rewrite it 10 times if at
all possible).


Because the BDR sequences rely on all the other BDR machinery I suspect
that'd be a pretty big thing to review and follow for someone who
doesn't know the BDR code.

Do you think it'd be simple to provide a blocking, transactional
sequence allocator via this API? i.e. gapless sequences, much the same
as typically implemented with SELECT ... FOR UPDATE on a counter table.

It might be more digestible standalone, and would be a handy contrib/
example extension demonstrating use of the API.



Yes I think that's doable (once we iron out the API we can agree on).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-07 Thread Petr Jelinek

On 08/11/14 00:57, Petr Jelinek wrote:

On 08/11/14 00:45, Robert Haas wrote:

On Nov 5, 2014, at 5:43 PM, Petr Jelinek p...@2ndquadrant.com wrote:

I don't see how to make that work with ALTER SEQUENCE USING to be
honest and I do care quite a lot about that use-case (I think the
ability to convert the local sequences to 3rd party ones and back
is very important).


What specific problems do you foresee?  There's an issue if something
depends on one of the added sequence columns, but if that is the case
then you had *better* fail.

I think that the debugability value of making extra sequence columns
human-readable is quite high.



My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?



And obviously, once the last_value is part of the compulsory columns we 
again have to WAL log all the time for the use-case which Heikki is 
using as model, so it does not help there (just to clear what my point 
was about).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-07 Thread Robert Haas
On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 My main problem is actually not with having tuple per-seqAM, but more
 with the fact that Heikki does not want to have last_value as compulsory
 column/parameter. How is the new AM then supposed to know where to pick
 up and if it even can pick up?

That seems pretty well impossible to know anyway.  If the pluggable AM
was handing out values at random or in some unpredictable fashion,
there may be no well-defined point where it's safe for the default AM
to resume.  Granted, in the case of replication, it probably is
possible, and maybe that's important enough to be worth catering to.

 And obviously, once the last_value is part of the compulsory columns we
 again have to WAL log all the time for the use-case which Heikki is using as
 model, so it does not help there (just to clear what my point was about).

But I don't know what to do about that.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-06 Thread Craig Ringer
On 11/05/2014 05:01 AM, Petr Jelinek wrote:
 I guess I could port BDR sequences to this if it would help (once we
 have bit more solid agreement that the proposed API at least
 theoretically seems ok so that I don't have to rewrite it 10 times if at
 all possible).

Because the BDR sequences rely on all the other BDR machinery I suspect
that'd be a pretty big thing to review and follow for someone who
doesn't know the BDR code.

Do you think it'd be simple to provide a blocking, transactional
sequence allocator via this API? i.e. gapless sequences, much the same
as typically implemented with SELECT ... FOR UPDATE on a counter table.

It might be more digestible standalone, and would be a handy contrib/
example extension demonstrating use of the API.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-05 Thread Heikki Linnakangas

On 11/04/2014 11:01 PM, Petr Jelinek wrote:

On 04/11/14 13:11, Heikki Linnakangas wrote:

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the local seqam, there are two levels of caching. Each backend
caches some values (per the CACHE value option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the current value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
current value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.


My idea of how this works is - sequence_next handles the allocation and
the level2 caching (WAL logged cache) via amdata if it supports it, or
returns single value if it doesn't - then the WAL will always just write
the 1 value and there will basically be no level2 cache, since it is the
sequence_next who controls how much will be WAL-logged, what backend
asks for is just suggestion.


Hmm, so the AM might return an nallocated value less than the fetch 
value that sequence.c requested? As the patch stands, wouldn't that make 
sequence.c write a WAL record more often?


In fact, if the seqam manages the current value outside the database 
(e.g. a remote seqam that gets the value from another server), 
nextval() never needs to write a WAL record.



The third level caching as you say should be solved by the
sequence_request_update and sequence_update callback - that will enable
the sequence AM to handle this kind of caching asynchronously without
blocking the sequence_next unnecessarily.

That way it's possible to implement many different strategies, from no
cache at all, to three levels of caching, with and without blocking of
calls to sequence_next.


It's nice that asynchronous operation is possible, but that seems like a 
more advanced feature. Surely an approach where you fetch a bunch of 
values when needed is going to be more common. Let's focus on the simple 
synchronous case first, and make sure the API works well for that.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add last_value, log_cnt
and is_called to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.


Wouldn't that somewhat bloat catalog if we had new catalog table for
each sequence AM?


No, that's not what I meant. The number of catalog tables would be the 
same as today. Sequences 

Re: [HACKERS] Sequence Access Method WIP

2014-11-05 Thread Petr Jelinek

On 05/11/14 13:45, Heikki Linnakangas wrote:

On 11/04/2014 11:01 PM, Petr Jelinek wrote:

On 04/11/14 13:11, Heikki Linnakangas wrote:

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by
backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the local seqam, there are two levels of caching. Each backend
caches some values (per the CACHE value option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the current value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
current value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.


My idea of how this works is - sequence_next handles the allocation and
the level2 caching (WAL logged cache) via amdata if it supports it, or
returns single value if it doesn't - then the WAL will always just write
the 1 value and there will basically be no level2 cache, since it is the
sequence_next who controls how much will be WAL-logged, what backend
asks for is just suggestion.


Hmm, so the AM might return an nallocated value less than the fetch
value that sequence.c requested? As the patch stands, wouldn't that make
sequence.c write a WAL record more often?



That's correct, that's also why you usually want to have some form of 
local caching if possible.



In fact, if the seqam manages the current value outside the database
(e.g. a remote seqam that gets the value from another server),
nextval() never needs to write a WAL record.


Sure it does, you need to keep the current state in Postgres also, at 
least the current value so that you can pass correct input to 
sequence_alloc(). And you need to do this in crash-safe way so WAL is 
necessary.


I think sequences will cache in amdata if possible for that type of 
sequence and in case where it's not and it's caching on some external 
server then you'll most likely get bigger overhead from network than WAL 
anyway...





For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it
used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add last_value, log_cnt
and is_called to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.


Wouldn't that somewhat bloat catalog if we had new catalog table for
each sequence AM?


No, that's not what I meant. The number of catalog tables would be the
same as today. Sequences look much like any other relation, with entries
in pg_attribute catalog table for all the attributes for each 

Re: [HACKERS] Sequence Access Method WIP

2014-11-05 Thread Heikki Linnakangas

On 11/05/2014 05:07 PM, Petr Jelinek wrote:

On 05/11/14 13:45, Heikki Linnakangas wrote:

In fact, if the seqam manages the current value outside the database
(e.g. a remote seqam that gets the value from another server),
nextval() never needs to write a WAL record.


Sure it does, you need to keep the current state in Postgres also, at
least the current value so that you can pass correct input to
sequence_alloc(). And you need to do this in crash-safe way so WAL is
necessary.


Why does sequence_alloc need the current value? If it's a remote 
seqam, the current value is kept in the remote server, and the last 
value that was given to this PostgreSQL server is irrelevant.


That irks me with this API. The method for acquiring a new value isn't 
fully abstracted behind the AM interface, as sequence.c still needs to 
track it itself. That's useful for the local AM, of course, and maybe 
some others, but for others it's totally useless.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it
used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add last_value, log_cnt
and is_called to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.


Wouldn't that somewhat bloat catalog if we had new catalog table for
each sequence AM?


No, that's not what I meant. The number of catalog tables would be the
same as today. Sequences look much like any other relation, with entries
in pg_attribute catalog table for all the attributes for each sequence.
Currently, all sequences have the same set of attributes, sequence_name,
last_value and so forth. What I'm proposing is that there would a set of
attributes that are common to all sequences, but in addition to that
there could be any number of AM-specific attributes.


Oh, that's interesting idea, so the AM interfaces would basically return
updated tuple and there would be some description function that returns
tupledesc.


Yeah, something like that.


I am bit worried that this would kill any possibility of
ALTER SEQUENCE USING access_method. Plus I don't think it actually
solves any real problem - serializing the internal C structs into bytea
is not any harder than serializing them into tuple IMHO.


I agree that serialization to bytea isn't that difficult, but it's still 
nicer to work directly with the correct data types. And it makes the 
internal state easily accessible for monitoring and debugging purposes.



It also does not really solve the amdata being dynamic
size issue.


Yes it would. There would not be a single amdata attribute, but the AM
could specify any number of custom attributes, which could be fixed size
or varlen. It would be solely the AM's responsibility to set the values
of those attributes.



That's not the issue I was referring to, I was talking about the page
replacement code which is not as simple now that we have potentially
dynamic size tuple and if tuples were different for different AMs the
code would still have to be able to handle that case. Setting the values
in tuple itself is not too complicated.


I don't see the problem with that. We deal with variable-sized tuples in 
heap pages all the time. The max size of amdata (or the extra 
AM-specific columns) is going to be determined by the block size, though.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-05 Thread Petr Jelinek

On 05/11/14 18:32, Heikki Linnakangas wrote:

On 11/05/2014 05:07 PM, Petr Jelinek wrote:

On 05/11/14 13:45, Heikki Linnakangas wrote:

In fact, if the seqam manages the current value outside the database
(e.g. a remote seqam that gets the value from another server),
nextval() never needs to write a WAL record.


Sure it does, you need to keep the current state in Postgres also, at
least the current value so that you can pass correct input to
sequence_alloc(). And you need to do this in crash-safe way so WAL is
necessary.


Why does sequence_alloc need the current value? If it's a remote
seqam, the current value is kept in the remote server, and the last
value that was given to this PostgreSQL server is irrelevant.


Hmm, I am not sure if I see this usecase as practical TBH, but I also 
don't see fundamental problem with it.



That irks me with this API. The method for acquiring a new value isn't
fully abstracted behind the AM interface, as sequence.c still needs to
track it itself. That's useful for the local AM, of course, and maybe
some others, but for others it's totally useless.


Hmm, I think that kind of abstraction can only be done by passing 
current tuple and returning updated tuple (yes I realize that it's what 
you have been saying basically).


In general it sounds like the level of abstraction you'd want would be 
one where AM cares about everything except the the code that does the 
actual writes to page and WAL (but when to do those would still be 
controlled completely by AM?) and the SQL interface.
I don't see how to make that work with ALTER SEQUENCE USING to be honest 
and I do care quite a lot about that use-case (I think the ability to 
convert the local sequences to 3rd party ones and back is very important).




That's not the issue I was referring to, I was talking about the page
replacement code which is not as simple now that we have potentially
dynamic size tuple and if tuples were different for different AMs the
code would still have to be able to handle that case. Setting the values
in tuple itself is not too complicated.


I don't see the problem with that. We deal with variable-sized tuples in
heap pages all the time. The max size of amdata (or the extra
AM-specific columns) is going to be determined by the block size, though.



Glad to hear that. Yes the limit is block size, I think we can live with 
that at least for the moment...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-04 Thread Heikki Linnakangas

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to 
see how it really works. The local one is too dummy to expose any 
possible issues.



The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c 
still feels a bit funny. sequence.c keeps track of how many values have 
been WAL-logged, and thus usable immediately, but we still call 
sequence_alloc even when using up those already WAL-logged values. If 
you think of using this for something like a centralized sequence server 
in a replication cluster, you certainly don't want to make a call to the 
remote server for every value - you'll want to cache them.


With the local seqam, there are two levels of caching. Each backend 
caches some values (per the CACHE value option in CREATE SEQUENCE). In 
addition to that, the server WAL-logs 32 values at a time. If you have a 
remote seqam, it would most likely add a third cache, but it would 
interact in strange ways with the second cache.


Considering a non-local seqam, the locking is also a bit strange. The 
server keeps the sequence page locked throughout nextval(). But if the 
actual state of the sequence is maintained elsewhere, there's no need to 
serialize the calls to the remote allocator, i.e. the sequence_alloc() 
calls.


I'm not exactly sure what to do about that. One option is to completely 
move the maintenance of the current value, i.e. sequence.last_value, 
to the seqam. That makes sense from an abstraction point of view. For 
example with a remote server managing the sequence, storing the 
current value in the local catalog table makes no sense as it's always 
going to be out-of-date. The local seqam would store it as part of the 
am-private data. However, you would need to move the responsibility of 
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to 
provide an API that the seqam can call to do that. Perhaps just let the 
seqam call heap_inplace_update on the sequence relation.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs, 
with any datatypes. There would be a set of common attributes: 
sequence_name, start_value, cache_value, increment_by, max_value, 
min_value, is_cycled. The local seqam would add last_value, log_cnt 
and is_called to that. A remote seqam that calls out to some other 
server might store the remote server's hostname etc.


There could be a seqam function that returns a TupleDesc with the 
required columns, for example.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Sequence Access Method WIP

2014-11-04 Thread Petr Jelinek

On 04/11/14 13:11, Heikki Linnakangas wrote:

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to
see how it really works. The local one is too dummy to expose any
possible issues.



Yeah, I don't know what that would be, It's hard for me to find 
something that would sever purely demonstration purposes. I guess I 
could port BDR sequences to this if it would help (once we have bit more 
solid agreement that the proposed API at least theoretically seems ok so 
that I don't have to rewrite it 10 times if at all possible).



...
Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the local seqam, there are two levels of caching. Each backend
caches some values (per the CACHE value option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the current value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
current value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.



My idea of how this works is - sequence_next handles the allocation and 
the level2 caching (WAL logged cache) via amdata if it supports it, or 
returns single value if it doesn't - then the WAL will always just write 
the 1 value and there will basically be no level2 cache, since it is the 
sequence_next who controls how much will be WAL-logged, what backend 
asks for is just suggestion.


The third level caching as you say should be solved by the 
sequence_request_update and sequence_update callback - that will enable 
the sequence AM to handle this kind of caching asynchronously without 
blocking the sequence_next unnecessarily.


That way it's possible to implement many different strategies, from no 
cache at all, to three levels of caching, with and without blocking of 
calls to sequence_next.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add last_value, log_cnt
and is_called to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.



Wouldn't that somewhat bloat catalog if we had new catalog table for 
each sequence AM? It also does not really solve the amdata being dynamic 
size issue. I think just dynamic field where AM stores whatever it 
wants is enough (ie the current solution), it's 

Re: [HACKERS] Sequence Access Method WIP

2014-10-13 Thread Petr Jelinek

Hi,

I rewrote the patch with different API along the lines of what was 
discussed.


The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of 
requested values amdata and relevant sequence options like min/max and 
returns new amdata, new current value, number of values allocated and 
also if it needs wal write (that should be returned if amdata has 
changed plus other reasons the AM might have to force the wal update).


sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus 
the amdata and returns amdata (I can imagine some complex sequence AMs 
will want to throw error that setval can't be done on them).


sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then 
call the sequence_update method of an AM with current amdata and will 
write the updated amdata to disk


sequence_seqparams - function to process/validate the standard sequence 
options like start position, min/max, increment by etc by the AM, it's 
called in addition to the standard processing


sequence_reloptions - this is the only thing that remained unchanged 
from previous patch, it's meant to pass custom options to the AM


Only the alloc and reloptions methods are required (and implemented by 
the local AM).


The caching, xlog writing, updating the page, etc is handled by backend, 
the AM does not see the tuple at all. I decided to not pass even the 
struct around and just pass the relevant options because I think if we 
want to abstract the storage properly then the AM should not care about 
how the pg_sequence looks like at all, even if it means that the 
sequence_alloc parameter list is bit long.


For the amdata handling (which is the AM's private data variable) the 
API assumes that (Datum) 0 is NULL, this seems to work well for 
reloptions so should work here also and it simplifies things a little 
compared to passing pointers to pointers around and making sure 
everything is allocated, etc.


Sadly the fact that amdata is not fixed size and can be NULL made the 
page updates of the sequence relation quite more complex that it used to 
be. There are probably some optimizations possible there but I think the 
patch is good enough for the review now, so I am adding it to October 
commitfest.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index c32088f..aea4a14 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,6 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= common gin gist hash heap index nbtree rmgrdesc spgist transam sequence
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index e0b81b9..c5b7e0a 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -306,6 +306,7 @@ static bool need_initialization = true;
 static void initialize_reloptions(void);
 static void parse_one_reloption(relopt_value *option, char *text_str,
 	int text_len, bool validate);
+static bytea *common_am_reloptions(RegProcedure amoptions, Datum reloptions, bool validate);
 
 /*
  * initialize_reloptions
@@ -806,7 +807,8 @@ untransformRelOptions(Datum options)
  * instead.
  *
  * tupdesc is pg_class' tuple descriptor.  amoptions is the amoptions regproc
- * in the case of the tuple corresponding to an index, or InvalidOid otherwise.
+ * in the case of the tuple corresponding to an index or sequence, InvalidOid
+ * otherwise.
  */
 bytea *
 extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
@@ -839,6 +841,9 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
 		case RELKIND_INDEX:
 			options = index_reloptions(amoptions, datum, false);
 			break;
+		case RELKIND_SEQUENCE:
+			options = sequence_reloptions(amoptions, datum, false);
+			break;
 		case RELKIND_FOREIGN_TABLE:
 			options = NULL;
 			break;
@@ -1284,13 +1289,31 @@ heap_reloptions(char relkind, Datum reloptions, bool validate)
 
 /*
  * Parse options for indexes.
+ */
+bytea *
+index_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for sequences.
+ */
+bytea *
+sequence_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for indexes or sequences.
  *
  *	amoptions	Oid of option parser
  *	reloptions	options as text[] 

Re: [HACKERS] Sequence Access Method WIP

2014-09-16 Thread Andres Freund
On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:
 - int64 minv, maxv, incby, bool is_cycled - these are basically options
 giving info about how the new numbers are allocated (I guess some
 implementations are not going to support all of those)
 - bool is_called - the current built-in sequence generator behaves
 differently based on it and I am not sure we can get over it (it could
 perhaps be done in back-end independently of AM?)

I think we might be able to get rid of is_called entirely. Or at least
get rid of it from the view of the AMs.

 Also makes me think that the seqam options interface should also be passed
 the minv/maxv/incby/is_cycled etc options for validation, not just the
 amoptions.

Sup.

BTW: Is 'is_cycled' a horrible name, or is that just me? Horribly easy
to confuse with the fact that a sequence has already wrapped around...

 And it should return:
 
 int64 value - the first value allocated.
 int nvalues - the number of values allocated.
 am_private - updated private data.
 
 
 There is also more needed than this, you need:
 - int64 value - first value allocated (value to be returned)
 - int64 nvalues - number of values allocated
 - int64 last - last cached value (used for cached/last_value)
 - int64 next - last logged value (used for wal logging)
 - am_private - updated private data, must be possible to return as null

 I personally don't like that we need all the nvalues, next and last as
 it makes the seqam a little bit too aware of the sequence logging internals
 in my opinion but I haven't found a way around it - it's impossible for
 backend to know how the AM will act around incby/maxv/minv/cycling so it
 can't really calculate these values by itself, unless ofcourse we fix the
 behavior and require seqams to behave predictably, but that somewhat breaks
 the whole idea of leaving the allocation to the seqam. Obviously it would
 also work to return list of allocated values and then backend could
 calculate the value, nvalues, last, next from that list by itself,
 but I am worried about performance of that approach.

Yea, it's far from pretty.

I'm not convinced that the AM ever needs to/should care about
caching. To me that's more like a generic behaviour. So it probably
should be abstracted away from the individual AMs.

I think the allocation routine might also need to be able to indicate
whether WAL logging is needed or not.

One thing I want attention to be paid to is that the interface should be
able to support 'gapless' sequences. I.e. where nextval() (and thus
alloc) needs to wait until the last caller to it finished. That very
well can be relevant from the locking *and* WAL logging perspective.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-09-16 Thread Petr Jelinek

On 16/09/14 14:17, Andres Freund wrote:

On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:


There is also more needed than this, you need:
- int64 value - first value allocated (value to be returned)
- int64 nvalues - number of values allocated
- int64 last - last cached value (used for cached/last_value)
- int64 next - last logged value (used for wal logging)
- am_private - updated private data, must be possible to return as null



I personally don't like that we need all the nvalues, next and last as
it makes the seqam a little bit too aware of the sequence logging internals
in my opinion but I haven't found a way around it - it's impossible for
backend to know how the AM will act around incby/maxv/minv/cycling so it
can't really calculate these values by itself, unless ofcourse we fix the
behavior and require seqams to behave predictably, but that somewhat breaks
the whole idea of leaving the allocation to the seqam. Obviously it would
also work to return list of allocated values and then backend could
calculate the value, nvalues, last, next from that list by itself,
but I am worried about performance of that approach.


Yea, it's far from pretty.

I'm not convinced that the AM ever needs to/should care about
caching. To me that's more like a generic behaviour. So it probably
should be abstracted away from the individual AMs.

I think the allocation routine might also need to be able to indicate
whether WAL logging is needed or not.


Well that means we probably want to return first allocated value, last 
allocated value and then some boolean that tells backend if to wal log 
the sequence or not (number of values allocated does not really seem to 
be important unless I am missing something).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-09-14 Thread Petr Jelinek

On 18/11/13 11:50, Heikki Linnakangas wrote:


I don't think the sequence AM should be in control of 'cached'. The
caching is done outside the AM. And log_cnt probably should be passed to
the _alloc function directly as an argument, ie. the server code asks
the AM to allocate N new values in one call.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)



I was looking at this a bit today and what I see is that it's not that 
simple.


Minimum input the seqam_alloc needs is:
- Relation seqrel
- int64 minv, maxv, incby, bool is_cycled - these are basically options 
giving info about how the new numbers are allocated (I guess some 
implementations are not going to support all of those)
- bool is_called - the current built-in sequence generator behaves 
differently based on it and I am not sure we can get over it (it could 
perhaps be done in back-end independently of AM?)

- int64 nrequested - number of requested values
- Datum am_private - current private data

In this light I agree with what Andres wrote - let's just send the whole 
Form_pg_sequence object.


Also makes me think that the seqam options interface should also be 
passed the minv/maxv/incby/is_cycled etc options for validation, not 
just the amoptions.




And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.



There is also more needed than this, you need:
- int64 value - first value allocated (value to be returned)
- int64 nvalues - number of values allocated
- int64 last - last cached value (used for cached/last_value)
- int64 next - last logged value (used for wal logging)
- am_private - updated private data, must be possible to return as null

I personally don't like that we need all the nvalues, next and 
last as it makes the seqam a little bit too aware of the sequence 
logging internals in my opinion but I haven't found a way around it - 
it's impossible for backend to know how the AM will act around 
incby/maxv/minv/cycling so it can't really calculate these values by 
itself, unless ofcourse we fix the behavior and require seqams to behave 
predictably, but that somewhat breaks the whole idea of leaving the 
allocation to the seqam. Obviously it would also work to return list of 
allocated values and then backend could calculate the value, 
nvalues, last, next from that list by itself, but I am worried 
about performance of that approach.




The backend code handles the caching and logging of values. When it has
exhausted all the cached values (or doesn't have any yet), it calls the
AM's alloc function to get a new batch. The AM returns the new batch,
and updates its private state as necessary. Then the backend code
updates the relation file with the new values and the AM's private data.
WAL-logging and checkpointing is the backend's responsibility.



Agreed here.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-26 Thread Heikki Linnakangas

On 11/24/13 19:15, Simon Riggs wrote:

On 18 November 2013 07:36, Heikki Linnakangas hlinnakan...@vmware.com wrote:

On 14.11.2013 22:10, Simon Riggs wrote:


Includes test extension which allows sequences without gaps - gapless.


I realize this is just for demonstration purposes, but it's worth noting
that it doesn't actually guarantee that when you use the sequence to
populate a column in the table, the column would not have gaps. Sequences
are not transactional, so rollbacks will still produce gaps. The
documentation is misleading on that point. Without a strong guarantee, it's
a pretty useless extension.


True.

If I fix that problem, I should change the name to lockup sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?


Just bin it. It would be useful if it could guarantee gaplessness, but I 
don't see how to do that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-26 Thread Heikki Linnakangas

On 11/25/13 12:00, Simon Riggs wrote:

On 25 November 2013 04:01, Heikki Linnakangas hlinnakan...@vmware.com wrote:


The proposed changes to alloc() would still suffer from all the problems
that I complained about. Adding a new API alongside doesn't help with that.


You made two proposals. I suggested implementing both.

What would you have me do?


Dunno. I do know that the proposed changes to alloc() are not a good 
API. You could implement the other API, I think that has a chance of 
being a cleaner API, but looking at the BDR extension that would use 
that facility, I'm not sure how useful that would be for you. (and I'd 
really like to see an actual implementation of whatever API we come up 
with, before we commit to maintaining it).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-25 Thread Heikki Linnakangas

On 24.11.2013 19:23, Simon Riggs wrote:

On 18 November 2013 07:06, Heikki Linnakangas hlinnakan...@vmware.com wrote:

On 18.11.2013 13:48, Simon Riggs wrote:


On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com
wrote:


It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.



I don't think the sequence AM should be in control of 'cached'. The
caching
is done outside the AM. And log_cnt probably should be passed to the
_alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.


I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..


Index AM's are completely responsible for the on-disk structure, while with
the proposed API, both the AM and the backend are intimately aware of the
on-disk representation. Such a shared responsibility is not a good thing in
an API. I would also be fine with going 100% to the index AM direction, and
remove all knowledge of the on-disk layout from the backend code and move it
into the AMs. Then you could actually implement the discussed store all
sequences in a single file change by writing a new sequence AM for it.


I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()


The proposed changes to alloc() would still suffer from all the problems 
that I complained about. Adding a new API alongside doesn't help with that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-25 Thread Simon Riggs
On 25 November 2013 04:01, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 The proposed changes to alloc() would still suffer from all the problems
 that I complained about. Adding a new API alongside doesn't help with that.

You made two proposals. I suggested implementing both.

What would you have me do?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-24 Thread Simon Riggs
On 18 November 2013 07:36, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 14.11.2013 22:10, Simon Riggs wrote:

 Includes test extension which allows sequences without gaps - gapless.


 I realize this is just for demonstration purposes, but it's worth noting
 that it doesn't actually guarantee that when you use the sequence to
 populate a column in the table, the column would not have gaps. Sequences
 are not transactional, so rollbacks will still produce gaps. The
 documentation is misleading on that point. Without a strong guarantee, it's
 a pretty useless extension.

True.

If I fix that problem, I should change the name to lockup sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-24 Thread Simon Riggs
On 18 November 2013 07:06, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 18.11.2013 13:48, Simon Riggs wrote:

 On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 It doesn't go far enough, it's still too *low*-level. The sequence AM
 implementation shouldn't need to have direct access to the buffer page at
 all.


 I don't think the sequence AM should be in control of 'cached'. The
 caching
 is done outside the AM. And log_cnt probably should be passed to the
 _alloc
 function directly as an argument, ie. the server code asks the AM to
 allocate N new values in one call.


 I can't see what the rationale of your arguments is. All index Ams
 write WAL and control buffer locking etc..


 Index AM's are completely responsible for the on-disk structure, while with
 the proposed API, both the AM and the backend are intimately aware of the
 on-disk representation. Such a shared responsibility is not a good thing in
 an API. I would also be fine with going 100% to the index AM direction, and
 remove all knowledge of the on-disk layout from the backend code and move it
 into the AMs. Then you could actually implement the discussed store all
 sequences in a single file change by writing a new sequence AM for it.

I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()

So the SeqAM would implement either nextval() or alloc() but not both

global sequences as envisaged for BDR would use a special alloc() call.

I don't think that is too much work, but I want to do this just once...

Thoughts on exact next steps for implementation please?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Heikki Linnakangas

On 15.11.2013 21:00, Simon Riggs wrote:

On 15 November 2013 15:48, Peter Eisentraut pete...@gmx.net wrote:

Also, you set this to returned with feedback in the CF app.  Please
verify whether that was intentional.


Not sure that was me, if so, corrected.


It was me, sorry. I figured this needs such a large restructuring that 
it's not going to be committable in this commitfest. But I'm happy to 
keep the discussion going if you think otherwise.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Heikki Linnakangas

On 15.11.2013 20:21, Andres Freund wrote:

On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:

It's pretty hard to review the this without seeing the other
implementation you're envisioning to use this API. But I'll try:


We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdrk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0As=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().


Thanks. That pokes into the low-level details of sequence tuples, just 
like I was afraid it would.



I wonder if the level of abstraction is right. The patch assumes that the
on-disk storage of all sequences is the same, so the access method can't
change that.  But then it leaves the details of actually updating the on-disk
block, WAL-logging and all, as the responsibility of the access method.
Surely that's going to look identical in all the seqams, if they all use the
same on-disk format. That also means that the sequence access methods can't
be implemented as plugins, as plugins can't do WAL-logging.


Well, it exposes log_sequence_tuple() - together with the added am
private column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.


The proposed abstraction leaks like a sieve. The plugin should not need 
to touch anything but the private amdata field. log_sequence_tuple() is 
way too low-level.


Just as a thought-experiment, imagine that we decided to re-implement 
sequences so that all the sequence values are stored in one big table, 
or flat-file in the data directory, instead of the current 
implementation where we have a one-block relation file for each 
sequence. If the sequence access method API is any good, it should stay 
unchanged. That's clearly not the case with the proposed API.



The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the only
mention of am_data in the patch.


It's amdata, not am_data :/. Directly at the end of pg_sequence.


Ah, got it.


Stepping back a bit and looking at this problem from a higher level, why 
do you need to hack this stuff into the sequences? Couldn't you just 
define a new set of functions, say bdr_currval() and bdr_nextval(), to 
operate on these new kinds of sequences? You're not making much use of 
the existing sequence infrastructure, anyway, so it might be best to 
just keep the implementation completely separate. If you need it for 
compatibility with applications, you could create facade 
currval/nextval() functions that call the built-in version or the bdr 
version depending on the argument.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Andres Freund
On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
 On 15.11.2013 20:21, Andres Freund wrote:
 Well, it exposes log_sequence_tuple() - together with the added am
 private column of pg_squence that allows to do quite a bit of different
 things. I think unless we really implement pluggable RMGRs - which I
 don't really see coming - we cannot be radically different.
 
 The proposed abstraction leaks like a sieve. The plugin should not need to
 touch anything but the private amdata field. log_sequence_tuple() is way too
 low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence-amdata
-is_called
-last_value
-log_cnt
SeqTable-last
SeqTable-cached
SeqTable-last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
  once are.  Which e.g. you need to control if want a sequence that
  doesn't leak values in crashes
* cached is needed to control the per-backend caching.


 Just as a thought-experiment, imagine that we decided to re-implement
 sequences so that all the sequence values are stored in one big table, or
 flat-file in the data directory, instead of the current implementation where
 we have a one-block relation file for each sequence. If the sequence access
 method API is any good, it should stay unchanged. That's clearly not the
 case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

 Stepping back a bit and looking at this problem from a higher level, why do
 you need to hack this stuff into the sequences? Couldn't you just define a
 new set of functions, say bdr_currval() and bdr_nextval(), to operate on
 these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in userspace. How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

 You're not making much use of the existing sequence infrastructure, anyway, 
 so it might be best to just keep the
 implementation completely separate. If you need it for compatibility with
 applications, you could create facade currval/nextval() functions that call
 the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support.  Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Heikki Linnakangas

On 18.11.2013 11:48, Andres Freund wrote:

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:

On 15.11.2013 20:21, Andres Freund wrote:

Well, it exposes log_sequence_tuple() - together with the added am
private column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.


The proposed abstraction leaks like a sieve. The plugin should not need to
touch anything but the private amdata field. log_sequence_tuple() is way too
low-level.


Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.


It doesn't go far enough, it's still too *low*-level. The sequence AM 
implementation shouldn't need to have direct access to the buffer page 
at all.



If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence-amdata
 -is_called
 -last_value
 -log_cnt
SeqTable-last
SeqTable-cached
SeqTable-last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
   once are.  Which e.g. you need to control if want a sequence that
   doesn't leak values in crashes
* cached is needed to control the per-backend caching.


I don't think the sequence AM should be in control of 'cached'. The 
caching is done outside the AM. And log_cnt probably should be passed to 
the _alloc function directly as an argument, ie. the server code asks 
the AM to allocate N new values in one call.


I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has 
exhausted all the cached values (or doesn't have any yet), it calls the 
AM's alloc function to get a new batch. The AM returns the new batch, 
and updates its private state as necessary. Then the backend code 
updates the relation file with the new values and the AM's private data. 
WAL-logging and checkpointing is the backend's responsibility.



Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table, or
flat-file in the data directory, instead of the current implementation where
we have a one-block relation file for each sequence. If the sequence access
method API is any good, it should stay unchanged. That's clearly not the
case with the proposed API.


I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.


I'm not thinking that we'd change sequences to not be relations. I'm 
thinking that we might not want to store the state as a one-page file 
anymore. In fact that was just discussed in the other thread titled 
init_sequence spill to hash table.



b) It's not actually easy to get similar semantics in userspace. How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?


heap_inplace_update()

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Andres Freund
On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote:
 On 18.11.2013 11:48, Andres Freund wrote:
 I don't think the sequence AM should be in control of 'cached'. The caching
 is done outside the AM. And log_cnt probably should be passed to the _alloc
 function directly as an argument, ie. the server code asks the AM to
 allocate N new values in one call.

Sounds sane.

 I'm thinking that the alloc function should look something like this:
 
 seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

I don't think we can avoid giving access to the other columns of
pg_sequence, stuff like increment, limits et all need to be available
for reading, so that'd also need to get passed in. And we need to signal
that am_private can be NULL, otherwise we'd end up with ugly ways to
signal that.
So I'd say to pass in the entire tuple, and return a copy? Alternatively
we can return am_private as a 'struct varlena *', so we can properly
signal empty values.

We also need a way to set am_private from outside
seqam_alloc/setval/... Many of the fancier sequences that people talked
about will need preallocation somewhere in the background. As proposed
that's easy enough to write using log_sequence_tuple(), this way we'd
need something that calls a callback with the appropriate buffer lock
held. So maybe a seqam_update(Relation seqrel, ...) callback that get's
called when somebody executes pg_sequence_update(oid)?

It'd probably a good idea to provide a generic function for checking
whether a new value falls in the boundaries of the sequence's min, max +
error handling.

 I'm not thinking that we'd change sequences to not be relations. I'm
 thinking that we might not want to store the state as a one-page file
 anymore. In fact that was just discussed in the other thread titled
 init_sequence spill to hash table.

Yes, I read and even commented in that thread... But nothing in the
current proposed API would prevent you from going in that direction,
you'd just get passed in a different tuple/buffer.

 b) It's not actually easy to get similar semantics in userspace. How
 would you emulate the crash-safe but non-transactional semantics of
 sequences without copying most of sequence.c? Without writing
 XLogInsert()s which we cannot do from a out-of-tree code?
 
 heap_inplace_update()

That gets the crashsafe part partially (doesn't allow making the tuple
wider than before), but not the caching/stateful part et al. The point
is that you need most of sequence.c again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Simon Riggs
On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 It doesn't go far enough, it's still too *low*-level. The sequence AM
 implementation shouldn't need to have direct access to the buffer page at
 all.

 I don't think the sequence AM should be in control of 'cached'. The caching
 is done outside the AM. And log_cnt probably should be passed to the _alloc
 function directly as an argument, ie. the server code asks the AM to
 allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Do you have a new use case that shows why changes should happen? We
can't just redesign things based upon arbitrary decisions about what
things should or should not be possible via the API.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Heikki Linnakangas

On 18.11.2013 13:48, Simon Riggs wrote:

On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote:


It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.



I don't think the sequence AM should be in control of 'cached'. The caching
is done outside the AM. And log_cnt probably should be passed to the _alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.


I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..


Index AM's are completely responsible for the on-disk structure, while 
with the proposed API, both the AM and the backend are intimately aware 
of the on-disk representation. Such a shared responsibility is not a 
good thing in an API. I would also be fine with going 100% to the index 
AM direction, and remove all knowledge of the on-disk layout from the 
backend code and move it into the AMs. Then you could actually implement 
the discussed store all sequences in a single file change by writing a 
new sequence AM for it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Heikki Linnakangas

On 14.11.2013 22:10, Simon Riggs wrote:

Includes test extension which allows sequences without gaps - gapless.


I realize this is just for demonstration purposes, but it's worth noting 
that it doesn't actually guarantee that when you use the sequence to 
populate a column in the table, the column would not have gaps. 
Sequences are not transactional, so rollbacks will still produce gaps. 
The documentation is misleading on that point. Without a strong 
guarantee, it's a pretty useless extension.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-15 Thread Heikki Linnakangas

On 14.11.2013 22:10, Simon Riggs wrote:

On 16 January 2013 00:40, Simon Riggs si...@2ndquadrant.com wrote:


SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
   CREATE SEQUENCE foo_seq
   USING globalseq


Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - gapless.

Test using seq_test.psql after creating extension.

No dependencies on other patches.


It's pretty hard to review the this without seeing the other 
implementation you're envisioning to use this API. But I'll try:


I wonder if the level of abstraction is right. The patch assumes that 
the on-disk storage of all sequences is the same, so the access method 
can't change that. But then it leaves the details of actually updating 
the on-disk block, WAL-logging and all, as the responsibility of the 
access method. Surely that's going to look identical in all the seqams, 
if they all use the same on-disk format. That also means that the 
sequence access methods can't be implemented as plugins, as plugins 
can't do WAL-logging.


The comment in seqam.c says that there's a private column reserved for 
access method-specific data, called am_data, but that seems to be the 
only mention of am_data in the patch.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-15 Thread Andres Freund
On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:
 It's pretty hard to review the this without seeing the other
 implementation you're envisioning to use this API. But I'll try:

We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/bdr/bdr_seq.c;h=c9480c8021882f888e35080f0e7a7494af37ae27;hb=bdr

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().

 I wonder if the level of abstraction is right. The patch assumes that the
 on-disk storage of all sequences is the same, so the access method can't
 change that.  But then it leaves the details of actually updating the on-disk
 block, WAL-logging and all, as the responsibility of the access method.
 Surely that's going to look identical in all the seqams, if they all use the
 same on-disk format. That also means that the sequence access methods can't
 be implemented as plugins, as plugins can't do WAL-logging.

Well, it exposes log_sequence_tuple() - together with the added am
private column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

 The comment in seqam.c says that there's a private column reserved for
 access method-specific data, called am_data, but that seems to be the only
 mention of am_data in the patch.

It's amdata, not am_data :/. Directly at the end of pg_sequence.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-15 Thread Simon Riggs
On 15 November 2013 15:08, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 I wonder if the level of abstraction is right.

That is the big question and not something to shy away from.

What is presented is not the first thought, by a long way. Andres'
contribution to the patch is mainly around this point, so the seq am
is designed with the needs of the main use case in mind.

I'm open to suggested changes but I would say that practical usage
beats changes suggested for purity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2013-11-15 Thread Peter Eisentraut
On 11/14/13, 3:10 PM, Simon Riggs wrote:
 On 16 January 2013 00:40, Simon Riggs si...@2ndquadrant.com wrote:
 
 SeqAm allows you to specify a plugin that alters the behaviour for
 sequence allocation and resetting, aimed specifically at clustering
 systems.

 New command options on end of statement allow syntax
   CREATE SEQUENCE foo_seq
   USING globalseq
 
 Production version of this, ready for commit to PG 9.4

This patch doesn't apply anymore.

Also, you set this to returned with feedback in the CF app.  Please
verify whether that was intentional.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >