Re: [HACKERS] TABLESAMPLE patch

2015-04-22 Thread Petr Jelinek

On 19/04/15 01:24, Michael Paquier wrote:

On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:

On 10/04/15 06:46, Michael Paquier wrote:

13) Some regression tests with pg_tablesample_method would be welcome.


Not sure what you mean by that.


I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;


Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..



Yes I guessed that and it's very reasonable request, I guess it should 
look like the attached (I don't want to send new version of everything 
just for this).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From d059c792a1e864e38f321cea51169ea0b3c5caab Mon Sep 17 00:00:00 2001
From: Petr Jelinek pjmo...@pjmodos.net
Date: Wed, 22 Apr 2015 21:28:29 +0200
Subject: [PATCH 7/7] tablesample: add catalog regression test

---
 src/test/regress/expected/tablesample.out | 15 +++
 src/test/regress/sql/tablesample.sql  | 13 +
 2 files changed, 28 insertions(+)

diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
index 271638d..04e5eb8 100644
--- a/src/test/regress/expected/tablesample.out
+++ b/src/test/regress/expected/tablesample.out
@@ -209,6 +209,21 @@ SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
 ERROR:  syntax error at or near TABLESAMPLE
 LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL...
  ^
+-- catalog sanity
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+   OR tsmseqscan IS NULL
+   OR tsmpagemode IS NULL
+   OR tsmnextblock IS NULL
+   OR tsmnexttuple IS NULL
+   OR tsmend IS NULL
+   OR tsmreset IS NULL
+   OR tsmcost IS NULL;
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost 
+-++-+-+--+--+-++--+-
+(0 rows)
+
 -- done
 DROP TABLE test_tablesample CASCADE;
 NOTICE:  drop cascades to 2 other objects
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
index 2f4b7de..7b3eb9b 100644
--- a/src/test/regress/sql/tablesample.sql
+++ b/src/test/regress/sql/tablesample.sql
@@ -57,5 +57,18 @@ SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
 
 SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
 
+-- catalog sanity
+
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+   OR tsmseqscan IS NULL
+   OR tsmpagemode IS NULL
+   OR tsmnextblock IS NULL
+   OR tsmnexttuple IS NULL
+   OR tsmend IS NULL
+   OR tsmreset IS NULL
+   OR tsmcost IS NULL;
+
 -- done
 DROP TABLE test_tablesample CASCADE;
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-22 Thread Petr Jelinek

On 20/04/15 17:50, Heikki Linnakangas wrote:

On 03/15/2015 09:07 PM, Petr Jelinek wrote:

Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.


I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch seqam,
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.


Thanks!



I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.


Yeah, I was thinking about this myself I just liked sending 10 
parameters to the function less than this.




The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.


Hmm yes it should and I am sure it did at some point, must have messed 
it during one of the rebases :(


And it's the reason why we need separate API function.



postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min  max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.


Well then we need to send restart as additional parameter to the init 
function as restart is normally not stored anywhere.


The checking is actually if new value is withing min/max but yes that 
can be done inside sequence.c I guess.




* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance
here.


Right, this is actually more of a relic of the custom columns 
implementation where I didn't want to build the tuple with NULLs for 
columns that might have been specified as NOT NULL, but with the amdata 
approach it can create the tuple safely.




* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.


Yeah that was exactly the reasoning. Helper function works for me (will 
check what Álvaro's suggested, maybe it can be moved somewhere and reused).





seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));

for (i = 0; i  count; i++)
{
if (pg_strcasecmp(keys[i], last_value) == 0)
seq-last_value = DatumGetInt64(DirectFunctionCall1(int8in,

CStringGetDatum(values[i])));
else if (pg_strcasecmp(keys[i], is_called) == 0)
seq-is_called = DatumGetBool(DirectFunctionCall1(boolin,

CStringGetDatum(values[i])));
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg(invalid state key \%s\ for local
sequence,
keys[i])));
}

sequence_save_tuple(seqh, NULL, true);


If that error happens after having already processed a last_value or
is_called entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place

Re: [HACKERS] Replication identifiers, take 4

2015-04-22 Thread Petr Jelinek

On 21/04/15 22:36, Andres Freund wrote:

On 2015-04-21 16:26:08 -0400, Robert Haas wrote:

On Tue, Apr 21, 2015 at 8:08 AM, Andres Freund and...@anarazel.de wrote:

I've now named the functions:

* pg_replication_origin_create
* pg_replication_origin_drop
* pg_replication_origin_get (map from name to id)
* pg_replication_progress_setup_origin : configure session to replicate
   from a specific origin
* pg_replication_progress_reset_origin
* pg_replication_progress_setup_tx_details : configure per transaction
   details (LSN and timestamp currently)
* pg_replication_progress_is_replaying : Is a origin configured for the session
* pg_replication_progress_advance : manually set the replication
   progress to a value. Primarily useful for copying values from other
   systems and such.
* pg_replication_progress_get : How far did replay progress for a
   certain origin
* pg_get_replication_progress : SRF returning the replay progress for
   all origin.

Any comments?


Why are we using functions for this rather than DDL?


Unless I miss something the only two we really could use ddl for is
pg_replication_origin_create/pg_replication_origin_drop. We could use
DDL for them if we really want, but I'm not really seeing the advantage.



I think the only value of having DDL for this would be consistency 
(catalog objects are created via DDL) as it looks like something that 
will be called only by extensions and not users during normal operation.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-04-20 Thread Petr Jelinek

On 20/04/15 12:05, Andres Freund wrote:

On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:

With the patch, pg_class.relam column references to the pg_seqam table for
sequences, but pg_indexam for indexes. I believe it's the first instance
where we reuse a foreign key column like that. It's not a real foreign
key, of course - that wouldn't work with a real foreign key at all - but
it's a bit strange. That makes me a bit uncomfortable. How do others feel
about that?


Hm. I'd modeled it more as an extension of the 'relkind' column
mentally. I.e. it further specifies how exactly the relation is
behaving. Given that the field has been added to pg_class and not
pg_index, combined with it not having index in its name, makes me think
that it actually was intended to be used the way it's done in the patch.



That's how I think about it too. It's also short for access method, 
nothing really suggests to me that it should be index specific by design.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Petr Jelinek

On 17/04/15 22:36, Simon Riggs wrote:


I said that IMO the difference in WAL size is so small that we
should just use 4-byte OIDs for the replication identifiers, instead
of trying to make do with 2 bytes. Not because I find it too likely
that you'll run out of IDs (although it could happen), but more to
make replication IDs more like all other system objects we have.
Andreas did some pgbench benchmarking to show that the difference in
WAL size is about 10%. The WAL records generated by pgbench happen
to have just the right sizes so that the 2-3 extra bytes bump them
over to the next alignment boundary. That's why there is such a big
difference - on average it'll be less. I think that's acceptable,
Andreas seems to think otherwise. But if the WAL size really is so
precious, we could remove the two padding bytes from XLogRecord,
instead of dedicating them for the replication ids. That would be an
even better use for them.


The argument to move to 4 bytes is a poor one. If it was reasonable in
terms of code or cosmetic value then all values used in the backend
would be 4 bytes. We wouldn't have any 2 byte values anywhere. But we
don't do that.

The change does nothing useful, since I doubt anyone will ever need
 32768 nodes in their cluster.



And if they did there would be other much bigger problems than 
replication identifier being 16bit (it's actually 65534 as it's 
unsigned btw).


Considering the importance and widespread use of replication I think we 
should really make sure the related features have small overhead.



Increasing WAL size for any non-zero amount is needlessly wasteful for a
change with only cosmetic value. But for a change that has significant
value for database resilience, it is a sensible use of bytes.

+1 to Andres' very reasonable suggestion. Lets commit this and go home.



+1

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-11 Thread Petr Jelinek

On 10/04/15 18:03, Andres Freund wrote:

On 2015-04-07 17:08:16 +0200, Andres Freund wrote:

I'm starting benchmarks now.


What I'm benchmarking here is the WAL overhead, since that's what we're
debating.

The test setup I used was a pgbench scale 10 instance. I've run with
full_page_write=off to have more reproducible results. This of course
over-emphasizes the overhead a bit. But for a long checkpoint interval
and a memory resident working set it's not that unrealistic.

I ran 50k transactions in a signle b
baseline:
- 20445024
- 20437128
- 20436864
- avg: 20439672
extern 2byte identifiers:
- 23318368
- 23148648
- 23128016
- avg: 23198344
- avg overhead: 13.5%
padding 2byte identifiers:
- 21160408
- 21319720
- 21164280
- avg: 21214802
- avg overhead: 3.8%
extern 4byte identifiers:
- 23514216
- 23540128
- 23523080
- avg: 23525808
- avg overhead: 15.1%

To me that shows pretty clearly that a) reusing the padding is
worthwhile b) even without that using 2byte instead of 4 byte
identifiers is beneficial.



My opinion is that 10% of WAL size difference is quite high price to pay 
so that we can keep the padding for some other, yet unknown feature that 
hasn't come up in several years, which would need those 2 bytes.


But if we are willing to pay it then we can really go all the way and 
just use Oids...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-10 Thread Petr Jelinek

On 10/04/15 21:26, Peter Eisentraut wrote:

On 4/9/15 8:58 PM, Petr Jelinek wrote:

Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.

Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.


What is your intended use case for this feature?  I know that give me
100 random rows from this table quickly is a common use case, but
that's kind of cumbersome if you need to apply formulas like that.  I'm
not sure what the use of a percentage is.  Presumably, the main use of
this features is on large tables.  But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.



My main intended use-case is analytics on very big tables. The 
percentages of population vs confidence levels are pretty well mapped 
there and you can get quite big speedups if you are fine with getting 
results with slightly smaller confidence (ie you care about ballpark 
figures).


But this was not really my point, the BERNOULLI just does not work well 
with row-limit by definition, it applies probability on each individual 
row and while you can get probability from percentage very easily (just 
divide by 100), to get it for specific target number of rows you have to 
know total number of source rows and that's not something we can do very 
accurately so then you won't get 500 rows but approximately 500 rows.


In any case for give me 500 somewhat random rows from table quickly 
you want probably SYSTEM sampling anyway as it will be orders of 
magnitude faster on big tables and yes even 0.1% might be more than you 
wanted in that case. I am not against having row limit input for methods 
which can work with it like SYSTEM but then that's easily doable by 
adding separate sampling method which accepts rows (even if sampling 
algorithm itself is same). In current approach all you'd have to do is 
write different init function for the sampling method and register it 
under new name (yes it won't be named SYSTEM but for example 
SYSTEM_ROWLIMIT then).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-10 Thread Petr Jelinek

On 10/04/15 22:16, Tomas Vondra wrote:



On 04/10/15 21:57, Petr Jelinek wrote:

On 10/04/15 21:26, Peter Eisentraut wrote:

But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.


It's actually even trickier. Even if you happen to know the exact number
of rows in the table, you can't just convert that into a percentage like
that and use it for BERNOULLI sampling. It may give you different number
of result rows, because each row is sampled independently.

That is why we have Vitter's algorithm for reservoir sampling, which
works very differently from BERNOULLI.



Hmm this actually gives me idea - perhaps we could expose Vitter's 
reservoir sampling as another sampling method for people who want the 
give me 500 rows from table fast then? We already have it implemented 
it's just matter of adding the glue.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Petr Jelinek

On 09/04/15 11:37, Simon Riggs wrote:

On 9 April 2015 at 04:52, Simon Riggs si...@2ndquadrant.com wrote:


TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.

This query should be part of the test suite and should generate a
useful message or work correctly.


The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.


Yes SQL Standard allows this and the reason why they don't define what 
happens with SYSTEM is that they actually don't define how SYSTEM should 
behave except that it should return approximately given percentage of 
rows, but the actual behavior is left to the DBMS. The reason why other 
dbs like MSSQL or DB2 have chosen this to be block sampling is that it 
makes most sense (and is fastest) on tables and those databases don't 
support TABLESAMPLE on anything else at all.




On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.



I think so too, considering how late in the last CF we are. Especially 
given my note about MSSQL and DB2 above.


In any case I don't see any fundamental issues with extending the 
current implementation with the subquery support. I think most of the 
work there is actually in parser/analyzer and planner. The sampling 
methods will just not receive the request for next blockid and tupleid 
from that block when source of the data is subquery and if they want to 
support subquery as source of sampling they will have to provide the 
examinetuple interface (which is already there and optional, the 
test/example custom sampling method is using it).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 12:33, Amit Kapila wrote:

On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.
 
  I would be ok with just expr_list, naming parameters here isn't
overly important, but ability to have different types and numbers of
parameters for custom TABLESAMPLE methods *is* important.
 

Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.



I think I'll go with simple list of a_exprs. The reason for that is that 
I can foresee sampling methods that need multiple parameters, but I 
don't think naming them is very important. Also adding support for 
naming parameters can be done in the future if we decide so without 
breaking compatibility. Side benefit is that it makes hinting about what 
is wrong with input somewhat easier.


I don't think we need to come up with different limit from 
FUNC_MAX_ARGS. I don't think any sampling method would need that many 
parameters but I also don't see what would additional smaller limit give us.




 
  2.
  postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
  = 2;
  ERROR:  syntax error at or near TABLESAMPLE
  LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
 
  postgres=# Delete from test_tablesample TABLESAMPLE system(30);
  ERROR:  syntax error at or near TABLESAMPLE
  LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
 
  Isn't TABLESAMPLE clause suppose to work with Update/Delete
  statements?
 
 
  It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
 

But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
 QUERY PLAN
---
  Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
-  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
(2 rows)



Right, I'll make those views not auto-updatable.



  Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.
 

By the way, what is the usecase to support sample scan in
Update or Delete statement?



Well for the USING/FROM part the use-case is same as for SELECT - 
providing sample of the data for the query (it can be useful also for 
getting pseudo random rows fast). And if we didn't support it, it could 
still be done using sub-select so why not have it directly.



Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?



Most of other clauses that we support in FROM are not mentioned in 
UPDATE/DELETE docs, both of those commands just say something like 
refer to the SELECT FROM docs for more info. Do you think TABLESAMPLE 
deserves special treatment in this regard?



 
  7.
  ParseTableSample()
  {
  ..
  arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
  ..
  }
 
  What is the reason for coercing value of REPEATABLE clause to INT4OID
  when numeric value is expected for the clause.  If user gives the
  input value as -2.3, it will generate a seed which doesn't seem to
  be right.
 
 
  Because the REPEATABLE is numeric expression so it can produce
whatever number but we need int4 internally (well float4 would also work
just the code would be slightly uglier).
 

Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295


Well the thing with SQL Standard's numeric value expression is that it 
actually does not mean numeric data type, it's just simple arithmetic 
expression with some given rules (by the standard), but the output data 
type can be either implementation

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 15:07, Amit Kapila wrote:

On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 06/04/15 12:33, Amit Kapila wrote:
 
 
  But I think the Update on target table with sample scan is
  supported via views which doesn't seem to be the right thing
  in case you just want to support it via FROM/USING, example
 
  postgres=# create view vw_test As select * from test_tablesample
  TABLESAMPLE sys
  tem(30);
  postgres=# explain update vw_test set id = 4;
   QUERY PLAN
 
---
Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
  -  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4
width=210)
  (2 rows)
 
 
  Right, I'll make those views not auto-updatable.
 
 
Standard is somewhat useless for UPDATE and DELETE as it only defines
  quite limited syntax there. From what I've seen when doing research
  MSSQL also only supports it in their equivalent of FROM/USING list,
  Oracle does not seem to support their SAMPLING clause outside of SELECTs
  at all and if I got the cryptic DB2 manual correctly I think they don't
  support it outside of (sub)SELECTs either.
   
 
  By the way, what is the usecase to support sample scan in
  Update or Delete statement?
 
 
  Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
 

I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list.  Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?



Well, I don't understand why would somebody do it either, but then again 
during research of this feature I've found questions on stack overflow 
and similar sites about how to do it, so people must have use-cases.


And in any case as you say sub-select would work there so there is no 
reason to explicitly disable it. Plus there is already difference 
between what can be the target table in DELETE/UPDATE versus what can be 
in the FROM/USING clause and I think the TABLESAMPLE behavior follows 
that separation nicely - it's well demonstrated by the fact that we 
would have to add explicit exception to some places in code to disallow it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 11:02, Simon Riggs wrote:

On 2 April 2015 at 17:36, Petr Jelinek p...@2ndquadrant.com wrote:


so here is version 11.


Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was sampling method can be one of list.



Will reword.


Are we ready for a final detailed review and commit?



I plan to send v12 in the evening with some additional changes that came 
up from Amit's comments + some improvements to error reporting. I think 
it will be ready then.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Petr Jelinek
 the calculations much easier and 
faster than dealing with Numeric would.



9.
postgres=# explain SELECT t1.id http://t1.id FROM test_tablesample as
t1 TABLESAMPLE SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
  QUERY PLAN

  Nested Loop  (cost=0.00..4.05 rows=100 width=4)
-  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
-  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...


Good idea.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Petr Jelinek

On 05/04/15 01:03, Tomas Vondra wrote:

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled
without asserts. Attached is a simple fix to get rid of the warnings.



This has been already fixed, see b7e1652d5 .


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-02 Thread Petr Jelinek

On 02/04/15 21:21, Robert Haas wrote:

On Thu, Apr 2, 2015 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:

I think this is really nice work, so I have committed this version.  I
made a few fairly minor changes, hopefully without breaking anything
in the process:

- I adjusted things for recent commits around INT{32,63}_{MIN_MAX}.
- I removed (void) ssup; which I do not think is normal PostgreSQL style.
- Removed the #if DEC_DIGITS != 4 check.  The comment is great, but I
think we don't need protect against #if 0 code get re-enabled.
- I removed the too-clever (at least IMHO) handing of TRACE_SORT in
favor of just using #ifdef around each occurrence.
- I also declared trace_sort in guc.h, where various other GUCs are
declared, instead of declaring it privately in each file that needed
it.
- Changed some definitions to depend on SIZEOF_DATUM rather than
USE_FLOAT8_BYVAL.  Hopefully I didn't muff this; please check it.
- Fixed an OID conflict.
- And of course, bumped catversion.


And that's broken the buildfarm.  Argh.



That's because of this:

+#ifdef SIZEOF_DATUM == 8

You need just #if there.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-02 Thread Petr Jelinek

On 02/04/15 22:22, Robert Haas wrote:

On Thu, Apr 2, 2015 at 4:21 PM, Petr Jelinek p...@2ndquadrant.com wrote:

On 02/04/15 21:21, Robert Haas wrote:

On Thu, Apr 2, 2015 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:


I think this is really nice work, so I have committed this version.  I
made a few fairly minor changes, hopefully without breaking anything
in the process:

- I adjusted things for recent commits around INT{32,63}_{MIN_MAX}.
- I removed (void) ssup; which I do not think is normal PostgreSQL style.
- Removed the #if DEC_DIGITS != 4 check.  The comment is great, but I
think we don't need protect against #if 0 code get re-enabled.
- I removed the too-clever (at least IMHO) handing of TRACE_SORT in
favor of just using #ifdef around each occurrence.
- I also declared trace_sort in guc.h, where various other GUCs are
declared, instead of declaring it privately in each file that needed
it.
- Changed some definitions to depend on SIZEOF_DATUM rather than
USE_FLOAT8_BYVAL.  Hopefully I didn't muff this; please check it.
- Fixed an OID conflict.
- And of course, bumped catversion.



And that's broken the buildfarm.  Argh.


That's because of this:

+#ifdef SIZEOF_DATUM == 8

You need just #if there.


Thanks.  I actually pushed a fix for that about 25 minutes ago;
hopefully that is all that is needed.



Ok, the git.pg.org was somewhat behind. It did fix it for me when I 
tested it locally.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unused variable in hashpage.c

2015-04-02 Thread Petr Jelinek

Hi,

my compiler complains about unused variable nblkno in _hash_splitbucket 
in no-assert build. It looks like relic of commit ed9cc2b5d which 
removed the only use of that variable besides the Assert.


Looking at the code:
nblkno = start_nblkno;
Assert(nblkno == BufferGetBlockNumber(nbuf));

I was originally thinking of simply changing the Assert to use 
start_nblkno but then I noticed that the start_nblkno is actually not 
used anywhere in the function either (it's one of input parameters). And 
given that the only caller of that function is getting the variable it 
sends as nbuf to that function via start_nblkno, I think the Assert is 
somewhat pointless there anyway.


So best way to solve this seems to be removing the start_nblkno from the 
_hash_splitbucket altogether. Attached patch does just that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9a77945..2f82b1e 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -39,7 +39,6 @@ static bool _hash_alloc_buckets(Relation rel, BlockNumber firstblock,
 static void _hash_splitbucket(Relation rel, Buffer metabuf,
   Buffer nbuf,
   Bucket obucket, Bucket nbucket,
-  BlockNumber start_oblkno,
   BlockNumber start_nblkno,
   uint32 maxbucket,
   uint32 highmask, uint32 lowmask);
@@ -666,8 +665,7 @@ _hash_expandtable(Relation rel, Buffer metabuf)
 	/* Relocate records to the new bucket */
 	_hash_splitbucket(rel, metabuf, buf_nblkno,
 	  old_bucket, new_bucket,
-	  start_oblkno, start_nblkno,
-	  maxbucket, highmask, lowmask);
+	  start_oblkno, maxbucket, highmask, lowmask);
 
 	/* Release bucket locks, allowing others to access them */
 	_hash_droplock(rel, start_oblkno, HASH_EXCLUSIVE);
@@ -758,13 +756,11 @@ _hash_splitbucket(Relation rel,
   Bucket obucket,
   Bucket nbucket,
   BlockNumber start_oblkno,
-  BlockNumber start_nblkno,
   uint32 maxbucket,
   uint32 highmask,
   uint32 lowmask)
 {
 	BlockNumber oblkno;
-	BlockNumber nblkno;
 	Buffer		obuf;
 	Page		opage;
 	Page		npage;
@@ -781,8 +777,6 @@ _hash_splitbucket(Relation rel,
 	opage = BufferGetPage(obuf);
 	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
 
-	nblkno = start_nblkno;
-	Assert(nblkno == BufferGetBlockNumber(nbuf));
 	npage = BufferGetPage(nbuf);
 
 	/* initialize the new bucket's primary page */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Petr Jelinek

On 01/04/15 17:52, Robert Haas wrote:

On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila amit.kapil...@gmail.com wrote:

I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category.  In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same.  Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version.  I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.


That seems like a legitimate concern.  We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:

http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html

This also makes method an unreserved keyword, which I'm not wild
about either.  Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created?  If the latter, can we
find something to call it that doesn't require new keywords?



REPEATABLE is mandated by standard. I did try for quite some time to 
make it unreserved but was not successful (I can only make it unreserved 
if I make it mandatory but that's not a solution). I haven't been in 
fact even able to find out what it actually conflicts with...


METHOD is something I added. I guess we could find a way to name this 
differently if we really tried. The reason why I picked METHOD was that 
I already added the same unreserved keyword in the sequence AMs patch 
and in that one any other name does not really make sense.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Petr Jelinek

On 01/04/15 18:38, Robert Haas wrote:

On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek p...@2ndquadrant.com wrote:

REPEATABLE is mandated by standard. I did try for quite some time to make it
unreserved but was not successful (I can only make it unreserved if I make
it mandatory but that's not a solution). I haven't been in fact even able to
find out what it actually conflicts with...


Yeah, that can be hard to figure out.  Did you run bison with -v and
poke around in gram.output?



Oh, no I didn't (I didn't know gram.output will be generated). This 
helped quite a bit. Thanks.


I now found the reason, it's conflicting with alias but that's my 
mistake - alias should be before TABLESAMPLE clause as per standard and 
I put it after in parser. Now that I put it at correct place REPEATABLE 
can be unreserved keyword. This change requires making TABLESAMPLE a 
type_func_name_keyword but that's probably not really an issue as 
TABLESAMPLE is reserved keyword per standard.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-03-28 Thread Petr Jelinek

So I did some more in depth look, I do have couple of comments.

I would really like to have something like Logical Replication 
Infrastructure doc section that would have both decoding and 
identifiers (and possibly even CommitTs) underneath.


There is typo in docs:

+ para
+   The optional functionfilter_by_origin_cb/function callback
+   is called to determine wheter data that has been replayed


wheter - whether


And finally I have issue with how the new identifiers are allocated. 
Currently, if you create identifier 'foo', remove identifier 'foo' and 
create identifier 'bar', the identifier 'bar' will have same id as the 
old 'foo' identifier. This can be problem because the identifier id is 
used as origin of the data and the replication solution using the 
replication identifiers can end up thinking that data came from node 
'bar' even though they came from the node 'foo' which no longer exists. 
This can have bad effects for example on conflict detection or debugging 
problems with replication.


Maybe another reason to use standard Oids?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-03-24 Thread Petr Jelinek

On 24/03/15 16:33, Andres Freund wrote:

Hi,

Here's the next version of this patch. I've tried to address the biggest
issue (documentation) and some more. Now that both the more flexible
commit WAL record format and the BKI_FORCE_NOT_NULL thing is in, it
looks much cleaner.



Nice, I see you also did the more close integration with CommitTs. I 
only skimmed the patch but so far and it looks quite good, I'll take 
closer look around end of the week.




I'd greatly appreciate some feedback on the documentation. I'm not
entirely sure into how much detail to go; and where exactly in the docs
to place it. I do wonder if we shouldn't merge this with the logical
decoding section and whether we could also document commit timestamps
somewhere in there.


Perhaps we should have some Logical replication developer documentation 
section and put all those three as subsections of that?



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-22 Thread Petr Jelinek

On 22/03/15 10:35, Andres Freund wrote:

On March 22, 2015 10:34:04 AM GMT+01:00, Michael Paquier 
michael.paqu...@gmail.com wrote:

On Sun, Mar 22, 2015 at 6:22 PM, Andres Freund and...@2ndquadrant.com

That's due to a different patch though, right? When I checked earlier

only jacana had problems due to this, and it looked like random memory
was being output. It's interesting that that's on the one windows (not
cygwin) critter that does the 128bit dance...

Yes, sorry, the e+000 stuff is from 959277a. This patch has visibly
broken that:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacanadt=2015-03-21%2003%3A01%3A21


That's the stuff looking like random memory that I talk about above...



If you look at it closely, it's actually not random memory. At least in 
the first 2 failing tests which are not obfuscated by aggregates on top 
of aggregates. It looks like first NumericDigit is ok and the second one 
is corrupted (there are only 2 NumericDigits in those numbers). Of 
course the conversion to Numeric is done from the end so it looks like 
only the last computation/pointer change/something stays ok while the 
rest got corrupted.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-22 Thread Petr Jelinek

On 22/03/15 13:59, Andreas Karlsson wrote:

On 03/22/2015 11:47 AM, Petr Jelinek wrote:

On 22/03/15 10:35, Andres Freund wrote:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacanadt=2015-03-21%2003%3A01%3A21




That's the stuff looking like random memory that I talk about above...



If you look at it closely, it's actually not random memory. At least in
the first 2 failing tests which are not obfuscated by aggregates on top
of aggregates. It looks like first NumericDigit is ok and the second one
is corrupted (there are only 2 NumericDigits in those numbers). Of
course the conversion to Numeric is done from the end so it looks like
only the last computation/pointer change/something stays ok while the
rest got corrupted.


Would this mean the bug is most likely somewhere in
int128_to_numericvar()? Maybe that version of gcc has a bug in some
__int128 operator or I messed up the code there somehow.



Yeah that's what I was thinking also, and I went through the function 
and didn't find anything suspicious (besides it's same as the 64 bit 
version except for the int128 use).


It really might be some combination of arithmetic + the conversion to 
16bit uint bug in the compiler. Would be worth to try to produce test 
case and try it standalone maybe?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INT64_MIN and _MAX

2015-03-21 Thread Petr Jelinek

On 21/03/15 23:45, Andrew Gierth wrote:

A couple of places (adt/timestamp.c and pgbench.c) have this:

#ifndef INT64_MAX
#define INT64_MAX   INT64CONST(0x7FFF)
#endif

#ifndef INT64_MIN
#define INT64_MIN   (-INT64CONST(0x7FFF) - 1)
#endif

On the other hand, int8.c uses the INT64_MIN expression directly inline.

On the third hand, INT64_MIN etc. would typically be defined in stdint.h
if it exists.

So wouldn't it make more sense to move these definitions into c.h and
standardize their usage?



I was thinking the same when I've seen Peter's version of Numeric 
abbreviations patch. So +1 for that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-16 Thread Petr Jelinek

On 16/03/15 00:37, Andreas Karlsson wrote:

On 03/15/2015 09:47 PM, Petr Jelinek wrote:

It's almost the same thing as you wrote but the 128 bit and 64 bit
optimizations are put on the same level of optimized routines.

But this is nitpicking at this point, I am happy with the patch as it
stands right now.


Do you think it is ready for committer?



In my opinion, yes.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Petr Jelinek

On 15/03/15 14:51, Magnus Hagander wrote:

On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com wrote:

On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
/*
 * Override any inconsistent requests. Not that this is a
change
 * of behaviour in 9.5; prior to this we simply ignored a
request
 * to pause if hot_standby = off, which was surprising
behaviour.
 */
if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
recoveryTargetActionSet 
standbyState == STANDBY_DISABLED)
recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

While it's easy enough to fix I rather dislike the whole intent here
though. *Silently* switching the mode of operation in a rather
significant way seems like a bad idea to me. At the very least we need
to emit a LOG message about this; but I think it'd be much better to
error out instead.

9.5's behaviour was already quite surprising. But changing things to a
different surprising behaviour seems like a bad idea.


+1. Especially for sensitive operations like this, having
predictable-behavior-or-error is usually the best choice.



Thinking about it again now, it does seem that ignoring user setting 
because it's in conflict with another user setting is a bad idea and I 
think we in general throw errors on those.


So +1 from me also.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-15 Thread Petr Jelinek

On 14/03/15 04:17, Andreas Karlsson wrote:

On 03/13/2015 10:22 PM, Peter Geoghegan wrote:

On Thu, Mar 12, 2015 at 6:23 PM, Andreas Karlsson andr...@proxel.se
wrote:

/*
  * Integer data types use Numeric accumulators to share code and
avoid risk
  * of overflow.  To speed up aggregation 128-bit integer
accumulators are
  * used instead where sum(X) or sum(X*X) fit into 128-bits, and
there is
  * platform support.
  *
  * For int2 and int4 inputs sum(X) will fit into a 64-bit
accumulator, hence
  * we use faster special-purpose accumulator routines for SUM and
AVG of
  * these datatypes.
  */

#ifdef HAVE_INT128
typedef struct Int128AggState


Not quite. Refer to the 128-bit integer accumulators as
special-purpose accumulator routines instead. Then, in the case of
the extant 64-bit accumulators, refer to them by the shorthand
integer accumulators. Otherwise it's the wrong way around.


I disagree. The term integer accumulators is confusing. Right now I do
not have any better ideas for how to write that comment, so I submit the
next version of the patch with the comment as I wrote it above. Feel
free to come up with a better wording, my English is not always up to
par when writing technical texts.



I don't like the term integer accumulators either as integer is 
platform specific. I would phase it like this:


/*
 * Integer data types in general use Numeric accumulators to share code
 * and avoid risk of overflow.
 *
 * However for performance reasons some of the optimized special-purpose
 * accumulator routines are used when possible.
 *
 * On platforms with 128-bit integer support, the 128-bit routines will be
 * used when sum(X) or sum(X*X) fit into 128-bit.
 *
 * For int2 and int4 inputs, the N and sum(X) fit into 64-bit so the 64-bit
 * accumulators will be used for SUM and AVG of these data types.
 */

It's almost the same thing as you wrote but the 128 bit and 64 bit 
optimizations are put on the same level of optimized routines.


But this is nitpicking at this point, I am happy with the patch as it 
stands right now.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-03-14 Thread Petr Jelinek

On 22/02/15 01:59, Petr Jelinek wrote:

On 21/02/15 22:09, Andrew Dunstan wrote:


On 02/16/2015 09:05 PM, Petr Jelinek wrote:


I found one more issue with the 1.2--1.3 upgrade script, the DROP
FUNCTION pg_stat_statements(); should be DROP FUNCTION
pg_stat_statements(bool); since in 1.2 the function identity has
changed.





I think all the outstanding issues are fixed in this patch.



I think PGSS_FILE_HEADER should be also updated, otherwise it's good.




I see you marked this as 'needs review', I am marking it as 'ready for 
committer' as it looks good to me.


Just don't forget to update the PGSS_FILE_HEADER while committing (I 
think that one can be threated same way as catversion, ie be updated by 
committer and not in patch submission).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action doesn't work for anything but shutdown

2015-03-13 Thread Petr Jelinek

On 13/03/15 15:06, Petr Jelinek wrote:

On 12/03/15 15:53, Andres Freund wrote:

On 2015-03-12 15:52:02 +0100, Andres Freund wrote:

I guess what you actually intended to test was StandbyModeRequested?


Err, EnableHotStandby.



Yeah pause does not work currently. This change was made by committer



Actually the code is from me, committer just requested the change, sorry.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action doesn't work for anything but shutdown

2015-03-13 Thread Petr Jelinek

On 12/03/15 15:53, Andres Freund wrote:

On 2015-03-12 15:52:02 +0100, Andres Freund wrote:

I guess what you actually intended to test was StandbyModeRequested?


Err, EnableHotStandby.



Yeah pause does not work currently. This change was made by committer 
and I think the intended change was to make it a little safer by 
silently changing to shutdown instead of silently changing to promote 
which is essentially what was happening before the patch.


But yes the check should have been against EnableHotStandby it seems.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mogrify and indent features for jsonb

2015-03-13 Thread Petr Jelinek

On 01/03/15 16:49, Andrew Dunstan wrote:


On 03/01/2015 05:03 AM, Petr Jelinek wrote:

On 23/02/15 18:15, Dmitry Dolgov wrote:

Hi, Petr, thanks for the review.

  I think it would be better if the ident printing didn't put the
start of array ([) and start of dictionary ({) on separate line
Did you mean this?

 [
 {
 a: 1,
 b: 2
 }
 ]

I tried to verify this in several ways (http://jsonprettyprint.com/,
JSON.stringify, json.too from python), and I always get this result
(new line after ([) and ({) ).


No, I mean new lines before the ([) and ({) - try pretty printing
something like '{a:[b, c], d: {e:f}}' using that tool you
pasted and see what your patch outputs to see what I mean.





Please try this patch and see if it's doing what you want.



Yes, this produces output that looks like what javascript/python produce.


(the other stuff I commented about, mainly the h_atoi is still unfixed 
though)


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-10 Thread Petr Jelinek

On 10/03/15 10:54, Amit Kapila wrote:

On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?
 

Yes, this is what I am suggesting.

And if the caller will try to do it in one step and cache the
  visibility info then we'll end up with pretty much same structure as
  rs_vistuples - there isn't saner way to cache this info other than
  ordered vector of tuple offsets, unless we assume that most pages have
  close to MaxOffsetNumber of tuples which they don't, so why not just use
  the heapgetpage directly and do the binary search over rs_vistuples.
   
 
  The downside of doing it via heapgetpage is that it will do
  visibility test for tuples which we might not even need (I think
  we should do visibility test for tuples retrurned by tsmnexttuple).
 
 
  Well, heapgetpage can either read visibility data for whole page or
not, depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not.
 

Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().



I think your suggestion is actually simpler code wise, I am just 
somewhat worried by the fact that no other scan node uses that kind of 
caching and there is probably reason for that.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-10 Thread Petr Jelinek

On 10/03/15 04:43, Amit Kapila wrote:

On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 09/03/15 04:51, Amit Kapila wrote:
 
  On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com
Double checking for tuple visibility is the only downside I can think
  of.
 
  That will happen if we use heapgetpage and the way currently
  code is written in patch, however we can easily avoid double
  checking if we don't call heapgetpage and rather do the required
  work at caller's place.
 
 
  What's the point of pagemode then if the caller code does the
visibility checks still one by one on each call. I thought one of the
points of pagemode was to do this in one step (and one buffer lock).
 

You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().



Ok now I think I finally understand what you are suggesting - you are 
saying let's go over whole page while tsmnexttuple returns something, 
and do the visibility check and other stuff in that code block under the 
buffer lock and cache the resulting valid tuples in some array and then 
return those tuples one by one from that cache?



  And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pages have
close to MaxOffsetNumber of tuples which they don't, so why not just use
the heapgetpage directly and do the binary search over rs_vistuples.
 

The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).



Well, heapgetpage can either read visibility data for whole page or not, 
depending on if we want pagemode reading or not. So we can use the 
pagemode for sampling methods where it's feasible (like system) and not 
use pagemode where it's not (like bernoulli) and then either use the 
rs_vistuples or call HeapTupleSatisfiesVisibility individually again 
depending if the method is using pagemode or not. This is how sequential 
scan does it afaik.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: disallow operator = and use it for named parameters

2015-03-10 Thread Petr Jelinek

On 10/03/15 17:01, Pavel Stehule wrote:



2015-03-10 16:50 GMT+01:00 Tom Lane t...@sss.pgh.pa.us
mailto:t...@sss.pgh.pa.us:

Robert Haas robertmh...@gmail.com mailto:robertmh...@gmail.com
writes:

 Committed with a few documentation tweaks.

Was there any consideration given to whether ruleutils should start
printing NamedArgExprs with =?  Or do we think that needs to wait?


I didn't think about it? I don't see any reason why it have to use
deprecated syntax.



There is one, loading the output into older version of Postgres. Don't 
know if that's important one though.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-09 Thread Petr Jelinek

On 09/03/15 13:39, Andreas Karlsson wrote:

On 03/07/2015 07:18 PM, Petr Jelinek wrote:


What I am wondering is if those numeric_int16_* functions that also deal
with either the Int128AggState or NumericAggState should be renamed in
similar fashion.


You mean something like numeric_poly_sum instead of numeric_int16_sum? I
personally am not fond of either name. While numeric_int16_* incorrectly
implies we have a int16 SQL type numeric_poly_* does not tell us that
this is an optimized version which uses a smaller state.


Yes that's what I mean, since the int16 in the name is misleading given 
that in at least some builds the int16 won't be used. You could always 
have numeric function, int16 function and the poly function which 
decides between them but that's probably overkill.




The worst part of writing this patch has always been naming functions
and types. :)


Naming is hard :)


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-09 Thread Petr Jelinek

On 09/03/15 18:39, David Fetter wrote:

On Mon, Mar 09, 2015 at 01:39:04PM +0100, Andreas Karlsson wrote:

On 03/07/2015 07:18 PM, Petr Jelinek wrote:


What I am wondering is if those numeric_int16_* functions that also deal
with either the Int128AggState or NumericAggState should be renamed in
similar fashion.


You mean something like numeric_poly_sum instead of numeric_int16_sum? I
personally am not fond of either name. While numeric_int16_* incorrectly
implies we have a int16 SQL type numeric_poly_* does not tell us that this
is an optimized version which uses a smaller state.


Would it be simpler to write a separate patch to add an int16 SQL type
so that this implication is correct?



No, because then you'd need to emulate the type on platforms where it 
does not exist and the patch would be several times more complex for no 
useful reason.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-09 Thread Petr Jelinek

On 09/03/15 04:51, Amit Kapila wrote:

On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 05/03/15 09:21, Amit Kapila wrote:
 
  On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com
  mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote:
   
   
I didn't add the whole page visibility caching as the tuple ids we
  get from sampling methods don't map well to the visibility info we get
  from heapgetpage (it maps to the values in the rs_vistuples array not to
  to its indexes). Commented about it in code also.
   
 
  I think we should set pagemode for system sampling as it can
  have dual benefit, one is it will allow us caching tuples and other
  is it can allow us pruning of page which is done in heapgetpage().
  Do you see any downside to it?
 
 
  Double checking for tuple visibility is the only downside I can think
of.

That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.



What's the point of pagemode then if the caller code does the visibility 
checks still one by one on each call. I thought one of the points of 
pagemode was to do this in one step (and one buffer lock).


And if the caller will try to do it in one step and cache the visibility 
info then we'll end up with pretty much same structure as rs_vistuples - 
there isn't saner way to cache this info other than ordered vector of 
tuple offsets, unless we assume that most pages have close to 
MaxOffsetNumber of tuples which they don't, so why not just use the 
heapgetpage directly and do the binary search over rs_vistuples.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-07 Thread Petr Jelinek

On 04/03/15 23:51, Andreas Karlsson wrote:

On 01/29/2015 12:28 AM, Peter Geoghegan wrote:

* I'm not sure about the idea of polymorphic catalog functions (that
return the type internal, but the actual struct returned varying
based on build settings).

I tend to think that things would be better if there was always a
uniform return type for such internal type returning functions, but
that *its* structure varied according to the availability of int128
(i.e. HAVE_INT128) at compile time. What we should probably do is have
a third aggregate struct, that encapsulates this idea of (say)
int2_accum() piggybacking on one of either Int128AggState or
NumericAggState directly. Maybe that would be called PolyNumAggState.
Then this common code is all that is needed on both types of build (at
the top of int4_accum(), for example):

PolyNumAggState *state;

state = PG_ARGISNULL(0) ? NULL : (PolyNumAggState *)
PG_GETARG_POINTER(0);

I'm not sure if I'd do this with a containing struct or a simple
#ifdef HAVE_INT128 typedef #else ... , but I think it's an
improvement either way.


Not entirely sure exactly what I mean but I have changed to something
like this, relying on typedef, in the attached version of the patch. I
think it looks better after the changes.



I think this made the patch much better indeed.

What I am wondering is if those numeric_int16_* functions that also deal 
with either the Int128AggState or NumericAggState should be renamed in 
similar fashion.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-07 Thread Petr Jelinek

On 05/03/15 09:21, Amit Kapila wrote:

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
 
  I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.
 

I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?


Double checking for tuple visibility is the only downside I can think 
of. Plus some added code complexity of course. I guess we could use 
binary search on rs_vistuples (it's already sorted) so that info won't 
be completely useless. Not sure if worth it compared to normal 
visibility check, but not hard to do.


I personally don't see the page pruning in TABLESAMPLE as all that 
important though, considering that in practice it will only scan tiny 
portion of a relation and usually one that does not get many updates (in 
practice the main use-case is BI).




Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej


Ah bitrot over time.



2.
Few warnings in code (compiled on windows(msvc))
1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'



I think this is just compiler stupidity but hopefully fixed (I don't 
have msvc to check for it and other compilers I tried don't complain).



3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+TableSampleClause *tablesample)
{
..
+/*
+* Page at a time mode is useless for us as we need to check visibility
+* of tuples individually because tuple offsets returned by sampling
+* methods map to rs_vistuples values and not to its indexes.
+*/
+node-ss.ss_currentScanDesc-rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c



Yeah I agree, I taught the heap_beginscan_strat about it as that one is 
the advanced API.




4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+*tuples = path-rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.



I am not sure how we would know what size would the tuples have in the 
random blocks that we are going to read later. That said, I am sure that 
costing can be improved even if I don't know how myself.




6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+| REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.


Yes it can be moved to type_func_name_keyword which is not all that much 
better but at least something. I did try to play with this already 
during first submission but could not find a way to move it to something 
less restrictive.


I could not even pinpoint what exactly is the shift/reduce issue except 
that it has something to do with the fact that the REPEATABLE clause is 
optional (at least I didn't have the problem when it wasn't optional).




On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
 ;

  opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *)
$3; }
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *)
$3

Re: [HACKERS] proposal: searching in array function - array_position

2015-03-07 Thread Petr Jelinek

On 22/02/15 12:19, Pavel Stehule wrote:



2015-02-22 3:00 GMT+01:00 Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com:

On 28/01/15 08:15, Pavel Stehule wrote:



2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com
mailto:jim.na...@bluetreble.com:

 On 1/27/15 4:36 AM, Pavel Stehule wrote:


 It is only partially identical - I would to use cache for
 array_offset, but it is not necessary for array_offsets ..
 depends how we would to modify current API to support
externally
 cached data.


 Externally cached data?


Some from these functions has own caches for minimize access to
typcache
(array_map, array_cmp is example). And in first case, I am trying to
push these information from fn_extra, in second case I don't do it,
because I don't expect a repeated call (and I am expecting so
type cache
will be enough).


You actually do caching via fn_extra in both case and I think that's
the correct way, and yes that part can be moved common function.

I also see that the documentation does not say what is returned by
array_offset if nothing is found (it's documented in code but not in
sgml).


rebased + fixed docs



You still duplicate the type cache code, but many other array functions 
do that too so I will not hold that against you. (Maybe somebody should 
write separate patch that would put all that duplicate code into common 
function?)


I think this patch is ready for committer so I am going to mark it as such.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mogrify and indent features for jsonb

2015-03-01 Thread Petr Jelinek

On 23/02/15 18:15, Dmitry Dolgov wrote:

Hi, Petr, thanks for the review.

  I think it would be better if the ident printing didn't put the
start of array ([) and start of dictionary ({) on separate line
Did you mean this?

 [
 {
 a: 1,
 b: 2
 }
 ]

I tried to verify this in several ways (http://jsonprettyprint.com/,
JSON.stringify, json.too from python), and I always get this result
(new line after ([) and ({) ).


No, I mean new lines before the ([) and ({) - try pretty printing 
something like '{a:[b, c], d: {e:f}}' using that tool you 
pasted and see what your patch outputs to see what I mean.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Petr Jelinek

On 22/02/15 09:57, Andres Freund wrote:

On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote:

On 16/02/15 10:46, Andres Freund wrote:

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:


At a quick glance, this basic design seems workable. I would suggest
expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
small price to pay, to make it work more like everything else in the system.


I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.



I agree that limiting the overhead is important.

But I have related though about this - I wonder if it's worth to try to map
this more directly to the CommitTsNodeId.


Maybe. I'd rather go the other way round though;
replication_identifier.c/h's stuff seems much more generic than
CommitTsNodeId.



Probably not more generic but definitely nicer as the nodes are named 
and yes it has richer API.



I mean it is currently using that
for the actual storage, but I am thinking of having the Replication
Identifiers be the public API and the CommitTsNodeId stuff be just hidden
implementation detail. This thought is based on the fact that CommitTsNodeId
provides somewhat meaningless number while Replication Identifiers give us
nice name for that number. And if we do that then the type should perhaps be
same for both?


I'm not sure. Given that this is included in a significant number of
recordsd I'd really rather not increase the overhead as described
above. Maybe we can just limit CommitTsNodeId to the same size for now?



That would make sense.

I also wonder about the default nodeid infrastructure the committs 
provides. I can't think of use-case which it can be used for and which 
isn't solved by replication identifiers - in the end the main reason I 
added that infra was to make it possible to write something like 
replication identifiers as part of an extension. In any case I don't 
think the default nodeid can be used in parallel with replication 
identifiers since one will overwrite the SLRU record for the other. 
Maybe it's enough if this is documented but I think it might be better 
if this patch removed that default committs nodeid infrastructure. It's 
just few lines of code which nobody should be using yet.


Thinking about this some more and rereading the code I see that you are 
setting TransactionTreeSetCommitTsData during xlog replay, but not 
during transaction commit, that does not seem correct as the local 
records will not have any nodeid/origin.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-02-22 Thread Petr Jelinek

On 23/02/15 03:24, Robert Haas wrote:

On Sat, Feb 21, 2015 at 11:29 PM, Petr Jelinek p...@2ndquadrant.com wrote:

I am wondering a bit about interaction with wal_keep_segments.
One thing is that wal_keep_segments is still specified in number of segments
and not size units, maybe it would be worth to change it also?
And the other thing is that, if set, the wal_keep_segments is the real
max_wal_size from the user perspective (not from perspective of the
algorithm in this patch, but user does not really care about that) which is
somewhat weird given the naming.


It seems like wal_keep_segments is more closely related to
wal_*min*_size.  The idea of both settings is that each is a minimum
amount of WAL we want to keep around for some purpose.  But they're
not quite the same, I guess, because wal_min_size just forces us to
keep that many files around - they can be overwritten whenever.
wal_keep_segments is an amount of actual WAL data we want to keep
around.


Err yes of course, min not max :)



Would it make sense to require that wal_keep_segments = wal_min_size?



It would to me, the patch as it stands is confusing in a sense that you 
can set min and max but then wal_keep_segments somewhat overrides those.


And BTW this brings another point, I actually don't see check for 
min_wal_size = max_wal_size anywhere in the patch.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-21 Thread Petr Jelinek

On 21/02/15 22:09, Andrew Dunstan wrote:


On 02/16/2015 09:05 PM, Petr Jelinek wrote:


I found one more issue with the 1.2--1.3 upgrade script, the DROP
FUNCTION pg_stat_statements(); should be DROP FUNCTION
pg_stat_statements(bool); since in 1.2 the function identity has changed.





I think all the outstanding issues are fixed in this patch.



I think PGSS_FILE_HEADER should be also updated, otherwise it's good.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-02-21 Thread Petr Jelinek

On 13/02/15 18:43, Heikki Linnakangas wrote:


Ok, I don't hear any loud objections to min_wal_size and max_wal_size,
so let's go with that then.

Attached is a new version of this. It now comes in four patches. The
first three are just GUC-related preliminary work, the first of which I
posted on a separate thread today.




The 0001 patch is very nice, I would go ahead and commit it.

Not really sure I see the need for 0002 but it should not harm anything 
so why not.


The 0003 should be part of 0004 IMHO as it does not really do anything 
by itself.


I am wondering a bit about interaction with wal_keep_segments.
One thing is that wal_keep_segments is still specified in number of 
segments and not size units, maybe it would be worth to change it also?
And the other thing is that, if set, the wal_keep_segments is the real 
max_wal_size from the user perspective (not from perspective of the 
algorithm in this patch, but user does not really care about that) which 
is somewhat weird given the naming.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: searching in array function - array_position

2015-02-21 Thread Petr Jelinek

On 28/01/15 08:15, Pavel Stehule wrote:



2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com
mailto:jim.na...@bluetreble.com:

On 1/27/15 4:36 AM, Pavel Stehule wrote:


It is only partially identical - I would to use cache for
array_offset, but it is not necessary for array_offsets ..
depends how we would to modify current API to support externally
cached data.


Externally cached data?


Some from these functions has own caches for minimize access to typcache
(array_map, array_cmp is example). And in first case, I am trying to
push these information from fn_extra, in second case I don't do it,
because I don't expect a repeated call (and I am expecting so type cache
will be enough).



You actually do caching via fn_extra in both case and I think that's the 
correct way, and yes that part can be moved common function.


I also see that the documentation does not say what is returned by 
array_offset if nothing is found (it's documented in code but not in sgml).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-02-21 Thread Petr Jelinek
Now that the issue with padding seems to no longer exists since the 
patch works both with and without padding, I went through the code and 
here are some comments I have (in no particular order).


In CheckPointReplicationIdentifier:

+ * FIXME: Add a CRC32 to the end.


The function already does it (I guess you forgot to remove the comment).

Using max_replication_slots as limit for replication_identifier states 
does not really make sense to me as replication_identifiers track remote 
info while and slots are local and in case of master-slave replication 
you need replication identifiers but don't need slots.


In bootstrap.c:

 #define MARKNOTNULL(att) \
((att)-attlen  0 || \
 (att)-atttypid == OIDVECTOROID || \
-(att)-atttypid == INT2VECTOROID)
+(att)-atttypid == INT2VECTOROID || \
+strcmp(NameStr((att)-attname), riname) == 0 \
+   )


Huh? Can this be solved in a nicer way?

Since we call XLogFlush with local_lsn as parameter, shouldn't we check 
that it's actually within valid range?

Currently we'll get errors like this if set to invalid value:
ERROR:  xlog flush request 123/123 is not satisfied --- flushed only to 
0/168FB18


In AdvanceReplicationIndentifier:

+   /*
+* XXX: should we restore into a hashtable and dump into shmem only 
after
+* recovery finished?
+*/


Probably no given that the function is also callable via SQL interface.

As I wrote in another email, I would like to integrate the RepNodeId and 
CommitTSNodeId into single thing.


There are no docs for the new sql interfaces.

The replication_identifier.c might deserve some intro/notes text.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bootstrap DATA is a pita

2015-02-20 Thread Petr Jelinek

On 21/02/15 04:22, Peter Eisentraut wrote:

I violently support this proposal.


Maybe something rougly like:

# pg_type.data
CatalogData(
 'pg_type',
 [
  {
 oid = 2249,
 data = {typname = 'cstring', typlen = -2, typbyval = 1, fake = 
'...'},
 oiddefine = 'CSTRINGOID'
  }
 ]
);


One concern I have with this is that in my experience different tools
and editors have vastly different ideas on how to format these kinds of
nested structures.  I'd try out YAML, or even a homemade fake YAML over
this.



+1 for the idea and +1 for YAML(-like) syntax.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: disallow operator = and use it for named parameters

2015-02-19 Thread Petr Jelinek

On 19/01/15 17:14, Pavel Stehule wrote:



2015-01-19 14:27 GMT+01:00 Robert Haas robertmh...@gmail.com
mailto:robertmh...@gmail.com:

On Mon, Jan 19, 2015 at 2:59 AM, Pavel Stehule
pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com wrote:
 I think you should just remove the WARNING, not change it to an error.
 If somebody wants to quote the operator name to be able to continue
 using it, I think that's OK.

 It looks so quoting doesn't help here

 + CREATE OPERATOR = (
 +leftarg = int8,-- right unary
 +procedure = numeric_fac
 + );
 + ERROR:  syntax error at or near (
 + LINE 1: CREATE OPERATOR = (
 +  ^

Well then the error check is just dead code.  Either way, you don't
need it.


yes, I removed it



I am marking this as Ready For Committer, the patch is trivial and works 
as expected, there is nothing to be added to it IMHO.


The = operator was deprecated for several years so it should not be 
too controversial either.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-02-18 Thread Petr Jelinek

On 16/02/15 10:46, Andres Freund wrote:

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:


At a quick glance, this basic design seems workable. I would suggest
expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
small price to pay, to make it work more like everything else in the system.


I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.



I agree that limiting the overhead is important.

But I have related though about this - I wonder if it's worth to try to 
map this more directly to the CommitTsNodeId. I mean it is currently 
using that for the actual storage, but I am thinking of having the 
Replication Identifiers be the public API and the CommitTsNodeId stuff 
be just hidden implementation detail. This thought is based on the fact 
that CommitTsNodeId provides somewhat meaningless number while 
Replication Identifiers give us nice name for that number. And if we do 
that then the type should perhaps be same for both?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-02-17 Thread Petr Jelinek

On 17/02/15 23:11, Robert Haas wrote:

On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek p...@2ndquadrant.com wrote:

sending new version that is updated along the lines of what we discussed at
FOSDEM, which means:

- back to single bytea amdata column (no custom columns)




Well, the main argument is still future possibility of moving into 
single table for storage. And when we discussed about it in person we 
agreed that there is not too big advantage in having separate columns 
since there need to be dump/restore state interfaces anyway so you can 
see the relevant state via those as we made the output more human 
readable (and the psql support reflects that).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mogrify and indent features for jsonb

2015-02-17 Thread Petr Jelinek

Hi,

I looked at the patch and have several comments.

First let me say that modifying the individual paths of the json is the 
feature I miss the most in the current implementation so I am happy that 
this patch was submitted.


I would prefer slightly the patch split into two parts, one for the 
indent printing and one with the manipulation functions, but it's not 
too big patch so it's not too bad as it is.


There is one compiler warning that I see:
jsonfuncs.c:3533:1: warning: no previous prototype for 
‘jsonb_delete_path’ [-Wmissing-prototypes]

 jsonb_delete_path(PG_FUNCTION_ARGS)

I think it would be better if the ident printing didn't put the start of 
array ([) and start of dictionary ({) on separate line since most 
pretty printing implementations outside of Postgres (like the 
JSON.stringify or python's json module) don't do that. This is purely 
about consistency with the rest of the world.


The json_ident might be better named as json_pretty perhaps?

I don't really understand the point of h_atoi() function. What's wrong 
with using strtol like pg_atoi does? Also there is no integer overflow 
check anywhere in that function.


There is tons of end of line whitespace mess in jsonb_indent docs.

Otherwise everything I tried so far works as expected. The code looks ok 
as well except maybe the replacePath could use couple of comments (for 
example the line which uses the h_atoi) to make it easier to follow.


About the:

+   /* XXX : why do we need this assertion? The functions is declared to 
take text[] */
+   Assert(ARR_ELEMTYPE(path) == TEXTOID);


I wonder about this also, some functions do that, some don't, I am not 
really sure what is the rule there myself.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-02-17 Thread Petr Jelinek

On 18/02/15 02:59, Petr Jelinek wrote:

On 17/02/15 23:11, Robert Haas wrote:

On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

sending new version that is updated along the lines of what we
discussed at
FOSDEM, which means:

- back to single bytea amdata column (no custom columns)




Well, the main argument is still future possibility of moving into
single table for storage. And when we discussed about it in person we
agreed that there is not too big advantage in having separate columns
since there need to be dump/restore state interfaces anyway so you can
see the relevant state via those as we made the output more human
readable (and the psql support reflects that).



Also makes the patch a little simpler obviously.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-17 Thread Petr Jelinek

On 17/02/15 16:12, Andres Freund wrote:

On 2015-02-17 15:50:39 +0100, Petr Jelinek wrote:

On 17/02/15 03:07, Petr Jelinek wrote:

On 17/02/15 03:03, Andrew Dunstan wrote:

On 02/16/2015 08:57 PM, Andrew Dunstan wrote:

Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592



So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in
the reporting function. So it looks like the difference here should be
noise. Maybe we need some longer runs.



Yes there are variations between individual runs so it might be really
just that, I can leave it running for much longer time tomorrow.



Ok so I let it run for more than hour on a different system, the difference
is negligible - 14461 vs 14448 TPS. I think there is bigger difference
between individual runs than between the two versions...


These numbers sound like you measured them without concurrency, am I
right? If so, the benchmark isn't that interesting - the computation
happens while a spinlock is held, and that'll mainly matter if there are
many backends running at the same time.



It's pgbench -j16 -c32


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-17 Thread Petr Jelinek

On 17/02/15 03:07, Petr Jelinek wrote:

On 17/02/15 03:03, Andrew Dunstan wrote:


On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in
the reporting function. So it looks like the difference here should be
noise. Maybe we need some longer runs.



Yes there are variations between individual runs so it might be really
just that, I can leave it running for much longer time tomorrow.



Ok so I let it run for more than hour on a different system, the 
difference is negligible - 14461 vs 14448 TPS. I think there is bigger 
difference between individual runs than between the two versions...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Petr Jelinek

On 21/01/15 17:32, Andrew Dunstan wrote:


On 01/21/2015 11:21 AM, Arne Scheffer wrote:




Why is it a bad thing to call the column stddev_samp analog to the
aggregate function or make a note in the documentation, that the
sample stddev is used to compute the solution?


...

But I will add a note to the documentation, that seems reasonable.



I agree this is worth mentioning in the doc.

In any case here some review from me:

We definitely want this feature, I wished to have this info many times.

The patch has couple of issues though:

The 1.2 to 1.3 upgrade file is named pg_stat_statements--1.2-1.3.sql and 
should be renamed to pg_stat_statements--1.2--1.3.sql (two dashes).


There is new sqrtd function for fast sqrt calculation, which I think is 
a good idea as it will reduce the overhead of the additional calculation 
and this is not something where little loss of precision would matter 
too much. The problem is that the new function is actually not used 
anywhere in the code, I only see use of plain sqrt.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Petr Jelinek

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Petr Jelinek

On 17/02/15 03:03, Andrew Dunstan wrote:


On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in
the reporting function. So it looks like the difference here should be
noise. Maybe we need some longer runs.



Yes there are variations between individual runs so it might be really 
just that, I can leave it running for much longer time tomorrow.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Petr Jelinek

On 17/02/15 02:57, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



I think so too.

I found one more issue with the 1.2--1.3 upgrade script, the DROP 
FUNCTION pg_stat_statements(); should be DROP FUNCTION 
pg_stat_statements(bool); since in 1.2 the function identity has changed.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-15 Thread Petr Jelinek

On 13/02/15 14:04, Petr Jelinek wrote:

On 13/02/15 08:48, Michael Paquier wrote:


Looking at this patch, I don't see what we actually gain much here
except a decoder plugin that speaks a special protocol for a special
background worker that has not been presented yet. What actually is the
value of that defined as a contrib/ module in-core. Note that we have
already test_decoding to basically test the logical decoding facility,
used at least at the SQL level to get logical changes decoded.

Based on those reasons I am planning to mark this as rejected (it has no
documentation as well). So please speak up if you think the contrary,
but it seems to me that this could live happily out of core.


I think you are missing point of this, it's not meant to be committed in
this form at all and even less as contrib module. It was meant as basis
for in-core logical replication discussion, but sadly I didn't really
have time to pursue it in this CF in the end.



That being said and looking at the size of February CF, I think I am 
fine with dropping this in 9.5 cycle, it does not seem likely that there 
will be anything useful done with this fast enough to get to 9.5 so 
there is no point in spending committer resources on it in final CF.


I will pick it up again after the CF is done.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-13 Thread Petr Jelinek

On 13/02/15 08:48, Michael Paquier wrote:



On Mon, Dec 22, 2014 at 10:26 PM, Robert Haas robertmh...@gmail.com
mailto:robertmh...@gmail.com wrote:

On Fri, Dec 19, 2014 at 8:40 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 What I hope to get from this is agreement on the general approach and
 protocol so that we can have common base which will both make it easier to
 create external logical replication solutions and also eventually lead to
 full logical replication inside core PostgreSQL.

The protocol is a really important topic which deserves its own
discussion.  Andres has mentioned some of the ideas he has in mind -
which I think are similar to what you did here - but there hasn't
really been a thread devoted to discussing that topic specifically.  I
think that would be a good idea: lay out what you have in mind, and
why, and solicit feedback.


Looking at this patch, I don't see what we actually gain much here
except a decoder plugin that speaks a special protocol for a special
background worker that has not been presented yet. What actually is the
value of that defined as a contrib/ module in-core. Note that we have
already test_decoding to basically test the logical decoding facility,
used at least at the SQL level to get logical changes decoded.

Based on those reasons I am planning to mark this as rejected (it has no
documentation as well). So please speak up if you think the contrary,
but it seems to me that this could live happily out of core.


I think you are missing point of this, it's not meant to be committed in 
this form at all and even less as contrib module. It was meant as basis 
for in-core logical replication discussion, but sadly I didn't really 
have time to pursue it in this CF in the end.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-02-03 Thread Petr Jelinek

On 03/02/15 16:50, Robert Haas wrote:

On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


That's the whole point of this patch. max_checkpoint_segments = 10, or
max_checkpoint_segments = 160 MB, means that the system will begin a
checkpoint so that when the checkpoint completes, and it truncates away or
recycles old WAL, the total size of pg_xlog is 160 MB.

That's different from our current checkpoint_segments setting. With
checkpoint_segments, the documented formula for calculating the disk usage
is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's
a lot less intuitive to set.


Hmm, that's different from what I was thinking.  We probably shouldn't
call that max_checkpoint_segments, then.  I got confused and thought
you were just trying to decouple the number of segments that it takes
to trigger a checkpoint from the number we keep preallocated.

But I'm confused: how can we know how much new WAL will be written
before the checkpoint completes?



The preallocation is based on estimated size of next checkpoint which is 
basically running average of the previous checkpoints with some 
additional adjustments for unsteady behavior (last checkpoint has higher 
weight in the formula).


(we also still internally have the CheckPointSegments which is 
calculated the way Heikki described above)


In any case, I don't like the max_checkpoint_segments naming too much, 
and I don't even like the number of segments as limit too much, I think 
the ability to set this in actual size is quite nice property of this 
patch and as Heikki says the numbers don't map that well to the old ones 
in practice.


I did some code reading and I do like the patch. Basically only negative 
thing I can say is that I am not big fan of _logSegNo variable name but 
that's not new in this patch, we use it all over the place in xlog.


I would vote for bigger default of the checkpoint_wal_size (or whatever 
it will be named) though, since the current one is not much bigger in 
practice than what we have now and that one is way too conservative.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders

2015-02-03 Thread Petr Jelinek

On 03/02/15 13:51, Magnus Hagander wrote:

On Tue, Feb 3, 2015 at 1:43 PM, Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com wrote:

Hi,

I think these days there's no reason for the split between the archive
and hot_standby wal levels. The split was made out of volume and
stability concerns. I think we can by now be confident about the
wal_level = hot_standby changes (note I'm not proposing hot_standby =
on).

So let's remove the split. It just gives users choice between two
options
that don't have a meaningful difference.


+1.



+1 too



Additionally I think we should change the default for wal_level to
hot_standby and max_wal_senders (maybe to 5). That way users can use
pg_basebackup and setup streaming standbys without having to restart the
primary. I think that'd be a important step in making setup easier.


Yes, please!

Those who want to optimize their WAL size can set it back to minimal,
but let's make the default the one that makes life *easy* for people.

The other option, which would be more complicated (I have a
semi-finished patch that I never got time to clean up) would be for
pg_basebackup to be able to dynamically raise the value of wal_level
during it's run. It would not help with the streaming standby part, but
it would help with pg_basebackup. That could be useful independent - for
those who prefer using wal_level=minimal and also pg_basebackup..




This is not that easy to do, let's do it one step at a time.



Comments?

Additionally, more complex and further into the future, I wonder if we
couldn't also get rid of wal_level = logical by automatically switching
to it whenever logical slots are active.



If it can be safely done online, I definitely think that would be a good
goal to have. If we could do the same for hot_standby if you had
physical slots, that might also be a good idea?



+many for the logical, physical would be nice but I think it's again in 
the category of not so easy and maybe better as next step if at all.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-31 Thread Petr Jelinek

On 31/01/15 14:27, Amit Kapila wrote:

On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:

On 19/01/15 07:08, Amit Kapila wrote:

On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
p...@2ndquadrant.com mailto:p...@2ndquadrant.com
mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote:


I think that's actually good to have, because we still do costing
and the partial index might help produce better estimate of number
of returned rows for tablesample as well.


I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.



Oh I think we were talking abut different things then, I thought you 
were talking about the index checks that planner/optimizer sometimes 
does to get more accurate statistics. I'll take another look then.




Well similar, not same as we are not always fetching whole page or
doing visibility checks on all tuples in the page, etc. But I don't
see why it can't be inside nodeSamplescan. If you look at bitmap
heap scan, that one is also essentially somewhat modified sequential
scan and does everything within the node nodeBitmapHeapscan because
the algorithm is not used anywhere else, just like sample scan.


I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.


Yeah I can split it to separate function within the nodeSamplescan, that 
sounds reasonable.





Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.


Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.


Have you taken care of this in your latest patch?


Not yet. I think I will need to make the strategy property of the 
sampling method instead of returning it by costing function so that the 
info can be used by the scan.




Oh and BTW when I delete 50k of tuples (make them invisible) the
results of 20 runs are between 30880 and 40063 rows.


This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.



No, that's because I can't write properly, the lower number was supposed 
to be 39880 which is well within the tolerance, sorry for the confusion 
 (9 and 0 are just too close).



Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access
the tuple contents as I can imagine sampling methods that will want
to do that.

+/*
+* Let the sampling method examine the actual tuple and decide if we
+* should return it.
+*
+* Note that we let it examine even invisible tuples.
+*/
+if (OidIsValid(node-tsmreturntuple.fn_oid))
+{
+found = DatumGetBool(FunctionCall4(node-tsmreturntuple,
+  PointerGetDatum
(node),
+  UInt32GetDatum
(blockno),
+  PointerGetDatum
(tuple),
+  BoolGetDatum
(visible)));
+/* XXX: better error */
+if (found  !visible)
+elog(ERROR, Sampling method wanted to return invisible tuple);
+}

You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.



Well I think returning invisible tuples to user is bad idea and that's 
why the check, but I also think it might make sense to examine the 
invisible tuple for the sampling function in case it wants to create 
some kind of stats about the scan and wants to use those for making the 
decision about returning other tuples. The interface should be probably 
called tsmexaminetuple instead to make it more clear what the intention is.



1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?


Right, will add.



2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+/*
+* Lock the buffer so we can safely assess tuple
+* visibility later.
+*/
+LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}

When is this content lock released, shouldn't we release it after
checking visibility of tuple?


Here,
+   if (!OffsetNumberIsValid(tupoffset))
+   {
+   UnlockReleaseBuffer(buffer);

but yes you are correct, it should be just released there and we can 
unlock already after visibility check.




3.
+static TupleTableSlot *
+SampleNext(SampleScanState

Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:


As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where 
a  5;
QUERY PLAN

  Sample Scan on t1  (cost=0.00..301.00 rows=1 width=4) (actual 
time=0.015..2
.920 rows=4294 loops=1)
Filter: (a  5)
Rows Removed by Filter: 5876
Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel-tuples,
not using baserel-rows so it doesn't reflect the scan quals. Did
the code come from some other intention?



No, that's a bug.


 4.
 SampleNext()
 a.
 Isn't it better to code SampleNext() similar to SeqNext() and
 IndexNext(), which encapsulate the logic to get the tuple in
 another function(tbs_next() or something like that)?


 Possibly, my thinking was that unlike the index_getnext() and
 heap_getnext(), this function would not be called from any other
 place so adding one more layer of abstraction didn't seem useful.
 And it would mean creating new ScanDesc struct, etc.


I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.


Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.


As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.



Yes, that's my view too. I would generally be for that change also and 
it would be worth it if the code was used in more than one place, but as 
it is it seems like it will just add code/complexity for no real 
benefit. It would make sense in case we used sequential scan node 
instead of the new node as Amit also suggested, but I remain unconvinced 
that mixing sampling and sequential scan into single scan node would be 
a good idea.




How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.



I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.


Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.



Yes it does, but as I am failing to explain even here, it's not clear to 
me what to write there. From my point of view it's just effect of the 
essential property of bernoulli sampling which is that every element has 
equal probability of being included in the sample. It comes from the 
fact that we do bernoulli trial (in the code it's the while 
(sampler_random_fract(sampler-randstate)  probability) loop) on every 
individual tuple.


This means that the ratio of visible and invisible tuples should be same 
in the sample as it is in the relation. We then just skip the invisible 
tuples and get the correct percentage of the visible ones (this has 
performance benefit of not having to do visibility check on all tuples).


If that wasn't true than the bernoulli sampling would just simply not 
work as intended as the same property is the reason why it's used in 
statistics - the trends should look the same in sample as they look in 
the source population.


Obviously there is some variation in practice as we don't have perfect 
random generator but that's independent of the algorithm.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7

Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 08:23, Kyotaro HORIGUCHI wrote:

Hi, I took a look on this and found nice.

By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.



It wasn't my intention to support it, but you are correct, the code is 
generic enough that we can support it.



The following change seems enough.



Seems about right, thanks.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-28 Thread Petr Jelinek

On 28/01/15 18:09, Heikki Linnakangas wrote:

On 01/23/2015 02:34 AM, Petr Jelinek wrote:

On 22/01/15 17:02, Petr Jelinek wrote:


The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...


Here it is as promised.


I generally like the division of labour between the seqam
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create
confusion. And it's not really enough for something like a remote
sequence AM anyway that definitely needs some extra reloptions, like the
hostname of the remote server. The default should be 'local', unless you
specify something else with CREATE SEQUENCE USING - no GUCs.



Hmm, I am not too happy about this, I want SERIAL to work with custom 
sequences (as long as it's possible for the sequence to work that way at 
least). That's actually important feature for me. Your argument about 
this being potential problem for some sequenceAMs is valid though.



Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number
instead of checking whether pg_sequence_dump_state() function exists.
That's what we usually do when new features are introduced. And there's
actually a bug there: you have the check backwards. (try dumping a
database with any sequences in it; it fails)


Right.


* pg_dump should not output a USING clause when a sequence uses the
'local' sequence. No point in adding such noise - making the SQL command
not standard-compatible - for the 99% of databases that don't use other
sequence AMs.


Well this only works if we remove the GUC. Because otherwise if GUC is 
there then you always need to either add USING clause or set the GUC in 
advance (like we do with search_path for example).



* Be careful to escape all strings correctly in pg_dump. I think you're
missing escaping for the 'state' variable at least.


Ouch.


In sequence_save_tuple:

else
{
/*
 * New tuple was not sent, so the original tuple was probably
just
 * changed inline, all we need to do is mark the buffer dirty and
 * optionally log the update tuple.
 */
START_CRIT_SECTION();

MarkBufferDirty(seqh-buf);

if (do_wal)
log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, page);

END_CRIT_SECTION();
}


This is wrong when checksums are enabled and !do_wal. I believe this
should use MarkBufferDirtyHint().



Oh ok, didn't realize.


Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.


Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,


Yeah but at worst there are some unused rows there, it's not too bad. I 
could also create relation per sequence so that dependency code would 
handle everything correctly, but seems bit too expensive to create not 
one but two relations per sequence...



updating a row in a table for every nextval() call is pretty darn
expensive.


Yes it's expensive, but the gapless sequence also serializes access so 
it will never be speed daemon.



But I don't actually see a problem with dump and restore.


The problem is that the tuple stored in the sequence relation is always 
the one with latest state (and with frozen xid), so pg_dump has no way 
of getting the value as it was at the time it took the snapshot. This 
means that after dump/restore cycle, the sequence can be further away 
than the table it's attached to and you end up with a gap there.




Can you rewrite it without using the values table? AFAICS, there are a
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old
value. I suggest keeping two values in the sequence tuple: the old,
definitely committed one, and the last value. The last value is only
considered current if the associated XID committed; otherwise the old
value is current. When a transaction is about to commit, it stores its
top-level XID and the new value in the last field, and copies the
previously current value to the old field.



Yes, this is how the previous implementation worked.


2. You need to track the last value on a per-subtransaction basis, until
the transaction commits/rolls back, in order to have rollback to
savepoint to retreat the sequence's value. That can be done in
backend-private memory, maintaining a stack of subtransactions and the
last value of each. Use RegisterSubXactCallback to hook into subxact
commit/abort. Just before top-level commit (in pre-commit callback),
update the sequence tuple with the latest value, so that it gets
WAL-logged

Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Petr Jelinek

On 27/01/15 00:51, Andres Freund wrote:

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.


Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.


I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.



IIRC the DatumGetFloat8 was quite visible in the perf when I was writing 
the array version of width_bucket. It was one of the motivations for 
making special float8 version since not having to call it had 
significant effect. Sadly I don't remember if it was the function call 
itself or the conversion anymore.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Petr Jelinek

On 22/01/15 16:50, Heikki Linnakangas wrote:

On 01/12/2015 11:33 PM, Petr Jelinek wrote:

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.


Do we need DDL commands for this at all? I could go either way on that
question. We recently had a discussion on that wrt. index access methods
[1], and Tom opined that providing DDL for creating index access methods
is not worth it. The extension can just insert the rows into pg_seqam
with INSERT. Do we expect sequence access methods as extensions to be
more popular than index access methods? Maybe, because the WAL-logging
problem doesn't exist. But OTOH, if you're writing something like a
replication system that needs global sequences as part of it, there
aren't that many of those, and the installation scripts will need to
deal with more complicated stuff than inserting a row in pg_seqam.


I don't know about popularity, and I've seen the discussion about 
indexes. The motivation for DDL for me was handling dependencies 
correctly, that's all. If we say we don't care about that (and allow 
DROP EXTENSION even though user has sequences that are using the AM) 
then we don't need DDL.




[1] http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us


And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.


I'll take a look at that to see how the API works, but we're not going
to include it in the source tree, because it doesn't actually guarantee
gaplessness. That makes it a pretty dangerous example. I'm sure we can
come up with a better example that might even be useful. How about a
Lamport's clock sequence, which advances once per second, in addition to
when anyone calls nextval() ? Or a remote sequence that uses an FDW to
call nextval() in a foreign server?



I have updated patch ready, just didn't submit it because I am otherwise 
busy this week, I hope to get to it today evening or tomorrow morning, 
so I'd wait until that with looking at the patch.


The new version (the one that is not submitted yet) of gapless sequence 
is way more ugly and probably not best example either but does guarantee 
gaplessness (it stores the last value in it's own value table). So I am 
not sure if it should be included either...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-22 Thread Petr Jelinek

On 23/01/15 00:40, Andreas Karlsson wrote:


- Renamed some things from int12 to int128, there are still some places
with int16 which I am not sure what to do with.



I'd vote for renaming them to int128 too, there is enough C functions 
that user int16 for 16bit integer that this is going to be confusing 
otherwise.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-22 Thread Petr Jelinek

On 05/01/15 17:50, Petr Jelinek wrote:

On 05/01/15 16:17, Petr Jelinek wrote:

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com
wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed
pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the
standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the
standby

BTW, at the step #4, I got the following log messages. This might be
a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did
it though.



Added more comments and made the error message bit clearer.



Fujii, Alvaro, did one of you had chance to look at this fix?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-17 Thread Petr Jelinek

On 17/01/15 13:46, Amit Kapila wrote:

On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
 
  In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
 

I have looked into this patch and would like to share my
findings with you.


That's a lot for this.



1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/*
+* There is only one plan to consider but we still need to set
+* parameters for RelOptInfo.
+*/
+set_cheapest(rel);
}

It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?


Ah, this changed after I started working on this patch and I didn't 
notice - previously all the set_something_rel_pathlist called 
set_cheapest() individually. I will change the code.




2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/* We only do sample scan if it was requested */
+add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}

Do we need to add_path, if there is only one path?


Good point, we can probably just set the pathlist directly in this case.



3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+else if (rte-tablesample != NULL)
+{
+/* Build sample scan on relation */
+set_tablesample_rel_pathlist(root, rel, rte);
+}

Have you consider to have a different RTE for this?



I assume you mean different RTEKind, yes I did, but the fact is that 
it's still a relation, just with different scan type and I didn't want 
to modify every piece of code which deals with RTE_RELATION to also deal 
with the new RTE for sample scan as it seems unnecessary. I am not 
strongly opinionated about this one though.



4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?


Possibly, my thinking was that unlike the index_getnext() and 
heap_getnext(), this function would not be called from any other place 
so adding one more layer of abstraction didn't seem useful. And it would 
mean creating new ScanDesc struct, etc.




b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).


Both good points, will add.


Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?


I don't follow this one tbh.



c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.


I don't think so, the way bernoulli works is that it returns every tuple 
with equal probability, so the visible tuples have same chance of being 
returned as the invisible ones so the issue should be smoothed away 
automatically (IMHO).


The acquire_sample_rows has limit on number of rows it returns so that's 
why it has to do the visibility check before as the problem you 
described applies there.


The reason for using the probability instead of tuple limit is the fact 
that there is no way to accurately guess number of tuples in the table 
at the beginning of scan. This method should actually be better at 
returning the correct number of tuples without dependence on how many of 
them are visible or not and how much space in blocks is used.




5.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id

   1
   3
   4
   7
   8
   9
(6 rows)


postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id

   0
   1
   2
   3
   4
   5
   6
   7
   8
   9
(10 rows)

So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?


I think this is caused by random generator not producing smooth enough 
random

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-14 Thread Petr Jelinek

On 14/01/15 14:24, Robert Haas wrote:

On Mon, Jan 5, 2015 at 8:43 PM, Peter Eisentraut pete...@gmx.net wrote:

I have previously argued for a different approach: Ready recovery.conf
as a configuration file after postgresql.conf, but keep it as a file to
start recovery.  That doesn't break any old workflows, it gets you all
the advantages of having recovery parameters in the GUC system, and it
keeps all the options open for improving things further in the future.


But this is confusing, too, because it means that if you set a
parameter in both postgresql.conf and recovery.conf, you'll end up
with the recovery.conf value of the parameter even after recovery is
done.  Until you restart, and then you won't.  That's weird.

I think your idea of adding read-only GUCs with the same names as all
of the recovey.conf parameters is a clear win.  Even if we do nothing
more than that, it makes the values visible from the SQL level, and
that's good.  But I think we should go further and make them really be
GUCs.  Otherwise, if you want to be able to change one of those
parameters other than at server startup time, you've got to invent a
separate system for reloading them on SIGHUP.  If you make them part
of the GUC mechanism, you can use that.  I think it's quite safe to
say that the whole reason we *have* a GUC mechanism is so that all of
our configuration can be done through one grand, unified mechanism
rather than being fragmented across many files.


This is basically what the patch which is in commitfest does no?



Some renaming might be in order.  Heikki previously suggested merging
all of the recovery_target_whatever settings down into a single
parameter recovery_target='kindofrecoverytarget furtherdetailsgohere',
like recovery_target='xid 1234' or recovery_target='name bob'.  Maybe
that would be more clear (or not).


I was thinking about this a lot myself while reviewing the patch too, 
seems strange to have multiple config parameters for what is essentially 
same thing, my thinking was similar except I'd use : as separator 
('kindofrecoverytarget:furtherdetailsgohere'). I think though while it 
is technically nicer to do this, it might hurt usability for users.



Maybe standby_mode needs a better name.


I actually think standby_mode should be merged with hot_standby (as in 
standby_mode = 'hot'/'warm'/'off' or something).



But I think the starting point for this discussion shouldn't be
why in the world would we merge recovery.conf into postgresql.conf?
but why, when we've otherwise gone to such trouble to push all of our
configuration through a single, unified mechanism that offers many
convenient features, do we continue to suffer our recovery.conf
settings to go through some other, less-capable mechanism?.



+1

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-13 Thread Petr Jelinek

On 13/01/15 13:24, Tomas Vondra wrote:

On 12.1.2015 22:33, Petr Jelinek wrote:

On 15/12/14 11:36, Petr Jelinek wrote:

On 10/12/14 03:33, Petr Jelinek wrote:

On 24/11/14 12:16, Heikki Linnakangas wrote:

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly


This version does AlterSequence differently and better. Didn't attach
the gapless sequence again as that one is unchanged.




And another version, separated into patch-set of 3 patches.
First patch contains the seqam patch itself, not many changes there,
mainly docs/comments related. What I wrote in the previous email for
version 3 still applies.


I did a review of the first part today - mostly by reading through the
diff. I plan to do a bit more thorough testing in a day or two. I'll
also look at the two (much smaller) patches.



Thanks!


comments/questions/general nitpicking:

(1) Why treating empty string as equal to 'local'? Isn't enforcing the
 actual value a better approach?



Álvaro mentioned on IM also, I already changed it to saner normal GUC 
with 'local' as default value in my working copy



(2) NITPICK: Maybe we could use access_method in the docs (instead of
 sequence_access_method), as the 'sequence' part is clear from the
 context?


Yes.


(3) Why index_reloptions / sequence_reloptions when both do exactly the
 same thing (call to common_am_reloptions)? I'd understand this if
 the patch(es) then change the sequence_reloptions() but that's not
 happening. Maybe that's expected to happen?


That's leftover from the original design where AM was supposed to call 
this, since this is not exposed to AM anymore I think it can be single 
function now.




(4) DOCS: Each sequence can only use one access method at a time.

 Does that mean a sequence can change the access method during it's
 lifetime? My understanding is that the AM is fixed after creating
 the sequence?



Oh, I forgot to add ALTER SEQUENCE USING into docs, you can change AM 
even though you probably don't want to do it often, but for migrations 
it's useful.



(8) check_default_seqam without a transaction

  * If we aren't inside a transaction, we cannot do database access so
  * cannot verify the name. Must accept the value on faith.

 In which situation this happens? Wouldn't it be better to simply
 enforce the transaction and fail otherwise?


Reading postgresql.conf during startup, I don't think we want to fail in 
that scenario ;)



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-01-11 Thread Petr Jelinek

On 11/01/15 08:56, Kohei KaiGai wrote:

2015-01-11 10:40 GMT+09:00 Jim Nasby jim.na...@bluetreble.com:


Yeah there are actually several places in the code where relid means
index in range table and not oid of relation, it still manages to
confuse
me. Nothing this patch can do about that.


Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste
time if
it'll just be rejected.

Any other examples of this I should fix too?


Sorry, to clarify... any other items besides Scan.scanrelid that I should
fix?


This naming is a little bit confusing, however, I don't think it should
be
changed because this structure has been used for a long time, so reworking
will prevent back-patching when we find bugs around scanrelid.


We can still backpatch; it just requires more work. And how many bugs do we
actually expect to find around this anyway?

If folks think this just isn't worth fixing fine, but I find the
backpatching argument rather dubious.


Even though here is no problem around Scan structure itself, a bugfix may
use the variable name of scanrelid to fix it. If we renamed it on v9.5, we
also need a little adjustment to apply this bugfix on prior versions.
It seems to me a waste of time for committers.



I tend to agree, especially as there is multiple places in code this 
would affect - RelOptInfo and RestrictInfo have same issue, etc.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-11 Thread Petr Jelinek

On 11/01/15 05:07, Andreas Karlsson wrote:

On 01/11/2015 02:36 AM, Andres Freund wrote:

@@ -3030,6 +3139,18 @@ int8_avg_accum(PG_FUNCTION_ARGS)
  Datum
  int2_accum_inv(PG_FUNCTION_ARGS)
  {
+#ifdef HAVE_INT128
+Int16AggState *state;
+
+state = PG_ARGISNULL(0) ? NULL : (Int16AggState *)
PG_GETARG_POINTER(0);
+
+/* Should not get here with no state */
+if (state == NULL)
+elog(ERROR, int2_accum_inv called with NULL state);
+
+if (!PG_ARGISNULL(1))
+do_int16_discard(state, (int128) PG_GETARG_INT16(1));
+#else
  NumericAggState *state;

  state = PG_ARGISNULL(0) ? NULL : (NumericAggState *)
PG_GETARG_POINTER(0);
@@ -3049,6 +3170,7 @@ int2_accum_inv(PG_FUNCTION_ARGS)
  if (!do_numeric_discard(state, newval))
  elog(ERROR, do_numeric_discard failed unexpectedly);
  }


Hm. It might be nicer to move the if (!state) elog() outside the ifdef,
and add curly parens inside the ifdef.


The reason I did so was because the type of the state differs and I did
not feel like having two ifdef blocks. I have no strong opinion about
this though.



I think Andres means you should move the NULL check before the ifdef and 
then use curly braces inside the the ifdef/else so that you can define 
the state variable there. That can be done with single ifdef.


int2_accum_inv(PG_FUNCTION_ARGS)
{
... null check ...
{
#ifdef HAVE_INT128
Int16AggState *state;
...
#else
NumericAggState *state;
...
#endif
PG_RETURN_POINTER(state);
}
}

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-01-09 Thread Petr Jelinek

On 10/01/15 01:19, Kohei KaiGai wrote:

2015-01-10 8:18 GMT+09:00 Jim Nasby jim.na...@bluetreble.com:

On 1/6/15, 5:43 PM, Kouhei Kaigai wrote:


scan_relid != InvalidOid






Ideally, they should be OidIsValid(scan_relid)



Scan.scanrelid is an index of range-tables list, not an object-id.
So, InvalidOid or OidIsValid() are not a good choice.



I think the name needs to change then; scan_relid certainly looks like the
OID of a relation.

scan_index?


Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
  * ==
  * Scan nodes
  * ==
  */
typedef struct Scan
{
 Planplan;
 Index   scanrelid;  /* relid is index into the range table */
} Scan;



Yeah there are actually several places in the code where relid means 
index in range table and not oid of relation, it still manages to 
confuse me. Nothing this patch can do about that.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints

2015-01-07 Thread Petr Jelinek

On 07/01/15 00:59, Michael Paquier wrote:

On Wed, Jan 7, 2015 at 4:24 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I noticed
that for wal_log_hints we assign the value in ControFile to current value
instead of value that comes from WAL.

ISTM it has just information value so it should not have any practical
impact, but it looks like a bug anyway so here is simple patch to change
that.

Right. That's something that should get fixed in 9.4 as well..

  ControlFile-track_commit_timestamp = track_commit_timestamp;
The new value of track_commit_timestamp should be taken as well from
xlrec on master btw.



That's part of the larger fix for CommitTs that I sent separately in 
response to the bug report from Fujii - there is much more to be done 
there for CommitTs than just this.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-01-06 Thread Petr Jelinek

On 07/01/15 00:05, Jim Nasby wrote:

On 1/6/15, 8:17 AM, Kouhei Kaigai wrote:

The attached patch is newer revision of custom-/foreign-join
interface.


Shouldn't instances of

scan_relid  0

be

scan_relid != InvalidOid



Ideally, they should be OidIsValid(scan_relid)


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints

2015-01-06 Thread Petr Jelinek

Hi,

when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I 
noticed that for wal_log_hints we assign the value in ControFile to 
current value instead of value that comes from WAL.


ISTM it has just information value so it should not have any practical 
impact, but it looks like a bug anyway so here is simple patch to change 
that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5cc7e47..be67da4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8967,7 +8967,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile-max_prepared_xacts = xlrec.max_prepared_xacts;
 		ControlFile-max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile-wal_level = xlrec.wal_level;
-		ControlFile-wal_log_hints = wal_log_hints;
+		ControlFile-wal_log_hints = xlrec.wal_log_hints;
 		ControlFile-track_commit_timestamp = track_commit_timestamp;
 
 		/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-05 Thread Petr Jelinek

On 05/01/15 16:17, Petr Jelinek wrote:

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com
wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed
pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the
standby

BTW, at the step #4, I got the following log messages. This might be
a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did
it though.



Added more comments and made the error message bit clearer.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..59d19a0 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,12 @@ StartupCommitTs(void)
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
 
+	if (track_commit_timestamp)
+	{
+		ActivateCommitTs();
+		return;
+	}
+
 	LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE);
 
 	/*
@@ -571,14 +577,30 @@ StartupCommitTs(void)
  * This must be called ONCE during postmaster or standalone-backend startup,
  * when commit timestamp is enabled.  Must be called after recovery has
  * finished.
+ */
+void
+CompleteCommitTsInitialization(void)
+{
+	if (!track_commit_timestamp)
+		DeactivateCommitTs(true);
+}
+
+/*
+ * This must be called when track_commit_timestamp is turned on.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * The reason why this SLRU needs separate activation/deactivation functions is
+ * that it can be enabled/disabled during start and the activation/deactivation
+ * on master is propagated to slave via replay. Other SLRUs don't have this
+ * property and they can be just initialized during normal startup.
  *
  * This is in charge of creating the currently active segment, if it's not
  * already there.  The reason for this is that the server might have been
  * running with this module disabled for a while and thus might have skipped
  * the normal creation point.
  */
-void
-CompleteCommitTsInitialization(void)
+void ActivateCommitTs(void)
 {
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
@@ -591,22 +613,6 @@ CompleteCommitTsInitialization(void)
 	LWLockRelease(CommitTsControlLock);
 
 	/*
-	 * If this module is not currently enabled, make sure we don't hand back
-	 * possibly-invalid data; also remove segments of old data.
-	 */
-	if (!track_commit_timestamp)
-	{
-		LWLockAcquire(CommitTsLock, LW_EXCLUSIVE);
-		ShmemVariableCache-oldestCommitTs = InvalidTransactionId;
-		ShmemVariableCache-newestCommitTs = InvalidTransactionId;
-		LWLockRelease(CommitTsLock);
-
-		TruncateCommitTs(ReadNewTransactionId());
-
-		return;
-	}
-
-	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
 	 * need to set the oldest and newest values to the next Xid; that way, we
 	 * will not try to read data that might not have been set.
@@ -641,6 +647,35 @@ CompleteCommitTsInitialization(void)
 }
 
 /*
+ * This must be called when track_commit_timestamp is turned off.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * Resets CommitTs into invalid state

[HACKERS] event trigger pg_dump fix

2015-01-05 Thread Petr Jelinek
Hi,

Attached is fix for
http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com

It was dumping comment with NULL owner while the function expects
empty string for objects without owner (there is pg_strdup down the
line).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6658fda..6541463 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo)
  query-data, , NULL, NULL, 0, NULL, NULL);
 
 	dumpComment(fout, dopt, labelq-data,
-NULL, NULL,
+NULL, ,
 evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId);
 
 	destroyPQExpBuffer(query);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger pg_dump fix

2015-01-05 Thread Petr Jelinek

On 06/01/15 01:02, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

Attached is fix for
http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com



It was dumping comment with NULL owner while the function expects
empty string for objects without owner (there is pg_strdup down the
line).


This isn't right either: event triggers do have owners.



Bah, of course, stupid me.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6658fda..00b87e7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo)
  query-data, , NULL, NULL, 0, NULL, NULL);
 
 	dumpComment(fout, dopt, labelq-data,
-NULL, NULL,
+NULL, evtinfo-evtowner,
 evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId);
 
 	destroyPQExpBuffer(query);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-05 Thread Petr Jelinek

On 24/12/14 20:11, Alex Shulgin wrote:

Alex Shulgin a...@commandprompt.com writes:


- the StandbyModeRequested and StandbyMode should be lowercased like
the rest of the GUCs


Yes, except that standby_mode is linked to StandbyModeRequested, not the
other one.  I can try see if there's a sane way out of this.


As I see it, renaming these will only add noise to this patch, and there
is also ArchiveRecoveryRequested / InArchiveRecovery.  This is going to
be tricky and I'm not sure it's really worth the effort.



I don't buy that to be honest, most (if not all) places that would be 
affected are already in diff as part of context around other renames 
anyway and it just does not seem right to rename some variables and not 
the others.


I still think there should be some thought given to merging the 
hot_standby and standby_mode, but I think we'd need opinion of more 
people (especially some committers) on this one.



- The whole CopyErrorData and memory context logic is not needed in
check_recovery_target_time() as the FlushErrorState() is not called
there


You must be right.  I recall having some trouble with strings being
free'd prematurely, but if it's ERROR, then there should be no need for
that.  I'll check again.


Hrm, after going through this again I'm pretty sure that was correct:
the only way to obtain the current error message is to use
CopyErrorData(), but that requires you to switch back to non-error
memory context (via Assert).


Right, my bad.



The FlushErrorState() call is not there because it's handled by the hook
caller (or it can exit via ereport() with elevel==ERROR).



Yes I've seen that, shouldn't you FreeErrorData(edata) then though? I 
guess it does not matter too much considering that you are going to 
throw error and die eventually anyway once you are in that code path, 
maybe just for the clarity...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_min_apply_delay with a negative value

2015-01-05 Thread Petr Jelinek

On 05/01/15 20:44, Robert Haas wrote:

On Sat, Jan 3, 2015 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Of course, if recovery_min_apply_delay were a proper GUC, we'd just
configure it with a minimum value of zero and be done :-(


Amen.  We should *really* convert all of the recovery.conf parameters
to be GUCs.



Well, there is an ongoing effort on that and I think the patch is very 
close to the state where committer should take a look IMHO, I have only 
couple of gripes with it now and one of them needs opinions of others 
anyway.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-05 Thread Petr Jelinek

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby

BTW, at the step #4, I got the following log messages. This might be a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did 
it though.


And while at it I noticed that redo code for XLOG_PARAMETER_CHANGE sets
ControlFile-wal_log_hints = wal_log_hints;
shouldn't it be
ControlFile-wal_log_hints = xlrec.wal_log_hints;
instead?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..fcfccf8 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,12 @@ StartupCommitTs(void)
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
 
+	if (track_commit_timestamp)
+	{
+		ActivateCommitTs();
+		return;
+	}
+
 	LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE);
 
 	/*
@@ -571,14 +577,25 @@ StartupCommitTs(void)
  * This must be called ONCE during postmaster or standalone-backend startup,
  * when commit timestamp is enabled.  Must be called after recovery has
  * finished.
+ */
+void
+CompleteCommitTsInitialization(void)
+{
+	if (!track_commit_timestamp)
+		DeactivateCommitTs(true);
+}
+
+/*
+ * This must be called when track_commit_timestamp is turned on.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
  *
  * This is in charge of creating the currently active segment, if it's not
  * already there.  The reason for this is that the server might have been
  * running with this module disabled for a while and thus might have skipped
  * the normal creation point.
  */
-void
-CompleteCommitTsInitialization(void)
+void ActivateCommitTs(void)
 {
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
@@ -591,22 +608,6 @@ CompleteCommitTsInitialization(void)
 	LWLockRelease(CommitTsControlLock);
 
 	/*
-	 * If this module is not currently enabled, make sure we don't hand back
-	 * possibly-invalid data; also remove segments of old data.
-	 */
-	if (!track_commit_timestamp)
-	{
-		LWLockAcquire(CommitTsLock, LW_EXCLUSIVE);
-		ShmemVariableCache-oldestCommitTs = InvalidTransactionId;
-		ShmemVariableCache-newestCommitTs = InvalidTransactionId;
-		LWLockRelease(CommitTsLock);
-
-		TruncateCommitTs(ReadNewTransactionId());
-
-		return;
-	}
-
-	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
 	 * need to set the oldest and newest values to the next Xid; that way, we
 	 * will not try to read data that might not have been set.
@@ -641,6 +642,35 @@ CompleteCommitTsInitialization(void)
 }
 
 /*
+ * This must be called when track_commit_timestamp is turned off.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * Resets CommitTs into invalid state to make sure we don't hand back
+ * possibly-invalid data; also removes segments of old data.
+ */
+void
+DeactivateCommitTs(bool do_wal)
+{
+	TransactionId xid = ShmemVariableCache-nextXid

Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-29 Thread Petr Jelinek

On 29/12/14 11:16, Andres Freund wrote:

On 2014-12-29 12:06:07 +0200, Heikki Linnakangas wrote:

That's a little bit better, but I have to say I'm still not impressed. There
are so many implicit assumptions in the system. The first assumption is that
a 32-bit node id is sufficient.


Seriously? Are we going to build facilities for replication systems with
that many nodes? It seems absolutely unrealistic that a) somebody does
this b) that we'll blindly meet the demands of such a super hypothetical
scenario.



+1, Honestly, if somebody will ever have setup with more nodes than what 
fits into 32bits they will run into bigger problems than nodeid being 
too small.



Then there's the assumption that the node id should be sticky,
i.e. it's set for the whole session. Again, I'm sure that's useful for
many systems, but I could just as easily imagine that you'd want it to
reset after every commit.


It's trivial to add that/reset it manually. So what?


Yes you can reset in the extension after commit, or you can actually 
override both commit timestamp and nodeid after commit if you so wish.





To be honest, I think this patch should be reverted. Instead, we should
design a system where extensions can define their own SLRUs to store
additional per-transaction information. That way, each extension can have as
much space per transaction as needed, and support functions that make most
sense with the information. Commit timestamp tracking would be one such
extension, and for this node ID stuff, you could have another one (or
include it in the replication extension).


If somebody wants that they should develop it. But given that we, based
on previous discussions, don't want to run user defined code in the
relevant phase during transaction commit *and* replay I don't think it'd
be all that easy to do it fast and flexible.



Right, I would love to have custom SLRUs but I don't see it happening 
given those two restrictions, otherwise I would write the CommitTs patch 
that way in the first place...



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit timestamp abbreviations

2014-12-24 Thread Petr Jelinek

On 24/12/14 15:15, Bruce Momjian wrote:

On Tue, Dec 23, 2014 at 06:00:21PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

I noticed this when looking at the allocated shared memory structures in
head:

shared memory alignment 64-byte of CommitTs Ctl:  0
shared memory alignment 64-byte of CommitTs shared:  0

I thought we got rid of the idea that 'Ts' means timestamp.  Was this
part forgotten?


Do you have a specific reference?  That's not the concern I remember,
and I sure don't want to re-read that whole thread again.


I remember the issue of using _ts and 'ts' inconsistently, and I thought
we were going to spell out timestamp in more places, but maybe I am
remembering incorrectly.



The change was from committs to commit_ts + CommitTs depending on place.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-23 Thread Petr Jelinek

On 12/12/14 16:34, Alex Shulgin wrote:

Alex Shulgin a...@commandprompt.com writes:


Alex Shulgin a...@commandprompt.com writes:


Here's an attempt to revive this patch.


Here's the patch rebased against current HEAD, that is including the
recently committed action_at_recovery_target option.

The default for the new GUC is 'pause', as in HEAD, and
pause_at_recovery_target is removed completely in favor of it.

I've also taken the liberty to remove that part that errors out when
finding $PGDATA/recovery.conf.  Now get your rotten tomatoes ready. ;-)


This was rather short-sighted, so I've restored that part.

Also, rebased on current HEAD, following the rename of
action_at_recovery_target to recovery_target_action.



Hi,

I had a quick look, the patch does not apply cleanly anymore but it's 
just release notes so nothing too bad.


I did not do any testing yet, but I have few comments about the code:

- the patch mixes functionality change with the lowercasing of the 
config variables, I wonder if those could be separated into 2 separate 
diffs - it would make review somewhat easier, but I can live with it as 
it is if it's too much work do separate into 2 patches


- the StandbyModeRequested and StandbyMode should be lowercased like the 
rest of the GUCs


- I am wondering if StandbyMode and hot_standby should be merged into 
one GUC if we are breaking backwards compatibility anyway


- Could you explain the reasoning behind:
+   if ((*newval)[0] == 0)
+   xid = InvalidTransactionId;
+   else

in check_recovery_target_xid() (and same check in 
check_recovery_target_timeline()), isn't the strtoul which is called 
later enough?


- The whole CopyErrorData and memory context logic is not needed in 
check_recovery_target_time() as the FlushErrorState() is not called there


- The new code in StartupXLOG() like
+   if (recovery_target_xid_string != NULL)
+   InitRecoveryTarget(RECOVERY_TARGET_XID);
+
+   if (recovery_target_time_string != NULL)
+   InitRecoveryTarget(RECOVERY_TARGET_TIME);
+
+   if (recovery_target_name != NULL)
+   InitRecoveryTarget(RECOVERY_TARGET_NAME);
+
+   if (recovery_target_string != NULL)
+   InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE);

would probably be better in separate function that you then call 
StartupXLOG() as StartupXLOG() is crazy long already so I think it's 
preferable to not make it worse.


- I wonder if we should rename trigger_file to standby_tigger_file


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 21/12/14 18:38, Tomas Vondra wrote:

Hi,

On 18.12.2014 13:14, Petr Jelinek wrote:

Hi,

v2 version of this patch is attached.


I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:


Thanks for looking at it!



(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:

 https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation

 We should either update it or mark it as obsolete I guess. Also,
 I'd like to know what's the status regarding the TODO items
 mentioned there. Are those still valid with this patch?


I will have to look in more detail over the holidays and update it, but 
the general info about table sampling there applies and will apply to 
any patch trying to implement it.


Quick look on the mail thread suggest that at least the concerns 
mentioned in the mailing list should not apply to this implementation.
And looking at the patch the problem with BERNOULLI sampling should not 
exist either as I use completely different implementation for that.


I do also have some issues with joins which I plan to look at but it's 
different one (my optimizer code overestimates the number of rows)



(1) The patch adds a new catalog, but does not bump CATVERSION.



I thought this was always done by committer?


(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
 as it squishes everything into a single chunk. That's inconsistent
 with naming of the other catalogs. I think pg_table_sample_method
 would be better.


Fair point, but perhaps pg_tablesample_method then as tablesample is 
used as single word everywhere including the standard.




(3) There are a few more strange naming decisions, but that's mostly
 because of the SQL standard requires that naming. I mean SYSTEM and
 BERNOULLI method names, and also the fact that the probability is
 specified as 0-100 value, which is inconsistent with other places
 (e.g. percentile_cont uses the usual 0-1 probability notion). But
 I don't think this can be fixed, that's what the standard says.


Yeah, I don't exactly love that either but what standard says, standard 
says.




(4) I noticed there's an interesting extension in SQL Server, which
 allows specifying PERCENT or ROWS, so you can say

   SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

 or

   SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

 That seems handy, and it'd make migration from SQL Server easier.
 What do you think?


Well doing it exactly this way somewhat kills the extensibility which 
was one of the main goals for me - I allow any kind of parameters for 
sampling and the handling of those depends solely on the sampling 
method. This means that in my approach if you'd want to change what you 
are limiting you'd have to write new sampling method and the query would 
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500); 
or some such (depending on how you name the sampling method). Or SELECT 
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend 
the SYSTEM sampling method, that would be also doable.


The reason for this is that I don't want to really limit too much what 
parameters can be send to sampling (I envision even SYSTEM_TIMED 
sampling method that will get limit as time interval for example).




(5) I envision a lot of confusion because of the REPEATABLE clause.
 With READ COMMITTED, it's not really repeatable because of changes
 done by the other users (and maybe things like autovacuum). Shall
 we mention this in the documentation?


Yes docs need improvement and this should be mentioned, also code-docs - 
I found few places which I forgot to update when changing code and now 
have misleading comments.




(6) This seems slightly wrong, because of long/uint32 mismatch:

   long seed = PG_GETARG_UINT32(1);

 I think uint32 would be more appropriate, no?


Probably, although I need long later in the algorithm anyway.



(7) NITPICKING: I think a 'sample_rate' would be a better name here:

   double percent = sampler-percent;


Ok.



(8) NITPICKING: InitSamplingMethod contains a command with ';;'

   fcinfo.arg[i] = (Datum) 0;;


Yeah :)



(9) The current regression tests only use the REPEATABLE cases. I
 understand queries without this clause are RANDOM, but maybe we
 could do something like this:

SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;

  Granted, there's still a small probability of false positive, but
  maybe that's sufficiently small? Or is the amount of code this
  tests negligible?


Well, depending on fillfactor and limit it could be made quite reliable 
I think, I also want to add test with VIEW (I think v2 has a bug there) 
and perhaps some JOIN.




(10) In the initial patch you mentioned it's possible to write custom

Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 22/12/14 20:14, Jaime Casanova wrote:

On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

v2 version of this patch is attached.



a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.


Fixed.


in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard


Also fixed.



also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish


Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it 
might take a while to fetch a row if percentage is very small and table 
is big... Fixed.



Attached is v3 which besides the fixes mentioned above also includes 
changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE 
METHOD), fixes for crash with FETCH FIRST and is rebased against current 
master.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac
 
 phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase
 
-[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
+[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ TABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ] ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] ( replaceable class=parameterselect/replaceable ) [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ]
 replaceable class=parameterwith_query_name/replaceable [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] replaceable class=parameterfunction_name/replaceable ( [ replaceable class=parameterargument/replaceable [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ]
  /varlistentry
 
  varlistentry
+  termTABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ]/term
+  listitem
+   para
+Table sample clause after
+replaceable class=parametertable_name/replaceable indicates that
+a replaceable class=parametersampling_method/replaceable should
+be used to retrieve subset of rows in the table.
+The replaceable class=parametersampling_method/replaceable can be
+one of:
+itemizedlist
+ listitem
+  paraliteralSYSTEM/literal/para
+ /listitem
+ listitem
+  paraliteralBERNOULLI/literal/para
+ /listitem
+/itemizedlist
+Both of those sampling methods currently accept only single argument
+which is the percent (floating point from 0 to 100) of the rows to
+be returned.
+The literalSYSTEM/literal sampling method does block level
+sampling with each block having same chance of being selected and
+returns all rows from each selected block.
+The literalBERNOULLI/literal scans whole table and returns
+individual rows with equal probability.
+The optional numeric parameter literalREPEATABLE/literal is used
+as random seed for sampling.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termreplaceable class=parameteralias/replaceable/term
   listitem
para
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam tsm
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17

Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-19 Thread Petr Jelinek

On 10/12/14 16:03, Petr Jelinek wrote:

On 10/12/14 04:26, Michael Paquier wrote:

On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Yeah, it was raised. I don't think it was really addressed. There was
lengthy discussion on whether to include LSN, node id, and/or some other
information, but there was no discussion on:

* What is a node ID?
* How is it used?
* Who assigns it?

etc.

None of those things are explained in the documentation nor code
comments.





Could it be possible to get a refresh on that before it bitrots too
much or we'll simply forget about it? In the current state, there is
no documentation able to explain what is the point of the nodeid
stuff, how to use it and what use cases it is aimed at. The API in
committs.c should as well be part of it. If that's not done, I think
that it would be fair to remove this portion and simply WAL-log the
commit timestamp its GUC is activated.


I have this on my list so it's not being forgotten and I don't see it
bitrotting unless we are changing XLog api again. I have bit busy
schedule right now though, can it wait till next week? - I will post
some code documentation patches then.



Hi,

as promised I am sending code-comment patch that explains the use of 
commit timestamps + nodeid C api for the conflict resolution, comments 
welcome.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..fff1fdd 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -6,6 +6,62 @@
  * This module is a pg_clog-like system that stores the commit timestamp
  * for each transaction.
  *
+ * When track_commit_timestamp is enabled, this module will keep track of
+ * commit timestamp for each transaction. It also provides API to for
+ * optionally storing nodeid (origin) of each transaction. The main purpose of
+ * this functionality is to help with conflict detection and resolution for
+ * replication systems.
+ *
+ * The following example shows how to use the API provided by this module, to
+ * handle UPDATE conflicts coming from replication stream:
+ * void
+ * update_tuple(Relation relation, HeapTuple remote_tuple,
+ *  TimestampTz remote_commit_ts, CommitTsNodeId remote_node_id)
+ * {
+ * bool exists;
+ * HeapTupleDatalocal_tuple;
+ *
+ * // Find existing tuple with same PK/unique index combination.
+ * exists = find_local_tuple(relation, remote_tuple, local_tuple);
+ *
+ * // The tuple was found.
+ * if (exists)
+ * {
+ * TransactionIdxmin;
+ * TimestampTz  local_commit_ts;
+ * CommitTsNodeId   local_node_id;
+ *
+ * xmin = HeapTupleHeaderGetXmin(local_tuple.t_data);
+ * TransactionIdGetCommitTsData(xmin, local_commit_ts, nodeid);
+ *
+ * // New tuple is coming from different node than the locally saved
+ * // tuple and the remote commit timestamp is older than local commit
+ * // timestamp, this is UPDATE/UPDATE conflict (node being UPDATEd on
+ * // different nodes at the same time.
+ * if (remote_id != local_node_id  remote_commit_ts = local_commit_ts)
+ * {
+ * if (remote_commit_ts  local_commit_ts)
+ * return; // Keep the local tuple.
+ *
+ * // Handle the conflict in a consistent manner.
+ * }
+ * else
+ * {
+ * // The new tuple either comes from same node as old tuple and/or
+ * // is has newer commit timestamp than the local tuple, apply the
+ * // UPDATE.
+ * }
+ *  }
+ *  else
+ *  {
+ *  // Tuple not found (possibly UPDATE/DELETE conflict), handle it
+ *  // in a consistent manner.
+ *  }
+ * }
+ *
+ * See default_node_id and CommitTsSetDefaultNodeId for explanation of how to
+ * set nodeid when applying transactions.
+ *
  * XLOG interactions: this module generates an XLOG record whenever a new
  * CommitTs page is initialized to zeroes.  Also, one XLOG record is
  * generated for setting of values when the caller requests it; this allows
@@ -49,6 +105,15 @@
  */
 
 /*
+ * CommitTimestampEntry
+ *
+ * Record containing information about the transaction commit timestamp and
+ * the nodeid.
+ *
+ * The nodeid provides IO efficient way for replication systems to store
+ * information about origin of the transaction. Currently the nodeid is opaque
+ * value meaning of which is defined by the replication system itself.
+ *
  * We need 8+4 bytes per xact.  Note that enlarging this struct might mean
  * the largest possible file name is more than 5 chars long; see
  * SlruScanDirectory.
@@ -93,6 +158,21 @@ CommitTimestampShared	*commitTsShared;
 /* GUC variable */
 bool	track_commit_timestamp;
 
+/*
+ * The default_node_id

Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-19 Thread Petr Jelinek

On 19/12/14 13:17, Michael Paquier wrote:

On Fri, Dec 19, 2014 at 6:30 PM, Petr Jelinek p...@2ndquadrant.com wrote:

On 10/12/14 16:03, Petr Jelinek wrote:


On 10/12/14 04:26, Michael Paquier wrote:


On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


Yeah, it was raised. I don't think it was really addressed. There was
lengthy discussion on whether to include LSN, node id, and/or some other
information, but there was no discussion on:

* What is a node ID?
* How is it used?
* Who assigns it?

etc.

None of those things are explained in the documentation nor code
comments.





Could it be possible to get a refresh on that before it bitrots too
much or we'll simply forget about it? In the current state, there is
no documentation able to explain what is the point of the nodeid
stuff, how to use it and what use cases it is aimed at. The API in
committs.c should as well be part of it. If that's not done, I think
that it would be fair to remove this portion and simply WAL-log the
commit timestamp its GUC is activated.



I have this on my list so it's not being forgotten and I don't see it
bitrotting unless we are changing XLog api again. I have bit busy
schedule right now though, can it wait till next week? - I will post
some code documentation patches then.

as promised I am sending code-comment patch that explains the use of commit
timestamps + nodeid C api for the conflict resolution, comments welcome.

Having some documentation with this example in doc/ would be more fruitful IMO.



I am not sure I see point in that, the GUC and SQL interfaces are 
documented in doc/ and we usually don't put documentation for C 
interfaces there.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2014-12-19 Thread Petr Jelinek

On 15/12/14 19:42, Robert Haas wrote:

On Mon, Dec 15, 2014 at 12:57 AM, Petr Jelinek p...@2ndquadrant.com wrote:

we've made few helper functions for making logical replication easier, I
bundled it into contrib module as this is mainly for discussion at this time
(I don't expect this to get committed any time soon, but it is good way to
iron out protocol, etc).

I created sample logical decoding plugin that uses those functions and which
can be used for passing DML changes in platform/version independent
(hopefully) format.

I will post sample apply BG worker also once I get some initial feedback
about this.

It's hard to write tests for this as the binary changes contain transaction
ids and timestamps so the data changes constantly.

This is of course based on the BDR work Andres, Craig and myself have been
doing.


I can't understand, either from what you've written here or the rather
sparse comments in the patch, what this might be good for.



What I tried to achieve here is to provide solution to many of the 
common problems faced by logical replication solutions. I believe the 
first step in designing the logical replication (now that we have the 
logical decoding) is making the output plugin and the efficient protocol 
so I started with that.


The code itself provides two main parts:
First is the lrep_utils common utility functions that solve things like 
transporting DML statements, and more importantly the changed data in 
efficient manner, trying to not do any conversion if not needed (when 
architecture/version matches) but falling back to binary/textual IO 
representation of individual types so that the cross platform/version 
replication works too. I think those should eventually end up in core 
(ie not in contrib) as they are helper functions likely to be shared by 
multiple extensions, but for now I keep them with the rest of the 
contrib module as I feel better experimenting inside that module.
There are also read functions that show how the other side could look 
like, but they are currently unused as the example apply worker is not 
part of the submission yet.


The second part is extensible output plugin which serves both as an 
example of the intended use of those common utility functions and also 
as actual working solution that can be used as base for several 
replication solutions.
It provides hooks for the replication solutions built on top of it that 
can be used for deciding if to replicate specific action on specific 
object and also injecting additional information to both BEGIN and 
COMMIT message - this can be useful for example when you are forwarding 
changes from another node and you wish to pass the information about the 
original node to the target one.


What I hope to get from this is agreement on the general approach and 
protocol so that we can have common base which will both make it easier 
to create external logical replication solutions and also eventually 
lead to full logical replication inside core PostgreSQL.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-18 Thread Petr Jelinek

Hi,

v2 version of this patch is attached.


On 16/12/14 09:31, Petr Jelinek wrote:

On 16/12/14 08:43, Jaime Casanova wrote:


Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.


Will do.


Done.



The test added for this failed, attached is the diff. i didn't looked
up why it failed



It isn't immediately obvious to me why, will look into it.


Fixed.




Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.



Yeah, as I mentioned in the submission the ruleutils support is not
there yet, so that's expected.



Also fixed.

I also added proper costing/row estimation. I consider this patch 
feature complete now, docs could still use improvement though.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac
 
 phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase
 
-[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
+[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ TABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ] ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] ( replaceable class=parameterselect/replaceable ) [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ]
 replaceable class=parameterwith_query_name/replaceable [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] replaceable class=parameterfunction_name/replaceable ( [ replaceable class=parameterargument/replaceable [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ]
  /varlistentry
 
  varlistentry
+  termTABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ]/term
+  listitem
+   para
+Table sample clause after
+replaceable class=parametertable_name/replaceable indicates that
+a replaceable class=parametersampling_method/replaceable should
+be used to retrieve subset of rows in the table.
+The replaceable class=parametersampling_method/replaceable can be
+one of:
+itemizedlist
+ listitem
+  paraliteralSYSTEM/literal/para
+ /listitem
+ listitem
+  paraliteralBERNOULLI/literal/para
+ /listitem
+/itemizedlist
+Both of those sampling methods currently accept only single argument
+which is the percent (floating point from 0 to 100) of the rows to
+be returned.
+The literalSYSTEM/literal sampling method does block level
+sampling with each block having same chance of being selected and
+returns all rows from each selected block.
+The literalBERNOULLI/literal scans whole table and returns
+individual rows with equal probability.
+The optional numeric parameter literalREPEATABLE/literal is used
+as random seed for sampling.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termreplaceable class=parameteralias/replaceable/term
   listitem
para
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam tsm
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@
+#-
+#
+# Makefile--
+#Makefile for access/tsm

Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Petr Jelinek

On 17/12/14 11:02, Atri Sharma wrote:



On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs si...@2ndquadrant.com
mailto:si...@2ndquadrant.com wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a combining
function, that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.


Also, should we not have a sanity check for the user function provided?


Looking at the code, yes, there seems to be XXX comment about that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-17 Thread Petr Jelinek

On 17/12/14 19:55, Alvaro Herrera wrote:

I noticed that this makes REPEATABLE a reserved keyword, which is
currently an unreserved one.  Can we avoid that?



I added it because I did not find any other way to fix the shift/reduce 
conflicts that bison complained about. I am no bison expert though.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-16 Thread Petr Jelinek

On 16/12/14 08:43, Jaime Casanova wrote:

On Wed, Dec 10, 2014 at 6:24 PM, Petr Jelinek p...@2ndquadrant.com wrote:

Hello,

Attached is a basic implementation of TABLESAMPLE clause. It's SQL standard
clause and couple of people tried to submit it before so I think I don't
need to explain in length what it does - basically returns random sample
of a table using a specified sampling method.



Tablesample, yay!

Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.


Will do.



The test added for this failed, attached is the diff. i didn't looked
up why it failed



It isn't immediately obvious to me why, will look into it.



Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.



Yeah, as I mentioned in the submission the ruleutils support is not 
there yet, so that's expected.



will look at the patch more close tomorrow when my brain wake up ;)



Thanks!



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-12-15 Thread Petr Jelinek

On 15/12/14 09:12, Michael Paquier wrote:

On Wed, Dec 10, 2014 at 6:50 PM, Noah Misch n...@leadboat.com wrote:

On Mon, Dec 08, 2014 at 02:23:39AM +0100, Petr Jelinek wrote:

On 08/12/14 00:56, Noah Misch wrote:

The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
running on 64-bit Windows Server 2003.  I have not checked other Windows
configurations; the suite does pass on GNU/Linux.


Hmm I wonder if  now() needs to be changed to = now() in those queries
to make them work correctly on that plarform, I don't have machine with that
environment handy right now, so I would appreciate if you could try that, in
case you don't have time for that, I will try to setup something later...


I will try that, though perhaps not until next week.


FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
I also checked that changing  now() to = now() fixed the
problem, so your assumption was right, Petr.
Regards,



Cool, thanks, I think it was the time granularity problem in Windows.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-15 Thread Petr Jelinek

On 10/12/14 03:33, Petr Jelinek wrote:

On 24/11/14 12:16, Heikki Linnakangas wrote:

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly


This version does AlterSequence differently and better. Didn't attach 
the gapless sequence again as that one is unchanged.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..818da15 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam sequence
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c16b38e..35818c0 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -321,6 +321,7 @@ static bool need_initialization = true;
 static void initialize_reloptions(void);
 static void parse_one_reloption(relopt_value *option, char *text_str,
 	int text_len, bool validate);
+static bytea *common_am_reloptions(RegProcedure amoptions, Datum reloptions, bool validate);
 
 /*
  * initialize_reloptions
@@ -821,7 +822,8 @@ untransformRelOptions(Datum options)
  * instead.
  *
  * tupdesc is pg_class' tuple descriptor.  amoptions is the amoptions regproc
- * in the case of the tuple corresponding to an index, or InvalidOid otherwise.
+ * in the case of the tuple corresponding to an index or sequence, InvalidOid
+ * otherwise.
  */
 bytea *
 extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
@@ -854,6 +856,9 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, Oid amoptions)
 		case RELKIND_INDEX:
 			options = index_reloptions(amoptions, datum, false);
 			break;
+		case RELKIND_SEQUENCE:
+			options = sequence_reloptions(amoptions, datum, false);
+			break;
 		case RELKIND_FOREIGN_TABLE:
 			options = NULL;
 			break;
@@ -1299,13 +1304,31 @@ heap_reloptions(char relkind, Datum reloptions, bool validate)
 
 /*
  * Parse options for indexes.
+ */
+bytea *
+index_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for sequences.
+ */
+bytea *
+sequence_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+{
+	return common_am_reloptions(amoptions, reloptions, validate);
+}
+
+/*
+ * Parse options for indexes or sequences.
  *
  *	amoptions	Oid of option parser
  *	reloptions	options as text[] datum
  *	validate	error flag
  */
-bytea *
-index_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
+static bytea *
+common_am_reloptions(RegProcedure amoptions, Datum reloptions, bool validate)
 {
 	FmgrInfo	flinfo;
 	FunctionCallInfoData fcinfo;
diff --git a/src/backend/access/sequence/Makefile b/src/backend/access/sequence/Makefile
new file mode 100644
index 000..01a0dc8
--- /dev/null
+++ b/src/backend/access/sequence/Makefile
@@ -0,0 +1,17 @@
+#-
+#
+# Makefile--
+#Makefile for access/sequence
+#
+# IDENTIFICATION
+#src/backend/access/sequence/Makefile
+#
+#-
+
+subdir = src/backend/access/sequence
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = seqam.o seqlocal.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/sequence/seqam.c b/src/backend/access/sequence/seqam.c
new file mode 100644
index 000..ce57f16
--- /dev/null
+++ b/src/backend/access/sequence/seqam.c
@@ -0,0 +1,428 @@
+/*-
+ *
+ * seqam.c
+ *	  sequence access method routines
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/sequence/seqam.c
+ *
+ *
+ * Sequence access method allows the SQL Standard Sequence objects to be
+ * managed according to either the default access method or a pluggable
+ * replacement. Each sequence can only use one access method at a time,
+ * though different sequence access methods can be in use by different
+ * sequences at the same time.
+ *
+ * The SQL Standard assumes that each Sequence object is completely controlled
+ * from the current database node, preventing any form

<    4   5   6   7   8   9   10   11   12   >