date:20141213

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-13 Thread Amit Kapila

On Sat, Dec 13, 2014 at 10:48 PM, Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > On 11/20/2014 02:27 AM, Amit Kapila wrote:
> >> Now the part where I would like to receive feedback before revising the
> >> patch is on the coding style.  It seems to me from Tom's comments that
> >> he is not happy with the code, now I am not sure which part of the
patch
> >> he thinks needs change.  Tom if possible, could you be slightly more
> >> specific about your concern w.r.t code?
>
> > In view of the request above for comments from Tom, I have moved this
> > back to "Needs Review".
>
> Sorry, I was not paying very close attention to this thread and missed
> the request for comments.  A few such:
>
> 1. The patch is completely naive about what might be in the symlink
> path string; eg embedded spaces in the path would break it.  On at
> least some platforms, newlines could be in the path as well.  I'm not
> sure about how to guard against this while maintaining human readability
> of the file.
>

I will look into this and see what best can be done.

> 2. There seems to be more going on here than what is advertised, eg
> why do we need to add an option to the BASE_BACKUP command

This is to ensure that symlink file is generated only for tar format;
server is not aware of whether the backup is generated for plain format
or tar format.  We don't want to do it for plain format as for that
client (pg_basebackup) can update the symlinks via -T option and backing
up symlink file during that operation can lead to spurious symlinks after
archive recovery.  I have given the reason why we want to accomplish it
only for tar format in my initial mail.

> (and if
> we do need it, doesn't it need to be documented in protocol.sgml)?

I shall take care of it in next version of patch.

> And why is the RelationCacheInitFileRemove call relocated?
>

Because it assumes that tablespace directory pg_tblspc is in
place and it tries to remove the files by reading pg_tblspc
directory as well.  Now as we setup the symlinks in pg_tblspc
after reading symlink file, so we should remove relcache init
file once the symlinks are setup in pg_tblspc directory.

> 3. Not terribly happy with the changes made to the API of
> do_pg_start_backup, eg having to be able to parse "DIR *" in its
> arguments seems like a lot of #include creep.  xlog.h is pretty
> central so I'm not happy about plastering more #includes in it.
>

The reason of adding new include in xlog.c is for use of tablespaceinfo
structure which I have now kept in basebackup.h.

The reason why I have done this way is because do_pg_start_backup has
some functionality common to both non-exclusive and exclusive backups and
for this feature we have to do some work common for both non-exclusive
and exclusive backup which is to generate the symlink label file for
non-exclusive backups and write the symlink label file for exclusive
backups using that information. Doing this way seems right to me
as we are already doing something like that for backup label file.

Another possible way could be to write a new function in xlogutils.c
to do the symlink label stuff and then use the same in xlog.c, I think
that way we could avoid any new include in xlog.c.  However for this we
need to have include in xlogutils.c to make it aware of tablespaceinfo
structure.

> 4. In the same vein, publicly declaring a struct with a name as
> generic as "tablespaceinfo" doesn't seem like a great idea, when
> its usage is far from generic.
>

This is related to above point, we need to use this for both
non-exclusive and exclusive backups and the work for exclusive
backups is done outside of basebackup.c due to which we need
to expose the same.

Any other better idea to address points 3 and 4?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Commitfest problems

2014-12-13 Thread Mark Kirkwood


On 13/12/14 22:37, Craig Ringer wrote:

On 12/12/2014 06:02 AM, Josh Berkus wrote:


Speaking as the originator of commitfests, they were *always* intended
to be a temporary measure, a step on the way to something else like
continuous integration.


I'd really like to see the project revisit some of the underlying
assumptions that're being made in this discussion:

- Patches must be email attachments to a mailing list

- Changes must be committed by applying a diff

... and take a look at some of the options a git-based workflow might
offer, especially in combination with some of the tools out there that
help track working branches, run CI, etc.

Having grown used to push/pull workflows with CI integration I find the
PostgreSQL patch workflow very frustrating, especially for larger
patches. It's particularly annoying to see a patch series squashed into
a monster patch whenever it changes hands or gets rebased, because it's
being handed around as a great honking diff not a git working branch.

Is it time to stop using git like CVS?

(/hides)



Having dealt with other projects that use a more git centric + CI 
approach, I can say that trying to do reviews can be just frustrating in 
that case too:


- quirky and annoying web interfaces
- changesets "expiring" in the middle of you reviewing them
- pulls and rebases making actually making it harder to see what was 
changed in new changeset versions


I think the basic issue is that reviewing is hard, and while our 
system-ismation of the workflow is really primitive, and could be much 
better (that seems to be being worked on), the *tool* is not really the 
problem.


regards

Mark








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 1:16 PM, Andres Freund  wrote:
> On 2014-12-14 09:56:59 +0900, Michael Paquier wrote:
>> On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
>> > On 13 December 2014 at 14:36, Michael Paquier  
>> > wrote:
>> >
>> >> Something to be aware of btw is that this patch introduces an
>> >> additional 8 bytes per block image in WAL as it contains additional
>> >> information to control the compression. In this case this is the
>> >> uint16 compress_len present in XLogRecordBlockImageHeader.
>> >
>> > So we add 8 bytes to all FPWs, or only for compressed FPWs?
>> In this case that was all. We could still use xl_info to put a flag
>> telling that blocks are compressed, but it feels more consistent to
>> have a way to identify if a block is compressed inside its own header.
>
> Your 'consistency' argument doesn't convince me.

Could you be more precise (perhaps my use of the word "consistent" was
incorrect here)? Isn't it the most natural way of doing to have the
compression information of each block in their own headers? There may
be blocks that are marked as incompressible in a whole set, so we need
to track for each block individually if they are compressed. Now,
instead of an additional uint16 to store the compressed length of the
block, we can take 1 bit from hole_length and 1 bit from hole_offset
to store a status flag deciding if a block is compressed. If we do so,
the tradeoff is to fill in the block hole with zeros and compress
BLCKSZ worth of data all the time, costing more CPU. But doing so we
would still use only 4 bytes for the block information, making default
case, aka compression switch off, behave like HEAD in term of pure
record quantity.
This second method has been as well mentioned upthread a couple of times.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread David Rowley

On 14 December 2014 at 13:50, Jim Nasby  wrote:
>
> On 12/13/14, 6:17 PM, Tatsuo Ishii wrote:
>
>> Problem with "-f implies -n" approach is, it breaks backward
>> compatibility. There are use cases using custom script*and*  pgbench_*
>> tables. For example the particular user wants to use the standard
>> pgbench tables and is not satisfied with the built in scenario. I know
>> at least one user does this way.
>>
>
> If we care enough about that case to attempt the vacuum anyway then we
> need to do something about the error message; either squelch it or check
> for the existence of the tables before attempting to vacuum. Since there's
> no way to squelch in the server logfile, I think checking for the table is
> the right answer.
>
>
Maybe someone can write a patch for VACUUM IF EXISTS ... :-)

/me hides

Regards

David Rowley

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Andres Freund

On 2014-12-14 09:56:59 +0900, Michael Paquier wrote:
> On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
> > On 13 December 2014 at 14:36, Michael Paquier  
> > wrote:
> >
> >> Something to be aware of btw is that this patch introduces an
> >> additional 8 bytes per block image in WAL as it contains additional
> >> information to control the compression. In this case this is the
> >> uint16 compress_len present in XLogRecordBlockImageHeader.
> >
> > So we add 8 bytes to all FPWs, or only for compressed FPWs?
> In this case that was all. We could still use xl_info to put a flag
> telling that blocks are compressed, but it feels more consistent to
> have a way to identify if a block is compressed inside its own header.

Your 'consistency' argument doesn't convince me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread Tatsuo Ishii

> If we care enough about that case to attempt the vacuum anyway then we
> need to do something about the error message; either squelch it or
> check for the existence of the tables before attempting to
> vacuum. Since there's no way to squelch in the server logfile, I think
> checking for the table is the right answer.

Fair enough. I will come up with "checking for table before vacuum"
approach.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
> On 13 December 2014 at 14:36, Michael Paquier  
> wrote:
>
>> Something to be aware of btw is that this patch introduces an
>> additional 8 bytes per block image in WAL as it contains additional
>> information to control the compression. In this case this is the
>> uint16 compress_len present in XLogRecordBlockImageHeader.
>
> So we add 8 bytes to all FPWs, or only for compressed FPWs?
In this case that was all. We could still use xl_info to put a flag
telling that blocks are compressed, but it feels more consistent to
have a way to identify if a block is compressed inside its own header.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread Jim Nasby


On 12/13/14, 6:17 PM, Tatsuo Ishii wrote:

Problem with "-f implies -n" approach is, it breaks backward
compatibility. There are use cases using custom script*and*  pgbench_*
tables. For example the particular user wants to use the standard
pgbench tables and is not satisfied with the built in scenario. I know
at least one user does this way.


If we care enough about that case to attempt the vacuum anyway then we need to 
do something about the error message; either squelch it or check for the 
existence of the tables before attempting to vacuum. Since there's no way to 
squelch in the server logfile, I think checking for the table is the right 
answer.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split builtins.h to quote.h

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 1:00 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 11/08/2014 12:37 AM, Michael Paquier wrote:
>>> Well, yes :) I missed that. Note that I am leaning to Robert's
>>> direction as well to do a clear separation... Now if the final
>>> consensus is different, then let's use the patch attached that puts
>>> the SQL functions to builtins.h, and the rest in quote.h.
>
>> I am unlcear about what the consensus is on this, and don't have strong
>> feelings either way. Do we need a vote? It's not of earth-shattering
>> importance, but my slight inclination would be to do the minimally
>> invasive thing where there is disagreement.
>
> Well, the minimally invasive thing would be to reject the patch
> altogether.  Do we really need this?
>
> In a quick look, the patch seems to result in strictly increasing the
> number of #include's needed, which ISTM is not a positive sign for a
> refactoring, especially given the number of files it hits.  If there
> had been some #include's removed as well, I'd be happier.
Let's do so then. I marked it as rejected.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread Amit Langote

On Sun, Dec 14, 2014 at 1:40 AM, José Luis Tallón
 wrote:
> On 12/12/2014 05:43 AM, Amit Langote wrote:
>
> Amit: mind if I add the DB2 syntax for partitioning to the wiki, too?
>
> This might as well help with deciding the final form of partitioning
> (and define the first implementation boundaries, too)
>

Please go ahead.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread Amit Langote

On Sun, Dec 14, 2014 at 1:57 AM, José Luis Tallón
 wrote:
> On 12/13/2014 03:09 AM, Alvaro Herrera wrote:
>>
>> [snip]
>> Arbitrary SQL expressions (including functions) are not the thing to use
>> for partitioning -- at least that's how I understand this whole
>> discussion.  I don't think you want to do "proofs" as such -- they are
>> expensive.
>
>
> Yup. Plus, it looks like (from reading Oracle's documentation) they end up
> converting the LESS THAN clauses into range lists internally.
> Anyone that can attest to this? (or just disprove it, if I'm wrong)
>
> I just suggested using the existing RangeType infrastructure for this ( <<,
>>> and && operators, specifically, might do the trick) before reading your
> mail citing BRIN.
> ... which might as well allow some interesting runtime optimizations
> when range partitioning is used and *a huge* number of partitions get
> defined --- I'm specifically thinking about massive OLTP with very deep
> (say, 5 years' worth) archival partitioning where it would be inconvenient
> to have the tuple routing information always in memory.
> I'm specifically suggesting some ( range_value -> partitionOID) mapping
> using a BRIN index for this --- it could be auto-created just like we do for
> primary keys.
>
> Just my 2c

Since we are keen on being able to reuse existing infrastructure, I
think this and RangeType, ArrayType stuff is worth thinking about
though I am afraid we may lose a certain level of generality of
expression we might very well be able to afford. Though that's
something difficult to definitely say without actually studying it a
little more detail which I haven't quite yet. We may be able to go
somewhere with it perhaps. And of course the original designers of the
infrastructure in question would be better able to vouch for it I
think.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread Tatsuo Ishii

> On 14 December 2014 at 04:39, Tom Lane  wrote:
>>
>> Tatsuo Ishii  writes:
>> > Currently pgbench -f (run custom script) executes vacuum against
>> > pgbench_* tables before stating bench marking if -n (or --no-vacuum)
>> > is not specified. If those tables do not exist, pgbench fails. To
>> > prevent this, -n must be specified. For me this behavior seems insane
>> > because "-f" does not necessarily suppose the existence of the
>> > pgbench_* tables.  Attached patch prevents pgbench from exiting even
>> > if those tables do not exist.
>>
>> I don't particularly care for this approach.  I think if we want to
>> do something about this, we should just make -f imply -n.  Although
>> really, given the lack of complaints so far, it seems like people
>> manage to deal with this state of affairs just fine.  Do we really
>> need to do anything?
>>
>>
>>
> I also find this weird vacuum non existing tables rather bizarre. I think
> the first time I ever used pgbench I was confronted by the pgbench* tables
> not existing, despite the fact that I was trying to run my own script.
> Looking at --help it mentioned nothing about the pgbench_* tables, so I was
> pretty confused until I opened up the online docs.
> 
> I'm not really a fan of how this is done in the proposed patch, I'd vote
> for either skipping vacuum if -f is specified, or just doing a database
> wide vacuum in that case. Though, that might surprise a few people, so
> maybe the first option is better.

Problem with "-f implies -n" approach is, it breaks backward
compatibility. There are use cases using custom script *and* pgbench_*
tables. For example the particular user wants to use the standard
pgbench tables and is not satisfied with the built in scenario. I know
at least one user does this way.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] operator does not exist: character varying[] <> character[]

2014-12-13 Thread Jim Nasby


On 12/12/14, 7:16 PM, Tom Lane wrote:

Jim Nasby  writes:

I'd say that array_eq (and probably _cmp) just needs to be taught to fall back 
to what oper() does, but this part of the commit message gives me pause:



"Change the operator search algorithms to look for appropriate btree or hash index 
opclasses, instead of assuming operators named '<' or '=' have the right semantics."


As it should.  array_cmp is the basis for a btree opclass, therefore
it must *not* use operators that are not themselves btree operators.

Quite aside from that, we need to move even further away from having
internal system operations depend on operator-name-based lookups;
see for instance the recent complaints over stuff like IS DISTINCT FROM
failing for types whose operators aren't in the search path.


Agreed, but in a way that's not what we're doing here; we're trying to utilize 
an operator the user asked us to use. Of course, array_eq assumes that we want 
equality; I think that's a problem itself. rowtypes suffer from this too, but 
presumably it's not as big a deal because we require typid's to match exactly.


It's arguable that the typcache code should be taught to look for
binary-compatible opclasses if it can't find one directly for the
specified type.  I'm not sure offhand what rules we'd need to make
to ensure such a search would yield deterministic results, though.


The risk I see is what happens when someone adds a new operator or cast and 
suddenly we have multiple paths. That should be fine for regular comparisons, 
but probably not in an index.


Another possibility is that we might be able to extend the "text_ops"
btree operator family to include an opclass entry for varchar, rather than
relying on binary compatibility to find the text opclass.  But that would
also require some careful thought to understand what the relaxed
invariants should be for the opfamily structure as a whole.  We don't want
to add more actual operators, for fear of creating ambiguous-operator
lookup failures.


Yeah, to me that sounds like heading back down the road of assuming = means = 
and the other fun we had before classes and families... but maybe I'm just 
being paranoid.

I have an alternative idea, though I'm not sure it's worth the work. Instead of 
having special array-only operators we could instead apply regular operators to 
arrays. I believe we can do this and reuse existing operators, if we store an 
expression of how to combine the result from the previous iteration to the current 
one (ie: for < this would be (prev AND current), if there is a result value 
that should stop iteration (for comparison operators that would be false), and 
what to do with different size arrays. In the last case, you're either going to 
use that to provide a final result, substitute a specific value for any missing 
elements, or throw an error.

Pros:
With some additional information in the catalog, we could provide a lot more 
array operations, using existing operator functions.
We can use the same operator search logic we use for elements. If you can 
perform an operation on 2 elements and that operator has array support, then 
we're good to go. If we'd perform casting on the elements, we'd do the same 
casting with the array values.
We no longer need to assume things like = means equal. If text = int actually 
meant length(text) = int then as long as that operator had array support you 
could do text[] = int[].
This keeps all operator logic together, regardless of whether the input is an 
array or a base element. Not

Cons:
Could this cause problems in the planner? Selectivity estimation comes to mind.
transformAExprOp/make_op would need to do something different for arrays, 
because the function would need more information. If we can just extend OpExpr 
then maybe this isn't a big deal. (A simple grep shows 401 instances of OpExpr 
in src/backend).
I don't think we could use this same technique with rowtypes.
We'd still allow for array-specific operators; perhaps that might cause issues 
or at least confusion.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...

2014-12-13 Thread Fabrízio de Royes Mello

Em sábado, 13 de dezembro de 2014, Andrew Dunstan 
escreveu:

>
> On 11/03/2014 07:35 AM, Fabrízio de Royes Mello wrote:
>
>> On Mon, Nov 3, 2014 at 3:12 AM, Rushabh Lathia > > wrote:
>> >
>> > Patch looks good, assigning to committer.
>> >
>>
>> Thanks for your review!
>>
>>
>
> Committed.
>
>
Thanks.

Fabrízio


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread David Rowley

On 14 December 2014 at 04:39, Tom Lane  wrote:
>
> Tatsuo Ishii  writes:
> > Currently pgbench -f (run custom script) executes vacuum against
> > pgbench_* tables before stating bench marking if -n (or --no-vacuum)
> > is not specified. If those tables do not exist, pgbench fails. To
> > prevent this, -n must be specified. For me this behavior seems insane
> > because "-f" does not necessarily suppose the existence of the
> > pgbench_* tables.  Attached patch prevents pgbench from exiting even
> > if those tables do not exist.
>
> I don't particularly care for this approach.  I think if we want to
> do something about this, we should just make -f imply -n.  Although
> really, given the lack of complaints so far, it seems like people
> manage to deal with this state of affairs just fine.  Do we really
> need to do anything?
>
>
>
I also find this weird vacuum non existing tables rather bizarre. I think
the first time I ever used pgbench I was confronted by the pgbench* tables
not existing, despite the fact that I was trying to run my own script.
Looking at --help it mentioned nothing about the pgbench_* tables, so I was
pretty confused until I opened up the online docs.

I'm not really a fan of how this is done in the proposed patch, I'd vote
for either skipping vacuum if -f is specified, or just doing a database
wide vacuum in that case. Though, that might surprise a few people, so
maybe the first option is better.

Regards

David Rowley

[HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2014-12-13 Thread Andreas Karlsson


Hi,

I have attached a patch with the current status of my work on reducing 
the lock level of trigger and foreign key related DDL.


This commit reduces the lock level of the following commands from ACCESS 
EXCLUSIVE to SHARE ROW EXCLUSIVE, plus that it does the same for the 
referred table of the foreign key.


ADD TRIGGER
ALTER TRIGGER
ALTER TABLE ... ADD FOREIGN KEY
ALTER TABLE ... DISABLE TRIGGER
ALTER TABLE ... ENABLE TRIGGER

The patch does currently not reducing the lock level of the following 
two commands, but I think it would be really nice to fix those too and 
here I would like some feedback and ideas.


DROP TRIGGER
ALTER TABLE ... DROP CONSTRAINT -- For foreign keys

Foreign keys and triggers are fixed at the same time because foreign 
keys are implemented with two triggers, one at each of the involved tables.


The idea behind the patch is that since we started using MVCC snapshots 
we no longer need to hold an exclusive lock on the table when adding a 
trigger. Theoretically we only need to lock out writers of the rows of 
the table (SHARE ROW EXCLUSIVE/SHARE), but as noted by Robert and Noah 
just reducing the lock level will break pg_dump since 
pg_get_triggerdef() does not use the current snapshot when reading the 
catalog.


I have fixed pg_get_triggerdef() to use the snapshot like how 
e5550d5fec66aa74caad1f79b79826ec64898688 fixed pg_get_constraintdef(), 
and this fixes the code for the ALTER TRIGGER case (ADD TRIGGER was 
already safe). But to fix it for the DROP TRIGGER case we need to also 
make the dumping of the WHEN clause (which is dumped by get_rule_expr()) 
use the current snapshot.


get_rule_expr() relies heavily on the catcache which to me does not look 
like it could easily be (and probably not even should be) made to use 
the current snapshot. Refactoring ruleutils.c to rely less no the 
catcache seems like a reasonable thing to do if we want to reduce 
weirdness of how it ignores MVCC but it is quite a bit of work and I 
fear it could give us performance regressions.


Do you have any ideas for how to fix get_rule_expr()? Is this patch 
worthwhile even without reducing the lock levels of the drop commands?


Andreas

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1e737a0..ece6ed5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2862,13 +2862,8 @@ AlterTableGetLockLevel(List *cmds)
 break;
 
 /*
- * These subcommands affect write operations only. XXX
- * Theoretically, these could be ShareRowExclusiveLock.
+ * These subcommands affect write operations only.
  */
-			case AT_ColumnDefault:
-			case AT_ProcessedConstraint:		/* becomes AT_AddConstraint */
-			case AT_AddConstraintRecurse:		/* becomes AT_AddConstraint */
-			case AT_ReAddConstraint:	/* becomes AT_AddConstraint */
 			case AT_EnableTrig:
 			case AT_EnableAlwaysTrig:
 			case AT_EnableReplicaTrig:
@@ -2877,6 +2872,17 @@ AlterTableGetLockLevel(List *cmds)
 			case AT_DisableTrig:
 			case AT_DisableTrigAll:
 			case AT_DisableTrigUser:
+cmd_lockmode = ShareRowExclusiveLock;
+break;
+
+/*
+ * These subcommands affect write operations only. XXX
+ * Theoretically, these could be ShareRowExclusiveLock.
+ */
+			case AT_ColumnDefault:
+			case AT_ProcessedConstraint:		/* becomes AT_AddConstraint */
+			case AT_AddConstraintRecurse:		/* becomes AT_AddConstraint */
+			case AT_ReAddConstraint:	/* becomes AT_AddConstraint */
 			case AT_AlterConstraint:
 			case AT_AddIndex:	/* from ADD CONSTRAINT */
 			case AT_AddIndexConstraint:
@@ -2913,11 +2919,9 @@ AlterTableGetLockLevel(List *cmds)
 			/*
 			 * We add triggers to both tables when we add a
 			 * Foreign Key, so the lock level must be at least
-			 * as strong as CREATE TRIGGER. XXX Might be set
-			 * down to ShareRowExclusiveLock though trigger
-			 * info is accessed by pg_get_triggerdef
+			 * as strong as CREATE TRIGGER.
 			 */
-			cmd_lockmode = AccessExclusiveLock;
+			cmd_lockmode = ShareRowExclusiveLock;
 			break;
 
 		default:
@@ -6034,16 +6038,13 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
 	ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
 
 	/*
-	 * Grab an exclusive lock on the pk table, so that someone doesn't delete
-	 * rows out from under us. (Although a lesser lock would do for that
-	 * purpose, we'll need exclusive lock anyway to add triggers to the pk
-	 * table; trying to start with a lesser lock will just create a risk of
-	 * deadlock.)
+	 * Grab an share lock on the pk table, so that someone doesn't delete
+	 * rows out from under us.
 	 */
 	if (OidIsValid(fkconstraint->old_pktable_oid))
-		pkrel = heap_open(fkconstraint->old_pktable_oid, AccessExclusiveLock);
+		pkrel = heap_open(fkconstraint->old_pktable_oid, ShareRowExclusiveLock);
 	else
-		pkrel = heap_openrv(fkconstraint->pktable, AccessExclusiveLock);
+		pkre

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 13 December 2014 at 14:36, Michael Paquier  wrote:

> Something to be aware of btw is that this patch introduces an
> additional 8 bytes per block image in WAL as it contains additional
> information to control the compression. In this case this is the
> uint16 compress_len present in XLogRecordBlockImageHeader.

So we add 8 bytes to all FPWs, or only for compressed FPWs?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-13 Thread Álvaro Hernández Tortosa



On 12/12/14 20:43, Josh Berkus wrote:

On 12/12/2014 11:35 AM, Alvaro Herrera wrote:

Uh, really?  Last I looked at the numbers from SPI treasurer reports,
they are not impressive enough to hire a full-time engineer, let alone a
senior one.

The Linux Foundation has managed to pay for Linus Torvalds somehow, so
it does sound possible.  We have a number of companies making money all
over the globe, at least.


I think Álvaro (Herrera) got to the real point when he suggested to 
fund a developer. Or three, as was also suggested. But to really nail it 
down, I'd say not only for CFM. I think, overall, the PostgreSQL 
community would really need to seriously consider raising funds and pay 
with that full time development for some senior developers.


That would also allow to have development more oriented into what 
the community really wants, rather than what other companies or 
individuals work for (which is absolutely great, of course).



You're looking at this wrong.  We have that amount of money in the
account based on zero fundraising whatsoever, which we don't do because
we don't spend the money.  We get roughly $20,000 per year just by
putting up a "donate" link, and not even promoting it.

So, what this would take is:

1) a candidate who is currently a known major committer


I think it would be even better to sell this approach as a 
long-term strategy, not tied to any particular candidate. Sure, some 
known major committer is for sure a good selling point; but a well 
communicated strategy for a long-term foundation-like fund raising to 
improve PostgreSQL in certain ways is the way to go.


2) clear goals for what this person would spend their time doing
+1. That may be the Core Team based on feedback/input from all the 
list, or something like that


3) buy-in from the Core Team, the committers, and the general hackers
community (including buy-in to the idea of favorable publicity for
funding supporters)

+1


4) an organizing committee with the time to deal with managing
foundation funds

+1. Absolutely necessary, otherwise funding will not work


If we had those four things, the fundraising part would be easy.  I
speak as someone who used to raise $600,000 per year for a non-profit in
individual gifts alone.


I know it sounds difficult, and surely it is, but I believe the 
PostgreSQL community should be able to raise, globally, some millions 
per year to stably, and permanently, fund this community-guided 
development and have our best developers devoted 100% to PostgreSQL.



Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...

2014-12-13 Thread Andrew Dunstan



On 11/03/2014 07:35 AM, Fabrízio de Royes Mello wrote:
On Mon, Nov 3, 2014 at 3:12 AM, Rushabh Lathia 
mailto:rushabh.lat...@gmail.com>> wrote:

>
> Patch looks good, assigning to committer.
>

Thanks for your review!




Committed.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón


On 12/13/2014 05:57 PM, José Luis Tallón wrote:

On 12/13/2014 03:09 AM, Alvaro Herrera wrote:

[snip]
Arbitrary SQL expressions (including functions) are not the thing to use
for partitioning -- at least that's how I understand this whole
discussion.  I don't think you want to do "proofs" as such -- they are
expensive.


Yup. Plus, it looks like (from reading Oracle's documentation) they 
end up converting the LESS THAN clauses into range lists internally.

Anyone that can attest to this? (or just disprove it, if I'm wrong)

I just suggested using the existing RangeType infrastructure for this 
( <<, >> and && operators, specifically, might do the trick) before 
reading your mail citing BRIN.
... which might as well allow some interesting runtime 
optimizations when range partitioning is used and *a huge* number of 
partitions get defined --- I'm specifically thinking about massive 
OLTP with very deep (say, 5 years' worth) archival partitioning where 
it would be inconvenient to have the tuple routing information always 
in memory.
I'm specifically suggesting some ( range_value -> partitionOID) 
mapping using a BRIN index for this --- it could be auto-created just 
like we do for primary keys.


Reviewing the existing documentation on this topic I have stumbled on an 
e-mail by Simon Riggs from almost seven years ago

http://www.postgresql.org/message-id/1199296574.7260.149.ca...@ebony.site

 where he suggested a way of physically partitioning tables by using 
segments in a way that sounds to be quite close to what we are proposing 
here.


ISTM that the partitioning meta-data might very well be augmented a bit 
in the direction Simon pointed to, adding support for "effectively 
read-only" and/or "explicitly marked read-only" PARTITIONS (not segments 
in this case) for an additional optimization. We would need some syntax 
additions (ALTER PARTITION  SET READONLY) in this case.

This feature can be added later on, of course.


I'd like to explicitly remark the potentially performance-enhancing 
effect of fillfactor=100 (cfr. 
http://www.postgresql.org/docs/9.3/static/sql-createtable.html) and 
partitions marked "effectively read-only" (cfr. Simon's proposal) when 
coupled with "fullscan analyze" vs. the regular sample-based analyze 
that autovacuum performs.
When a partition consists of multiple *segments*, a generalization of 
the proposed BRIN index (to cover segments in addition to partitions) 
will further speed up scans.





Just for the record, allowing some partitions to be moved to foreign 
tables (i.e. foreign servers, via postgres_fdw) will multiply the 
usefullness of this "partitioned table wide" BRIN index  now 
becoming a real "global index".



Just my 2c


Thanks,

/ J.L.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-13 Thread Andrew Dunstan



On 12/13/2014 01:19 PM, Fabien COELHO wrote:


As it doesn't have documentation, I'm inclined to say we should mark 
this as Waiting on Author or Returned with Feedback.


I'm planing to have a detailed look at Robert's patch before the end 
of the year. I could update pgbench documentation while doing that.




Ok, I think for now that means Returned with Feedback.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-13 Thread Fabien COELHO



As it doesn't have documentation, I'm inclined to say we should mark this as 
Waiting on Author or Returned with Feedback.


I'm planing to have a detailed look at Robert's patch before the end of 
the year. I could update pgbench documentation while doing that.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread David Fetter

On Fri, Dec 12, 2014 at 09:03:12AM -0500, Robert Haas wrote:

> Yeah, range and list partition definitions are very similar, but
> hash partition definitions are a different kettle of fish.  I don't
> think we really need hash partitioning for anything right away -
> it's pretty useless unless you've got, say, a way for the partitions
> to be foreign tables living on remote servers -

There's a patch enabling exactly this feature in the queue for 9.5.

https://commitfest.postgresql.org/action/patch_view?id=1386

> but we shouldn't pick a design that will make it really hard to add
> later.

Indeed not :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread Jim Nasby


On 12/12/14, 3:48 PM, Robert Haas wrote:

On Fri, Dec 12, 2014 at 4:28 PM, Jim Nasby  wrote:

Sure.  Mind you, I'm not proposing that the syntax I just mooted is
actually for the best.  What I'm saying is that we need to talk about
it.


Frankly, if we're going to require users to explicitly define each partition
then I think the most appropriate API would be a function. Users will be
writing code to create new partitions as needed, and it's generally easier
to write code that calls a function as opposed to glomming a text string
together and passing that to EXECUTE.


I have very little idea what the API you're imagining would actually
look like from this description, but it sounds like a terrible idea.
We don't want to make this infinitely general.  We need a *fast* way
to go from a value (or list of values, one per partitioning column) to
a partition OID, and the way to get there is not to call arbitrary
user code.


You were talking about the syntax for partition creation/definition; that's the 
API I was referring to.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2014-12-13 Thread David Fetter

On Fri, Dec 12, 2014 at 10:06:32AM -0500, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > I'd like to include pg_rewind in contrib.
> 
> I don't object to adding the tool as such, but let's wait to see
> what happens with Peter's proposal to move contrib command-line
> tools into src/bin/.  If it should be there it'd be less code churn
> if it went into there in the first place.

+1 for putting it directly in src/bin.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-13 Thread Tom Lane

Andrew Dunstan  writes:
> On 11/20/2014 02:27 AM, Amit Kapila wrote:
>> Now the part where I would like to receive feedback before revising the
>> patch is on the coding style.  It seems to me from Tom's comments that
>> he is not happy with the code, now I am not sure which part of the patch
>> he thinks needs change.  Tom if possible, could you be slightly more
>> specific about your concern w.r.t code?

> In view of the request above for comments from Tom, I have moved this 
> back to "Needs Review".

Sorry, I was not paying very close attention to this thread and missed
the request for comments.  A few such:

1. The patch is completely naive about what might be in the symlink
path string; eg embedded spaces in the path would break it.  On at
least some platforms, newlines could be in the path as well.  I'm not
sure about how to guard against this while maintaining human readability
of the file.

2. There seems to be more going on here than what is advertised, eg
why do we need to add an option to the BASE_BACKUP command (and if
we do need it, doesn't it need to be documented in protocol.sgml)?
And why is the RelationCacheInitFileRemove call relocated?

3. Not terribly happy with the changes made to the API of
do_pg_start_backup, eg having to be able to parse "DIR *" in its
arguments seems like a lot of #include creep.  xlog.h is pretty
central so I'm not happy about plastering more #includes in it.

4. In the same vein, publicly declaring a struct with a name as
generic as "tablespaceinfo" doesn't seem like a great idea, when
its usage is far from generic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] moving Orafce from pgFoundry - pgFoundry management

2014-12-13 Thread David Fetter

On Sat, Dec 13, 2014 at 11:19:08AM +0100, Pavel Stehule wrote:
> Hi
> 
> a Orafce package on pgFoundry is obsolete - we migrated to github few years
> ago. Please, can somebody modify a description about this migration? Or
> drop this project there.

Pavel,

I've removed pgFoundry references from the IRC documentation bot and
replaced them with references to github.

Marc,

Could you please remove the Orafce project from pgFoundry?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón


On 12/13/2014 03:09 AM, Alvaro Herrera wrote:

[snip]
Arbitrary SQL expressions (including functions) are not the thing to use
for partitioning -- at least that's how I understand this whole
discussion.  I don't think you want to do "proofs" as such -- they are
expensive.


Yup. Plus, it looks like (from reading Oracle's documentation) they end 
up converting the LESS THAN clauses into range lists internally.

Anyone that can attest to this? (or just disprove it, if I'm wrong)

I just suggested using the existing RangeType infrastructure for this ( 
<<, >> and && operators, specifically, might do the trick) before 
reading your mail citing BRIN.
... which might as well allow some interesting runtime 
optimizations when range partitioning is used and *a huge* number of 
partitions get defined --- I'm specifically thinking about massive OLTP 
with very deep (say, 5 years' worth) archival partitioning where it 
would be inconvenient to have the tuple routing information always in 
memory.
I'm specifically suggesting some ( range_value -> partitionOID) mapping 
using a BRIN index for this --- it could be auto-created just like we do 
for primary keys.


Just my 2c


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón


On 12/12/2014 05:43 AM, Amit Langote wrote:

[snip]
In case of what we would have called a 'LIST' partition, this could look like

... FOR VALUES (val1, val2, val3, ...)

Assuming we only support partition key to contain only one column in such a 
case.


Hmmm….

[...] PARTITION BY LIST(col1 [, col2, ...])

just like we do for indexes would do.


and CREATE PARTITION child_name OF parent_name
FOR [VALUES] (val1a,val2a), (val1b,val2b), (val1c,val2c)
[IN tblspc_name]

just like we do for multi-valued inserts.


In case of what we would have called a 'RANGE' partition, this could look like

... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)

How about BETWEEN ... AND ... ?


Unless I'm missing something obvious, we already have range types for 
this, don't we?


...   PARTITION BY RANGE (col)

CREATE PARTITION child_name OF parent_name
FOR [VALUES] '[val1min,val1max)', '[val2min,val2max)', 
'[val3min,val3max)'

[IN tblspc_name]

and I guess this should simplify a fully flexible implementation (if you 
can construct a RangeType for it, you can use that for partitioning).
This would substitute the ugly (IMHO) "VALUES LESS THAN" syntax with a 
more flexible one
(even though it might end up being converted into "less than" 
boundaries internally for implementation/optimization purposes)


In both cases we would need to allow for overflows / default partition 
different from the parent table.



Plus some ALTER PARTITION part_name TABLESPACE=tblspc_name


The main problem being that we are assuming named partitions here, which 
might not be that practical at all.



[snip]
I would include the noise keyword VALUES just for readability if 
anything. 


+1


FWIW, deviating from already "standard" syntax (Oracle-like --as 
implemented by PPAS for example-- or DB2-like) is quite 
counter-productive unless we have very good reasons for it... which 
doesn't mean that we have to do it exactly like they do (specially if we 
would like to go the incremental implementation route).


Amit: mind if I add the DB2 syntax for partitioning to the wiki, too?

This might as well help with deciding the final form of 
partitioning (and define the first implementation boundaries, too)



Thanks,

/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate #define

2014-12-13 Thread Heikki Linnakangas


On 12/13/2014 04:45 PM, Mark Dilger wrote:

In commit 2c03216d831160bedd72d45f712601b6f7d03f1c, the
following define occurs twice in src/include/access/xlogrecord.h:

#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))

It is no big deal, as the definitions don't contradict each other.  We
could probably live with just one, though.

Patch attached


Thanks, fixed!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-13 Thread Andrew Dunstan



On 11/24/2014 07:26 AM, Heikki Linnakangas wrote:

On 09/25/2014 05:10 AM, Robert Haas wrote:
On Wed, Sep 24, 2014 at 2:34 PM, Fabien COELHO  
wrote:

Sigh.

How to transform a trivial 10 lines patch into a probably 500+ lines 
project
involving flex & bison & some non trivial data structures, and which 
may get

rejected on any ground...

Maybe I'll set that as a student project.


I think you're making a mountain out of a molehill.  I implemented
this today in about three hours.  The patch is attached.  It needs
more testing, documentation, and possibly some makefile adjustments,
but it seems to basically work.


Looks good to me. The new modulo operator needs documentation, and it 
could use a pgindent run, but other than that I think this is ready 
for commit.


It would be nice to go even further, and replace process_file, 
process_builtin, and process_commands altogether with a Bison parser. 
But this is definitely a step in the right direction.





As it doesn't have documentation, I'm inclined to say we should mark 
this as Waiting on Author or Returned with Feedback.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-13 Thread Andrew Dunstan

On 11/20/2014 02:27 AM, Amit Kapila wrote:
On Wed, Nov 19, 2014 at 11:46 PM, Robert Haas > wrote:

> On Tue, Nov 18, 2014 at 9:19 AM, Alvaro Herrera
> mailto:alvhe...@2ndquadrant.com>> wrote:
> >> Right, but they provide same functionality as symlinks and now we
> >> are even planing to provide this feature for both linux and 
windows as

> >> both Tom and Robert seems to feel, it's better that way.  Anyhow,
> >> I think naming any entity generally differs based on individual's
> >> perspective, so we can go with the name which appeals to more people.
> >> In case, nobody else has any preference, I will change it to what 
both
> >> of us can agree upon (either 'tablespace catalog', 
'tablespace_info' ...).

> >
> > Well, I have made my argument.  Since you're the submitter, feel 
free to

> > select what you think is the best name.
>
> For what it's worth, I, too, dislike having symlink in the name.
> Maybe "tablespace_map"?

Sounds good to me as well.

To summarize the situation of this patch, I have received below comments
on which I am planning to work:

1. Change the name of file containing tablespace path information.
2. Store tablespace name as well along with oid and path to make the
information Human readable.
3. Make the code generic (Remove #ifdef Win32 macro's and change
comments referring this functionality for windows and see if any more
changes are required to make it work on linux.)

Now the part where I would like to receive feedback before revising the
patch is on the coding style.  It seems to me from Tom's comments that
he is not happy with the code, now I am not sure which part of the patch
he thinks needs change.  Tom if possible, could you be slightly more
specific about your concern w.r.t code?

I have attached a rebased (on top of commit-8d7af8f) patch, just incase
some one wants to apply and check it.

In view of the request above for comments from Tom, I have moved this 
back to "Needs Review".

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split builtins.h to quote.h

2014-12-13 Thread Tom Lane

Andrew Dunstan  writes:
> On 11/08/2014 12:37 AM, Michael Paquier wrote:
>> Well, yes :) I missed that. Note that I am leaning to Robert's
>> direction as well to do a clear separation... Now if the final
>> consensus is different, then let's use the patch attached that puts
>> the SQL functions to builtins.h, and the rest in quote.h.

> I am unlcear about what the consensus is on this, and don't have strong 
> feelings either way. Do we need a vote? It's not of earth-shattering 
> importance, but my slight inclination would be to do the minimally 
> invasive thing where there is disagreement.

Well, the minimally invasive thing would be to reject the patch
altogether.  Do we really need this?

In a quick look, the patch seems to result in strictly increasing the
number of #include's needed, which ISTM is not a positive sign for a
refactoring, especially given the number of files it hits.  If there
had been some #include's removed as well, I'd be happier.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split builtins.h to quote.h

2014-12-13 Thread Andrew Dunstan



On 11/08/2014 12:37 AM, Michael Paquier wrote:

On Sat, Nov 8, 2014 at 5:55 AM, Alvaro Herrera  wrote:

Michael Paquier wrote:

On Fri, Nov 7, 2014 at 2:31 AM, Alvaro Herrera  wrote:

I thought the consensus was that the SQL-callable function declarations
should remain in builtins.h -- mainly so that quote.h does not need to
include fmgr.h.

Moving everything to quote.h is done in-line with what Tom and Robert
suggested, builtins.h being a refuge for function declarations that
have nowhere else to go. Suggestion from here:
http://www.postgresql.org/message-id/ca+tgmozf3dkptua6ex6gxlnnd-nms-fbjcxroitwffh-+6y...@mail.gmail.com

Did you miss this one?
http://www.postgresql.org/message-id/31728.1413209...@sss.pgh.pa.us

Well, yes :) I missed that. Note that I am leaning to Robert's
direction as well to do a clear separation... Now if the final
consensus is different, then let's use the patch attached that puts
the SQL functions to builtins.h, and the rest in quote.h.



I am unlcear about what the consensus is on this, and don't have strong 
feelings either way. Do we need a vote? It's not of earth-shattering 
importance, but my slight inclination would be to do the minimally 
invasive thing where there is disagreement.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2014-12-13 Thread Tom Lane

Tatsuo Ishii  writes:
> Currently pgbench -f (run custom script) executes vacuum against
> pgbench_* tables before stating bench marking if -n (or --no-vacuum)
> is not specified. If those tables do not exist, pgbench fails. To
> prevent this, -n must be specified. For me this behavior seems insane
> because "-f" does not necessarily suppose the existence of the
> pgbench_* tables.  Attached patch prevents pgbench from exiting even
> if those tables do not exist.

I don't particularly care for this approach.  I think if we want to
do something about this, we should just make -f imply -n.  Although
really, given the lack of complaints so far, it seems like people
manage to deal with this state of affairs just fine.  Do we really
need to do anything?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4rc bug in percentile_cont

2014-12-13 Thread Tom Lane

Andrew Gierth  writes:
> Just got a report on IRC of a bug in the array version of
> percentile_cont; if two of the requested percentiles were between the
> same pair of input rows, the result could be wrong or an error would
> be generated.

Oooh, good catch.

> Proposed patch (against current master) attached.

I think this patch could be improved a bit; will work on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Status of CF 2014-10 and upcoming 2014-12

2014-12-13 Thread Andrew Dunstan



On 12/13/2014 10:00 AM, Michael Paquier wrote:

Hi all,

Looking at the CF app as of today, there is the following status for
pending patches:
- Needs Review: 18
- Waiting on Author: 1
- Ready for Committer: 8

And the next coming fest that should begin on Monday has this status:
- Needs Review: 34
- Waiting on Author: 3
- Ready for Committer: 1

I would like to volunteer to be the CFM of the upcoming commit fest as
I think that I will be able to find enough room to do it correctly. On
top of that, I think that it would be good to clean up the current CF,
meaning doing a last round of lookup at the patches and move the
entries to the next CF, or change their status according to what has
been done. I'd suspect that most of them will be simply moved, but it
would still be good to have a last look... In any case, I will be able
to do that on Monday.
Opinions?



I will look at clearing out a few of the Ready For Committer patches 
today and tomorrow.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

2014-12-13 Thread Michael Paquier

On Sat, Dec 13, 2014 at 11:53 PM, Julien Rouhaud
 wrote:
> I agree with you about the problems of the v2 patch I originally sent.
> I think this v3 is the right way of keeping track of .ready files, so
> it's ok for me. The v3 also still applies well on current head.
Simon got a good point mentioning that we can currently estimate the
number of files to be archived with the information that we have now
as the logic in the archiver is made as such. This information would
still be useful for a node freshly promoted that needs to promote a
bunch of files btw... But now there are as well discussions about
having a node only archive WAL files it produces, aka a master
archiving only WAL files on its current timeline, so we wouldn't
really need this patch if that's done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Status of CF 2014-10 and upcoming 2014-12

2014-12-13 Thread Michael Paquier

Hi all,

Looking at the CF app as of today, there is the following status for
pending patches:
- Needs Review: 18
- Waiting on Author: 1
- Ready for Committer: 8

And the next coming fest that should begin on Monday has this status:
- Needs Review: 34
- Waiting on Author: 3
- Ready for Committer: 1

I would like to volunteer to be the CFM of the upcoming commit fest as
I think that I will be able to find enough room to do it correctly. On
top of that, I think that it would be good to clean up the current CF,
meaning doing a last round of lookup at the patches and move the
entries to the next CF, or change their status according to what has
been done. I'd suspect that most of them will be simply moved, but it
would still be good to have a last look... In any case, I will be able
to do that on Monday.
Opinions?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

2014-12-13 Thread Julien Rouhaud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 18/11/2014 08:36, Michael Paquier a écrit :
> On Wed, Oct 22, 2014 at 12:50 AM, Brightwell, Adam 
>  wrote:
>> Though, I would think that the general desire would be to keep
>> the patch relevant ONLY to the necessary changes.  I would not
>> qualify making those types of changes as relevant, IMHO.  I do
>> think this is potential for cleanup, however, I would suspect
>> that would be best done in a separate patch.  But again, I'd
>> defer to a committer whether such changes are even 
>> necessary/acceptable.
> 
> I have been looking at this patch, and I think that it is a mistake
> to count the .ready files present in archive_status when calling 
> pg_stat_get_archiver(). If there are many files waiting to be 
> archived, this penalizes the run time of this function, and the 
> application behind relying on those results, not to mention that 
> actually the loop used to count the .ready files is a copy of what
> is in pgarch.c. Hence I think that we should simply count them in 
> pgarch_readyXlog, and then return a value back to 
> pgarch_ArchiverCopyLoop, value that could be decremented by 1 each 
> time a file is successfully archived to keep the stats as precise
> as possible, and let the information know useful information when 
> archiver process is within a single loop process of 
> pgarch_ArchiverCopyLoop. This way, we just need to extend 
> PgStat_MsgArchiver with a new counter to track this number.
> 
> The attached patch, based on v2 sent previously, does so.
> Thoughts?
> 
> 
> 
> 

Sorry for this late answer.

I agree with you about the problems of the v2 patch I originally sent.
I think this v3 is the right way of keeping track of .ready files, so
it's ok for me. The v3 also still applies well on current head.


Regards.
- -- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUjFLWAAoJELGaJ8vfEpOqV9AIAI1yTUYqiB8rMJpfM47IHiM6
92fRNJ7sGwuFKD7Vb2gcMuRLelhFVRevJ7tjhggci8Y36j6YDXgqz74kTjkXvcjN
/SlyS2CIcSleWwvJ2A/WZM0rIzbtm1DTahKupQQ8UdcjHsk3m8T+nySIGyQWdKzz
X9JiXATztlevAaC/1Mf+zsbDSzW5tiQVfIm835G1/sEqIXh43TQyyXyr/nJFlFfQ
85OPssInrxt1e2F82s3SoXb7lIBZg77fZTEusxG5zHX5ANF6uMpF7CBJiZXezRYw
xWrKKuJBLw4zSimzNsVYpxNN3jJuANEAkvzIV+glKDYD57A3DbmpYSJ+btXtDIw=
=JKhg
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Custom timestamp format in logs

2014-12-13 Thread Michael Paquier

Hi all,

This week, we heard about a user willing to use a custom timestamp
format across a set of services to improve the debugability of the
whole set, Postgres being one of them. Unfortunately datestyle does
not take into account the logs. Would it be worth adding a new GUC
able to control the timestamp format in the logs?

We could still redirect the logs with syslog and have a custom
timestamp format there, but in the case of this particular user syslog
was a no-go. Looking at the code, timestamp format is now hardcoded in
setup_formatted_log_time and setup_formatted_start_time when calling
pg_strftime @ elog.c, so it doesn't seem to be much complicated to do.

Opinions? This thread is here for that.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] duplicate #define

2014-12-13 Thread Mark Dilger

In commit 2c03216d831160bedd72d45f712601b6f7d03f1c, the
following define occurs twice in src/include/access/xlogrecord.h:

#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))

It is no big deal, as the definitions don't contradict each other.  We
could probably live with just one, though.

Patch attached


Mark Dilger
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 11ddfac..fbfad5f 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -174,7 +174,4 @@ typedef struct XLogRecordDataHeaderLong
 #define XLR_BLOCK_ID_DATA_SHORT255
 #define XLR_BLOCK_ID_DATA_LONG 254
 
-#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))
-
-
 #endif   /* XLOGRECORD_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Fri, Dec 12, 2014 at 11:50 PM, Michael Paquier
 wrote:
>
>
> On Wed, Dec 10, 2014 at 11:25 PM, Bruce Momjian  wrote:
>>
>> On Wed, Dec 10, 2014 at 07:40:46PM +0530, Rahila Syed wrote:
>> > The tests ran for around 30 mins.Manual checkpoint was run before each
>> > test.
>> >
>> > Compression   WAL generated%compressionLatency-avg   CPU usage
>> > (seconds)  TPS
>> > Latency
>> > stddev
>> >
>> >
>> > on  1531.4 MB  ~35 %  7.351 ms
>> >   user diff: 562.67s system diff: 41.40s  135.96
>> >   13.759 ms
>> >
>> >
>> > off  2373.1 MB 6.781
>> > ms
>> >   user diff: 354.20s  system diff: 39.67s147.40
>> >   14.152 ms
>> >
>> > The compression obtained is quite high close to 35 %.
>> > CPU usage at user level when compression is on is quite noticeably high
>> > as
>> > compared to that when compression is off. But gain in terms of reduction
>> > of WAL
>> > is also high.
>>
>> I am sorry but I can't understand the above results due to wrapping.
>> Are you saying compression was twice as slow?
>
>
> I got curious to see how the compression of an entire record would perform
> and how it compares for small WAL records, and here are some numbers based
> on the patch attached, this patch compresses the whole record including the
> block headers, letting only XLogRecord out of it with a flag indicating that
> the record is compressed (note that this patch contains a portion for replay
> untested, still this patch gives an idea on how much compression of the
> whole record affects user CPU in this test case). It uses a buffer of 4 *
> BLCKSZ, if the record is longer than that compression is simply given up.
> Those tests are using the hack upthread calculating user and system CPU
> using getrusage() when a backend.
>
> Here is the simple test case I used with 512MB of shared_buffers and small
> records, filling up a bunch of buffers, dirtying them and them compressing
> FPWs with a checkpoint.
> #!/bin/bash
> psql < SELECT pg_backend_pid();
> CREATE TABLE aa (a int);
> CREATE TABLE results (phase text, position pg_lsn);
> CREATE EXTENSION IF NOT EXISTS pg_prewarm;
> ALTER TABLE aa SET (FILLFACTOR = 50);
> INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location());
> INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB
> SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
> SELECT pg_prewarm('aa'::regclass);
> CHECKPOINT;
> INSERT INTO results VALUES ('pre-update', pg_current_xlog_location());
> UPDATE aa SET a = 700 + a;
> CHECKPOINT;
> INSERT INTO results VALUES ('post-update', pg_current_xlog_location());
> SELECT * FROM results;
> EOF
Re-using this test case, I have produced more results by changing the
fillfactor of the table:
=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update
- pre_update), user_diff, system_diff from results;
   ?column?| pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(12 rows)

The following tests are run:
- "Record" means the record-level compression
- "HEAD" is postgres at 1c5c70df
- "FPW off" is HEAD + patch with switch set to off
- "FPW on" is HEAD + patch with switch set to on
The gain in compression has a linear profile with the length of page
hole. There was visibly some noise in the tests: you can see that the
CPU of "FPW off" is a bit higher than HEAD.

Something to be aware of btw is that this patch introduces an
additional 8 bytes per block image in WAL as it contains additional
information to control the compression. In this case this is the
uint16 compress_len present in XLogRecordBlockImageHeader. In the case
of the measurements done, knowing that 63638 FPWs have been written,
there is a difference of a bit less than 500k in WAL between HEAD and
"FPW off" in favor of HEAD. The gain with compression is welcome,
still for the default there is a small price to track down if a block
is compressed or not. T

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-13 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

With the high-priority questions out of the way, time to tackle the
rest:

 Tom> My single biggest complaint is about the introduction of struct
 Tom> GroupedVar.  If we stick with that, we're going to have to teach
 Tom> an extremely large number of places that know about Vars to also
 Tom> know about GroupedVars.  This will result in code bloat and
 Tom> errors of omission.  If you think the latter concern is
 Tom> hypothetical, note that you can't get 40 lines into gsp1.patch
 Tom> without finding such an omission, namely the patch fails to
 Tom> teach pg_stat_statements.c about GroupedVars.  (That also points
 Tom> up that some of the errors of omission will be in third-party
 Tom> code that we can't fix easily.)

Except that GroupedVar is created only late in planning, and so only a
small proportion of places need to know about it (and certainly
pg_stat_statements does not). It also can't end up attached to any
foreign scan or otherwise potentially third-party plan node.

 Tom> I think you should get rid of that concept and instead implement
 Tom> the behavior by having nodeAgg.c set the relevant fields of the
 Tom> representative tuple slot to NULL, so that a regular Var does
 Tom> the right thing.

We did consider that. Messing with the null flags of the slot didn't
seem like an especially clean approach. But if that's how you want
it...

 Tom> I don't really have any comments on the algorithms yet, having
 Tom> spent too much time trying to figure out underdocumented data
 Tom> structures to get to the algorithms.  However, noting the
 Tom> addition of list_intersection_int() made me wonder whether you'd
 Tom> not be better off reducing the integer lists to bitmapsets a lot
 Tom> sooner, perhaps even at parse analysis.

list_intersection_int should not be time-critical; common queries do
not call it at all (simple cube or rollup clauses always have an empty
grouping set, causing the intersection test to bail immediately), and
in pathological worst-case constructions like putting a dozen
individually grouped columns in front of a 12-d cube (thus calling it
4096 times on lists at least 12 nodes long) it doesn't account for
more than a small percentage even with optimization off and debugging
and asserts on.

The code uses the list representation almost everywhere in parsing and
planning because in some places the order of elements matters, and I
didn't want to keep swapping between a bitmap and a list
representation.

(We _do_ use bitmapsets where we're potentially going to be doing an
O(N^2) number of subset comparisons to build the graph adjacency
list for computing the minimal set of sort operations, and at
execution time.)

I didn't even consider using bitmaps for the output of parse analysis
because at that stage we want to preserve most of the original query
substructure (otherwise view deparse won't look anything like the
original query did).

 Tom> list_intersection_int() is going to be O(N^2) by nature.  Maybe
 Tom> N can't get large enough to matter in this context, but I do see
 Tom> places that seem to be concerned about performance.

My main feeling on performance is that simple cube and rollup clauses
or short lists of grouping sets should parse and plan very quickly;
more complex cases should parse and plan fast enough that execution
time on any nontrivial input will swamp the parse/plan time; and the
most complex cases that aren't outright rejected should plan in no
more than a few seconds extra. (We're limiting to 4096 grouping sets
in any query level, which is comparable to other databases and seems
quite excessively high compared to what people are actually likely to
need.)

(don't be fooled by the excessive EXPLAIN time on some queries. There
are performance issues in EXPLAIN output generation that have nothing
to do with this patch, and which I've not pinned down.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 12 December 2014 at 21:40, Robert Haas  wrote:
> On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs  wrote:
>> What I don't understand is why we aren't working on double buffering,
>> since that cost would be paid in a background process and would be
>> evenly spread out across a checkpoint. Plus we'd be able to remove
>> FPWs altogether, which is like 100% compression.
>
> The previous patch to implement that - by somebody at vmware - was an
> epic fail.  I'm not opposed to seeing somebody try again, but it's a
> tricky problem.  When the double buffer fills up, then you've got to
> finish flushing the pages whose images are stored in the buffer to
> disk before you can overwrite it, which acts like a kind of
> mini-checkpoint.  That problem might be solvable, but let's use this
> thread to discuss this patch, not some other patch that someone might
> have chosen to write but didn't.

No, I think its relevant.

WAL compression looks to me like a short term tweak, not the end game.

On that basis, we should go for simple and effective, user-settable
compression of FPWs and not spend too much Valuable Committer Time on
it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2014-12-13 Thread Amit Kapila

On Tue, Dec 9, 2014 at 10:45 AM, Amit Kapila 
wrote:
> On Mon, Dec 8, 2014 at 3:17 PM, Simon Riggs  wrote:
> >
> > On 8 December 2014 at 11:46, Michael Paquier 
wrote:
> > > I don't really like those new names, but I'd prefer
> > > wal_compression_level if we go down that road with 'none' as default
> > > value. We may still decide in the future to support compression at the
> > > record level instead of context level, particularly if we have an API
> > > able to do palloc_return_null_at_oom, so the idea of WAL compression
> > > is not related only to FPIs IMHO.
> >
> > We may yet decide, but the pglz implementation is not effective on
> > smaller record lengths. Nor has any testing been done to show that is
> > even desirable.
> >
>
> It's even much worse for non-compressible (or less-compressible)
> WAL data.

To check the actual effect, I have ran few tests with the patch
(0001-Move-pg_lzcompress.c-to-src-common,
0002-Support-compression-for-full-page-writes-in-WAL) and the
data shows that for worst case (9 short and 1 long, short changed)
there is dip of ~56% in runtime where the compression is less (~20%)
and a ~35% of dip in runtime for the small record size
(two short fields, no change) where compression is ~28%. For best
case (one short and one long field, no change), the compression is
more than 2 times and there is an improvement in runtime of ~4%.
Note that in worst case, I am using random string due to which the
compression is less and it seems to me that worst is not by far the
worst because we see some compression in that case as well.  I
think this might not be the best test to measure the effect of this
patch, but still it has data for various compression ratio's which
could indicate the value of this patch.  Test case used to take
below data is attached with this mail.

Seeing this data, one way to mitigate the cases where it can cause
performance impact is to have a table level compression flag which
we have discussed last year during development of WAL compression
for Update operation as well.

Performance Data
-
m/c configuration -
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB
Non-default parameters -
checkpoint_segments - 256
checkpoint_timeout - 15 min


wal_compression=off

testname | wal_generated | duration
-+---+--
 two short fields, no change | 540055720 | 12.1288201808929
 two short fields, no change | 542911816 | 11.8804960250854
 two short fields, no change | 540063400 | 11.7856659889221
 two short fields, one changed   | 540055792 | 11.9835240840912
 two short fields, one changed   | 540056624 | 11.9008920192719
 two short fields, one changed   | 540059560 |  12.064150094986
 two short fields, both changed  | 581813832 | 10.290940847
 two short fields, both changed  | 579823384 | 12.4431331157684
 two short fields, both changed  | 579896448 | 12.5214929580688
 one short and one long field, no change | 320058048 | 5.04950094223022
 one short and one long field, no change | 321150040 | 5.24907302856445
 one short and one long field, no change | 320055072 | 5.07368278503418
 ten tiny fields, all changed| 620765680 | 14.2868521213531
 ten tiny fields, all changed| 620681176 | 14.2786719799042
 ten tiny fields, all changed| 620684600 |  14.21634316
 hundred tiny fields, all changed| 306317512 | 6.98173499107361
 hundred tiny fields, all changed| 308039000 | 7.03955984115601
 hundred tiny fields, all changed| 307117016 | 7.11708188056946
 hundred tiny fields, half changed   | 306483392 | 7.06978106498718
 hundred tiny fields, half changed   | 309336056 | 7.07678198814392
 hundred tiny fields, half changed   | 306317432 | 7.02817606925964
 hundred tiny fields, half nulled| 219931376 | 6.29952597618103
 hundred tiny fields, half nulled| 221001240 | 6.34559392929077
 hundred tiny fields, half nulled| 219933072 | 6.36759996414185
 9 short and 1 long, short changed   | 253761248 | 4.37235498428345
 9 short and 1 long, short changed   | 253763040 | 4.34973502159119
 9 short and 1 long, short changed   | 253760280 | 4.34902000427246
(27 rows)




wal_compression = on

testname | wal_generated | duration
-+---+--
 two short fields, no change | 420569264 | 18.1419389247894
 two short fields, no change | 423401960 | 16.0569458007812
 two short fields, no change | 420568240 | 15.9060699939728
 two short fields, one changed   | 420769880 | 15.4179458618164
 two short fields, one changed

Re: [HACKERS] Reducing lock strength of adding foreign keys

2014-12-13 Thread Andreas Karlsson


On 10/28/2014 01:33 AM, Noah Misch wrote:

ALTER TRIGGER is not bad; like you say, change pg_get_triggerdef_worker() the
way commit e5550d5 changed pg_get_constraintdef_worker().  DROP TRIGGER is
more difficult.  pg_constraint.tgqual of a dropped trigger may reference other
dropped objects, which calls for equipping get_rule_expr() to use the
transaction snapshot.  That implicates quite a bit of ruleutils.c code.


I started looking into this again and fixed 
pg_get_constraintdef_worker() as suggested.


But I have no idea how to fix get_rule_expr() since it relies on doing 
lookups in the catcache. Replacing these with uncached lookups sounds 
like it could cause quite some slowdown. Any ideas?


Indexes should suffer from the same problems since they too have emay 
contain expressions but they seem to solve this by having a higher lock 
level on DROP INDEX, but I do wonder about the CONCURRENTLY case.


By the way, unless I am mistaken there is currently no protection 
against having a concurrent ALTER FUNCTION ... RENAME mess up what is 
dumped in by pg_get_triggerdef().


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] moving Orafce from pgFoundry - pgFoundry management

2014-12-13 Thread Pavel Stehule

Hi

a Orafce package on pgFoundry is obsolete - we migrated to github few years
ago. Please, can somebody modify a description about this migration? Or
drop this project there.

Regards

Pavel

[HACKERS] pgbench -f and vacuum

2014-12-13 Thread Tatsuo Ishii

Currently pgbench -f (run custom script) executes vacuum against
pgbench_* tables before stating bench marking if -n (or --no-vacuum)
is not specified. If those tables do not exist, pgbench fails. To
prevent this, -n must be specified. For me this behavior seems insane
because "-f" does not necessarily suppose the existence of the
pgbench_* tables.  Attached patch prevents pgbench from exiting even
if those tables do not exist. Here is the sample session:

./pgbench -f /tmp/a.sql test2
starting vacuum...ERROR:  relation "pgbench_branches" does not exist
ERROR:  relation "pgbench_tellers" does not exist
ERROR:  relation "pgbench_history" does not exist
end.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 5977.286312 (including connections establishing)
tps = 15822.784810 (excluding connections establishing)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 3453a1f..0a48646 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -605,6 +605,22 @@ executeStatement(PGconn *con, const char *sql)
 	PQclear(res);
 }
 
+/* call PQexec() but does not exit() on failure, instead returns -1. */
+static int
+executeStatement2(PGconn *con, const char *sql)
+{
+	PGresult   *res;
+
+	res = PQexec(con, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		fprintf(stderr, "%s", PQerrorMessage(con));
+		return -1;
+	}
+	PQclear(res);
+	return 0;
+}
+
 /* set up a connection to the backend */
 static PGconn *
 doConnect(void)
@@ -3193,15 +3209,19 @@ main(int argc, char **argv)
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
-		executeStatement(con, "vacuum pgbench_branches");
-		executeStatement(con, "vacuum pgbench_tellers");
-		executeStatement(con, "truncate pgbench_history");
+		if (executeStatement2(con, "vacuum pgbench_branches") && ttype != 3)
+			exit(1);
+		if (executeStatement2(con, "vacuum pgbench_tellers") && ttype != 3)
+			exit(1);
+		if (executeStatement2(con, "truncate pgbench_history") && ttype != 3)
+			exit(1);
 		fprintf(stderr, "end.\n");
 
 		if (do_vacuum_accounts)
 		{
 			fprintf(stderr, "starting vacuum pgbench_accounts...");
-			executeStatement(con, "vacuum analyze pgbench_accounts");
+			if (executeStatement2(con, "vacuum analyze pgbench_accounts") && ttype != 3)
+exit(1);
 			fprintf(stderr, "end.\n");
 		}
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-13 Thread Craig Ringer

On 12/12/2014 06:01 AM, David G Johnston wrote:
> The "patch list" concept should be formalized, and should include a
> "targeted release" concept.

IMO, the "patch list" concept should be discarded in favour of a
"working tree list".

At this point, given the challenges the CF process faces, I can't say
I'd be sad if Pg adopted GitHub's push/pull process on their
infrastructure ... but I'm aware that's not going to happen, and I
understand at least some of the reasons why.

I'd be really happy to see the CF somewhat closer to that than a bunch
of message-id copy-and-pasting and patch emailing though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] moving from contrib to bin

2014-12-13 Thread Christoph Berg

Re: Alvaro Herrera 2014-12-12 <20141212203700.gb1...@alvh.no-ip.org>
> > Pardon me for not knowing much about Debian packages, but how would
> > that work exactly?  Is it possible to do make install-client, then
> > package the installed files, then rm -rf the install tree, then
> > repeat for install-server and install-contrib?  In the RPM world
> > this would never work because the build/install step happens in
> > toto before the packaging step.
> 
> Uh, couldn't you just run "make install-client DESTDIR=.../client" for
> client-only files, and so on?  You would end up with separate
> directories containing files for each subpackage.

Exactly. You don't need to use DESTDIR=debian/tmp, you can "make
install-client DESTDIR=$(CURDIR)/debian/postgresql-client-9.4" and
skip the intermediate location for some of the files.

> > Even without that, it seems like it'd be hard to make it entirely
> > automatic since some files would be installed in multiple cases (and
> > directories even more so).
> 
> Yeah, you would need to fix that somehow.

Directories shipped in multiple packages are not a problem for dpkg
(/usr/bin!). Files must not be installed twice, but if that happened,
I'd deem that a bug in the Makefile.

The src/contrib/doc (and config) directories already have separate
install targets. These do exactly what we need for server/contrib/doc
installation, so I'll likely move to using these to remove some
complexity from debian/*.install.

On top of that, a separate "install-client" or "make -C client
install" target would be very nice, as currently not only the client
binaries have to be picked from the server tree, but also the manpages
and locale files.

The same would be nice for end-users who just want a client install
when building from source. At the moment they need to install the full
server.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-13 Thread Craig Ringer

On 12/12/2014 06:02 AM, Josh Berkus wrote:
> 
> Speaking as the originator of commitfests, they were *always* intended
> to be a temporary measure, a step on the way to something else like
> continuous integration.

I'd really like to see the project revisit some of the underlying
assumptions that're being made in this discussion:

- Patches must be email attachments to a mailing list

- Changes must be committed by applying a diff

... and take a look at some of the options a git-based workflow might
offer, especially in combination with some of the tools out there that
help track working branches, run CI, etc.

Having grown used to push/pull workflows with CI integration I find the
PostgreSQL patch workflow very frustrating, especially for larger
patches. It's particularly annoying to see a patch series squashed into
a monster patch whenever it changes hands or gets rebased, because it's
being handed around as a great honking diff not a git working branch.

Is it time to stop using git like CVS?

(/hides)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

2014-12-13 Thread Craig Ringer

Hi all

While working on BDR, I've run into a situation that I think highlights
a limitation of the dynamic bgworker API that should be fixed.
Specifically, when the postmaster crashes and restarts shared memory
gets cleared but dynamic bgworkers don't get unregistered, and that's a
mess.

I have a few suggestions and would like your thoughts / preferences. If
I don't hear anything I'll post a patch to add a
BGW_UNREGISTER_ON_POSTMASTER_RESTART flag soon.


Details:


I think we need one of:

* An API to enumerate registered BackgroundWorker s and get their
BackgroundWorkerHandle s so a restarting extension can clean up its old
workers from before the restart by killing and unregistering them using
their handles;

* Make background workers unconditionally unregistered on postmaster
restart. This would probably have been the correct design, but I think
it's too late to change as an unconditional default now.

* Add a BGW_UNREGISTER_ON_POSTMASTER_CRASH (or whatever) flag that lets
extensions tell the postmaster to discard a dynamic bgworker when it
restarts and clears shmem. There was some earlier discussion (see below)
on that when unregistration on exit code 0 got added, but it didn't make
it in the final patch.

An API to enumerate bgworkers seems pretty easy to add, and generally
useful. The worker lib + funcname + argument + name should be sufficient
to identify a worker usefully. Worth having? When it came up before it
was bounced because we were too close to releasing 9.4 (heh) for proper
API discussion and review.

Overall I'd prefer to have a simple flag to unregister workers on
postmaster restart, but I think an enumeration API could be generally a
useful thing.

Note that in the prior thread related to this:

http://www.postgresql.org/message-id/534e250f.2060...@2ndquadrant.com

I was at that time passing pointers into postmaster-allocated memory
directly to bgworkers. That's no longer the case. The same problem
exists when you pass an offset into a shared memory array if you can't
guarantee that the array is always initialized with the same entries in
the same order when the postmaster restarts, though.

See in particular:

http://www.postgresql.org/message-id/20140416112144.gd17...@awork2.anarazel.de

--- Explanation ---

The latest BDR extension has a single static bgworker registered at
shared_preload_libraries time. This worker launches one dynamic bgworker
per database. Those dynamic bgworkers are in turn responsible for
launching workers for each connection to another node in the mesh
topology (and for various other tasks). They communicate via shared
memory blocks, where each worker has an offset into a shared memory array.

That's all fine until the postmaster crashes and restarts, zeroing
shared memory. The dynamic background workers are killed by the
postmaster, but *not* unregistered. Workers only get unregistered if
they exit with exit code 0, which isn't the case when they're killed, or
when their restart interval is BGW_NO_RESTART .

This means that when the workers start they try to access their shared
memory blocks via the offsets passed as arguments in the
BackgroundWorker struct and find that their shared memory block is
zeroed out, so they can do nothing but exit. Or worse, their shared
memory block might've since been initialized with data for some other
unrelated worker launched after the postmaster restart.

Meanwhile the static worker that manages BDR has been restarted and has
to set up shared memory and register workers. It doesn't have any way of
knowing which workers from before the postmaster restart were registered
or what their offsets into shared memory are. So it has to launch a new
batch of workers, but it has no way to stop the old workers from seeing
the new workers' shared memory blocks as their own.

To work around this, at Andres's suggestion I added a "worker generation
number" as a postmaster var that gets copied into the BDR shared memory
control segment at startup. The generation number is passed to workers
as the high 16 bits of their int32 worker argument, with the low 16 bits
being their offset into the shared memory array.

Workers check the generation number before trying to access their shared
memory blocks and proc_exit(0) if they see they're from a previous
generation or if the global generation number is zero (indicating shmem
got zeroed and hasn't been set up again yet).

This works, but it's ugly, especially the need to jam the generation
number into the worker argument. I think this problem will be common  as
adoption of dynamic bgworkers increases and should be fixed in 9.5 if
possible.

So: comments, preferences?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

53 matches

Mail list logo