Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-04-17 Thread Amit Kapila
On Fri, Apr 3, 2015 at 9:07 PM, Abhijit Menon-Sen 
wrote:
>
> At 2015-03-31 22:43:49 +0530, a...@2ndquadrant.com wrote:
> >
> > I'm just posting this WIP patch where I've renamed fastbloat to
> > pgstatbloat as suggested by Tomas, and added in the documentation, and
> > so on.
>
> Here's the revised version that also adds the count of RECENTLY_DEAD and
> INSERT/DELETE_IN_PROGRESS tuples to the call to vac_estimate_reltuples.
>

I think you have missed to address the below point:

>> 4) prefetch
>>
>>fbstat_heap is using visibility map to skip fully-visible pages,
>>which is nice, but if we skip too many pages it breaks readahead
>>similarly to bitmap heap scan. I believe this is another place where
>>effective_io_concurrency (i.e. prefetch) would be appropriate.
>>

> Good point.  We can even think of using the technique used by Vacuum
> which is skip only when we can skip atleast SKIP_PAGES_THRESHOLD
> pages.


Apart from this, one minor point:
+typedef struct pgstatbloat_output_type
+{
+ uint64 table_len;
+ uint64 tuple_count;
+ uint64 misc_count;


misc_count sounds out of place, may be non_live_count?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_upgrade in 9.5 broken for adminpack

2015-04-17 Thread Bruce Momjian
On Thu, Apr 16, 2015 at 11:29:07PM -0700, Jeff Janes wrote:
> Of course after sending that it became obvious.  The C function is not getting
> called because the SQL function is marked as being strict, yet is called with
> NULL arguments.
> 
> Trivial patch attached to unset strict flag in pg_proc.h.
> 
> But  CATALOG_VERSION_NO probably needs another bump as well.

Patch applied and catversion bumped.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving on to close the current CF 2015-02

2015-04-17 Thread Michael Paquier
On Sat, Apr 18, 2015 at 12:03 AM, Simon Riggs wrote:
> On 17 April 2015 at 08:22, Michael Paquier wrote:
>
>> To committers, here are the patches that seem on top of the list:
> I'm pretty sure Committers are the people to decide which patches can be
> committed, but thanks for the opinion.

Note the word "seem", my previous statement being written only based
on the status of the patches already in the CF app and nothing else,
where those patches have visibly already been reviewed and waiting for
committer input.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-04-17 Thread Kouhei Kaigai
Hanada-san,

Thanks for your works. I have nothing to comment on any more (at this moment).
I hope committer review / comment on the couple of features.

Best regards,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Shigeru HANADA [mailto:shigeru.han...@gmail.com]
> Sent: Friday, April 17, 2015 1:44 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown;
> pgsql-hackers@postgreSQL.org
> Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] 
> Custom
> Plan API)
> 
> Kaigai-san,
> 
> 2015/04/17 10:13、Kouhei Kaigai  のメール:
> 
> > Hanada-san,
> >
> >> I merged explain patch into foreign_join patch.
> >>
> >> Now v12 is the latest patch.
> >>
> > It contains many garbage lines... Please ensure the
> > patch is correctly based on tOhe latest master +
> > custom_join patch.
> 
> Oops, sorry.  I’ve re-created the patch as v13, based on Custom/Foreign join 
> v11
> patch and latest master.
> 
> It contains EXPLAIN enhancement that new subitem “Relations” shows relations
> and joins, including order and type, processed by the foreign scan.
> 
> --
> Shigeru HANADA
> shigeru.han...@gmail.com
> 
> 
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Petr Jelinek

On 17/04/15 22:36, Simon Riggs wrote:


I said that IMO the difference in WAL size is so small that we
should just use 4-byte OIDs for the replication identifiers, instead
of trying to make do with 2 bytes. Not because I find it too likely
that you'll run out of IDs (although it could happen), but more to
make replication IDs more like all other system objects we have.
Andreas did some pgbench benchmarking to show that the difference in
WAL size is about 10%. The WAL records generated by pgbench happen
to have just the right sizes so that the 2-3 extra bytes bump them
over to the next alignment boundary. That's why there is such a big
difference - on average it'll be less. I think that's acceptable,
Andreas seems to think otherwise. But if the WAL size really is so
precious, we could remove the two padding bytes from XLogRecord,
instead of dedicating them for the replication ids. That would be an
even better use for them.


The argument to move to 4 bytes is a poor one. If it was reasonable in
terms of code or cosmetic value then all values used in the backend
would be 4 bytes. We wouldn't have any 2 byte values anywhere. But we
don't do that.

The change does nothing useful, since I doubt anyone will ever need
 >32768 nodes in their cluster.



And if they did there would be other much bigger problems than 
replication identifier being 16bit (it's actually >65534 as it's 
unsigned btw).


Considering the importance and widespread use of replication I think we 
should really make sure the related features have small overhead.



Increasing WAL size for any non-zero amount is needlessly wasteful for a
change with only cosmetic value. But for a change that has significant
value for database resilience, it is a sensible use of bytes.

+1 to Andres' very reasonable suggestion. Lets commit this and go home.



+1

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Simon Riggs
On 17 April 2015 at 19:18, Heikki Linnakangas  wrote:


> To be honest, I'm not entirely sure what we're arguing over.


When arguing over something you consider small, it is customary to allow
the author precedence. We can't do things our own way all the time.

I didn't much like pg_rewind, but it doesn't hurt and you like it, so I
didn't object. We've all got better things to do.


> I said that IMO the difference in WAL size is so small that we should just
> use 4-byte OIDs for the replication identifiers, instead of trying to make
> do with 2 bytes. Not because I find it too likely that you'll run out of
> IDs (although it could happen), but more to make replication IDs more like
> all other system objects we have. Andreas did some pgbench benchmarking to
> show that the difference in WAL size is about 10%. The WAL records
> generated by pgbench happen to have just the right sizes so that the 2-3
> extra bytes bump them over to the next alignment boundary. That's why there
> is such a big difference - on average it'll be less. I think that's
> acceptable, Andreas seems to think otherwise. But if the WAL size really is
> so precious, we could remove the two padding bytes from XLogRecord, instead
> of dedicating them for the replication ids. That would be an even better
> use for them.


The argument to move to 4 bytes is a poor one. If it was reasonable in
terms of code or cosmetic value then all values used in the backend would
be 4 bytes. We wouldn't have any 2 byte values anywhere. But we don't do
that.

The change does nothing useful, since I doubt anyone will ever need >32768
nodes in their cluster.

Increasing WAL size for any non-zero amount is needlessly wasteful for a
change with only cosmetic value. But for a change that has significant
value for database resilience, it is a sensible use of bytes.

+1 to Andres' very reasonable suggestion. Lets commit this and go home.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Dean Rasheed
On 17 April 2015 at 12:54, Stephen Frost  wrote:
> Dean, I've been working through your patches over the past couple of
> days (apologies for the lack of updates, just been busy) and hope to
> push them very shortly (ie: by the end of the weekend).
>

Cool. Thanks.

> One thing that I was hoping to discuss a bit is that I've gone ahead and
> added another set of hooks, so we can have both "permissive" and
> "restrictive" policies be provided from the hook.  It's a bit late to
> make the grammar and other changes which would be required to add a
> "restrictive" policy option to the built-in RLS, but adding the hooks is
> relatively low-impact.
>

Sounds interesting. Perhaps that discussion should be moved to a new thread.

> I'm also going to be including a test_rls_hooks module into
> src/test/modules which will test those hooks and provide an example of
> how to use them.
>

Good idea. I had been thinking that it would be good to test RLS hooks.

> As for the discussion- there was some concern raised about extensions
> being able to "override" built-in policies by using the hooks a certain
> way.  I don't entirely follow the logic behind that concern as an
> extension has the ability to read the files on disk directly from C
> code, should it be written to do so, and so not providing a hook to add
> "permissive" policies is denying useful functionality for very question
> gain, in my view at least.
>
> Thoughts?
>

Yeah, perhaps that concern is somewhat overblown and shouldn't stand
in the way of allowing a hook to add permissive policies.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Peter Geoghegan
On Fri, Apr 17, 2015 at 1:30 AM, Andres Freund  wrote:
>> However, like my second approach, there would be a "speculative
>> affirmation" WAL record.
>
> I think there should be one, but it's not required for the approach. The
> 'pending speculative insertion' can just be completed whenever there's a
> insert/update/delete that's not a super deletion.
>
> I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit()
> just add something like:
>
> if (pending_insertion != NULL)
> {
> if (new_record_is_super_deletion)
> ReorderBufferReturnTupleBuf(pending_insertion);
> else
> rb->apply_change(rb, txn, relation, pending_insertion);
> }
> ...
>
> You'll have to be careful to store the pending_insertion *after*
> ReorderBufferToastReplace(), but that should not be a problem.

Okay. It seems like what you're saying is that I should be prepared to
have to deal with a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change (or
multiple such changes) from within ReorderBufferCommit() between a
speculative insertion and a super deletion, but that I can safely
assume that once some other insert/update/delete is encountered (or
once all changes have been drained from the reorder buffer), I can
then apply the speculative insertion as a regular insertion.

Is that what you're saying, in a nutshell? IOW, when you said this:

"""
I earlier thought that'd not be ok because there could be a new catalog
snapshot inbetween, but I was mistaken: The locking on the source
transaction prevents problems.
"""

What you meant was that you'd decided that this pattern is not broken,
*provided* that the implementation simply account for the fact that a
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change could come before some
other (non-REORDER_BUFFER_CHANGE_INTERNAL_DELETE, A.K.A.
non-superdelete) change came? And that we might need to be more
"patient" about deciding that a speculative insertion succeeded (more
"patient" than considering only one single non-superdelete change,
that can be anything else)?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Peter Geoghegan
On Fri, Apr 17, 2015 at 11:19 AM, Heikki Linnakangas  wrote:
>> because he wanted to preserve those by doing the MagicOffsetNumber
>> thing.
>
>
> Yes, that's the way to go.
>
> Glad we cleared that up :-).

Got it, thanks!


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Heikki Linnakangas

On 04/17/2015 09:02 PM, Peter Geoghegan wrote:

I'm slightly surprised that Heikki now wants
to use an infomask2 bit (if that is what you mean),


No, I don't.


because he wanted to preserve those by doing the MagicOffsetNumber
thing.


Yes, that's the way to go.

Glad we cleared that up :-).

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-04-17 Thread Qingqing Zhou
On Sun, Mar 29, 2015 at 11:56 AM, Andres Freund  wrote:
> I'm not sure whether the above is the best solution however. For one I
> think it's not necessarily a good idea to opencode this in hio.c - I've
> not observed it, but this probably can happen for btrees and such as
> well. For another, this is still a exclusive lock while we're doing IO:
> smgrextend() wants a page to write out, so we have to be careful not to
> overwrite things.
>
I think relaxing a global lock will fix the contention mostly.
However, several people suggested that extending with many pages have
other benefits. This hints for a more fundamental change in our
storage model. Currently we map one file per relation. While it is
simple and robust, considering partitioned table, maybe later columnar
storage are integrated into the core, this model needs some further
thoughts. Think about a 1000 partitioned table with 100 columns: that
is 100K files, no to speak of other forks - surely we can continue
challenging file system's limit or playing around vfds, but we have a
chance now to think ahead.

Most commercial database employs a DMS storage model, where it manages
object mapping and freespace itself. So different objects are sharing
storage within several files. Surely it has historic reasons, but it
has several advantages over current model:
- remove fd pressure
- remove double buffering (by introducing ADIO)
- controlled layout and access pattern (sequential and read-ahead)
- better quota management
- performance potentially better

Considering platforms supported and the stableness period needed, we
shall support both current storage model and DMS model. I will stop
here to see if this deserves further discussion.


Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Heikki Linnakangas

On 04/17/2015 08:36 PM, Simon Riggs wrote:

On 17 April 2015 at 18:12, Heikki Linnakangas  wrote:


On 04/17/2015 12:04 PM, Simon Riggs wrote:


On 17 April 2015 at 09:54, Andres Freund  wrote:

  Hrmpf. Says the person that used a lot of padding, without much

discussion, for the WAL level infrastructure making pg_rewind more
maintainable.


Sounds bad. What padding are we talking about?


In the new WAL format, the data chunks are stored unaligned, without
padding, to save space. The new format is quite different to the old one,
so it's not straightforward to compare how much that saved.


The key point here is the whole WAL format was changed to accommodate a
minor requirement for one utility. Please notice that nobody tried to stop
you doing that.

The changes Andres is requesting have a very significant effect on a major
new facility. Perhaps there is concern that it is an external utility?

If we can trust Heikki to include code into core that was written
externally then I think we can do the same for Andres.


I'm not concerned of the fact it is an external utility. Well, it 
concerns me a little bit, because that means that it'll get little 
testing with PostgreSQL. But that has nothing to do with the WAL size 
question.



I think its time to stop the padding discussion and commit something
useful. We need this.


To be honest, I'm not entirely sure what we're arguing over. I said that 
IMO the difference in WAL size is so small that we should just use 
4-byte OIDs for the replication identifiers, instead of trying to make 
do with 2 bytes. Not because I find it too likely that you'll run out of 
IDs (although it could happen), but more to make replication IDs more 
like all other system objects we have. Andreas did some pgbench 
benchmarking to show that the difference in WAL size is about 10%. The 
WAL records generated by pgbench happen to have just the right sizes so 
that the 2-3 extra bytes bump them over to the next alignment boundary. 
That's why there is such a big difference - on average it'll be less. I 
think that's acceptable, Andreas seems to think otherwise. But if the 
WAL size really is so precious, we could remove the two padding bytes 
from XLogRecord, instead of dedicating them for the replication ids. 
That would be an even better use for them.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Peter Geoghegan
On Fri, Apr 17, 2015 at 1:38 AM, Andres Freund  wrote:
>> Do you just mean that you think that speculative insertions should be
>> explicitly affirmed by a second record (making it not a speculative
>> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
>> no chance of seeing a tuple until it was affirmed by a second in-place
>> modification, regardless of tuple xmin xact commit status?
>
> Yes. I think

Good.

> a) HEAP_SPECULATIVE should never be visible outside in a
> committed transaction. That allows us to redefine what exactly the bit
> means and such after a simple restart. On IM Heiki said he wants to
> replace this by a bit in the item pointer, but I don't think that
> changes things substantially.

I guess you envisage that HEAP_SPECULATIVE is an infomask2 bit? I
haven't been using one of those for a couple of revisions now,
preferring to use the offset MagicOffsetNumber to indicate a
speculative t_ctid value. I'm slightly surprised that Heikki now wants
to use an infomask2 bit (if that is what you mean), because he wanted
to preserve those by doing the MagicOffsetNumber thing. But I guess
we'd still be preserving the bit under this scheme (since it's always
okay to do something different with the bit after a restart).

Why is it useful to consume an infomask2 bit after all? Why did Heikki
change his mind - due to wanting representational redundancy? Or do I
have it all wrong?

> b) t_ctid should not contain a speculative token in committed
> (i.e. visible to other sessions using mvcc semantics) tuple. Right now
> (i.e. master) t_ctid will point to itself for non-updated tuples. I
> don't think it's good to have something in there that's not an actual
> ctid in there. The number of places that look into t_ctid for
> in-progress insertions of other sessions is smaller than the ones that
> look at tuples in general.

Right. So if a tuple is committed, it should not have set
HEAP_SPECULATIVE/have a t_ctid offset of MagicOffsetNumber. But a
non-ctid t_ctid (a speculative token) remains possible for
non-committed tuples visible to some types of snapshots (in
particular, dirty snapshots).

> c) Cleaning the flag/ctid after a completed speculative insertion makes
> it far less likely that we end up waiting on a other backend when we
> wouldn't have to. If a session inserts a large number of tuples
> speculatively (surely *not* a unlikely thing in the long run) it gets
> rather likely that tokens are reused. Which means if another backend
> touches these in-progress records it's quite possible that the currently
> acquired token is the same as the one on a tuple that has actually
> finished inserting.

It's more important than that, actually. If the inserter fails to
clear its tuple's speculative token, and then releases their token
lmgr lock, it will cause livelock with many upserting sessions.
Coordinating which other session gets to lazily clear the speculative
token (cleaning up after the original inserter) seemed quite hazardous
when I looked into it. Maybe you could fix it by interleaving buffer
locks and lmgr locks, but we can't do that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Simon Riggs
On 17 April 2015 at 18:12, Heikki Linnakangas  wrote:

> On 04/17/2015 12:04 PM, Simon Riggs wrote:
>
>> On 17 April 2015 at 09:54, Andres Freund  wrote:
>>
>>  Hrmpf. Says the person that used a lot of padding, without much
>>> discussion, for the WAL level infrastructure making pg_rewind more
>>> maintainable.
>>>
>>
>> Sounds bad. What padding are we talking about?
>>
>
> In the new WAL format, the data chunks are stored unaligned, without
> padding, to save space. The new format is quite different to the old one,
> so it's not straightforward to compare how much that saved.


The key point here is the whole WAL format was changed to accommodate a
minor requirement for one utility. Please notice that nobody tried to stop
you doing that.

The changes Andres is requesting have a very significant effect on a major
new facility. Perhaps there is concern that it is an external utility?

If we can trust Heikki to include code into core that was written
externally then I think we can do the same for Andres.

I think its time to stop the padding discussion and commit something
useful. We need this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] TABLESAMPLE patch

2015-04-17 Thread Simon Riggs
On 17 April 2015 at 14:54, Petr Jelinek  wrote:


> I agree that DDL patch is not that important to get in (and I made it last
> patch in the series now), which does not mean somebody can't write the
> extension with new tablesample method.
>
>
> In any case attached another version.
>
> Changes:
> - I addressed the comments from Michael
>
> - I moved the interface between nodeSampleScan and the actual sampling
> method to it's own .c file and added TableSampleDesc struct for it. This
> makes the interface cleaner and will make it more straightforward to extend
> for subqueries in the future (nothing really changes just some functions
> were renamed and moved). Amit suggested this at some point and I thought
> it's not needed at that time but with the possible future extension to
> subquery support I changed my mind.
>
> - renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion
> with sync scan
>
> - reworded some things and more typo fixes
>
> - Added two sample contrib modules demonstrating row limited and time
> limited sampling. I am using linear probing for both of those as the
> builtin block sampling is not well suited for row limited or time limited
> sampling. For row limited I originally thought of using the Vitter's
> reservoir sampling but that does not fit well with the executor as it needs
> to keep the reservoir of all the output tuples in memory which would have
> horrible memory requirements if the limit was high. The linear probing
> seems to work quite well for the use case of "give me 500 random rows from
> table".
>

For me, the DDL changes are something we can leave out for now, as a way to
minimize the change surface.

I'm now moving to final review of patches 1-5. Michael requested patch 1 to
be split out. If I commit, I will keep that split, but I am considering all
of this as a single patchset. I've already spent a few days reviewing, so I
don't expect this will take much longer.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Moving on to close the current CF 2015-02

2015-04-17 Thread Simon Riggs
On 17 April 2015 at 08:22, Michael Paquier 
wrote:


> To committers, here are the patches that seem on top of the list:
>

I'm pretty sure Committers are the people to decide which patches can be
committed, but thanks for the opinion.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Heikki Linnakangas

On 04/17/2015 12:04 PM, Simon Riggs wrote:

On 17 April 2015 at 09:54, Andres Freund  wrote:


Hrmpf. Says the person that used a lot of padding, without much
discussion, for the WAL level infrastructure making pg_rewind more
maintainable.


Sounds bad. What padding are we talking about?


In the new WAL format, the data chunks are stored unaligned, without 
padding, to save space. The new format is quite different to the old 
one, so it's not straightforward to compare how much that saved. The 
fixed-size XLogRecord header is 8 bytes shorter in the new format, 
because it doesn't have the xl_len field anymore. But the same 
information is stored elsewhere in the record, where it takes 2 or 5 
bytes (XLogRecordDataHeaderShort/Long).


But it's a fair point that we could've just made small adjustments to 
the old format, without revamping every record type and the way the 
block information is stored, and that the space saving of the new format 
should be compared with that instead, for a fair comparison.


As an example, one simple thing we could've done with the old format: 
remove xl_len, and store the length in place of the two unused padding 
bytes instead, as long as it fits in 16 bits. For longer records, set a 
flag and store it right after XLogRecord header. For practically all WAL 
records, that would've shrunk XLogRecord from 32 to 24 bytes, and made 
each record 8 bytes shorter.


I ran the same pgbench test Andres used, with scale 10, and 5 
transactions, and compared the WAL size between master and 9.4:


master: 20738352
9.4: 23915800

According to pg_xlogdump, there were 301153 WAL records. If you take the 
9.4 figure, and imagine that we had saved those 8 bytes on each WAL 
record, 9.4 would've been 21506576 bytes instead. So yeah, we could've 
achieved much of the WAL savings with that much smaller change. That's a 
useful thing to compare with.


BTW, those numbers are with wal_level=minimal. With wal_level=logical, 
the WAL size from the same test on master was 26503520 bytes. That's 
quite a bump. Looking at pg_xlogdump output, it seems that it's all 
because the commit records are wider.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Peter Geoghegan
On Fri, Apr 17, 2015 at 4:54 AM, Stephen Frost  wrote:
> Dean, I've been working through your patches over the past couple of
> days (apologies for the lack of updates, just been busy) and hope to
> push them very shortly (ie: by the end of the weekend).
>
> One thing that I was hoping to discuss a bit is that I've gone ahead and
> added another set of hooks, so we can have both "permissive" and
> "restrictive" policies be provided from the hook.  It's a bit late to
> make the grammar and other changes which would be required to add a
> "restrictive" policy option to the built-in RLS, but adding the hooks is
> relatively low-impact.

I came up with something that is along the lines of what we discussed.
I'll wait for you to push Dean's code, and rebase on top of that
before submitting what I came up with. Maybe this can be rolled into
my next revision if I work through Andres' most recent feedback
without much delay.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving on to close the current CF 2015-02

2015-04-17 Thread Magnus Hagander
On Fri, Apr 17, 2015 at 9:23 AM, Michael Paquier 
wrote:

> On Fri, Apr 17, 2015 at 4:22 PM, Michael Paquier wrote:
> > @Magnus: having the possibility to mark a patch as "returned with
> > feedback" without bumping it to the next CF automatically would be
> > cool to being moving on.
> Meh. "cool to have to help moving on".
>
>
Yeah, it's at the top of my list of priorities once I get some time to
spend on community stuff. Hopefully I can get around to it next week. There
is a small chance I can do it before then, but it is indeed small...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] ONLY in queries by RI triggers

2015-04-17 Thread Andres Freund
On 2015-04-17 17:35:16 +0300, Vladimir Borodin wrote:
> A long time ago in 04b31609b63ce77fb9273193f07cf21b2a7176af ONLY
> keyword was added to all queries in
> src/backend/utils/adt/ri_triggers.c. Since that time foreign keys do
> not work with inheritance trees and it is mentioned in the
> documentation for all versions since at least 7.3.
> 
> I wonder what are the pitfalls of removing ONLY keyword from those
> queries? I have made such change, it passes all tests and foreign keys
> for partitioned tables do work, but I suppose that there are lots of
> places where it could break something.

Unique keys don't work across partitions and they fk triggers rely on
one side being unique...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ONLY in queries by RI triggers

2015-04-17 Thread Vladimir Borodin
Hi all.

A long time ago in 04b31609b63ce77fb9273193f07cf21b2a7176af ONLY keyword was 
added to all queries in src/backend/utils/adt/ri_triggers.c. Since that time 
foreign keys do not work with inheritance trees and it is mentioned in the 
documentation for all versions since at least 7.3. 

I wonder what are the pitfalls of removing ONLY keyword from those queries? I 
have made such change, it passes all tests and foreign keys for partitioned 
tables do work, but I suppose that there are lots of places where it could 
break something.

Thanks in advance.

--
May the force be with you…
https://simply.name



[HACKERS] Buildfarm client version 4.15 released

2015-04-17 Thread Andrew Dunstan
I have just released version 4.15 of the PostgreSQL Buildfarm Client 
.  It 
can be downloaded at 



Here's what's changed:

 * support the new location for pg_upgrade
 * support running tests of client programs
 * support building, installing and running testmodules
 * use a default ccache directory
 * improve logging when running pg_upgrade tests
 * handle odd location of Python3 regression tests
 * add timestamp to default log_line_prefix
 * make qw() errors in the config file fatal (helps detect errors)
 * minor bug fixes for web script settings.
 * allow for using linked git directories in non-master branches

The last item might need a little explanation.  Essentially this can 
reduce quite dramatically the amount of space required if you are 
building on more than one branch. Instead of keeping, say, 6 checked out 
repos for the current six tested branches, we keep one and link all the 
others to it. This works almost exactly the way git-new-workdir does (I 
stole the logic from there). This doesn't work in a couple of 
situations: if you are using Windows or if you are using git-reference. 
In these cases the new setting is simply ignored.


To enable this new setting in an existing installation, do the following 
after installing the new release:


 * in your config file, add this setting:
   git_use_workdirs => 1,
 * remove the pgsql directory in each branch directory other than HEAD

Another good thing to do in existing installations would be to add "%m" 
to the beginning of the log_line_prefix setting in extra_config stanza.


Enjoy!

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for xidin

2015-04-17 Thread Tom Lane
"Zhang Zq"  writes:
>The implements of 'xidin' use only ¡®strtoul¡¯ to cast from string to 
> xid. So in some cases, may cause confusion, for example,
> The sql 'select c1 from test where xmin='abc' can be executed.  and sometimes 
> will make mistakes, I want to query "select c1 from test where xmin='0x10'" 
> ,but write 'Ox10', '0' to 'O',The result is obviously wrong.

> The patch will correct it. I have justly copy some code of 'OID'. Whether we 
> need to extract the common code?

This seems like an awful lot of code to solve a problem that will never
occur in practice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-04-17 Thread Heikki Linnakangas

On 04/17/2015 03:58 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

On 03/28/2015 11:24 PM, Tom Lane wrote:

+  * Macros for iterating through elements of a flat or expanded array.



How about a struct instead?



struct ArrayIter {
  Datum   datumptr;
  bool   isnullptr;
  char   dataptr;
  bits8  bitmapptr;
  int  bitmask
}



Seems more natural.


Yes, and much less efficient I'm afraid.  Most compilers would be unable
to put the variables into registers, which is important for these inner
loops.


That would surprise me. Surely most compilers know to keep fields of a 
struct in registers, when the struct itself or a pointer to it is not 
passed anywhere.



How about turning these into functions?


Likewise.  The point of doing it like this was to avoid taking an
efficiency hit compared to the existing code.

It's conceivable that we could avoid such a hit by marking the functions
all "inline", but I'm not certain that they'd get inlined, and the
question of whether the variables could be in registers would remain.


Ok, this one I believe.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch for xidin

2015-04-17 Thread Zhang Zq
hi,
   The implements of 'xidin' use only ‘strtoul’ to cast from string to xid. So 
in some cases, may cause confusion, for example,
The sql 'select c1 from test where xmin='abc' can be executed.  and sometimes 
will make mistakes, I want to query "select c1 from test where xmin='0x10'" 
,but write 'Ox10', '0' to 'O',The result is obviously wrong.

The patch will correct it. I have justly copy some code of 'OID'. Whether we 
need to extract the common code?

Thanks.


xid.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Simon Riggs
On 17 April 2015 at 09:54, Andres Freund  wrote:


> Hrmpf. Says the person that used a lot of padding, without much
> discussion, for the WAL level infrastructure making pg_rewind more
> maintainable.


Sounds bad. What padding are we talking about?

--
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] pg_upgrade in 9.5 broken for adminpack

2015-04-17 Thread Bruce Momjian
On Fri, Apr 17, 2015 at 04:11:44PM +0900, Michael Paquier wrote:
> On Fri, Apr 17, 2015 at 3:29 PM, Jeff Janes wrote:
> > Of course after sending that it became obvious.  The C function is not
> > getting called because the SQL function is marked as being strict, yet is
> > called with NULL arguments.
> >
> > Trivial patch attached to unset strict flag in pg_proc.h.
> >
> > But  CATALOG_VERSION_NO probably needs another bump as well.
> 
> This looks good to me. I have tested your fix and the problem goes
> away. create_empty_extension() was the only function not declared as
> STRICT before 30982be4 (see install_support_functions_in_new_db() in
> contrib/pg_upgrade/function.c).

OK, as I am not the author of that change, I am going to wait a little
while for the author to commit the fix.  I can see it is an easy mistake
to make.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-04-17 Thread Tom Lane
Heikki Linnakangas  writes:
> On 03/28/2015 11:24 PM, Tom Lane wrote:
>> +  * Macros for iterating through elements of a flat or expanded array.

> How about a struct instead?

> struct ArrayIter {
>  Datum   datumptr;
>  bool   isnullptr;
>  char   dataptr;
>  bits8  bitmapptr;
>  int bitmask
> }

> Seems more natural.

Yes, and much less efficient I'm afraid.  Most compilers would be unable
to put the variables into registers, which is important for these inner
loops.

> How about turning these into functions?

Likewise.  The point of doing it like this was to avoid taking an
efficiency hit compared to the existing code.

It's conceivable that we could avoid such a hit by marking the functions
all "inline", but I'm not certain that they'd get inlined, and the
question of whether the variables could be in registers would remain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> In all of this, I think we should try to keep things as simple as
> possible, to give the end user a chance to understand it --- although
> I'm not sure I've achieved that with my explanation :-)

Thanks a lot for this.  It goes along with my thinking also and matches,
I believe, what I had explained to Peter on our call.

Peter, please let me know if you agree.

Dean, I've been working through your patches over the past couple of
days (apologies for the lack of updates, just been busy) and hope to
push them very shortly (ie: by the end of the weekend).

One thing that I was hoping to discuss a bit is that I've gone ahead and
added another set of hooks, so we can have both "permissive" and
"restrictive" policies be provided from the hook.  It's a bit late to
make the grammar and other changes which would be required to add a
"restrictive" policy option to the built-in RLS, but adding the hooks is
relatively low-impact.

I'm also going to be including a test_rls_hooks module into
src/test/modules which will test those hooks and provide an example of
how to use them.

As for the discussion- there was some concern raised about extensions
being able to "override" built-in policies by using the hooks a certain
way.  I don't entirely follow the logic behind that concern as an
extension has the ability to read the files on disk directly from C
code, should it be written to do so, and so not providing a hook to add
"permissive" policies is denying useful functionality for very question
gain, in my view at least.

Thoughts?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-04-17 Thread Heikki Linnakangas

On 03/28/2015 11:24 PM, Tom Lane wrote:

+ /*
+  * Macros for iterating through elements of a flat or expanded array.
+  * Use "ARRAY_ITER  ARRAY_ITER_VARS(name);" to declare the local variables
+  * needed for an iterator (more than one set can be used in the same function,
+  * if they have different names).
+  * Use "ARRAY_ITER_SETUP(name, arrayptr);" to prepare to iterate, and
+  * "ARRAY_ITER_NEXT(name, index, datumvar, isnullvar, ...);" to fetch the
+  * next element into datumvar/isnullvar.  "index" must be the zero-origin
+  * element number; we make caller provide this since caller is generally
+  * counting the elements anyway.
+  */
+ #define ARRAY_ITER/* dummy type name to keep 
pgindent happy */
+
+ #define ARRAY_ITER_VARS(iter) \
+   Datum  *iter##datumptr; \
+   bool   *iter##isnullptr; \
+   char   *iter##dataptr; \
+   bits8  *iter##bitmapptr; \
+   int iter##bitmask


How about a struct instead?

struct ArrayIter {
Datum   datumptr;
bool   isnullptr;
char   dataptr;
bits8  bitmapptr;
intbitmask
}

Seems more natural.


+ #define ARRAY_ITER_SETUP(iter, arrayptr) \
[long and complicated macro]
+
+ #define ARRAY_ITER_NEXT(iter,i, datumvar,isnullvar, elmlen,elmbyval,elmalign) 
\
[another long and complicated macro]


How about turning these into functions? We have a bunch of macros like 
this, but IMHO functions are much more readable and easier to debug, so 
would prefer functions in new code.


In general, refactoring the array iteration code to a macro/function 
like this is a good idea. It would make sense to commit that separately, 
regardless of the rest of the patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


fix xlogdump percentage display (was Re: [HACKERS] Replication identifiers, take 4)

2015-04-17 Thread Abhijit Menon-Sen
At 2015-04-17 10:54:51 +0200, and...@anarazel.de wrote:
>
> (The FPI percentage display above is arguably borked. Interesting.)

Sorry for the trouble. Patch attached.

-- Abhijit
>From 1e5c5d5948030e8ff6ccdd2309a97fb1e116d8e2 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Fri, 17 Apr 2015 14:45:41 +0530
Subject: Don't divide by zero when calculating percentages

---
 contrib/pg_xlogdump/pg_xlogdump.c | 53 ---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 4f297e9..3f61c32 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -489,18 +489,36 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
  */
 static void
 XLogDumpStatsRow(const char *name,
- uint64 n, double n_pct,
- uint64 rec_len, double rec_len_pct,
- uint64 fpi_len, double fpi_len_pct,
- uint64 total_len, double total_len_pct)
+ uint64 n, uint64 total_count,
+ uint64 rec_len, uint64 total_rec_len,
+ uint64 fpi_len, uint64 total_fpi_len,
+ uint64 tot_len, uint64 total_len)
 {
+	double		n_pct, rec_len_pct, fpi_len_pct, tot_len_pct;
+
+	n_pct = 0;
+	if (total_count != 0)
+		n_pct = 100 * (double) n / total_count;
+
+	rec_len_pct = 0;
+	if (total_rec_len != 0)
+		rec_len_pct = 100 * (double) rec_len / total_rec_len;
+
+	fpi_len_pct = 0;
+	if (total_fpi_len != 0)
+		fpi_len_pct = 100 * (double) fpi_len / total_fpi_len;
+
+	tot_len_pct = 0;
+	if (total_len != 0)
+		tot_len_pct = 100 * (double) tot_len / total_len;
+
 	printf("%-27s "
 		   "%20" INT64_MODIFIER "u (%6.02f) "
 		   "%20" INT64_MODIFIER "u (%6.02f) "
 		   "%20" INT64_MODIFIER "u (%6.02f) "
 		   "%20" INT64_MODIFIER "u (%6.02f)\n",
 		   name, n, n_pct, rec_len, rec_len_pct, fpi_len, fpi_len_pct,
-		   total_len, total_len_pct);
+		   tot_len, tot_len_pct);
 }
 
 
@@ -515,6 +533,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 	uint64		total_rec_len = 0;
 	uint64		total_fpi_len = 0;
 	uint64		total_len = 0;
+	double		rec_len_pct, fpi_len_pct;
 
 	/* ---
 	 * Make a first pass to calculate column totals:
@@ -557,10 +576,8 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 			tot_len = rec_len + fpi_len;
 
 			XLogDumpStatsRow(desc->rm_name,
-			 count, 100 * (double) count / total_count,
-			 rec_len, 100 * (double) rec_len / total_rec_len,
-			 fpi_len, 100 * (double) fpi_len / total_fpi_len,
-			 tot_len, 100 * (double) tot_len / total_len);
+			 count, total_count, rec_len, total_rec_len,
+			 fpi_len, total_fpi_len, tot_len, total_len);
 		}
 		else
 		{
@@ -583,10 +600,8 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 	id = psprintf("UNKNOWN (%x)", rj << 4);
 
 XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id),
- count, 100 * (double) count / total_count,
- rec_len, 100 * (double) rec_len / total_rec_len,
- fpi_len, 100 * (double) fpi_len / total_fpi_len,
- tot_len, 100 * (double) tot_len / total_len);
+ count, total_count, rec_len, total_rec_len,
+ fpi_len, total_fpi_len, tot_len, total_len);
 			}
 		}
 	}
@@ -601,14 +616,22 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 	 * them from the earlier ones, and are thus up to 9 characters long.
 	 */
 
+	rec_len_pct = 0;
+	if (total_len != 0)
+		rec_len_pct = 100 * (double) total_rec_len / total_len;
+
+	fpi_len_pct = 0;
+	if (total_len != 0)
+		fpi_len_pct = 100 * (double) total_fpi_len / total_len;
+
 	printf("%-27s "
 		   "%20" INT64_MODIFIER "u %-9s"
 		   "%20" INT64_MODIFIER "u %-9s"
 		   "%20" INT64_MODIFIER "u %-9s"
 		   "%20" INT64_MODIFIER "u %-6s\n",
 		   "Total", stats->count, "",
-		   total_rec_len, psprintf("[%.02f%%]", 100 * (double)total_rec_len / total_len),
-		   total_fpi_len, psprintf("[%.02f%%]", 100 * (double)total_fpi_len / total_len),
+		   total_rec_len, psprintf("[%.02f%%]", rec_len_pct),
+		   total_fpi_len, psprintf("[%.02f%%]", fpi_len_pct),
 		   total_len, "[100%]");
 }
 
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2015-04-17 Thread Alexander Korotkov
On Wed, Feb 25, 2015 at 12:15 PM, Alexander Korotkov 
wrote:

> Hi!
>
> On Tue, Feb 24, 2015 at 5:39 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 17.2.2015 14:21, Alexander Korotkov wrote:
>> > On Sun, Feb 15, 2015 at 2:08 PM, Alexander Korotkov
>> > mailto:aekorot...@gmail.com>> wrote:
>> >
>> > Revised patch with reordering in GiST is attached
>> > (knn-gist-recheck-in-gist.patch) as well as testing script (test.py).
>>
>> I meant to do a bit of testing on this (assuming it's still needed), but
>> the patches need rebasing - Heikki fixed a few issues, so they don't
>> apply cleanly.
>>
>
> Both patches are revised.
>

Both patches are rebased against current master.

--
With best regards,
Alexander Korotkov.


knn-gist-recheck-8.patch
Description: Binary data


knn-gist-recheck-in-gist-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-17 Thread Andres Freund
On 2015-04-12 22:02:38 +0300, Heikki Linnakangas wrote:
> This needs to be weighed against removing the padding bytes
> altogether.

Hrmpf. Says the person that used a lot of padding, without much
discussion, for the WAL level infrastructure making pg_rewind more
maintainable. And you deemed to be perfectly ok to use them up to avoid
*increasing* the WAL size with the *additional data* (which so far
nothing but pg_rewind needs in that way). While it perfectly well could
have been used to shrink the WAL size to less than it now is. And that's
*far*, *far* harder to back out/refactor changes than this (which are
pretty localized and thus can easily be changed); to the point that I
think it's infeasible to do so...


If you want to shrink the WAL size, send in a patch independently. Not
as a way to block somebody else implementing something.


> I'm surprised there's such a big difference between the "extern" and
> "padding" versions above. At a quick approximation, storing the ID as a
> separate "fragment", along with XLogRecordDataHeaderShort and
> XLogRecordDataHeaderLong, should add one byte of overhead plus the ID
> itself. So that would be 3 extra bytes for 2-byte identifiers, or 5 bytes
> for 4-byte identifiers. Does that mean that the average record length is
> only about 30 bytes?

Yes, nearly. That's  xlogdump --stats=record from the above scenario with
replication identifiers used and reusing the padding:

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
Transaction/COMMIT 50003 ( 16.89)  2600496 
( 23.38)0 (  -nan)  2600496 ( 23.38)
CLOG/ZEROPAGE  1 (  0.00)   28 
(  0.00)0 (  -nan)   28 (  0.00)
Standby/RUNNING_XACTS  5 (  0.00)  248 
(  0.00)0 (  -nan)  248 (  0.00)
Heap2/CLEAN46034 ( 15.55)  1473088 
( 13.24)0 (  -nan)  1473088 ( 13.24)
Heap2/VISIBLE  2 (  0.00)   56 
(  0.00)0 (  -nan)   56 (  0.00)
Heap/INSERT49682 ( 16.78)  1341414 
( 12.06)0 (  -nan)  1341414 ( 12.06)
Heap/HOT_UPDATE   150013 ( 50.67)  5700494 
( 51.24)0 (  -nan)  5700494 ( 51.24)
Heap/INPLACE   5 (  0.00)  130 
(  0.00)0 (  -nan)  130 (  0.00)
Heap/INSERT+INIT 318 (  0.11) 8586 
(  0.08)0 (  -nan) 8586 (  0.08)
Btree/VACUUM   2 (  0.00)   56 
(  0.00)0 (  -nan)   56 (  0.00)
    
  
Total 296065  11124596 
[100.00%]   0 [0.00%]  11124596 [100%

(The FPI percentage display above is arguably borked. Interesting.)

So the average record size is ~37.5 bytes including the increased commit
record size due to the origin information (which is the part that
increases the size for that version that reuses the padding).

This *most definitely* isn't representative of every workload. But it
*is* *a* common type of workload.

Note that --stats will *not* show the size difference in xlog records
when adding data as an additional chunk vs. padding as it uses
XLogRecGetDataLen() to compute the record length... That confused me for
a while.

> That doesn't sound right, 30 bytes is very little.

Well, it's mostly HOT_UPDATES and INSERTS into not indexed tables. So
that's not too surprising. Obviously that'd look different with FPIs
enabled.

> Perhaps the size
> of the records created by pgbench happen to cross a 8-byte alignment
> boundary at that point, making a big difference. In another workload,
> there might be no difference at all, due to alignment.

Right.

> Also, you don't need to tag every record type with the replication ID. All
> indexam records can skip it, for starters, since logical decoding doesn't
> care about them. That should remove a fair amount of bloat.

Yes. I mentioned that. It's additional complexity because now the
decision has to be made at each xlog insertion callsite. Making
refactoring this into a different representation a bit harder. I don't
think it will make that much of a differenced 

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Andres Freund
On 2015-04-16 09:43:54 -0700, Peter Geoghegan wrote:
> On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund  wrote:
> > I'm, completely independent of logical decoding, of the *VERY* strong
> > opinion that 'speculative insertions' should never be visible when
> > looking with normal snapshots. For one it allows to simplify
> > considerations around wraparound (which has proven to be a very good
> > idea, c.f. multixacts + vacuum causing data corruption years after it
> > was thought to be harmless). For another it allows to reclaim/redefine
> > the bit after a database restart/upgrade. Given how complex this is and
> > how scarce flags are that seems like a really good property.
> >
> > And avoiding those flags to be visible to the outside requires a WAL
> > record, otherwise it won't be correct on the standby.
> 
> I'm a bit distracted here, and not sure exactly what you mean. What's
> a normal snapshot?

Normal visibility semantics, i.e. SnapshotMVCC, not SnapshotDirty.

> Do you just mean that you think that speculative insertions should be
> explicitly affirmed by a second record (making it not a speculative
> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
> no chance of seeing a tuple until it was affirmed by a second in-place
> modification, regardless of tuple xmin xact commit status?

Yes. I think

a) HEAP_SPECULATIVE should never be visible outside in a
committed transaction. That allows us to redefine what exactly the bit
means and such after a simple restart. On IM Heiki said he wants to
replace this by a bit in the item pointer, but I don't think that
changes things substantially.

b) t_ctid should not contain a speculative token in committed
(i.e. visible to other sessions using mvcc semantics) tuple. Right now
(i.e. master) t_ctid will point to itself for non-updated tuples. I
don't think it's good to have something in there that's not an actual
ctid in there. The number of places that look into t_ctid for
in-progress insertions of other sessions is smaller than the ones that
look at tuples in general.

c) Cleaning the flag/ctid after a completed speculative insertion makes
it far less likely that we end up waiting on a other backend when we
wouldn't have to. If a session inserts a large number of tuples
speculatively (surely *not* a unlikely thing in the long run) it gets
rather likely that tokens are reused. Which means if another backend
touches these in-progress records it's quite possible that the currently
acquired token is the same as the one on a tuple that has actually
finished inserting.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Andres Freund
On 2015-04-16 10:25:29 -0700, Peter Geoghegan wrote:
> On Thu, Apr 16, 2015 at 2:18 AM, Andres Freund  wrote:
> > If we go that way that's the way I think it should be done: Whenever we
> > encounter a speculative record we 'unlink' it from the changes that will
> > be reused for spooling from disk and do nothing further. Then we just
> > continue reading through the records.
>
> You mean we continue iterating through *changes* from ReorderBufferCommit()?

changes, records, not much of a difference here.

> > If the next thing we encounter is
> > a super-deletion we throw away that record. If it's another type of
> > change (or even bettter a 'speculative insertion succeeded' record)
> > insert it.
>
> By "insert it", I gather you mean report to the the logical decoding
> plugin as an insert change.

Yes.

> We'd have coordination between records originally decoded into
> changes, maybe "intercepting" them during xact reassembly, like my
> first approach. We'd mostly do the same thing as the first approach,
> actually. The major difference would be that there'd be explicit
> "speculative affirmation" WAL records. But we wouldn't rely on those
> affirmation records within ReorderBufferCommit() - we'd rely on the
> *absence* of a super deletion WAL record (to report an insert change
> to the decoding plugin). To emphasize, like my first approach, it
> would be based on an *absence* of a super deletion WAL record.

> However, like my second approach, there would be a "speculative
> affirmation" WAL record.

I think there should be one, but it's not required for the approach. The
'pending speculative insertion' can just be completed whenever there's a
insert/update/delete that's not a super deletion.

I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit()
just add something like:

if (pending_insertion != NULL)
{
if (new_record_is_super_deletion)
ReorderBufferReturnTupleBuf(pending_insertion);
else
rb->apply_change(rb, txn, relation, pending_insertion);
}
...

You'll have to be careful to store the pending_insertion *after*
ReorderBufferToastReplace(), but that should not be a problem.


> Right now, I'm tired, so bear with me. What I think I'm not quite
> getting here is how the new type of "affirmation" record affects
> visibility (or anything else, actually). Clearly dirty snapshots need
> to see the record to set a speculative insertion token for their
> caller to wait on (and HeapTupleSatisfiesVacuum() cannot see the
> speculatively inserted tuple as reclaimable, of course). They need
> this *before* the speculative insertion is affirmed, of course.
>
> Maybe you mean: If the speculative insertion xact is in progress, then
> the tuple is visible to several types of snapshots (dirty, vacuum,
> self, any). If it is not, then tuples are not visible because they are
> only speculative (and not *confirmed*). If they were confirmed, and
> the xact was committed, then those tuples are logically and physically
> indistinguishable from tuples that were inserted in the ordinary
> manner.
>
> Is that it? Basically, the affirmation records have nothing much to do
> with logical decoding in particular. But you still need to super
> delete, so that several types of snapshots ((dirty, vacuum, self, any)
> *stop* seeing the tuple as visible independent of the inserting xact
> being in progress.

I have no idea what this has to do with the email you responded to?
Maybe it's more meant as a response to my separate email that I want the
HEAP_SPECULATIVE to be cleared?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-17 Thread Dean Rasheed
On 9 April 2015 at 22:18, Peter Geoghegan  wrote:
> The big idea (the fine details of which Stephen appeared to be in
> tentative agreement with [1]) is that an UPSERT is a hybrid between an
> INSERT and an UPDATE, and not simply an INSERT and separate UPDATE
> tied together. So at the very least the exact behavior of such a
> hybrid cannot be assumed to be any particular way just from
> generalizing from known behaviors for INSERT and UPDATE (which is a
> usability issue, since the fine details of RLS are already complex
> enough without UPSERT).
>

I think that you're over-complicating this. From a usability point of
view, I think that the best approach is to keep this as simple as
possible and make the behaviour consistent with an INSERT and an
UPDATE tied together, as is suggested by the new syntax. The key point
is that, if you are using the RLS INSERT and UPDATE policies for this
new command, the rule should be that the user has permission to
insert/update a new/existing row if and only if they would have had
permission to do so using a stand-alone INSERT/UPDATE command.

On the other hand, if you believe that the behaviour should be
something other than that, then I think that it would need a new
dedicated kind of RLS policy for this command because, as I will
attempt to explain below, merging together the quals from INSERT and
UPDATE policies leads to logical problems.


> The INSERT policies are only enforced when a tuple is inserted
> because, when the alternative path isn't taken then it's really just
> an INSERT.
>

Agreed.


> For the UPDATE path, where the stickiness/hybridness begins...
> 
> I'd appreciate it if you explicitly outlined what policies you feel
> should be enforce at each of the 3 junctures within an UPSERT (post
> INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very
> explicit about whether or not RLS WITH CHECK policies and USING quals
> (presumably enforced as RLS WITH CHECK policies) from both INSERT and
> UPDATE policies should be enforced at each point. In particular,
> should I ever not treat RLS WCO and USING quals equivalently? (recall
> that we don't want to elide an UPDATE silently, which makes much less
> sense for UPSERT...I had assumed that whatever else, we'd always treat
> WCO and USING quals from UPDATE (and ALL) policies equivalently, but
> perhaps not).

OK, I'll try to explicitly outline how I think this ought to work:

For INSERTs and UPDATEs, there are 3 kinds of RLS quals that come into play:

1). insertCheckQuals - the logical OR of the quals from all INSERT
WITH CHECK policy clauses. These give the user permission to insert
into the table, provided that the new rows satisfy at least one of
these quals.

2). updateUsingQuals - the logical OR of the quals from all UPDATE
USING policy clauses. These give the user permission to update
existing rows in the table, where the existing rows satisfy at least
one of these quals.

3). updateCheckQuals - the logical OR of the quals from all UPDATE
WITH CHECK policy clauses. These give the user permission to update
the table, provided that the new rows satisfy at least one of these
quals.

In general these may all differ from one another.

If we go forward with the idea that RLS quals should be checked before
attempting to insert/update any data, as we do for regular permission
checks, then stand-alone INSERT and UPDATE commands would work
conceptually as follows:

INSERT:
  1. Check NEW against insertCheckQuals (error if the result is not true)
  2. Do the actual insert of NEW

UPDATE:
  1. Check OLD against updateUsingQuals (skip rows that don't match)
  2. Check NEW against updateCheckQuals (error if the result is not true)
  3. Do the actual update (OLD -> NEW)

Of course that's an over-simplification. The updateUsingQuals are
actually merged with the user-supplied WHERE clause in a way dependent
on the presence of leaky functions, but conceptually it matches the
above description.

I think that there is universal agreement that an INSERT .. ON
CONFLICT UPDATE that follows the insert path ought to match the pure
INSERT case, and only check the insertCheckQuals. That just leaves the
question of what to do on the update path, where things get more
complicated because there are 3 tuples involved in the process:

1). t1 - the tuple originally proposed for insertion, but which could
not be inserted due to a conflict with an existing row in the table
(aka EXCLUDED).

2). t2 - the existing row in the table that prevented t1 from being
inserted (aka TARGET).

3). t3 - the final new row resulting from the update path. In general,
we allow this to be quite different from t1 and t2.

If we think of INSERT .. ON CONFLICT UPDATE as an INSERT and an UPDATE
tied together, then logically the following would happen if the update
path were taken:

INSERT .. ON CONFLICT UPDATE (update path):
  1. Check t1 against insertCheckQuals (error if not true)
  2. Detect that t1 conflicts with t2
  3. Test user-supplied auxilia

Re: [HACKERS] Moving on to close the current CF 2015-02

2015-04-17 Thread Michael Paquier
On Fri, Apr 17, 2015 at 4:22 PM, Michael Paquier wrote:
> @Magnus: having the possibility to mark a patch as "returned with
> feedback" without bumping it to the next CF automatically would be
> cool to being moving on.
Meh. "cool to have to help moving on".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving on to close the current CF 2015-02

2015-04-17 Thread Michael Paquier
On Thu, Mar 26, 2015 at 10:50 AM, Michael Paquier
 wrote:
> On Thu, Mar 26, 2015 at 10:41 AM, Bruce Momjian  wrote:
>> On Thu, Mar 26, 2015 at 10:22:11AM +0900, Michael Paquier wrote:
>>> Hi all,
>>>
>>> Visibly there is no commit fest manager this time (was I?), and people
>>> may think that I still am the CFM for 2015-02, continuously after
>>> 2014-12 and that I am severely slacking on my duties. Honestly I
>>> thought that I was not and that it was clear enoug... Still, biting
>>> the bullet to make things move on, should I launch a VACUUM FULL on
>>> the current entries of the CF to brush up things that could get in
>>> 9.5? The "current" CF officially finished two weeks ago.
>>
>> Usually the last commitfest is double the typical length.
>
> Oh, OK. So this lets 3 weeks...

Three weeks later, here we are. There are still 31 patches in "Need
Review" state, 16 in "Waiting on Author" state and 9 marked as "Ready
for committer".

To committers, here are the patches that seem on top of the list:
- Do not vacuum pgbench tables if they do not exist when pgbench -f given
- Abbreviated key support for Datum sorts
- transforms
- Join pushdown support for foreign tables
- REINDEX xxx VERBOSE
- regrole and regnamespace
- pg_basebackup vs. Windows and tablespaces (Extend base backup to
include symlink file used to restore symlinks)
- catalog view to pg_hba.conf file

Here are two patches that did not attract much attention and I think
could be dropped right away (got involved in them so that's perhaps
easier to say):
- GIN fillfactor
- Turn recovery.conf parameters into GUCs

I would recommend as well the authors with patches in state "Waiting
on author" to update their patch as well accordingly to what their
state is.

@Magnus: having the possibility to mark a patch as "returned with
feedback" without bumping it to the next CF automatically would be
cool to being moving on.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade in 9.5 broken for adminpack

2015-04-17 Thread Michael Paquier
On Fri, Apr 17, 2015 at 3:29 PM, Jeff Janes wrote:
> Of course after sending that it became obvious.  The C function is not
> getting called because the SQL function is marked as being strict, yet is
> called with NULL arguments.
>
> Trivial patch attached to unset strict flag in pg_proc.h.
>
> But  CATALOG_VERSION_NO probably needs another bump as well.

This looks good to me. I have tested your fix and the problem goes
away. create_empty_extension() was the only function not declared as
STRICT before 30982be4 (see install_support_functions_in_new_db() in
contrib/pg_upgrade/function.c).
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers