date:20130904

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Atri Sharma



Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote:

 Hi,
 
 (2013/09/04 12:52), Atri Sharma wrote:
 On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote:
 Hi,
 
 I'm considering overhauling pgstat.stat, and would like to know how many
 people are interested in this topic.
 
 As you may know, this file could be handreds of MB in size, because
 pgstat.stat holds all access statistics in each database, and it needs
 to read/write an entire pgstat.stat frequently.
 
 As a result, pgstat.stat often generates massive I/O operation,
 particularly when having a large number of tables in the database.
 
 To support multi-tenancy or just a large number of tables (up to 10k
 tables in single database), I think pgstat.stat needs to be overhauled.
 
 I think using heap and btree in pgstat.stat would be preferred to reduce
 read/write and to allow updating access statistics for specific tables
 in pgstat.stat file.
 
 Is this good for us?
 
 Hi,
 
 Nice thought. I/O reduction in pgstat can be really helpful.
 
 I am trying to think of our aim here. Would we be looking to split
 pgstat per table, so that the I/O write happens for only a portion of
 pgstat? Or reduce the I/O in general?
 
 I prefer the latter.
 
 Under the current implementation, DBA need to split single database
 into many smaller databases with considering access locality of the
 tables. It's difficult and could be change in future.
 
 And splitting the statistics data into many files (per table,
 for example) would cause another performance issue when
 collecting/showing statistics at once. Just my guess though.
 
 So, I'm looking for a new way to reduce I/O for the statistics data
 in general.
 
 Regards,
 
 
 If the later, how would using BTree help us? I would rather go for a
 range tree or something. But again, I may be completely wrong.
 
 Please elaborate a bit more on the solution we are trying to
 achieve.It seems really interesting.
 
 Regards,
 
 Atri
 
 
 
 

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue 
which supports the main storage system of the stats?

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions

2013-09-04 Thread MauMau


From: Robert Haas robertmh...@gmail.com

OK, patch committed and back-patched to 9.3.

The patch file turned out to be sorta garbled.  I'm not sure if a
broken version of diff was used to generate this or whether MauMau
hand-edited it after the fact, but the number of lines that were
indicated in the control lines didn't match the actual hunks, and
patch threw up.  So it took me 20 minutes to do what should have taken
5, but now it's done.


Thank you, and I'm sorry I caused you much trouble.  I edited the patch by 
hand, being careless about the messy effect.  I'll avoid hand-editing.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Satoshi Nagayasu


(2013/09/04 15:23), Atri Sharma wrote:



Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote:


Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?


Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?


I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,



If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri







Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue 
which supports the main storage system of the stats?


For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers more easily.

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???

2013-09-04 Thread Noah Misch

On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn wrote:
 I find that it takes a long time when I increase the scale of a numeric  
 datatype.
 By checking the code, I found that's because it needs to rewrite that  
 table's file.
 After checking that table's data file, I found only parameter n_header  
 changed.
 And, I found the data in that numeric field never changed.
 So I thank It's not necessary to rewrite the table's file in this case.

n_header is part of the numeric field's data.  That's not just pedantry: the
display scale stored in n_header affects how numeric_out() formats the value.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Stephen Frost

Satoshi,

* Satoshi Nagayasu (sn...@uptime.jp) wrote:
 (2013/09/04 13:07), Alvaro Herrera wrote:
 We already changed it:
 
  commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
  Author: Alvaro Herrera alvhe...@alvh.no-ip.org
  Date:   Mon Feb 18 17:56:08 2013 -0300
 
   Split pgstat file in smaller pieces
 
 Thanks for the comments. I forgot to mention that.
 
 Yes, we have already split single pgstat.stat file into
 several pieces.
 
 However, we still need to read/write large amount of statistics
 data when we have a large number of tables in single database
 or multiple databases being accessed. Right?

Would simply also splitting per tablespace help?

 I think the issue here is that it is necessary to write/read
 statistics data even it's not actually changed.
 
 So, I'm wondering how we can minimize read/write operations
 on these statistics data files with using heap and btree.

It does sound like an interesting idea to use heap/btree instead but I
wonder about the effort involved, particularly around coordinating
access.  We wouldn't want to end up introducing additional contention
points by doing this..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Atri Sharma



Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote:

 (2013/09/04 15:23), Atri Sharma wrote:
 
 
 Sent from my iPad
 
 On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote:
 
 Hi,
 
 (2013/09/04 12:52), Atri Sharma wrote:
 On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote:
 Hi,
 
 I'm considering overhauling pgstat.stat, and would like to know how many
 people are interested in this topic.
 
 As you may know, this file could be handreds of MB in size, because
 pgstat.stat holds all access statistics in each database, and it needs
 to read/write an entire pgstat.stat frequently.
 
 As a result, pgstat.stat often generates massive I/O operation,
 particularly when having a large number of tables in the database.
 
 To support multi-tenancy or just a large number of tables (up to 10k
 tables in single database), I think pgstat.stat needs to be overhauled.
 
 I think using heap and btree in pgstat.stat would be preferred to reduce
 read/write and to allow updating access statistics for specific tables
 in pgstat.stat file.
 
 Is this good for us?
 
 Hi,
 
 Nice thought. I/O reduction in pgstat can be really helpful.
 
 I am trying to think of our aim here. Would we be looking to split
 pgstat per table, so that the I/O write happens for only a portion of
 pgstat? Or reduce the I/O in general?
 
 I prefer the latter.
 
 Under the current implementation, DBA need to split single database
 into many smaller databases with considering access locality of the
 tables. It's difficult and could be change in future.
 
 And splitting the statistics data into many files (per table,
 for example) would cause another performance issue when
 collecting/showing statistics at once. Just my guess though.
 
 So, I'm looking for a new way to reduce I/O for the statistics data
 in general.
 
 Regards,
 
 
 If the later, how would using BTree help us? I would rather go for a
 range tree or something. But again, I may be completely wrong.
 
 Please elaborate a bit more on the solution we are trying to
 achieve.It seems really interesting.
 
 Regards,
 
 Atri
 
 Right,thanks.
 
 How would using heap and BTree help here? Are we looking at a priority queue 
 which supports the main storage system of the stats?
 
 For example, when you read only a single block from your table,
 then you need to write all values in your database statistics next.
 It often generates large amount of i/o operation.
 
 However, if random access is allowed in the statistics, you can
 update only as single record for the specific table which you read.
 It would be less than 100 bytes for each table.
 
 I have no idea about how a priority queue can work here so far.
 However, if the statistics is overhauled, PostgreSQL would be able
 to host a much larger number of customers 


Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for 
random access.

Another thing I can think of is having a write back cache which could probably 
be used for a buffer before the actual stats write. I am just musing here 
though.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Pavel Stehule

2013/9/4 Atri Sharma atri.j...@gmail.com



 Sent from my iPad

 On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote:

  (2013/09/04 15:23), Atri Sharma wrote:
 
 
  Sent from my iPad
 
  On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote:
 
  Hi,
 
  (2013/09/04 12:52), Atri Sharma wrote:
  On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp
 wrote:
  Hi,
 
  I'm considering overhauling pgstat.stat, and would like to know how
 many
  people are interested in this topic.
 
  As you may know, this file could be handreds of MB in size, because
  pgstat.stat holds all access statistics in each database, and it
 needs
  to read/write an entire pgstat.stat frequently.
 
  As a result, pgstat.stat often generates massive I/O operation,
  particularly when having a large number of tables in the database.
 
  To support multi-tenancy or just a large number of tables (up to 10k
  tables in single database), I think pgstat.stat needs to be
 overhauled.
 
  I think using heap and btree in pgstat.stat would be preferred to
 reduce
  read/write and to allow updating access statistics for specific
 tables
  in pgstat.stat file.
 
  Is this good for us?
 
  Hi,
 
  Nice thought. I/O reduction in pgstat can be really helpful.
 
  I am trying to think of our aim here. Would we be looking to split
  pgstat per table, so that the I/O write happens for only a portion of
  pgstat? Or reduce the I/O in general?
 
  I prefer the latter.
 
  Under the current implementation, DBA need to split single database
  into many smaller databases with considering access locality of the
  tables. It's difficult and could be change in future.
 
  And splitting the statistics data into many files (per table,
  for example) would cause another performance issue when
  collecting/showing statistics at once. Just my guess though.
 
  So, I'm looking for a new way to reduce I/O for the statistics data
  in general.
 
  Regards,
 
 
  If the later, how would using BTree help us? I would rather go for a
  range tree or something. But again, I may be completely wrong.
 
  Please elaborate a bit more on the solution we are trying to
  achieve.It seems really interesting.
 
  Regards,
 
  Atri
 
  Right,thanks.
 
  How would using heap and BTree help here? Are we looking at a priority
 queue which supports the main storage system of the stats?
 
  For example, when you read only a single block from your table,
  then you need to write all values in your database statistics next.
  It often generates large amount of i/o operation.
 
  However, if random access is allowed in the statistics, you can
  update only as single record for the specific table which you read.
  It would be less than 100 bytes for each table.
 
  I have no idea about how a priority queue can work here so far.
  However, if the statistics is overhauled, PostgreSQL would be able
  to host a much larger number of customers


 Ah, now I get it. Thanks a ton for the detailed explanation.

 Yes, a BTree will sufficiently isolate per table stats here and allow for
 random access.

 Another thing I can think of is having a write back cache which could
 probably be used for a buffer before the actual stats write. I am just
 musing here though.


we very successfully use a tmpfs volume for pgstat files (use a backport of
multiple statfiles from 9.3 to 9.1)

Regards

Pavel



 Regards,

 Atri

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Pavel Stehule

2013/9/4 Atri Sharma atri.j...@gmail.com



 Sent from my iPad

 On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote:

  (2013/09/04 15:23), Atri Sharma wrote:
 
 
  Sent from my iPad
 
  On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote:
 
  Hi,
 
  (2013/09/04 12:52), Atri Sharma wrote:
  On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp
 wrote:
  Hi,
 
  I'm considering overhauling pgstat.stat, and would like to know how
 many
  people are interested in this topic.
 
  As you may know, this file could be handreds of MB in size, because
  pgstat.stat holds all access statistics in each database, and it
 needs
  to read/write an entire pgstat.stat frequently.
 
  As a result, pgstat.stat often generates massive I/O operation,
  particularly when having a large number of tables in the database.
 
  To support multi-tenancy or just a large number of tables (up to 10k
  tables in single database), I think pgstat.stat needs to be
 overhauled.
 
  I think using heap and btree in pgstat.stat would be preferred to
 reduce
  read/write and to allow updating access statistics for specific
 tables
  in pgstat.stat file.
 
  Is this good for us?
 
  Hi,
 
  Nice thought. I/O reduction in pgstat can be really helpful.
 
  I am trying to think of our aim here. Would we be looking to split
  pgstat per table, so that the I/O write happens for only a portion of
  pgstat? Or reduce the I/O in general?
 
  I prefer the latter.
 
  Under the current implementation, DBA need to split single database
  into many smaller databases with considering access locality of the
  tables. It's difficult and could be change in future.
 
  And splitting the statistics data into many files (per table,
  for example) would cause another performance issue when
  collecting/showing statistics at once. Just my guess though.
 
  So, I'm looking for a new way to reduce I/O for the statistics data
  in general.
 
  Regards,
 
 
  If the later, how would using BTree help us? I would rather go for a
  range tree or something. But again, I may be completely wrong.
 
  Please elaborate a bit more on the solution we are trying to
  achieve.It seems really interesting.
 
  Regards,
 
  Atri
 
  Right,thanks.
 
  How would using heap and BTree help here? Are we looking at a priority
 queue which supports the main storage system of the stats?
 
  For example, when you read only a single block from your table,
  then you need to write all values in your database statistics next.
  It often generates large amount of i/o operation.
 
  However, if random access is allowed in the statistics, you can
  update only as single record for the specific table which you read.
  It would be less than 100 bytes for each table.
 
  I have no idea about how a priority queue can work here so far.
  However, if the statistics is overhauled, PostgreSQL would be able
  to host a much larger number of customers


 Ah, now I get it. Thanks a ton for the detailed explanation.

 Yes, a BTree will sufficiently isolate per table stats here and allow for
 random access.

 Another thing I can think of is having a write back cache which could
 probably be used for a buffer before the actual stats write. I am just
 musing here though.


we very successfully use a tmpfs volume for pgstat files (use a backport of
multiple statfiles from 9.3 to 9.1



 Regards,

 Atri

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-04 Thread Andres Freund

On 2013-09-04 10:02:05 -0400, Robert Haas wrote:
 On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think it particularly needs to be configurable, but I wonder
  if we can't be a bit smarter about when we do it.  For example,
  suppose we logged it every 15 s but only until we log a non-overflowed
  snapshot.
 
  There's actually more benefits than just overflowed snapshots (pruning
  of the known xids machinery, exclusive lock cleanup).

 I know that, but I thought the master and slave could only lose sync
 on those things after a master crash and that once per checkpoint
 cycle was enough for those other benefits.  Am I wrong?

The xid tracking can keep track without the additional records but it
sometimes needs a good bit more memory to do so if the primary burns to
xids quite fast.  Everytime we see an running xacts record we can do
cleanup (that's the ExpireOldKnownAssignedTransactionIds() in
ProcArrayApplyRecoveryInfo()).

  The problem with using dbname=replication as a trigger for anything is
  that we actually allow databases to be created with that name. Perhaps
  that was a design mistake.
 
 It seemed like a good idea at the time, but maybe it wasn't.  I'm not
 sure where to go with it at this point; a forcible backward
 compatibility break would probably screw things up for a lot of
 people.

Yes, breaking things now doesn't seem like a good idea.

  I wondered about turning replication from a boolean into something like
  off|0, on|1, database. dbname= gets only used in the latter
  variant. That would be compatible with previous versions and would even
  support using old tools (since all of them seem to do replication=1).
 
 I don't love that, but I don't hate it, either.

Ok. Will update the patch that way. Seems better than it's current state.

 But it still doesn't
 answer the following question, which I think is important: if I (or
 someone else) commits this patch, how will that make things better for
 users?  At the moment it's just adding a knob that doesn't do anything
 for you when you twist it.

I am not sure it's a good idea to commit it before we're sure were going
to commit the changeset extraction. It's an independently reviewable and
testable piece of code that's simple enough to understand quickly in
contrast to the large changeset extraction patch. That's why I kept it
separate.
On the other hand, as you know, it's not without precedent to commit
pieces of infrastructure that aren't really useful for the enduser
(think FDW).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The PostgreSQL License requires LICENSE file?

2013-09-04 Thread Josh Berkus

On 09/03/2013 11:20 PM, Tomonari Katsumata wrote:
 I understands that the PostgreSQL license does not require
 any specified file name and missing LICENSE is not problem.
 
 I'm Sorry for the stupid question.

No, thank *you* for pointing it out!  None of us had actually looked at
that page.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Jeff Davis

On Tue, 2013-09-03 at 15:42 -0400, Robert Haas wrote:
 Although this is true, the administrator's estimate of whether that
 guarantee is or is not provided might not be as consistent as the
 hardware behavior itself.  I am generally of the feeling that having
 to restart the server to change setting sucks, and while it surely
 sucks less for this setting than some, mostly because few people
 change this setting in the first place, I'm still not convinced that
 this is moving in the right direction.

I think code complexity matters quite a lot. If we can eliminate some
complex code in a complex area, and all we give up is a feature with
essentially no use case, that sounds like we're moving in the right
direction to me.

I suppose some might be using it as a hack when they really just want to
temporarily disable WAL during a load or something. Seems like a blunt
tool though, and I haven't heard of anyone doing that or suggesting it.
And it doesn't store the page hole anyway, so the FPI during a load is
ordinarily quite small.

  Then, I intend to write another patch to make the full-page writes for
  checksums honor the full_page_writes setting. That will be easier to
  write once it's a PGC_POSTMASTER.
 
 I don't think I know exactly what this means.

XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at
the full page writes setting (like in XLogInsert()). That means, if
checksums are enabled and full_page_writes is off, we'll still write
some full page images for checksums. I'd like to remedy that.

Regards,
Jeff Davis






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown

2013-09-04 Thread Andres Freund

On 2013-09-04 10:53:19 -0400, Tom Lane wrote:
 Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes:
  I propose a tiny fix to clarify this.  Please find the attached patch.
 
 That's not an accurate description of what happens, though.
 AFAIR, we do not throw away pg_stats files as a result of recovery.

StartupXLOG() does a pgstat_reset_all() in the if (InRecovery) branch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-04 Thread Robert Haas

On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think it particularly needs to be configurable, but I wonder
 if we can't be a bit smarter about when we do it.  For example,
 suppose we logged it every 15 s but only until we log a non-overflowed
 snapshot.

 There's actually more benefits than just overflowed snapshots (pruning
 of the known xids machinery, exclusive lock cleanup).

I know that, but I thought the master and slave could only lose sync
on those things after a master crash and that once per checkpoint
cycle was enough for those other benefits.  Am I wrong?

 The patch as-is only writes if there has been WAL written since the last
 time it logged a running_xacts. I think it's not worth building more
 smarts than that?

Hmm, maybe.

 Because I don't see any reason to believe that this WAL record is any
 more important or urgent than any other WAL record that might get
 logged.

 I can't follow the logic behind that statement. Just about all WAL
 records are either pretty immediately flushed afterwards or are done in
 the context of a transaction where we flush (or do a
 XLogSetAsyncXactLSN) at transaction commit.

 XLogBackgroundFlush() won't necessarily flush the running_xacts
 record.

OK, this was the key point I was missing.

 It seems we need some more design there.  Perhaps entering replication
 mode could be triggered by writing either dbname=replication or
 replication=yes.  But then, do the replication commands simply become
 SQL commands?  I've certainly seen hackers use them that way.  And I
 can imagine that being a sensible approach, but this patch seems like
 it's only covering a fairly small fraction of what really ought to be
 a single commit.

 Yes. I think you're right that we need more input/design here. I've
 previously started threads about it, but nobody replied :(.

 The problem with using dbname=replication as a trigger for anything is
 that we actually allow databases to be created with that name. Perhaps
 that was a design mistake.

It seemed like a good idea at the time, but maybe it wasn't.  I'm not
sure where to go with it at this point; a forcible backward
compatibility break would probably screw things up for a lot of
people.

 I wondered about turning replication from a boolean into something like
 off|0, on|1, database. dbname= gets only used in the latter
 variant. That would be compatible with previous versions and would even
 support using old tools (since all of them seem to do replication=1).

I don't love that, but I don't hate it, either.  But it still doesn't
answer the following question, which I think is important: if I (or
someone else) commits this patch, how will that make things better for
users?  At the moment it's just adding a knob that doesn't do anything
for you when you twist it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Robert Haas

On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the security-barrier view stuff did not present itself as a 100%
 solution.  But perhaps more to the point, it was conceptually simple to
 implement, ie don't flatten views if they have this bit set, and don't
 push down quals into such sub-selects unless they're marked leakproof.

 Right.  IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference to a
 leakproof view over the table, with the RLS qual injected into the
 view.

 And for insert/update/delete, we do what exactly?

The same mechanism will prevent UPDATE and DELETE from seeing any rows
the user shouldn't be able to touch.

Simon and Greg are arguing that when an INSERT or UPDATE happens, we
ought to also check that the NEW row matches the RLS qual.  I don't
find that to be terribly important because you can accomplish the same
thing with a per-row trigger today; and I also don't think everyone
will want that behavior.  Some people will, I'm pretty sure, want to
let users give away rows, either unconditionally or subject to
defined restrictions.  Perhaps it's worth having, but it's a separate
feature, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Andres Freund

On 2013-09-04 08:01:30 -0700, Jeff Davis wrote:
 On Thu, 2013-08-08 at 17:42 -0400, Tom Lane wrote:
  Jon Nelson jnelson+pg...@jamponi.net writes:
   At this point I'm convinced that the issue is a pathological case in
   ext4. The performance impact disappears as soon as the unwritten
   extent(s) are written to with real data. Thus, even though allocating
   files with posix_fallocate is - frequently - orders of magnitude
   quicker than doing it with write(2), the subsequent re-write can be
   more expensive.  At least, that's what I'm gathering from the various
   threads.  Why this issue didn't crop up in earlier testing and why I
   can't seem to make test_fallocate do it (even when I modify
   test_fallocate to write to the newly-allocated file in a mostly-random
   fashion) has me baffled.
  
  Does your test program use all the same writing options that the real
  WAL writes do (like O_DIRECT)?
  
   Should this feature be reconsidered?
  
  Well, ext4 isn't the whole world, but it's sure a big part of it.
  The real point though is that obviously we didn't do enough performance
  testing, so we'd better do more before deciding what to do.
 
 Greg Smith,
 
 Do you have some thoughts about this? More tests we should run? We may
 need to revert the patch.

I'd vote for adding zeroing *after* the fallocate() first. That's what's
suggested by kernel hackers and what several other large applications
do. As it looks like it's what we would have to do if we ever get to use
fallocate for relation extension where we would have actual benefits
from it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's entirely sensible to question whether we should reject (not
 hold up) RLS if it has major covert-channel problems.

 We've already had this argument before, about the security_barrier
 view stuff, and that code got committed and is already released.  So
 the horse is already out of the barn and no amount of wishing will put
 it back in.

Well, the security-barrier view stuff did not present itself as a 100%
solution.  But perhaps more to the point, it was conceptually simple to
implement, ie don't flatten views if they have this bit set, and don't
push down quals into such sub-selects unless they're marked leakproof.

 I haven't reviewed this patch in a long time, but I would
 expect that it's basically just reusing that same infrastructure; in
 fact, I'd expect that it's little more than syntactic sugar around
 that infrastructure.

I've not read it in great detail, but it isn't that.  It's whacking the
planner around in ways that I have no confidence in, and probably still
wouldn't have any confidence in if they'd been adequately documented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Andres Freund

On 2013-09-04 07:57:15 -0700, Jeff Davis wrote:
 XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at
 the full page writes setting (like in XLogInsert()). That means, if
 checksums are enabled and full_page_writes is off, we'll still write
 some full page images for checksums. I'd like to remedy that.

I don't think that's really as easy as it sounds without removing the
ability to do base backups with full_page_writes = off. The interlocking
that would require makes things complex...
Personally I'd rather forbid enabling checkpoints in the combination
with full_page_writes = off. That doesn't seem like a good idea to me
and I am far from convinced it's actually going to work in all corner cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Robert Haas

On Sun, Sep 1, 2013 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote:
 And if someone can INSERT values that they can't actually see once they're
 committed, that's a similarly bad we should describe.

This is desirable in some cases but not others.  If the goal is
compartmentalization, then it's sensible to prevent this.  But you
might also have a drop-box environment - e.g. a student submits
coursework to a professor, and can't access the submitted work after
it's submitted.  FWIW, my CS classes in college had a tool that worked
just this way.

Or maybe an analyst writes a report and is then permitted to give
away the document to his boss for revisions.  Once the ownership of
the document has changed, the analyst can't see it any more, because
he can only see the documents he owns.  And maybe he's not permitted
to give away documents to just anyone (polluting their sandbox), but
he can give them to his boss (who expects to receive them).

The point is that we should be in the business of providing mechanism,
not policy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kohei KaiGai

2013/9/3 Bruce Momjian br...@momjian.us:
 On Sun, Sep  1, 2013 at 11:05:58AM -0700, Josh Berkus wrote:
  Security community also concludes it is not avoidable nature as long
  as human can observe system behavior and estimate something, thus,
  security evaluation criteria does not require eliminate covert-channels
  or does not pay attention about covert-channels for the products that
  is installed on the environment with basic robustness (that means,
  non-military, regular enterprise usage).

 To be completely blunt, the security community does not understand
 databases.  At all.  I'd think if anything had become clear through the
 course of work on SEPosgres, it would be that.

 Agreed.  The security community realizes these covert channels exist,
 but doesn't really have any recommendations on how to avoid them.  You
 could argue that avoiding them is too tied to specific database
 implementations, but there are general channels, like insert with a
 unique key, that should at least have well-defined solutions.

The security community also provides an extreme solution, but I don't
think it is suitable for flexible security policy and PostgreSQL wants it.

Their extreme solution manipulate definition of PK that automatically
become combined key that takes user-given key and security level being
set mandatory.
Thus, it does not conflict even if two different users with different security
level tries to insert a row with same primary key.
This technology is called polyinstantiation.
  http://en.wikipedia.org/wiki/Polyinstantiation

However, of course, I'm not favor to port this technology to PostgreSQL
world. Its side-effects are too much towards the problem to be solved.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Jeff Davis

On Thu, 2013-08-08 at 17:42 -0400, Tom Lane wrote:
 Jon Nelson jnelson+pg...@jamponi.net writes:
  At this point I'm convinced that the issue is a pathological case in
  ext4. The performance impact disappears as soon as the unwritten
  extent(s) are written to with real data. Thus, even though allocating
  files with posix_fallocate is - frequently - orders of magnitude
  quicker than doing it with write(2), the subsequent re-write can be
  more expensive.  At least, that's what I'm gathering from the various
  threads.  Why this issue didn't crop up in earlier testing and why I
  can't seem to make test_fallocate do it (even when I modify
  test_fallocate to write to the newly-allocated file in a mostly-random
  fashion) has me baffled.
 
 Does your test program use all the same writing options that the real
 WAL writes do (like O_DIRECT)?
 
  Should this feature be reconsidered?
 
 Well, ext4 isn't the whole world, but it's sure a big part of it.
 The real point though is that obviously we didn't do enough performance
 testing, so we'd better do more before deciding what to do.

Greg Smith,

Do you have some thoughts about this? More tests we should run? We may
need to revert the patch.

Regards,
Jeff Davis






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Robert Haas

On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's entirely sensible to question whether we should reject (not
 hold up) RLS if it has major covert-channel problems.

We've already had this argument before, about the security_barrier
view stuff, and that code got committed and is already released.  So
the horse is already out of the barn and no amount of wishing will put
it back in.  I haven't reviewed this patch in a long time, but I would
expect that it's basically just reusing that same infrastructure; in
fact, I'd expect that it's little more than syntactic sugar around
that infrastructure.  (If it it's instead introducing a whole new
mechanism, then I think that's reason enough to reject it right
there.)  My main question about this is whether that syntactic sugar
is really worth having given that it doesn't add any real new
functionality, not about the covert channel issues, which are already
a done deal.

And frankly, I'm with the group that says the covert channel issues
are not really a big deal.  In many real-world cases, the user can
control only the values that get subbed into queries that get sent to
the database, not the queries themselves, which eliminates a large
category of attacks.  Real-world example, from last job: sales reps
only get to see their own accounts, not accounts of other sales reps.
They could input new accounts (with sales_rep_id set to their ID) and
they could query the list of accounts (limited to those where
sales_rep_id matched their ID) - pulling either all of them or
searching by account name, both through a web application.  Yeah, a
sales rep could have launched a timing attack through the web
interface, and they could also have polled for the existence of
accounts by trying to create accounts with names that might already
exist in the system to see whether a duplicate got flagged.  But
neither attack had enough useful bandwidth to matter; a sales rep
wishing to learn our full account list (so that he could try to poach
them after leaving the company) could have learned a lot more a lot
faster via social engineering, and with less risk of being caught
doing something that would have resulted in his or her immediate
termination.

The point is, I don't think RLS needs to solve every problem.  What it
needs to do is solve one problem well, so that it can be used in
combination with other techniques to achieve a certain set of security
objectives.  If, in a particular environment, EXPLAIN is an issue,
then it can be blocked in that environment via a variety of
well-understood techniques.  That's not very hard to do even without
core support, and if we want to add core support, fine, but that's a
separate patch.  This is a patch to add row-level security, and it
deserves to be judged on how well or poorly it does that, not on
whether it solves covert channel problems that represent a mostly
orthogonal set of technical issues.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 I'd vote for adding zeroing *after* the fallocate() first. That's what's
 suggested by kernel hackers and what several other large applications
 do. As it looks like it's what we would have to do if we ever get to use
 fallocate for relation extension where we would have actual benefits
 from it.

Does that actually end up doing anything different from what we were
doing pre-patch here?  At best, it *might* end up using a larger extent,
but unless we can actually be confident that it does, I'm not convinced
the additional complexity is worth it and would rather see this simply
reverted.

One might ask why the kernel guys aren't doing this themselves or
figuring out why it's necessary to make it worthwhile.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right.  IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference to a
 leakproof view over the table, with the RLS qual injected into the
 view.

 And for insert/update/delete, we do what exactly?

 The same mechanism will prevent UPDATE and DELETE from seeing any rows
 the user shouldn't be able to touch.

No, it won't, because we don't support direct update/delete on views
(and if you look, you'll notice the auto-updatable-view stuff doesn't
think a security-barrier view is auto-updatable).

AFAICT, to deal with update/delete the RLS patch needs to constrain order
of qual application without the crutch of having a separate level of
subquery; and it's that behavior that I have zero confidence in, either
as to whether it works as submitted or as to our odds of not breaking it
in the future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Andres Freund

Hi,

On 2013-09-04 11:15:37 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  I'd vote for adding zeroing *after* the fallocate() first. That's what's
  suggested by kernel hackers and what several other large applications
  do. As it looks like it's what we would have to do if we ever get to use
  fallocate for relation extension where we would have actual benefits
  from it.
 
 Does that actually end up doing anything different from what we were
 doing pre-patch here?  At best, it *might* end up using a larger extent,
 but unless we can actually be confident that it does, I'm not convinced
 the additional complexity is worth it and would rather see this simply
 reverted.

It does end up with larger extents. That's why e.g. several of the
virtualization solutions do it for container files.

I was sceptical of the idea of using fallocate() for WAL file allocation
from the get go and voiced that repeatedly, but the argument was made
that it's a convenient testcase to familiarize ourselves with
fallocate(). As that argument won we should stick to it and learn the
ropes. Part of that is zeroing, so we should do that.

It's a single write() + error checking, so I don't really see much
complexity that way.

 One might ask why the kernel guys aren't doing this themselves or
 figuring out why it's necessary to make it worthwhile.

So, if I understood Ted correctly the reason it's slower is that
fallocate() onl marks the extents as containing 'zero' without actually
touching the data. When then a partial write into the area is done the
rest of the page has to be actively zeroed on disk.
I am not sure that explains all the slowdown, but given that according
to tests published in this thread, the additional zeroing helps to get
at the former + small win state...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of maintainer-check

2013-09-04 Thread Andres Freund

On 2013-09-03 22:41:17 -0400, Peter Eisentraut wrote:
 The maintainer-check target never really caught on, I think.  Most
 people don't run it, and that in turn annoys those who do.  Also, it
 doesn't provide much functionality.
 
 I propose that we get rid of it and roll the functionality into the
 regular build.
 
 Specifically:
 
 - Running duplicate_oids during the regular build was already discussed
 elsewhere recently.  There are some details to be resolved there, but
 it's doable.

Maybe we should also badger cpluspluscheck into a state where it can be
run as part of a normal build if a c++ compiler was detected?

I think it misses vpath support and it might be dependant on some
bashims.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of maintainer-check

2013-09-04 Thread Robert Haas

On Tue, Sep 3, 2013 at 10:41 PM, Peter Eisentraut pete...@gmx.net wrote:
 The maintainer-check target never really caught on, I think.  Most
 people don't run it, and that in turn annoys those who do.  Also, it
 doesn't provide much functionality.

 I propose that we get rid of it and roll the functionality into the
 regular build.

 Specifically:

 - Running duplicate_oids during the regular build was already discussed
 elsewhere recently.  There are some details to be resolved there, but
 it's doable.

 - Checking for tabs in SGML files can be run during the regular
 documentation build without problems.

 - The NLS checks can also be run during the regular NLS-enabled build.

 That's it.  Any concerns?

I can't speak for anyone else, but personally I think that sounds like
a significant improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Sun, Sep 1, 2013 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote:
  And if someone can INSERT values that they can't actually see once they're
  committed, that's a similarly bad we should describe.
 
 This is desirable in some cases but not others.  If the goal is
 compartmentalization, then it's sensible to prevent this.  But you
 might also have a drop-box environment - e.g. a student submits
 coursework to a professor, and can't access the submitted work after
 it's submitted.  FWIW, my CS classes in college had a tool that worked
 just this way.

Agreed, and part of the discussion that I had w/ KaiGai and Simon was
that we should provide a way to let the user pick which they'd like.
This is the concept around 'insert privileges' being different from
'select privileges' wrt RLS.

 The point is that we should be in the business of providing mechanism,
 not policy.

++

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:

 IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference
 to a leakproof view over the table, with the RLS qual injected
 into the view.

 And for insert/update/delete, we do what exactly?

 The same mechanism will prevent UPDATE and DELETE from seeing any
 rows the user shouldn't be able to touch.

+1

 Simon and Greg are arguing that when an INSERT or UPDATE happens,
 we ought to also check that the NEW row matches the RLS qual.  I
 don't find that to be terribly important because you can
 accomplish the same thing with a per-row trigger today; and I
 also don't think everyone will want that behavior.

As an example from my Wisconsin Courts days, source documents come
in which need to be entered, which may contain a Social Security
Number, and most of the Clerk of Courts staff should be authorized
to enter that into the database.  Once it is entered, most people
should not be allowed to view it, including many of the people who
were authorized to enter it initially.  It's one thing for a line
staff person to have a handful of documents come across their desk
with SSN on a given day; it's quite another if they could dump a
list of names, addresses, dates of birth, and SSNs for the entire
database on demand.

In a way this issue is similar to the covert channel issue --
volume matters.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown

2013-09-04 Thread Tom Lane

Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes:
 I propose a tiny fix to clarify this.  Please find the attached patch.

That's not an accurate description of what happens, though.
AFAIR, we do not throw away pg_stats files as a result of recovery.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the security-barrier view stuff did not present itself as a 100%
 solution.  But perhaps more to the point, it was conceptually simple to
 implement, ie don't flatten views if they have this bit set, and don't
 push down quals into such sub-selects unless they're marked leakproof.

 Right.  IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference to a
 leakproof view over the table, with the RLS qual injected into the
 view.

And for insert/update/delete, we do what exactly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 I'd vote for adding zeroing *after* the fallocate() first. That's what's
 suggested by kernel hackers and what several other large applications
 do. As it looks like it's what we would have to do if we ever get to use
 fallocate for relation extension where we would have actual benefits
 from it.

 Does that actually end up doing anything different from what we were
 doing pre-patch here?  At best, it *might* end up using a larger extent,
 but unless we can actually be confident that it does, I'm not convinced
 the additional complexity is worth it and would rather see this simply
 reverted.

 One might ask why the kernel guys aren't doing this themselves or
 figuring out why it's necessary to make it worthwhile.

The larger picture is that that isn't the committed behavior,
but a different one, one which would need performance testing.

At this point, I vote for reverting the patch and allowing it to be
resubmitted for a fresh round of testing with the zeroing added.
And this time we'll need to do the testing more carefully.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 On Tue, 2013-09-03 at 15:42 -0400, Robert Haas wrote:
 Although this is true, the administrator's estimate of whether that
 guarantee is or is not provided might not be as consistent as the
 hardware behavior itself.  I am generally of the feeling that having
 to restart the server to change setting sucks, and while it surely
 sucks less for this setting than some, mostly because few people
 change this setting in the first place, I'm still not convinced that
 this is moving in the right direction.

 I think code complexity matters quite a lot. If we can eliminate some
 complex code in a complex area, and all we give up is a feature with
 essentially no use case, that sounds like we're moving in the right
 direction to me.

Isn't this whole discussion academic in view of Andres' point?
http://www.postgresql.org/message-id/20130903121935.gb5...@awork2.anarazel.de

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kohei KaiGai

2013/9/4 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the security-barrier view stuff did not present itself as a 100%
 solution.  But perhaps more to the point, it was conceptually simple to
 implement, ie don't flatten views if they have this bit set, and don't
 push down quals into such sub-selects unless they're marked leakproof.

 Right.  IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference to a
 leakproof view over the table, with the RLS qual injected into the
 view.

 And for insert/update/delete, we do what exactly?

This patch does not care about insert, because it shall be done around the
place where we usually put before-row-insert; that is not related to planner.

Regarding to update/delete, this patch also enhanced to allow update or
delete mechanism allows to take a sub-query on top of the table scan plan.

So, its explain output shows as follows:

postgres= EXPLAIN (costs off) UPDATE customer SET email = 'al...@example.com';
QUERY PLAN
--
 Update on customer
   -  Subquery Scan on customer
 -  Seq Scan on customer customer_1
   Filter: (current_user() = uname)

You can see this update has Subquery plan instead of regular relation scan.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kohei KaiGai

2013/9/1 Greg Stark st...@mit.edu:
 On Sun, Sep 1, 2013 at 8:31 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Or, any other criteria even though?

 My (current) preference is plan (c: we will be able to fix up *this*
 cover-channel with reasonable efforts on explain code. probably,
 we can close it if we don't print filtered rows under the sub-query
 with security-barrier attribute.

 I think the criteria being discussed in this thread are too strict.

 It may be the case that Postgres cannot make a strong general case
 that it protects against covert channels. However it may still be able
 to make the much weaker case that it is *possible* to arrange your
 database such that the covert channels are kept under control.

Yes. I have to admit it is difficult to determine a strict and regular rule
to handle covert-channel scenario.
Sorry, the later half of this sentence is uncertain for me.
Are you saying, even if we could have a strict rule, we may have many
possible covert channel for information leakage??

 So I think up above Tom is wrong about why it's ok that INSERT leaks
 keys when it reports a unique key violation. The reason why it's ok
 that there's a covert channel there is that the DBA can avoid that
 covert channel by being careful when creating unique constraints. He
 or she should be aware that creating a unique constraint implicitly
 provides a kind of limited access to data to users who have INSERT
 privilege even if they lack the real SELECT privilege.

IIRC, we discussed and concluded that the above information leakage
scenario shall be described in the documentation, and the way to
avoid valuable information leakage using alternative keys, a few years
before.

 Likewise, as long as the covert channels in RLS are things the DBA has
 even a modicum of control over I wouldn't be too worried. Afaict from
 skimming the thread it looks like creating any indexes on columns
 leaks what values of the index key exist in the table. Is it the case
 that non-indexed columns do not get leaked?

According to the scenario reported by Korotkov, he could find number
of rows being filtered by the given qualifier, thus it implies existence of
the row with a value in a particular range.
Its solution is that I noted above, and I'm working for it now.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-04 Thread Tom Lane

Boguk, Maksym maks...@fast.au.fujitsu.com writes:
 Hi,  my task is implementing ANSI NATIONAL character string types as
 part of PostgreSQL core.

No, that's not a given.  You have a problem to solve, ie store some UTF8
strings in a database that's mostly just 1-byte data.  It is not clear
that NATIONAL CHARACTER is the best solution to that problem.  And I don't
think that you're going to convince anybody that this is an improvement in
spec compliance, because there's too much gap between what you're doing
here and what it says in the spec.

 Both of these approach requires dump/restore the whole database which is
 not always an opinion.

That's a disadvantage, agreed, but it's not a large enough one to reject
the approach, because what you want to do also has very significant
disadvantages.

I think it is extremely likely that we will end up rejecting a patch based
on NATIONAL CHARACTER altogether.  It will require too much duplicative
code, it requires too many application-side changes to make use of the
functionality, and it will break any applications that are relying on the
current behavior of that syntax.  But the real problem is that you're
commandeering syntax defined in the SQL spec for what is in the end quite
a narrow usage.  I agree that the use-case will be very handy for some
applications ... but if we were ever to try to achieve real spec
compliance for the SQL features around character sets, this doesn't look
like a step on the way to that.

I think you'd be well advised to take a hard look at the
specialized-database-encoding approach.  From here it looks like a 99%
solution for about 1% of the effort; and since it would be quite
uninvasive to the system as a whole, it's unlikely that such a patch
would get rejected.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Robert Haas

On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the security-barrier view stuff did not present itself as a 100%
 solution.  But perhaps more to the point, it was conceptually simple to
 implement, ie don't flatten views if they have this bit set, and don't
 push down quals into such sub-selects unless they're marked leakproof.

Right.  IMHO, this new feature should be similarly simple: when an
unprivileged user references a table, treat that as a reference to a
leakproof view over the table, with the RLS qual injected into the
view.

 I haven't reviewed this patch in a long time, but I would
 expect that it's basically just reusing that same infrastructure; in
 fact, I'd expect that it's little more than syntactic sugar around
 that infrastructure.

 I've not read it in great detail, but it isn't that.  It's whacking the
 planner around in ways that I have no confidence in, and probably still
 wouldn't have any confidence in if they'd been adequately documented.

If that's the case, then I agree that we should not accept it, at
least in its present form.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric？

2013-09-04 Thread Jeff Janes

On Tue, Sep 3, 2013 at 9:08 PM,  wangs...@highgo.com.cn wrote:
 Hi, Hackers!

 I find that it takes a long time when I increase the scale of a numeric
 datatype.
 By checking the code, I found that's because it needs to rewrite that
 table's file.
 After checking that table's data file, I found only parameter n_header
 changed.
 And, I found the data in that numeric field never changed.
 So I thank It's not necessary to rewrite the table's file in this case.

 Anyone has more idea about this, please come to talk about this!

This was fixed in version 9.2.  You must be using an older version.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-09-04 Thread Peter Eisentraut

On 7/19/13 11:59 AM, Atri Sharma wrote:
 Hi all,
 
 This is our current work-in-progress patch for WITHIN GROUP.

This patch needs to be rebased.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kohei KaiGai

2013/9/4 Kevin Grittner kgri...@ymail.com:
 Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:

 IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference
 to a leakproof view over the table, with the RLS qual injected
 into the view.

 And for insert/update/delete, we do what exactly?

 The same mechanism will prevent UPDATE and DELETE from seeing any
 rows the user shouldn't be able to touch.

 +1

 Simon and Greg are arguing that when an INSERT or UPDATE happens,
 we ought to also check that the NEW row matches the RLS qual.  I
 don't find that to be terribly important because you can
 accomplish the same thing with a per-row trigger today; and I
 also don't think everyone will want that behavior.

 As an example from my Wisconsin Courts days, source documents come
 in which need to be entered, which may contain a Social Security
 Number, and most of the Clerk of Courts staff should be authorized
 to enter that into the database.  Once it is entered, most people
 should not be allowed to view it, including many of the people who
 were authorized to enter it initially.  It's one thing for a line
 staff person to have a handful of documents come across their desk
 with SSN on a given day; it's quite another if they could dump a
 list of names, addresses, dates of birth, and SSNs for the entire
 database on demand.

 In a way this issue is similar to the covert channel issue --
 volume matters.

I think an important nature of this behavior is it is configurable.

In case when both of reader and writer side need to have same
security policy, it's good. One configuration allows to apply
a consistent security policy to fetch a row from table, and to
write a row to table.

If they don't want to check security policy on writer side,
all they need to do is setting a security policy for SELECT
only, even though its functionality is not implemented yet.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr

2013-09-04 Thread Alvaro Herrera


Is this transformation correct?  If I read this correctly, you're
missing the rangeTableEntry_used() condition, no?

 *** a/src/backend/rewrite/rewriteHandler.c
 --- b/src/backend/rewrite/rewriteHandler.c
 ***
 *** 1273,1287  matchLocks(CmdType event,
   }
   }
   
 ! if (oneLock-event == event)
 ! {
 ! if (parsetree-commandType != CMD_SELECT ||
 ! (oneLock-attrno == -1 ?
 !  rangeTableEntry_used((Node *) parsetree, 
 varno, 0) :
 !  attribute_used((Node *) parsetree,
 ! varno, 
 oneLock-attrno, 0)))
 ! matching_locks = lappend(matching_locks, 
 oneLock);
 ! }
   }
   
   return matching_locks;
 --- 1273,1280 
   }
   }
   
 ! if (oneLock-event == event  parsetree-commandType != 
 CMD_SELECT)
 ! matching_locks = lappend(matching_locks, oneLock);
   }
   
   return matching_locks;



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Jeff Davis

On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  I think code complexity matters quite a lot. If we can eliminate some
  complex code in a complex area, and all we give up is a feature with
  essentially no use case, that sounds like we're moving in the right
  direction to me.
 
 Isn't this whole discussion academic in view of Andres' point?

Maybe complex code was an overstatement. We'd be able to eliminate the
XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of
XLogCtlInsert; and make xlog.c slightly shorter in the process.

The first time I looked at doing the patch to honor full_page_writes=off
when checksums are on, the fact that fullPageWrites was changeable was a
distraction. Since I saw little or no value in what the code offered, my
instinct was to see if we could get rid of it.

It looks like Simon went to significant effort to maintain the
full_page_writes as a PGC_SUGHUP:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8366c780

Maybe he has the best perspective on the value versus complexity?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-09-04 Thread Atri Sharma



Sent from my iPad

On 04-Sep-2013, at 21:38, Peter Eisentraut pete...@gmx.net wrote:

 On 7/19/13 11:59 AM, Atri Sharma wrote:
 Hi all,
 
 This is our current work-in-progress patch for WITHIN GROUP.
 
 This patch needs to be rebased.
 
This version of patch is quite old.We will be sending an updated patch before 
the start of September commitfest, with all the points you mentioned taken care 
of.

Thanks for the points.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr

2013-09-04 Thread Kevin Grittner

This was previously discussed here:

http://www.postgresql.org/message-id/flat/24836.1370713...@sss.pgh.pa.us#24836.1370713...@sss.pgh.pa.us

The attached patch implements what I think we agreed on.

To recap, ev_attr was present in pg_rewrite at the point that
Postgres95 version 1.01 source code was imported to version
control, with a default of -1 to mean all columns.  It became
obsolete in 2002 with commit
95ef6a344821655ce4d0a74999ac49dd6af6d342, which went into
PostgreSQL version 7.3, removing the ability to define a rule on a
specific column; however, this column and over 100 lines of
vestigial code was left behind.  Later code was written as though 0
was used to mean all columns, as is done elsewhere in the code,
although pre-existing code was not changed to match.  That
inconsistency didn't much matter since there was no way to define
anything which exercised the code, short of hacking the system
tables directly.

The patch removes the obsolete column from pg_rewrite, and all the
vestigial code I was able to find.  The justification for the patch
is to eliminate over 100 lines of code from an area which is
confusing enough without it.

Unless someone has an objection or thinks this needs to go through
the CF process, I will commit it tomorrow, with a catversion bump.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 5039,5052 
   /row
  
   row
-   entrystructfieldev_attr/structfield/entry
-   entrytypeint2/type/entry
-   entry/entry
-   entryThe column this rule is for (currently, always -1 to
-   indicate the whole table)/entry
-  /row
- 
-  row
entrystructfieldev_type/structfield/entry
entrytypechar/type/entry
entry/entry
--- 5039,5044 
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***
*** 58,64  static Oid
  InsertRule(char *rulname,
  		   int evtype,
  		   Oid eventrel_oid,
- 		   AttrNumber evslot_index,
  		   bool evinstead,
  		   Node *event_qual,
  		   List *action,
--- 58,63 
***
*** 86,92  InsertRule(char *rulname,
  	namestrcpy(rname, rulname);
  	values[Anum_pg_rewrite_rulename - 1] = NameGetDatum(rname);
  	values[Anum_pg_rewrite_ev_class - 1] = ObjectIdGetDatum(eventrel_oid);
- 	values[Anum_pg_rewrite_ev_attr - 1] = Int16GetDatum(evslot_index);
  	values[Anum_pg_rewrite_ev_type - 1] = CharGetDatum(evtype + '0');
  	values[Anum_pg_rewrite_ev_enabled - 1] = CharGetDatum(RULE_FIRES_ON_ORIGIN);
  	values[Anum_pg_rewrite_is_instead - 1] = BoolGetDatum(evinstead);
--- 85,90 
***
*** 117,123  InsertRule(char *rulname,
  		 * When replacing, we don't need to replace every attribute
  		 */
  		MemSet(replaces, false, sizeof(replaces));
- 		replaces[Anum_pg_rewrite_ev_attr - 1] = true;
  		replaces[Anum_pg_rewrite_ev_type - 1] = true;
  		replaces[Anum_pg_rewrite_is_instead - 1] = true;
  		replaces[Anum_pg_rewrite_ev_qual - 1] = true;
--- 115,120 
***
*** 238,244  DefineQueryRewrite(char *rulename,
     List *action)
  {
  	Relation	event_relation;
- 	int			event_attno;
  	ListCell   *l;
  	Query	   *query;
  	bool		RelisBecomingView = false;
--- 235,240 
***
*** 495,501  DefineQueryRewrite(char *rulename,
  	/*
  	 * This rule is allowed - prepare to install it.
  	 */
- 	event_attno = -1;
  
  	/* discard rule if it's null action and not INSTEAD; it's a no-op */
  	if (action != NIL || is_instead)
--- 491,496 
***
*** 503,509  DefineQueryRewrite(char *rulename,
  		ruleId = InsertRule(rulename,
  			event_type,
  			event_relid,
- 			event_attno,
  			is_instead,
  			event_qual,
  			action,
--- 498,503 
*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
***
*** 1273,1287  matchLocks(CmdType event,
  			}
  		}
  
! 		if (oneLock-event == event)
! 		{
! 			if (parsetree-commandType != CMD_SELECT ||
! (oneLock-attrno == -1 ?
!  rangeTableEntry_used((Node *) parsetree, varno, 0) :
!  attribute_used((Node *) parsetree,
! varno, oneLock-attrno, 0)))
! matching_locks = lappend(matching_locks, oneLock);
! 		}
  	}
  
  	return matching_locks;
--- 1273,1280 
  			}
  		}
  
! 		if (oneLock-event == event  parsetree-commandType != CMD_SELECT)
! 			matching_locks = lappend(matching_locks, oneLock);
  	}
  
  	return matching_locks;
***
*** 1295,1301  static Query *
  ApplyRetrieveRule(Query *parsetree,
    RewriteRule *rule,
    int rt_index,
-   bool relation_level,
    Relation relation,
    List *activeRIRs,
    bool forUpdatePushedDown)
--- 1288,1293 
***
*** 1309,1316  ApplyRetrieveRule(Query *parsetree,
  		elog(ERROR, expected just one rule action);
  	if (rule-qual != NULL)

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Kohei KaiGai

2013/9/4 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right.  IMHO, this new feature should be similarly simple: when an
 unprivileged user references a table, treat that as a reference to a
 leakproof view over the table, with the RLS qual injected into the
 view.

 And for insert/update/delete, we do what exactly?

 The same mechanism will prevent UPDATE and DELETE from seeing any rows
 the user shouldn't be able to touch.

 No, it won't, because we don't support direct update/delete on views
 (and if you look, you'll notice the auto-updatable-view stuff doesn't
 think a security-barrier view is auto-updatable).

 AFAICT, to deal with update/delete the RLS patch needs to constrain order
 of qual application without the crutch of having a separate level of
 subquery; and it's that behavior that I have zero confidence in, either
 as to whether it works as submitted or as to our odds of not breaking it
 in the future.

Are you suggesting to rewrite update / delete statement to filter out
unprivileged rows from manipulation?
Yes. I also thought it is a simple solution that does not need additional
enhancement to allow update / delete to take sub-query on top of reader
side plan.

For example, if security policy is (t1.owner = current_user) and the given
query was UPDATE t1 SET value = value || '_updated' WHERE value like '%abc%',
this query may be able to rewritten as follows:
  UPDATE t1 SET value = value || '_updated' WHERE tid = (
SELECT tid FROM t1 WHERE t1.owner = current_user
  ) AND value like '%abc%';

This approach makes implementation simple, but it has to scan the
relation twice, thus its performance it not ideal, according to the
past discussion.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Jeff Davis

On Wed, 2013-09-04 at 17:00 +0200, Andres Freund wrote:
 On 2013-09-04 07:57:15 -0700, Jeff Davis wrote:
  XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at
  the full page writes setting (like in XLogInsert()). That means, if
  checksums are enabled and full_page_writes is off, we'll still write
  some full page images for checksums. I'd like to remedy that.
 
 I don't think that's really as easy as it sounds without removing the
 ability to do base backups with full_page_writes = off. The interlocking
 that would require makes things complex...

I didn't dig into that part yet. I was mostly distracted by the code to
support changing full_page_writes with SIGHUP.

One option would be to have XLogInsert return early if full_page_writes
is off, it's an XLOG_FPI record, and forcePageWrites is off.

 Personally I'd rather forbid enabling checkpoints in the combination
 with full_page_writes = off. That doesn't seem like a good idea to me
 and I am far from convinced it's actually going to work in all corner cases.

Hmm. It's good to be cautious when deploying on a less-common
configuration. However, I don't think it's a good idea to reject
seemingly valid combinations that are supposed to work due to a lack of
confidence in the review/testing process.

Might be an area warranting some further review and testing; I'll take a
look, but feel free to tell me if you can think of specific problem
areas.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr

2013-09-04 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Unless someone has an objection or thinks this needs to go through
 the CF process, I will commit it tomorrow, with a catversion bump.

Shouldn't attribute_used() be removed from rewriteManip.h?

I was a bit surprised by your removal of the rangeTableEntry_used()
test in the hunk at rewriteHandler.c:1273ff.  That's probably all
right, but it takes this out of the realm of a mechanical change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Robert Haas

On Wed, Sep 4, 2013 at 11:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The same mechanism will prevent UPDATE and DELETE from seeing any rows
 the user shouldn't be able to touch.

 No, it won't, because we don't support direct update/delete on views
 (and if you look, you'll notice the auto-updatable-view stuff doesn't
 think a security-barrier view is auto-updatable).

 AFAICT, to deal with update/delete the RLS patch needs to constrain order
 of qual application without the crutch of having a separate level of
 subquery; and it's that behavior that I have zero confidence in, either
 as to whether it works as submitted or as to our odds of not breaking it
 in the future.

I don't really see why.  AIUI, the ModifyTable node just needs to get
the right TIDs.  It's not like that has to be stacked directly on top
of a scan; indeed, in cases like UPDATE .. FROM and DELETE .. USING it
already isn't.  Maybe there's some reason why the intervening level
can be a Join but not a  SubqueryScan, but if so I'd expect we could
find some way of lifting that limitation without suffering too much
pain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-04 Thread Robert Haas

On Tue, Sep 3, 2013 at 3:13 AM,  wangs...@highgo.com.cn wrote:
 于 2013-09-03 08:15, David Johnston 回复:

 Jeff Davis-8 wrote

 Is there any semantic difference between marking a constraint as
 DISABLED and simply dropping it? Or does it just make it easier to
 re-add it later?


 David Johnston wrote:

 I cannot answer the question but if there is none then the main concern
 I'd
 have is capturing meta-information about WHY such a constraint has been
 disabled instead of dropped.


 Drop/build and disable/enable constraint has no fundamental difference,
 and could achieve the same purpose.What I do also more convenient for the
 user.
 Recording the disabled constraints is easier than recoding all the
 constrains.
 What's more, a lot of people ever asked about turing off constraint and
 The sql2008 support this.So I think it's necessary in some ways.

Please add your patch to the upcoming CommitFest so we don't forget about it.

https://commitfest.postgresql.org/action/commitfest_view/open

Please see also https://wiki.postgresql.org/wiki/Submitting_a_Patch

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Peter Eisentraut

On 7/8/13 10:05 AM, Pavel Stehule wrote:
 I am testing your code, and It increase speed of sum about 24% faster
 then original implementation.

This patch needs to be rebased (and/or the later version registered in
the commit fest).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr

2013-09-04 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:

 Shouldn't attribute_used() be removed from rewriteManip.h?

Yeah, I don't know how I missed that.  Thanks.

 I was a bit surprised by your removal of the
 rangeTableEntry_used() test in the hunk at
 rewriteHandler.c:1273ff.  That's probably all right, but it takes
 this out of the realm of a mechanical change.

[ also questioned by Álvaro ]

I'll leave that as it was -- it can be discussed separately from
the mechanical changes.

New patch attached.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 5039,5052 
   /row
  
   row
-   entrystructfieldev_attr/structfield/entry
-   entrytypeint2/type/entry
-   entry/entry
-   entryThe column this rule is for (currently, always -1 to
-   indicate the whole table)/entry
-  /row
- 
-  row
entrystructfieldev_type/structfield/entry
entrytypechar/type/entry
entry/entry
--- 5039,5044 
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***
*** 58,64  static Oid
  InsertRule(char *rulname,
  		   int evtype,
  		   Oid eventrel_oid,
- 		   AttrNumber evslot_index,
  		   bool evinstead,
  		   Node *event_qual,
  		   List *action,
--- 58,63 
***
*** 86,92  InsertRule(char *rulname,
  	namestrcpy(rname, rulname);
  	values[Anum_pg_rewrite_rulename - 1] = NameGetDatum(rname);
  	values[Anum_pg_rewrite_ev_class - 1] = ObjectIdGetDatum(eventrel_oid);
- 	values[Anum_pg_rewrite_ev_attr - 1] = Int16GetDatum(evslot_index);
  	values[Anum_pg_rewrite_ev_type - 1] = CharGetDatum(evtype + '0');
  	values[Anum_pg_rewrite_ev_enabled - 1] = CharGetDatum(RULE_FIRES_ON_ORIGIN);
  	values[Anum_pg_rewrite_is_instead - 1] = BoolGetDatum(evinstead);
--- 85,90 
***
*** 117,123  InsertRule(char *rulname,
  		 * When replacing, we don't need to replace every attribute
  		 */
  		MemSet(replaces, false, sizeof(replaces));
- 		replaces[Anum_pg_rewrite_ev_attr - 1] = true;
  		replaces[Anum_pg_rewrite_ev_type - 1] = true;
  		replaces[Anum_pg_rewrite_is_instead - 1] = true;
  		replaces[Anum_pg_rewrite_ev_qual - 1] = true;
--- 115,120 
***
*** 238,244  DefineQueryRewrite(char *rulename,
     List *action)
  {
  	Relation	event_relation;
- 	int			event_attno;
  	ListCell   *l;
  	Query	   *query;
  	bool		RelisBecomingView = false;
--- 235,240 
***
*** 495,501  DefineQueryRewrite(char *rulename,
  	/*
  	 * This rule is allowed - prepare to install it.
  	 */
- 	event_attno = -1;
  
  	/* discard rule if it's null action and not INSTEAD; it's a no-op */
  	if (action != NIL || is_instead)
--- 491,496 
***
*** 503,509  DefineQueryRewrite(char *rulename,
  		ruleId = InsertRule(rulename,
  			event_type,
  			event_relid,
- 			event_attno,
  			is_instead,
  			event_qual,
  			action,
--- 498,503 
*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
***
*** 1276,1285  matchLocks(CmdType event,
  		if (oneLock-event == event)
  		{
  			if (parsetree-commandType != CMD_SELECT ||
! (oneLock-attrno == -1 ?
!  rangeTableEntry_used((Node *) parsetree, varno, 0) :
!  attribute_used((Node *) parsetree,
! varno, oneLock-attrno, 0)))
  matching_locks = lappend(matching_locks, oneLock);
  		}
  	}
--- 1276,1282 
  		if (oneLock-event == event)
  		{
  			if (parsetree-commandType != CMD_SELECT ||
! rangeTableEntry_used((Node *) parsetree, varno, 0))
  matching_locks = lappend(matching_locks, oneLock);
  		}
  	}
***
*** 1295,1301  static Query *
  ApplyRetrieveRule(Query *parsetree,
    RewriteRule *rule,
    int rt_index,
-   bool relation_level,
    Relation relation,
    List *activeRIRs,
    bool forUpdatePushedDown)
--- 1292,1297 
***
*** 1309,1316  ApplyRetrieveRule(Query *parsetree,
  		elog(ERROR, expected just one rule action);
  	if (rule-qual != NULL)
  		elog(ERROR, cannot handle qualified ON SELECT rule);
- 	if (!relation_level)
- 		elog(ERROR, cannot handle per-attribute ON SELECT rule);
  
  	if (rt_index == parsetree-resultRelation)
  	{
--- 1305,1310 
***
*** 1632,1645  fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
  			if (rule-event != CMD_SELECT)
  continue;
  
- 			if (rule-attrno  0)
- 			{
- /* per-attr rule; do we need it? */
- if (!attribute_used((Node *) parsetree, rt_index,
- 	rule-attrno, 0))
- 	continue;
- 			}
- 
  			locks = lappend(locks, rule);
  		}
  
--- 1626,1631 
***
*** 1664,1670  fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
  parsetree = ApplyRetrieveRule(parsetree,
  			  rule,

Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Pavel Stehule

2013/9/4 Peter Eisentraut pete...@gmx.net

 On 7/8/13 10:05 AM, Pavel Stehule wrote:
  I am testing your code, and It increase speed of sum about 24% faster
  then original implementation.

 This patch needs to be rebased (and/or the later version registered in
 the commit fest).


I updated a commit fest info

Regards

Pavel

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Andres Freund

On 2013-09-04 09:23:20 -0700, Jeff Davis wrote:
 On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote:
  Jeff Davis pg...@j-davis.com writes:
   I think code complexity matters quite a lot. If we can eliminate some
   complex code in a complex area, and all we give up is a feature with
   essentially no use case, that sounds like we're moving in the right
   direction to me.
  
  Isn't this whole discussion academic in view of Andres' point?
 
 Maybe complex code was an overstatement. We'd be able to eliminate the
 XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of
 XLogCtlInsert; and make xlog.c slightly shorter in the process.

That path is also executed during a normal restart and during
promotion. Check the invocation of UpdateFullPageWrites() in
StartupXLOG(). Note that a standby needs to be able to follow a
primaries full_page_writes setting during a promotion.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Tomas Vondra

On 4.9.2013 07:24, Satoshi Nagayasu wrote:
 Hi,
 
 (2013/09/04 12:52), Atri Sharma wrote:
 On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp
  wrote:
 Hi,
 
 I'm considering overhauling pgstat.stat, and would like to know 
 how many people are interested in this topic.
 
 As you may know, this file could be handreds of MB in size, 
 because pgstat.stat holds all access statistics in each database,
 and it needs to read/write an entire pgstat.stat frequently.
 
 As a result, pgstat.stat often generates massive I/O operation, 
 particularly when having a large number of tables in the 
 database.
 
 To support multi-tenancy or just a large number of tables (up to
  10k tables in single database), I think pgstat.stat needs to be
  overhauled.
 
 I think using heap and btree in pgstat.stat would be preferred to
 reduce read/write and to allow updating access statistics for 
 specific tables in pgstat.stat file.
 
 Is this good for us?
 
 Hi,
 
 Nice thought. I/O reduction in pgstat can be really helpful.
 
 I am trying to think of our aim here. Would we be looking to split 
 pgstat per table, so that the I/O write happens for only a portion
 of pgstat? Or reduce the I/O in general?
 
 I prefer the latter.
 
 Under the current implementation, DBA need to split single database 
 into many smaller databases with considering access locality of the 
 tables. It's difficult and could be change in future.
 
 And splitting the statistics data into many files (per table, for 
 example) would cause another performance issue when 
 collecting/showing statistics at once. Just my guess though.
 
 So, I'm looking for a new way to reduce I/O for the statistics data 
 in general.

Hi,

as one of the authors of the 9.3 patch (per database stats), I planned
to work on this a bit more in the 9.4 cycle. So a few comments / ideas.

I'm not entirely sure splitting the stats per table would be that bad.
After all we already have per-relation data files, so either the users
already have serious problems (so this won't make it noticeably worse)
or it will work fine. But I'm not saying it's the right choice either.

My idea was to keep the per-database stats, but allow some sort of
random access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
index - a mapping of OID to position in the stat file.

I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

  1: get position from the index
  2: read sizeof(Entry) from the file
  3: if it's update, just overwrite the bytes, for delete set isdeleted
 flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-09-04 Thread Tom Lane

Kohei KaiGai kai...@kaigai.gr.jp writes:
 2013/9/4 Tom Lane t...@sss.pgh.pa.us:
 And for insert/update/delete, we do what exactly?

 Regarding to update/delete, this patch also enhanced to allow update or
 delete mechanism allows to take a sub-query on top of the table scan plan.

 So, its explain output shows as follows:

 postgres= EXPLAIN (costs off) UPDATE customer SET email = 
 'al...@example.com';
 QUERY PLAN
 --
  Update on customer
-  Subquery Scan on customer
  -  Seq Scan on customer customer_1
Filter: (current_user() = uname)

 You can see this update has Subquery plan instead of regular relation scan.

Really?  That wasn't apparent from reading the patch.  (Have I mentioned
it's desperately underdocumented?  Aside from needing a lot more in-code
comments than it's got, it would benefit from having an overview section
added to optimizer/README explaining stuff at the level of this
discussion.)

I'm a bit surprised that setrefs.c doesn't eliminate the Subquery Scan as
being a no-op, given that no quals end up getting applied at that level.
You might look into why not, since if that plan node were eliminated at
the end, it'd fix any efficiency complaints about this approach.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Tomas Vondra

On 4.9.2013 14:43, Pavel Stehule wrote:
 
 
 
 2013/9/4 Atri Sharma atri.j...@gmail.com mailto:atri.j...@gmail.com
 
 
 
 Sent from my iPad
 
 On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp
 mailto:sn...@uptime.jp wrote:
 
  (2013/09/04 15:23), Atri Sharma wrote:
 
 
  Sent from my iPad
 
  On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp
 mailto:sn...@uptime.jp wrote:
 
  Hi,
 
  (2013/09/04 12:52), Atri Sharma wrote:
  On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu
 sn...@uptime.jp mailto:sn...@uptime.jp wrote:
  Hi,
 
  I'm considering overhauling pgstat.stat, and would like to
 know how many
  people are interested in this topic.
 
  As you may know, this file could be handreds of MB in size,
 because
  pgstat.stat holds all access statistics in each database, and
 it needs
  to read/write an entire pgstat.stat frequently.
 
  As a result, pgstat.stat often generates massive I/O operation,
  particularly when having a large number of tables in the database.
 
  To support multi-tenancy or just a large number of tables (up
 to 10k
  tables in single database), I think pgstat.stat needs to be
 overhauled.
 
  I think using heap and btree in pgstat.stat would be preferred
 to reduce
  read/write and to allow updating access statistics for
 specific tables
  in pgstat.stat file.
 
  Is this good for us?
 
  Hi,
 
  Nice thought. I/O reduction in pgstat can be really helpful.
 
  I am trying to think of our aim here. Would we be looking to split
  pgstat per table, so that the I/O write happens for only a
 portion of
  pgstat? Or reduce the I/O in general?
 
  I prefer the latter.
 
  Under the current implementation, DBA need to split single database
  into many smaller databases with considering access locality of the
  tables. It's difficult and could be change in future.
 
  And splitting the statistics data into many files (per table,
  for example) would cause another performance issue when
  collecting/showing statistics at once. Just my guess though.
 
  So, I'm looking for a new way to reduce I/O for the statistics data
  in general.
 
  Regards,
 
 
  If the later, how would using BTree help us? I would rather go
 for a
  range tree or something. But again, I may be completely wrong.
 
  Please elaborate a bit more on the solution we are trying to
  achieve.It seems really interesting.
 
  Regards,
 
  Atri
 
  Right,thanks.
 
  How would using heap and BTree help here? Are we looking at a
 priority queue which supports the main storage system of the stats?
 
  For example, when you read only a single block from your table,
  then you need to write all values in your database statistics next.
  It often generates large amount of i/o operation.
 
  However, if random access is allowed in the statistics, you can
  update only as single record for the specific table which you read.
  It would be less than 100 bytes for each table.
 
  I have no idea about how a priority queue can work here so far.
  However, if the statistics is overhauled, PostgreSQL would be able
  to host a much larger number of customers
 
 
 Ah, now I get it. Thanks a ton for the detailed explanation.
 
 Yes, a BTree will sufficiently isolate per table stats here and
 allow for random access.
 
 Another thing I can think of is having a write back cache which
 could probably be used for a buffer before the actual stats write. I
 am just musing here though.
 
 
 we very successfully use a tmpfs volume for pgstat files (use a backport
 of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Pavel Stehule


 
  we very successfully use a tmpfs volume for pgstat files (use a backport
  of multiple statfiles from 9.3 to 9.1

 It works quite well as long as you have the objects (tables, indexes,
 functions) spread across multiple databases. Once you have one database
 with very large number of objects, tmpfs is not as effective.

 It's going to help with stats I/O, but it's not going to help with high
 CPU usage (you're reading and parsing the stat files over and over) and
 every rewrite creates a copy of the file. So if you have 400MB stats,
 you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
 use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
 be used for page cache etc.

 OTOH, it's true that if you have that many objects, 600MB of RAM is not
 going to help you anyway.


and just idea - can we use a database for storing these files. It can be
used in unlogged tables. Second idea - hold a one bg worker as persistent
memory key value database and hold data in memory with some optimizations -
using anti cache and similar memory database fetures.

Pavel



 Tomas


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Alvaro Herrera

Tomas Vondra wrote:

 My idea was to keep the per-database stats, but allow some sort of
 random access - updating / deleting the records in place, adding
 records etc. The simplest way I could think of was adding a simple
 index - a mapping of OID to position in the stat file.
 
 I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
 something like that. This would make it quite simple to access existing
 record
 
   1: get position from the index
   2: read sizeof(Entry) from the file
   3: if it's update, just overwrite the bytes, for delete set isdeleted
  flag (needs to be added to all entries)
 
 or reading all the records (just read the whole file as today).

Sounds reasonable.  However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array.  If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further XLogInsert scaling tweaking

2013-09-04 Thread Heikki Linnakangas


On 03.09.2013 16:22, Merlin Moncure wrote:

On Mon, Sep 2, 2013 at 10:32 PM, Bruce Momjianbr...@momjian.us  wrote:

On Mon, Sep  2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 39c58d0..28e62ea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -428,8 +428,14 @@ typedef struct XLogCtlInsert
   uint64  CurrBytePos;
   uint64  PrevBytePos;

- /* insertion slots, see above for details */
- XLogInsertSlotPadded *insertSlots;
+ /*
+  * Make sure the above heavily-contended spinlock and byte positions are
+  * on their own cache line. In particular, the RedoRecPtr and full page
+  * write variables below should be on a different cache line. They are
+  * read on every WAL insertion, but updated rarely, and we don't want
+  * those reads to steal the cache line containing Curr/PrevBytePos.
+  */
+ charpad[128];


Do we adjust for cache line lengths anywhere else?  PGPROC?  Should it
be a global define?


+1 -- that is, I think it should be.


Ok, committed that way. No, we adjust for cache line lengths anywhere 
else. As Alvaro noted, LWLocks are padded, but that's just to keep them 
from crossing cache line boundaries, not to keep two lwlocks on separate 
cache lines.


Thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Robert Haas

On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 After some thinking I don't think any solution primarily based on
 holding page level locks across other index operations is going to scale
 ok.

I'd like to chime in with a large +1 for this sentiment and pretty
much everything else Andres said further downthread.  The operations
across which you're proposing to hold buffer locks seem at least an
order of magnitude too complex to get away with something like that.
Concurrent readers will block in a non-interruptible wait if they try
to access a buffer, and that's a situation that will be intolerable
if, for example, it can persist across a disk I/O.  And I don't see
any way to avoid that.

One possible alternative to inserting promises into the index pages
themselves might be to use some kind of heavyweight lock.  The way
that SIREAD locks work is not entirely dissimilar to what's needed
here, I think.  Of course, the performance implications of checking
for lots of extra locks during inserts could be pretty bad, so you'd
probably need some way of avoiding that in common cases, which I don't
know exactly how to do, but maybe there's a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Peter Eisentraut

On 9/4/13 2:26 PM, Pavel Stehule wrote:
 
 
 
 2013/9/4 Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net
 
 On 7/8/13 10:05 AM, Pavel Stehule wrote:
  I am testing your code, and It increase speed of sum about 24% faster
  then original implementation.
 
 This patch needs to be rebased (and/or the later version registered in
 the commit fest).
 
 
 I updated a commit fest info

The new patch also needs to be rebased.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-04 Thread Andres Freund

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
  0002 wal_decoding: Introduce InvalidCommandId and declare that to be the 
  new maximum for CommandCounterIncrement
 
 I'm still unconvinced we want this.

Ok, so the reason for the existance of this patch is that currently
there is no way to represent a unset CommandId. This is a problem for
the following patches because we need to log the cmin, cmax of catalog
rows and obviously there can be rows where cmax is unset.
The reason I chose to change the definition of CommandIds is that the
other ondisk types we use like TransactionIds, XLogRecPtrs and such have
an invalid type, CommandIds don't. Changing their definition to have 0
- analogous to the previous examples - as their invalid value is not a
problem because CommandIds from pg_upgraded clusters may never be used
for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem
like a problem to me. Imo the CommandIds should have been defined that
way from the start.

Makes some sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] De-normalization optimizer research project

2013-09-04 Thread arthernan

I am working on a research project that looks at an entire database access
layer written against a normalized database and suggests de-normalizations
to be done to the database structure. In effect looking at de-normalizations
as a kind of cache. I am looking for any code I can leverage to do this
work.

As documentation for the plan optimizer I found the document below. Is this
still relatively accurate?
 
http://www.postgresql.org/files/developer/optimizer.pdf

I have looked at the postgresql optimizer code years ago. And it was not too
hard to follow. I wonder if it is fairly independent from the rest of the
system. Or if anyone has suggestions of using a different optimizer
codebase. Here is a discussion I found here from years ago.

http://postgresql.1045698.n5.nabble.com/About-method-of-PostgreSQL-s-Optimizer-td1933859.html



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/De-normalization-optimizer-research-project-tp5769655.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Tomas Vondra

On 4.9.2013 20:59, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 
 My idea was to keep the per-database stats, but allow some sort of
 random access - updating / deleting the records in place, adding
 records etc. The simplest way I could think of was adding a simple
 index - a mapping of OID to position in the stat file.

 I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
 something like that. This would make it quite simple to access existing
 record

   1: get position from the index
   2: read sizeof(Entry) from the file
   3: if it's update, just overwrite the bytes, for delete set isdeleted
  flag (needs to be added to all entries)

 or reading all the records (just read the whole file as today).
 
 Sounds reasonable.  However, I think the index should be a real index,
 i.e. have a tree structure that can be walked down, not just a plain
 array.  If you have a 400 MB stat file, then you must have about 4
 million tables, and you will not want to scan such a large array every
 time you want to find an entry.

I was thinking about a sorted array, so a bisection would be a simple
and fast way to search. New items could be added to another small
unsorted array (say, 1000 elements) and this would be extended and
resorted only when this small one gets full.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Tomas Vondra

On 4.9.2013 14:13, Stephen Frost wrote:
 * Satoshi Nagayasu (sn...@uptime.jp) wrote:

 Yes, we have already split single pgstat.stat file into several 
 pieces.
 
 However, we still need to read/write large amount of statistics 
 data when we have a large number of tables in single database or 
 multiple databases being accessed. Right?
 
 Would simply also splitting per tablespace help?

I don't think that's a good solution. Forcing the users to use
tablespaces just to minimize the stats overhead is not that far from
forcing them to use multiple databases.

Tablespaces have disadvantages too, so I'd suggest to look for a
solution metting the just works criterion.

 I think the issue here is that it is necessary to write/read 
 statistics data even it's not actually changed.
 
 So, I'm wondering how we can minimize read/write operations on 
 these statistics data files with using heap and btree.
 
 It does sound like an interesting idea to use heap/btree instead but 
 I wonder about the effort involved, particularly around coordinating
 access.  We wouldn't want to end up introducing additional contention
 points by doing this..

Yes, this is definitely an important thing to keep in mind. Maintaining
the index (no matter what kind of index will be used) will cause some
contention, but I believe it will be manageable with a bit of effort.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-04 Thread Bruce Momjian

On Thu, Dec 13, 2012 at 03:31:06PM +, Peter Geoghegan wrote:
 On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
  ANALYZE does not set that value, and is not going to start doing so,
  because it doesn't scan enough of the table to derive a trustworthy
  value.
 
 I'm slightly surprised by your remarks here, because the commit
 message where the relallvisible column was added (commit
 a2822fb9337a21f98ac4ce850bb4145acf47ca27) says:
 
 Add a column pg_class.relallvisible to remember the number of pages
 that were all-visible according to the visibility map as of the last
 VACUUM
 (or ANALYZE, or some other operations that update pg_class.relpages).
 Use relallvisible/relpages, instead of an arbitrary constant, to
 estimate how many heap page fetches can be avoided during an
 index-only scan.
 
 Have I missed some nuance?

I am looking back at this issue now and I think you are correct.  The
commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
and the code matches that:

if (!inh)
vac_update_relstats(onerel,
RelationGetNumberOfBlocks(onerel),
totalrows,
-- visibilitymap_count(onerel),
hasindex,
InvalidTransactionId);

so if an index scan was not being used after an ANALYZE, it isn't a bad
allvisibile estimate but something else.  This code was in PG 9.2.


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan

On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote:
 Concurrent readers will block in a non-interruptible wait if they try
 to access a buffer, and that's a situation that will be intolerable
 if, for example, it can persist across a disk I/O.  And I don't see
 any way to avoid that.

Then I have some bad news for you - that's already quite possible.
_bt_insertonpg() is called with the very same buffer exclusive locked,
and is where we do btree page splits. The first thing that _bt_split
does is this:

/* Acquire a new page to split into */
rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);

(Obviously this may ultimately result in the storage manager extending
the index relation).

Plus the split is WAL-logged immediately afterwards, which could
result in us blocking on someone else's I/O under contention (granted,
the XLogInsert scaling patch has now considerably ameliorated that
general problem). All the while holding an exclusive lock on the same
buffer. Note also that _bt_insertonpg() is called again recursively
after a page split. And of course we WAL-log btree index tuple
insertion proper all the time.

Let me be clear about something, though: I am not at all dismissive of
Andres' concerns. I was concerned about many of the same things before
I posted the patch.

I think that Andres and I ought to re-frame this discussion a little
bit. Right now, the presumption we seem to be making, perhaps without
even realizing it, is this is about providing functionality equivalent
to MySQL's INSERT IGNORE; insert tuples proposed for insertion where
possible, otherwise do not. However, Andres and I, not to mention
almost every Postgres user, are actually much more interested in
something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE.

That's what this mechanism has to support, even if it is *technically*
possible to commit just INSERT...ON DUPLICATE KEY IGNORE and punt on
these other questions. It was silly of me not to do that up-front. So
I'm going to try and produce a patch that does this as well for my
next revision.

Maybe this will enable Andres to refute my position that the buffer
locking approach to speculative insertion/value locking may actually
be acceptable. If that gets us closer to having the feature committed
in some form, then I welcome it. I fully expect to be held to this new
standard - it would be insane to do anything less. I don't want to
throw out an old IGNORE value locking mechanism and invent a whole new
one for upserting a little bit down the line.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Analysis on backend-private memory usage (and a patch)

2013-09-04 Thread Heikki Linnakangas

I received a complaint that each backend consumes a lot of 
backend-private memory, even if it's completely idle. a lot is of 
course very subjective and how much memory is actually used depends 
heavily on the application. In this case, the database is fairly small, 
but they have 250 connections. 'top' output says that each backend is 
consuming roughly 3MB of memory (RES - SHR). That's 750 MB of 
backend-private memory, which is a significant chunk of total RAM.


So I spent some time analyzing backend memory usage, looking for any 
low-hanging fruit. This isn't *that* big an issue, so I don't think we'd 
want to do any big rearchitecting for this.


On my laptop, just starting psql, the backend uses 1632 KB of private 
memory. Running a simple query like select * from foo where i = 1 
makes no noticeable difference, but after \d (which I'm using to 
represent a somewhat more complicated query), it goes up to 1960 KB.


The largest consumer of that memory is the relcache and syscaches. After 
starting psql, without running any queries, MemoryContextStats says:


CacheMemoryContext: 817840 total in 20 blocks; 134824 free (4 chunks); 
683016 used


plus there is one sub-memorycontext for each index in the relcache, each 
using about 1KB. After \d:


CacheMemoryContext: 1342128 total in 21 blocks; 517472 free (1 chunks); 
824656 used



Another thing that can consume a lot of memory is PrivateRefCount lookup 
table. It's an array with one int32 for each shared buffer, ie. 512 KB 
for each GB of shared_buffers. See previous discussion here: 
http://www.postgresql.org/message-id/flat/1164624036.3778.107.ca...@silverbirch.site. 
That discussion didn't lead to anything, but I think there's some 
potential in turning PrivateRefCount into a tiny hash table or simply a 
linear array. Or even simpler, change it from int32 to int16, and accept 
that you will get an error if you try to hold more than 2^16 pins one a 
buffer in one backend.


One fairly simple thing we could do is to teach catcache.c to resize the 
caches. Then we could make the initial size of all the syscaches much 
smaller. At the moment, we use fairly caches for catalogs like pg_enum 
(256 entries) and pg_usermapping (128), even though most databases don't 
use those features at all. If they could be resized on demand, we could 
easily allocate them initially with just, say, 4 entries.


Attached is a patch for that. That saves about 300 KB, for a backend 
that does nothing. Resizing the caches on demand also has the benefit 
that if you have a lot more objects of some type than usual, lookups 
won't be bogged down by a too small cache. I haven't tried to measure 
that, though.


- Heikki
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index cca0572..36fbc67 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -734,9 +734,8 @@ InitCatCache(int id,
 	int			i;
 
 	/*
-	 * nbuckets is the number of hash buckets to use in this catcache.
-	 * Currently we just use a hard-wired estimate of an appropriate size for
-	 * each cache; maybe later make them dynamically resizable?
+	 * nbuckets is the initial number of hash buckets to use in this catcache.
+	 * It will be enlarged later if it becomes too full.
 	 *
 	 * nbuckets must be a power of two.  We check this via Assert rather than
 	 * a full runtime check because the values will be coming from constant
@@ -775,7 +774,8 @@ InitCatCache(int id,
 	 *
 	 * Note: we rely on zeroing to initialize all the dlist headers correctly
 	 */
-	cp = (CatCache *) palloc0(sizeof(CatCache) + nbuckets * sizeof(dlist_head));
+	cp = (CatCache *) palloc0(sizeof(CatCache));
+	cp-cc_bucket = palloc0(nbuckets * sizeof(dlist_head));
 
 	/*
 	 * initialize the cache's relation information for the relation
@@ -814,6 +814,44 @@ InitCatCache(int id,
 }
 
 /*
+ * Enlarge a catcache, doubling the number of buckets.
+ */
+static void
+RehashCatCache(CatCache *cp)
+{
+	dlist_head *newbucket;
+	int			newnbuckets;
+	int			i;
+
+	elog(DEBUG1, rehashing cache with id %d for %s; %d tups, %d buckets,
+		 cp-id, cp-cc_relname, cp-cc_ntup, cp-cc_nbuckets);
+
+	/* Allocate a new, larger, hash table. */
+	newnbuckets = cp-cc_nbuckets * 2;
+	newbucket = (dlist_head *) MemoryContextAllocZero(CacheMemoryContext, newnbuckets * sizeof(dlist_head));
+
+	/* Move all entries from old hash table to new. */
+	for (i = 0; i  cp-cc_nbuckets; i++)
+	{
+		while (!dlist_is_empty(cp-cc_bucket[i]))
+		{
+			dlist_node * node = dlist_pop_head_node(cp-cc_bucket[i]);
+			CatCTup *ct = dlist_container(CatCTup, cache_elem, node);
+			int hashIndex;
+
+			hashIndex = HASH_INDEX(ct-hash_value, newnbuckets);
+
+			dlist_push_head(newbucket[hashIndex], ct-cache_elem);
+		}
+	}
+
+	/* Switch to the new array */
+	pfree(cp-cc_bucket);
+	cp-cc_nbuckets = newnbuckets;
+	cp-cc_bucket = newbucket;
+}
+
+/*
  *		CatalogCacheInitializeCache
  *
  * This function does final initialization of a

Re: [HACKERS] dynamic shared memory

2013-09-04 Thread Jim Nasby


On 8/31/13 7:17 AM, Robert Haas wrote:

On Thu, Aug 29, 2013 at 8:12 PM, Jim Nasby j...@nasby.net wrote:

On 8/13/13 8:09 PM, Robert Haas wrote:

is removed, the segment automatically goes away (we could allow for
server-lifespan segments as well with only trivial changes, but I'm
not sure whether there are compelling use cases for that).


To clarify... you're talking something that would intentionally survive
postmaster restart? I don't see use for that either...


No, I meant something that would live as long as the postmaster and
die when it dies.


ISTM that at some point we'll want to look at putting top-level shared memory 
into this system (ie: allowing dynamic resizing of GUCs that affect shared 
memory size).

But as you said, it'd be trivial to add that later.


Other comments...

+ * If the state file is empty or the contents are garbled, it probably
means
+ * that the operating system rebooted before the data written by the
previous
+ * postmaster made it to disk.  In that case, we can just ignore it; any
shared
+ * memory from before the reboot should be gone anyway.

I'm a bit concerned about this; I know it was possible in older versions for
the global shared memory context to be left behind after a crash and needing
to clean it up by hand. Dynamic shared mem potentially multiplies that by
100 or more. I think it'd be worth changing dsm_write_state_file so it
always writes a new file and then does an atomic mv (or something similar).


I agree that the possibilities for leftover shared memory segments are
multiplied with this new facility, and I've done my best to address
that.  However, I don't agree that writing the state file in a
different way would improve anything.


Wouldn't it protect against a crash while writing the file? I realize the odds 
of that are pretty remote, but AFAIK it wouldn't cost that much to write a new 
file and do an atomic mv...


+* If some other backend exited uncleanly, it might have corrupted
the
+* control segment while it was dying.  In that case, we warn and
ignore
+* the contents of the control segment.  This may end up leaving
behind
+* stray shared memory segments, but there's not much we can do
about
+* that if the metadata is gone.

Similar concern... in this case, would it be possible to always write
updates to an un-used slot and then atomically update a pointer? This would
be more work than what I suggested above, so maybe just a TODO for now...

Though... is there anything a dying backend could do that would corrupt the
control segment to the point that it would screw up segments allocated by
other backends and not related to the dead backend? Like marking a slot as
not used when it is still in use and isn't associated with the dead backend?


Sure.  A messed-up backend can clobber the control segment just as it
can clobber anything else in shared memory.  There's really no way
around that problem.  If the control segment has been overwritten by a
memory stomp, we can't use it to clean up.  There's no way around that
problem except to not the control segment, which wouldn't be better.


Are we trying to protect against memory stomps when we restart after a 
backend dies? I thought we were just trying to ensure that all shared data structures 
were correct and consistent. If that's the case, then I was thinking that by using a 
pointer that can be updated in a CPU-atomic fashion we know we'd never end up with a 
corrupted entry that was in use; the partial write would be to a slot with nothing 
pointing at it so it could be safely reused.

Like I said before though, it may not be worth worrying about this case right 
now.


Should dsm_impl_op sanity check the arguments after op? I didn't notice
checks in the type-specific code but I also didn't read all of it... are we
just depending on the OS to sanity-check?


Sanity-check for what?


Presumably there's limits to what the arguments can be rationally set to. IIRC 
there's nothing down-stream that's checking them in our code, so I'm guessing 
we're just depending on the kernel to sanity-check.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analysis on backend-private memory usage (and a patch)

2013-09-04 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 One fairly simple thing we could do is to teach catcache.c to resize the 
 caches. Then we could make the initial size of all the syscaches much 
 smaller.

I think this is attractive for the *other* reason you mention, namely
preserving reasonable performance when a catcache grows larger than
expected; but I'm pretty skeptical of nickel-and-diming caches that are
already really small.  Is it really worth cutting the TSPARSER caches
from 4 pointers to 2 for instance?

What concerns me about initially-undersized caches is that we'll waste
space and time in the enlargement process.  I'd suggest trying to get some
numbers about the typical size of each cache in a backend that's done a
few things (not merely started up --- we should not be optimizing for the
case of connections that get abandoned without running any queries).
Then set the initial size to the next larger power of 2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 regression

2013-09-04 Thread Robert Haas

On Wed, Sep 4, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen Frost sfr...@snowman.net writes:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 I'd vote for adding zeroing *after* the fallocate() first. That's what's
 suggested by kernel hackers and what several other large applications
 do. As it looks like it's what we would have to do if we ever get to use
 fallocate for relation extension where we would have actual benefits
 from it.

 Does that actually end up doing anything different from what we were
 doing pre-patch here?  At best, it *might* end up using a larger extent,
 but unless we can actually be confident that it does, I'm not convinced
 the additional complexity is worth it and would rather see this simply
 reverted.

 One might ask why the kernel guys aren't doing this themselves or
 figuring out why it's necessary to make it worthwhile.

 The larger picture is that that isn't the committed behavior,
 but a different one, one which would need performance testing.

 At this point, I vote for reverting the patch and allowing it to be
 resubmitted for a fresh round of testing with the zeroing added.
 And this time we'll need to do the testing more carefully.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Andres Freund

On 2013-09-04 15:01:57 -0700, Peter Geoghegan wrote:
 On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote:
  Concurrent readers will block in a non-interruptible wait if they try
  to access a buffer, and that's a situation that will be intolerable
  if, for example, it can persist across a disk I/O.  And I don't see
  any way to avoid that.
 
 Then I have some bad news for you - that's already quite possible.
 _bt_insertonpg() is called with the very same buffer exclusive locked,
 and is where we do btree page splits. The first thing that _bt_split
 does is this:
 
 /* Acquire a new page to split into */
 rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);
 
 (Obviously this may ultimately result in the storage manager extending
 the index relation).

I don't think that's an argument for much TBH. Those operations are way
much less heavyweight than the ones you're proposing to hold the pages
locked over and there actually is a forward guarantee. And it's very
hard to avoid locking a page exlusively once you've decided that you
need to split the page. You cannot just release the lock while you look
for a victim buffer.

 I think that Andres and I ought to re-frame this discussion a little
 bit. Right now, the presumption we seem to be making, perhaps without
 even realizing it, is this is about providing functionality equivalent
 to MySQL's INSERT IGNORE; insert tuples proposed for insertion where
 possible, otherwise do not. However, Andres and I, not to mention
 almost every Postgres user, are actually much more interested in
 something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE.

Yes, the promises approach gets more advantageous if you think about
UPSERT because most of the work will be paid of when the UPDATE occurs.

 Maybe this will enable Andres to refute my position that the buffer
 locking approach to speculative insertion/value locking may actually
 be acceptable.

Sorry to be harsh here, but I don't think I need to do that. I've
explained most of the reasons I see that that approach won't work out
and so far I don't see those refuted. And to me those issues seem to be
fatal for the approach. If you find a solution to the problems noted
uppon - great. So far it seems neither Robert nor me see how that is
possible, but that obviously doesn't mean it's impossible that you find
a way.
But why should I argue further until you proof me wrong (newer patch or
explaining changed algorithms)? If you don't think my arguments are
valid, well, I've brought those up I see as relevant and that's
it. Can't do much further.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan

On Wed, Sep 4, 2013 at 3:39 PM, Andres Freund and...@2ndquadrant.com wrote:
 Sorry to be harsh here, but I don't think I need to do that. I've
 explained most of the reasons I see that that approach won't work out
 and so far I don't see those refuted. And to me those issues seem to be
 fatal for the approach. If you find a solution to the problems noted
 uppon - great. So far it seems neither Robert nor me see how that is
 possible, but that obviously doesn't mean it's impossible that you find
 a way.

It seems you've misunderstood. My position was that I don't think it's
terribly useful to have a discussion about approaches to value locking
without considering how that needs to fit in with row locking too. So
it makes sense as an immediate goal to introduce that into the patch.
Any scheme is going to be constrained by having to think about the
interplay with value locking and row locking going forward.

For example, in my scheme, I couldn't block on locking the row if that
meant that buffer locks would be held indefinitely. There are also
deadlock hazards there for either scheme that must be carefully
considered.

What possible objection could you have? Suppose it was the case that I
was dead set on using buffer locking like this, because I'm stubborn
or whatever. I've just made life harder for myself, while probably not
also putting the same degree of burden on alternative proposals. Maybe
I am stubborn, but I don't think I'm stubborn about the basic approach
taken in this particular patch. I've merely been pointing out, as I
feel is my role as a participant in the community's adversarial system
of reaching agreement, the problems that exist with your proposal, and
some inconsistencies in your objections to mine.

Obviously the basic approach will remain the most difficult and
probably controversial part of this. Even if I threw my hands up and
immediately accepted everything you said, that would still be true. We
need to get all of the constraints in place sooner rather than later.

 But why should I argue further until you proof me wrong (newer patch or
 explaining changed algorithms)?

I didn't ask you to. You shouldn't.

 If you don't think my arguments are
 valid, well, I've brought those up I see as relevant and that's
 it. Can't do much further.

Uh, I just said that I thought your arguments were totally valid. I
couldn't have been clearer about that. Actually, I'm pretty surprised
that you haven't been the one insisting that I add a row locking
component from quite early on for exactly these reasons.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Andres Freund

Hi,

We seem to be miscommunication a bit. 

You've proposed an framework and algorithm for something I'd really, really 
like to see. I don't think that it can work explicitly as you proposed, so I 
roughly sketched out a solution I can see.
I don't want my idea to win, I want a idea to win. I haven't fleshed out my 
idea to the point where I would consider it something ready to implement or 
something.
You're the patch author here whose plans are laid open to be scrutinized ;). If 
you think my idea has merit, use and adapt it to reality. If not, find another, 
better, solution.

Even if our path to that goal is confrontational at times, the goal is to find 
a good solution, not the argument itself.

I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
about the fancier version in that case.

Regards,

Andres
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan

On Wed, Sep 4, 2013 at 5:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't want my idea to win, I want a idea to win.

I know. I want the same thing.

 You're the patch author here whose plans are laid open to be scrutinized ;). 
 If you think my idea has merit, use and adapt it to reality. If not, find 
 another, better, solution.

Sure.

 Even if our path to that goal is confrontational at times, the goal is to 
 find a good solution, not the argument itself.

Agreed.

 I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
 scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
 about the fancier version in that case.

I see where you're coming from, but my point is precisely that adding
a row locking component *isn't* fancier. I've come to realize that
it's an integral part of the patch, and that my previous omission of
row locking - and the subsequent defence of that decision I made in
passing - was ridiculous. In a world where IGNORE/not locking is a
feature we support, it can only exist as an adjunct to ON DUPLICATE
KEY LOCK - certainly not the other way around.

The tail cannot be allowed to wag the dog.

In posting the patch with a row locking component, I'll only be asking
you to consider that aspect separately. You may find that seeing the
problems I encounter and how I handle them will make you (or others)
re-assess your thoughts on value locking in a direction that nobody
expects right now. Equally, I myself may reassess things.

Now, I don't guarantee that that's the case, but it certainly seems
very possible. And so even if I were to concede right now that the
buffer locking approach is not workable, I feel it would be a little
premature to seriously get down to talking about the alternatives in
detail.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Gavin Flower


On 05/09/13 08:26, Robert Haas wrote:

On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote:

After some thinking I don't think any solution primarily based on
holding page level locks across other index operations is going to scale
ok.

I'd like to chime in with a large +1 for this sentiment and pretty
much everything else Andres said further downthread.  The operations
across which you're proposing to hold buffer locks seem at least an
order of magnitude too complex to get away with something like that.
Concurrent readers will block in a non-interruptible wait if they try
to access a buffer, and that's a situation that will be intolerable
if, for example, it can persist across a disk I/O.  And I don't see
any way to avoid that.

One possible alternative to inserting promises into the index pages
themselves might be to use some kind of heavyweight lock.  The way
that SIREAD locks work is not entirely dissimilar to what's needed
here, I think.  Of course, the performance implications of checking
for lots of extra locks during inserts could be pretty bad, so you'd
probably need some way of avoiding that in common cases, which I don't
know exactly how to do, but maybe there's a way.

How about an 'Expensive bit' (of course, renamed to sound more 
professional and to better indicate what it does!) - if the bit is set, 
then do the expensive processing. This should have minimal impact for 
the common case, so extensive checking would only be required when lots 
of locks need to be checked.


I strongly suspect that the situation, is way more complicated, than I 
imply above - but possibly, a more sophisticated version of the above 
might help?

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-04 Thread Bruce Momjian

On Tue, Jan  8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote:
 
 On 01/08/2013 08:08 PM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ...  And I don't especially like the idea of trying to
 make it depend directly on the box's physical RAM, for the same
 practical reasons Robert mentioned.
 For the record, I don't believe those problems would be particularly
 hard to solve.
 Well, the problem of find out the box's physical RAM is doubtless
 solvable if we're willing to put enough sweat and tears into it, but
 I'm dubious that it's worth the trouble.  The harder part is how to know
 if the box is supposed to be dedicated to the database.  Bear in mind
 that the starting point of this debate was the idea that we're talking
 about an inexperienced DBA who doesn't know about any configuration knob
 we might provide for the purpose.
 
 I'd prefer to go with a default that's predictable and not totally
 foolish --- and some multiple of shared_buffers seems like it'd fit the
 bill.
 
 +1. That seems to be by far the biggest bang for the buck. Anything
 else will surely involve a lot more code for not much more benefit.

I have developed the attached patch which implements an auto-tuned
effective_cache_size which is 4x the size of shared buffers.  I had to
set effective_cache_size to its old 128MB default so the EXPLAIN
regression tests would pass unchanged.

I considered a new available_ram variable but that just gives us another
variable, and in a way shared_buffers is a fixed amount, while
effective_cache_size is an estimate, so I thought driving everything
from shared_buffers made sense.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 23ebc11..de2374b
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include 'filename'
*** 2758,2764 
  
 para
  Random access to mechanical disk storage is normally much more expensive
! than four-times sequential access.  However, a lower default is used
  (4.0) because the majority of random accesses to disk, such as indexed
  reads, are assumed to be in cache.  The default value can be thought of
  as modeling random access as 40 times slower than sequential, while
--- 2758,2764 
  
 para
  Random access to mechanical disk storage is normally much more expensive
! than four times sequential access.  However, a lower default is used
  (4.0) because the majority of random accesses to disk, such as indexed
  reads, are assumed to be in cache.  The default value can be thought of
  as modeling random access as 40 times slower than sequential, while
*** include 'filename'
*** 2841,2849 
listitem
 para
  Sets the planner's assumption about the effective size of the
! disk cache that is available to a single query.  This is
! factored into estimates of the cost of using an index; a
! higher value makes it more likely index scans will be used, a
  lower value makes it more likely sequential scans will be
  used. When setting this parameter you should consider both
  productnamePostgreSQL/productname's shared buffers and the
--- 2841,2857 
listitem
 para
  Sets the planner's assumption about the effective size of the
! disk cache that is available to a single query.  The default
! setting of -1 selects a size equal to four times the size of xref
! linkend=guc-shared-buffers, but not less than the size of one
! shared buffer page, typically literal8kB/literal.  This value
! can be set manually if the automatic choice is too large or too
! small.
!/para
! 
!para
! This value is factored into estimates of the cost of using an index;
! a higher value makes it more likely index scans will be used, a
  lower value makes it more likely sequential scans will be
  used. When setting this parameter you should consider both
  productnamePostgreSQL/productname's shared buffers and the
*** include 'filename'
*** 2855,2862 
  memory allocated by productnamePostgreSQL/productname, nor
  does it reserve kernel disk cache; it is used only for estimation
  purposes.  The system also does not assume data remains in
! the disk cache between queries.  The default is 128 megabytes
! (literal128MB/).
 /para
/listitem
   /varlistentry
--- 2863,2872 
  memory allocated by productnamePostgreSQL/productname, nor
  does it reserve kernel disk cache; it is used only for

Re: [HACKERS] Frontend/backend protocol improvements proposal (request).

2013-09-04 Thread Noah Misch

On Fri, Jun 21, 2013 at 12:37:32PM +0400, Dmitriy Igrishin wrote:
 2013/6/21 Albe Laurenz laurenz.a...@wien.gv.at
  Dmitriy Igrishin wrote:
   While developing a C++ client library for Postgres I felt lack of extra
   information in command tags in the CommandComplete (B) message
   for the following commands:
 PREPARE;
 DEALLOCATE;
 DECLARE;
 CLOSE;
 LISTEN;
 UNLISTEN;
 SET;
 RESET.
   Namely, for example, users of my library can prepare statements by using
   protocol directly or via PREPARE command. Since the protocol does not
   supports prepared statement deallocation, I wrote a wrapper over
  DEALLOCATE
   command. The library knows about all prepared statements and
   invalidates them automatically when user performs deallocate() wrapper.
   But users can go with DEALLOCATE command directly and in these cases
   I need to query the database to get the list of currently prepared
  statements
   whenever CommandComplete message with DEALLOCATE command tag
   is consumed. Moreover, I need to do it *synchronously* and this breaks
   asynchronous API.
   I propose to include name of the object in the CommandComplete (B)
   message for the above commands.
 
  That would be a change in the protocol, so it's not likely to happen
  soon.  There is a page where proposed changes to the wire protocol
  are collected: http://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes
 
 Well, even if this proposal moves to the TODO, it would be nice.

That's worth at least considering when we start to revise the protocol, so I
have added it to the TODO list.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-04 Thread Bruce Momjian

On Tue, Sep  3, 2013 at 09:32:44PM -0400, Bruce Momjian wrote:
 In this test, SELECT NULL (which internally would produce SELECT
 ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting
 returns false.
 
 This has made me adjust my goal and change it so SELECT ROW(NULL) IS
 NULL returns true, and any further nesting returns false.
 
 Attached is a patch which accomplishes this, and a documentation update.

I have not heard any feedback on this patch, so I would like to apply it
to give us a nested ROW/IS NULL API we can document.  It would have to
be marked in the release notes as a backward incompatibility.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-09-04 Thread Quan Zongliang


On 07/23/2013 09:42 PM, Craig Ringer wrote:

(Replying on phone, please forgive bad quoting)

Isn't this pretty much what adopting ICU is supposed to give us? OS-independent 
collations?

I'd be interested in seeing the rest data for this performance report, partly 
as I'd like to see how ICU collations would compare when ICU is crudely hacked 
into place for testing.


I think of a new idea.
Add a compare method column to pg_collation.
Every collation has its own compare function or null.
When function varstr_cmp is called, if specified collation
has compare function, call it instead of strcoll().

How about this?

Regards.

Quan Zongliang



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown

2013-09-04 Thread Fujii Masao

On Wed, Sep 4, 2013 at 11:56 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-04 10:53:19 -0400, Tom Lane wrote:
 Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes:
  I propose a tiny fix to clarify this.  Please find the attached patch.

 That's not an accurate description of what happens, though.
 AFAIR, we do not throw away pg_stats files as a result of recovery.

 StartupXLOG() does a pgstat_reset_all() in the if (InRecovery) branch.

Yes. And this causes one problem that the statistics data would be reset
at the start after the clean shutdown of the standby server.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???

2013-09-04 Thread wangshuo


于 2013-09-04 19:30, Noah Misch 回复:
On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn 
wrote:
I find that it takes a long time when I increase the scale of a 
numeric

datatype.
By checking the code, I found that's because it needs to rewrite 
that

table's file.
After checking that table's data file, I found only parameter 
n_header

changed.
And, I found the data in that numeric field never changed.
So I thank It's not necessary to rewrite the table's file in this 
case.




Noah Misch n...@leadboat.com wrote:
n_header is part of the numeric field's data.  That's not just 
pedantry: the
display scale stored in n_header affects how numeric_out() formats 
the value.


Thanks for your reply.

Just because of what you said, I think increasing scale only lead to 
differently

diaplay. There's no difference between 5.25 and 5.2500 in use.
So thers's no need to rewrite the table.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Satoshi Nagayasu


(2013/09/05 3:50), Pavel Stehule wrote:

  we very successfully use a tmpfs volume for pgstat files (use a
backport
  of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.


and just idea - can we use a database for storing these files. It can be
used in unlogged tables. Second idea - hold a one bg worker as
persistent memory key value database and hold data in memory with some
optimizations - using anti cache and similar memory database fetures.


Yeah, I'm interested in this idea too.

If the stat collector has a dedicated connection to the backend in
order to store statistics into dedicated tables, we can easily take
advantages of index (btree, or hash?) and heap storage.

Is this worth trying?

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-04 Thread wangshuo


于 2013-09-05 01:56, Robert Haas 回复:

On Tue, Sep 3, 2013 at 3:13 AM,  wangs...@highgo.com.cn wrote:

于 2013-09-03 08:15, David Johnston 回复:


Jeff Davis-8 wrote


Is there any semantic difference between marking a constraint as
DISABLED and simply dropping it? Or does it just make it easier to
re-add it later?




David Johnston wrote:


I cannot answer the question but if there is none then the main 
concern

I'd
have is capturing meta-information about WHY such a constraint 
has been

disabled instead of dropped.



Drop/build and disable/enable constraint has no fundamental 
difference,
and could achieve the same purpose.What I do also more convenient 
for the

user.
Recording the disabled constraints is easier than recoding all the
constrains.
What's more, a lot of people ever asked about turing off constraint 
and

The sql2008 support this.So I think it's necessary in some ways.


Please add your patch to the upcoming CommitFest so we don't forget 
about it.


https://commitfest.postgresql.org/action/commitfest_view/open

Please see also https://wiki.postgresql.org/wiki/Submitting_a_Patch

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


I had committed the patch to the Server Features
(https://commitfest.postgresql.org/action/commitfest_view/open).
Is this right ? If not, please give me more advice,thanks !


 Wang Shuo
 HighGo Software Co.,Ltd.
 September 5, 2013


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Satoshi Nagayasu


(2013/09/05 3:59), Alvaro Herrera wrote:

Tomas Vondra wrote:


My idea was to keep the per-database stats, but allow some sort of
random access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
index - a mapping of OID to position in the stat file.

I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

   1: get position from the index
   2: read sizeof(Entry) from the file
   3: if it's update, just overwrite the bytes, for delete set isdeleted
  flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).


Sounds reasonable.  However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array.  If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.


I thought an array structure at first.

But, for now, I think we should have a real index for the
statistics data because we already have several index storages,
and it will allow us to minimize read/write operations.

BTW, what kind of index would be preferred for this purpose?
btree or hash?

If we use btree, do we need range scan thing on the statistics
tables? I have no idea so far.

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Atri Sharma



Sent from my iPad

On 05-Sep-2013, at 8:58, Satoshi Nagayasu sn...@uptime.jp wrote:

 (2013/09/05 3:59), Alvaro Herrera wrote:
 Tomas Vondra wrote:
 
 My idea was to keep the per-database stats, but allow some sort of
 random access - updating / deleting the records in place, adding
 records etc. The simplest way I could think of was adding a simple
 index - a mapping of OID to position in the stat file.
 
 I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
 something like that. This would make it quite simple to access existing
 record
 
   1: get position from the index
   2: read sizeof(Entry) from the file
   3: if it's update, just overwrite the bytes, for delete set isdeleted
  flag (needs to be added to all entries)
 
 or reading all the records (just read the whole file as today).
 
 Sounds reasonable.  However, I think the index should be a real index,
 i.e. have a tree structure that can be walked down, not just a plain
 array.  If you have a 400 MB stat file, then you must have about 4
 million tables, and you will not want to scan such a large array every
 time you want to find an entry.
 
 I thought an array structure at first.
 
 But, for now, I think we should have a real index for the
 statistics data because we already have several index storages,
 and it will allow us to minimize read/write operations.
 
 BTW, what kind of index would be preferred for this purpose?
 btree or hash?
 
 If we use btree, do we need range scan thing on the statistics
 tables? I have no idea so far.
 

The thing I am interested in is range scan. That is the reason I wish to 
explore range tree usage here, maybe as a secondary index.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-04 Thread Fujii Masao

On Thu, Sep 5, 2013 at 1:55 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-04 09:23:20 -0700, Jeff Davis wrote:
 On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote:
  Jeff Davis pg...@j-davis.com writes:
   I think code complexity matters quite a lot. If we can eliminate some
   complex code in a complex area, and all we give up is a feature with
   essentially no use case, that sounds like we're moving in the right
   direction to me.
 
  Isn't this whole discussion academic in view of Andres' point?

 Maybe complex code was an overstatement. We'd be able to eliminate the
 XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of
 XLogCtlInsert; and make xlog.c slightly shorter in the process.

 That path is also executed during a normal restart and during
 promotion. Check the invocation of UpdateFullPageWrites() in
 StartupXLOG(). Note that a standby needs to be able to follow a
 primaries full_page_writes setting during a promotion.

Yes, this is required for the backup from the standby.

If we make the GUC contect to PGC_POSTMASTER, I think that
we can remove XLOG_FPW_CHANGE and treat full_page_writes
the same way as wal_level, max_connections, i.e., the parameter
which CheckRequiredParameterValues() handles.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric？

2013-09-04 Thread wangshuo


于 2013-09-04 23:41, Jeff Janes 回复:

On Tue, Sep 3, 2013 at 9:08 PM,  wangs...@highgo.com.cn wrote:

Hi, Hackers!

I find that it takes a long time when I increase the scale of a 
numeric

datatype.
By checking the code, I found that's because it needs to rewrite 
that

table's file.
After checking that table's data file, I found only parameter 
n_header

changed.
And, I found the data in that numeric field never changed.
So I thank It's not necessary to rewrite the table's file in this 
case.


Anyone has more idea about this, please come to talk about this!




Jeff Janes jeff.ja...@gmail.com wrote:

This was fixed in version 9.2.  You must be using an older version.

Cheers,

Jeff


Thanks for your reply.

To declare a column of type numeric use the syntax:
NUMERIC(precision, scale).
What I said is this scale,not yours.


I made a test on PG9.2, as fellow:
postgres=# select version();
   version
--
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 
4.7.2 20121109 (Red Hat 4.7.2-8), 64-bit

(1 row)

postgres=# create table tt(t1 numeric(10,5));
CREATE TABLE
postgres=# insert into tt values (2.56);
INSERT 0 1
postgres=# select pg_relation_filepath('tt');
 pg_relation_filepath
--
 base/12914/16384
(1 row)

postgres=# alter table tt alter COLUMN t1 type numeric(10,6);
ALTER TABLE
postgres=# select pg_relation_filepath('tt');
 pg_relation_filepath
--
 base/12914/16387
(1 row)

So thers's no need to rewrite the table.

 Wang Shuo
 HighGo Software Co.,Ltd.
 September 5, 2013






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-04 Thread Alvaro Herrera

Satoshi Nagayasu wrote:

 But, for now, I think we should have a real index for the
 statistics data because we already have several index storages,
 and it will allow us to minimize read/write operations.
 
 BTW, what kind of index would be preferred for this purpose?
 btree or hash?

I find it hard to get excited about using the AM interface for this
purpose.  To me it makes a lot more sense to have separate, much
simpler code.  We don't need any transactionality, user defined types,
user defined operators, or anything like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

88 matches

Mail list logo