date:20130722

Re: [HACKERS] Wal sync odirect

2013-07-22 Thread Craig Ringer

On 07/21/2013 10:01 PM, Миша Тюрин wrote:
 hi, list. there are my proposal. i would like to tell about odirect in wal 
 sync in wal_level is higher than minimal. i think in my case when wal traffic 
 is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe 
 ssd under wal) - there would be better if wall traffic could not harm os 
 memory eviction. and i do not use streaming. my archive command may read wal 
 directly without os cache. just opinion, i have not done any tests yet. but i 
 am still under the some memory eviction anomaly.

PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)

Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.

I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-22 Thread Markus Wanner

On 07/22/2013 12:11 AM, Hannu Krosing wrote:
 Dropping this barrier by installing an untrusted PL (or equally insecure
 extensions), an attacker with superuser rights can trivially gain
 root.
 Could you elaborate ?
 
 This is equivalent to claiming that any linux user can trivially gain root.

Uh. Sorry, you're of course right, the attacker can only gain postgres
rights in that case. Thanks for correcting.

The point still holds. It's another layer that an attacker would have to
overcome.

 You already mentioned untrusted PL languages, and I don't see any
 difference in between offering PL/pythonu and PL/C on security grounds,
 really.
 I agree. However, this also means that any kind of solution it offers is
 not a good one for the security conscious sysadmin.
 This is usually the case with a security conscious sysadmin - they very
 seldom want to install anything.

Exactly. That's why I'm favoring solutions that don't require any
extension and keep the guarantee of preventing arbitrary native code.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.

2013-07-22 Thread Kyotaro HORIGUCHI

Hello, I'm very sory to take your time on this mess.

ul 2013 16:06:11 +0530, Amit Kapila amit.kap...@huawei.com wrote in 
014201ce7bc6$f71eb950$e55c2bf0$@kap...@huawei.com
 I understood your patch's algorithm, but still I have doubt in my mind that
 if the next analyze can correct the estimates,
 Why would that be not sufficient. Please refer my last mail for analysis of
 same 
 http://www.postgresql.org/message-id/000601ce77ad$7d3388e0$779a9aa0$@kapila@
 huawei.com

Hmm. I've reconfirmed what was happened on my test set.

As the result, the misestimation with dead_tup = 0 which I
thought to observe has turned out to be an illusion..

Tuple number estimation is working as it is expected.

I withdraw this patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Wal sync odirect

2013-07-22 Thread Миша Тюрин


i tell about wal_level is higher than MINIMAL


wal_level != minimal
http://doxygen.postgresql.org/xlogdefs_8h_source.html

48   * Because O_DIRECT bypasses the kernel buffers, and because we never
49   * read those buffers except during crash recovery or if wal_level != 
minimal 

 hi, list. there are my proposal. i would like to tell about odirect in wal 
 sync in wal_level is higher than minimal. i think in my case when wal 
 traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache 
 (or maybe ssd under wal) - there would be better if wall traffic could not 
 harm os memory eviction. and i do not use streaming. my archive command may 
 read wal directly without os cache. just opinion, i have not done any tests 
 yet. but i am still under the some memory eviction anomaly.

PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)

Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.

I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.

-- 
 Craig Ringer  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [HACKERS] Wal sync odirect

2013-07-22 Thread Craig Ringer

On 07/22/2013 03:30 PM, Миша Тюрин wrote:
 
 i tell about wal_level is higher than MINIMAL

OK, so you want to be able to force O_DIRECT for wal_level = archive ?

I guess that makes sense if you expect the archive_command to read the
file out of the RAID controller's write cache before it gets flushed and
your archive_command can also use direct I/O to avoid pulling it into cache.

You already know where to change to start experimenting with this. What
exactly are you trying to ask? I don't see any risk in forcing O_DIRECT
for higher wal_level, but I'm not an expert in WAL and recovery. I'd
recommend testing on a non-critical PostgreSQL instance.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding new joining alghoritm to postgresql

2013-07-22 Thread Craig Ringer

On 07/19/2013 09:47 PM, tubadzin wrote:
 Hi. I'm a little confused.  

 1.I have source code 9.2.4. version from
 http://www.postgresql.org/ftp/source/   

2.I want to add new alghoritm to
 index nested loops join, merge join and hash join. I have Executor
 catalog in src catalag containing nodeHash.c, nodeHasjoin.c,
 nodeMergejoin and nodeNestloop.c  

 3.After changes, I want to compile
 postgresql and use it.  

 4.Problem is:

 a)I do not know which library is
 responsible for this functionality. I understand, that I have to compile
 src and replace library (I don't know which library) in path where
 Postgresql in installed: C:\Program Files (x86)\PostgreSQL\9.2  

 b)I
 don't know how use files/library (which library?) with visual studio
 2010 and how compile it.


Start here:

http://www.postgresql.org/docs/current/static/install-windows-full.html

You don't need to install all the dependencies when you're just
compiling a copy for testing.

You might find this tool I wrote a while ago interesting, it tries to
automate downloading and compiling dependencies, creation of config.pl, etc:

https://github.com/2ndQuadrant/pg_build_win


Once you've successfully compiled PostgreSQL, start reading the
planner/executor sources. You will find this documentation quite useful
when trying to understand the code:

http://www.postgresql.org/docs/current/static/internals.html

as well as:

http://www.postgresql.org/files/developer/tour.pdf
http://momjian.us/main/presentations/internals.html

Working with the query planner and adding node types is NOT the easiest
way to get started with the PostgreSQL source code, though! You will
have a lot of learning ahead of you.

Consider trying to explain in greater detail what your idea is. See if
anybody here has already tried it, make sure you're not exploring a dead
end. Get ideas and suggestions on how to approach the problem before you
start work on it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-22 Thread Craig Ringer

On 07/22/2013 12:17 PM, Quan Zongliang wrote:
 Hi hackers,
 
 I tried to improve performance when database is Chinese.
 
 Under openSUSE, create index on table with 54996 rows
 locale=C, 140ms
 locale=zh_CN, 985ms
 
 I think the function strcoll() of Linux is too slow.
 So, I made a new utf8 to GB18030 map, store Chinese order in it.
 Do not call strcoll().
 On my modified code, same operation, locale=zh_CN, 203ms.

It might be worth looking at gcc's strcoll() implementation. See if it
performs better when you use the latest gcc, and if not try to improve
gcc's strcoll() .

I'd be interested in seeing a test case for this that shows that the
results of your new collation are exactly the same as the original
strcoll() based approach.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-22 Thread KONDO Mitsumasa


(2013/07/19 22:48), Greg Smith wrote:

On 7/19/13 3:53 AM, KONDO Mitsumasa wrote:

Recently, a user who think system availability is important uses
synchronous replication cluster.


If your argument for why it's OK to ignore bounding crash recovery on the master
is that it's possible to failover to a standby, I don't think that is
acceptable.  PostgreSQL users certainly won't like it.
OK. I will also test recovery time. However, I consider more good practice now, I 
test it with new patch.



I want you to read especially point that is line 631, 651, and 656.
MAX_WRITEBACK_PAGES is 1024 (1024 * 4096 byte).


You should read http://www.westnet.com/~gsmith/content/linux-pdflush.htm to
realize everything you're telling me about the writeback code and its congestion
logic I knew back in 2007.  The situation is even worse than you describe,
because this section of Linux has gone through multiple, major revisions since
then.  You can't just say here is the writeback source code; you have to
reference each of the commonly deployed versions of the writeback feature to 
tell
how this is going to play out if released.  There are four major ones I pay
attention to.  The old kernel style as see in RHEL5/2.6.18--that's what my 2007
paper discussed--the similar code but with very different defaults in 2.6.22, 
the
writeback method/tuning in RHEL6/Debian Squeeze/2.6.32, and then there are newer
kernels.  (The newer ones separate out into a few branches too, I haven't mapped
those as carefully yet)
The writeback source code which I indicated part of writeback is almost same as 
community kernel (2.6.32.61). I also read linux kernel 3.9.7, but it is almost 
same this part. I think that fs-writeback.c is easier than xlog.c. It is only 
1309 steps. I think that linux distributions are only different about tuning 
parameter, but same as programing logic. Do you think to need reading debian 
kernel source code? I will read part of this code, because it is only scores of 
steps at most.



 There are some examples of what really bad checkpoints look
like in
http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf
if you want to see some of them.  That's the talk I did around the same time I
was trying out spreading the database fsync calls out over a longer period.
Does it cause in ext3 or 4 file system? I think this is bug of XFS. If fsync call 
doesn't return,

it indicate cannot writing WAL and not return their commit. It is seriously 
problem.

My fsync patch is only sleep returned succece of fsync and maximum sleep time is 
set to 10 seconds. It does not cause bad for this problem.



When I did that, checkpoints became even less predictable, and that was a major
reason behind why I rejected the approach.  I think your suggestion will have 
the
same problem.  You just aren't generating test cases with really large write
workloads yet to see it.  You also don't seem afraid of how exceeding the
checkpoint timeout is a very bad thing yet.
I think it is important that why this problem was caused. We should try to find 
the cause of which program has bug or problem.



In addition, if you set a large value of a checkpoint_timeout or
checkpoint_complete_taget, you have said that performance is improved,
but is it true in all the cases?


The timeout, yes.  Throughput is always improved by increasing
checkpoint_timeout.  Less checkpoints per unit of time increases efficiency.
Less writes of the most heavy accessed buffers happen per transaction.  It is
faster because you are doing less work, which on average is always faster than
doing more work.  And doing less work usually beats doing more work, but doing 
it
smarter.

If you want to see how much work per transaction a test is doing, track the
numbers of buffers written at the beginning/end of your test via
pg_stat_bgwriter.  Tests that delay checkpoints will show a lower total number 
of
writes per transaction.  That seems more efficient, but it's efficiency mainly
gained by ignoring checkpoint_timeout.

OK. In next test, I will try it.


When a checkpoint complication target is actually enlarged,
performance may fall in some cases. I think this as the last fsync
having become heavy owing to having write in slowly.


I think you're confusing throughput and latency here.  Increasing the checkpoint
timeout, or to a lesser extent the completion target, on average that increases
throughput.  It results in less work, and the more/less work amount is much more
important than worrying about scheduler details.  Now matter how efficient a
given write is, whether you've sorted it across elevator horizon boundary A or
boundary B, it's better not do it at all.
I think fsync which has longest time or continues a lot block other transactions. 
And my patch not only improvement of throughput but also realize stable response 
time at fsync phase in checkpoint.



By the way:  if you have a theory like the last fsync having become heavy for
why something is

Re: [HACKERS] improve Chinese locale performance

2013-07-22 Thread Quan Zongliang


On 07/22/2013 03:54 PM, Craig Ringer wrote:

On 07/22/2013 12:17 PM, Quan Zongliang wrote:

Hi hackers,

I tried to improve performance when database is Chinese.

Under openSUSE, create index on table with 54996 rows
locale=C, 140ms
locale=zh_CN, 985ms

I think the function strcoll() of Linux is too slow.
So, I made a new utf8 to GB18030 map, store Chinese order in it.
Do not call strcoll().
On my modified code, same operation, locale=zh_CN, 203ms.


It might be worth looking at gcc's strcoll() implementation. See if it
performs better when you use the latest gcc, and if not try to improve
gcc's strcoll() .

I'd be interested in seeing a test case for this that shows that the
results of your new collation are exactly the same as the original
strcoll() based approach.


Do not same exactly.
I found some errors in gcc's strcoll() when order by Chinese character.
Because there are lots of special characters in Chinese.
gcc's strcoll() do not consider this or missed at part of them.

Yes, the best way is to impove gcc's strcoll().
But I don't know how to do.


Thanks,
Quan Zongliang



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-22 Thread KONDO Mitsumasa


(2013/07/21 4:37), Heikki Linnakangas wrote:

Mitsumasa-san, since you have the test rig ready, could you try the attached
patch please? It scans the buffer cache several times, writing out all the dirty
buffers for segment A first, then fsyncs it, then all dirty buffers for segment
B, and so on. The patch is ugly, but if it proves to be helpful, we can spend 
the
time to clean it up.

Thank you! It is interesting code, I test it.

By the way, my campany's colleague helps us to testing. If you have other idea, 
please send me patch or methods.


Best regards,
--
Mitsumasa KONDO


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] small typo in src/backend/access/transam/xlog.c

2013-07-22 Thread didier

Hi

in void
BootStrapXLOG(void)

  * to seed it other than the system clock value...)  The upper half of
the
 * uint64 value is just the tv_sec part, while the lower half is
the XOR
 * of tv_sec and tv_usec.  This is to ensure that we don't lose
uniqueness
 * unnecessarily if uint64 is really only 32 bits wide.  A person
 * knowing this encoding can determine the initialization time of
the
 * installation, which could perhaps be useful sometimes.
 */
gettimeofday(tv, NULL);
sysidentifier = ((uint64) tv.tv_sec)  32;
sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec);

should be
sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec);

Regards
Didier

[HACKERS] enum-ify resource manager's xl_info values

2013-07-22 Thread Andres Freund

Hi,

Would somebody object to making the rmgr's invo value #defines like:

/* XLOG info values for XLOG rmgr */
#define XLOG_CHECKPOINT_SHUTDOWN0x00
#define XLOG_CHECKPOINT_ONLINE  0x10
#define XLOG_NOOP   0x20
#define XLOG_NEXTOID0x30
#define XLOG_SWITCH 0x40
#define XLOG_BACKUP_END 0x50
#define XLOG_PARAMETER_CHANGE   0x60
#define XLOG_RESTORE_POINT  0x70
#define XLOG_FPW_CHANGE 0x80
#define XLOG_END_OF_RECOVERY0x90
#define XLOG_FPI0xA0

into enums? We already have a bunch of places looking at those values
and if/when changeset extraction makes it in, it's going to be one
more. Having the compiler tell you where a new value should have been
been added as well helps.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-22 Thread Sawada Masahiko

On Fri, Jul 19, 2013 at 6:53 PM, Andres Freund and...@2ndquadrant.com wrote:
 So you can just do stuff like:

 server.foo.grand_unified_config = value.

it looks good to me too. when server parse values which is written in
postgresql.conf, server handles those parameter as item list value.
after that, those parameter overwrite to corresponding values. so, we
can allocate area of those values dynamically when server first parse
those values. (of cause we should count number of servers)
but I think that when server setting value is no set in order by
server name, parsing is a little difficult. we should check whether
server name has already registered at all time.

for example :
 postgresql.conf -
server.AAA.wal_sender_timeout = 60
server.BBB.synchronous_transfer = data_flush
server.AAA.synchronous_transfer = all
 postgresql.conf -
what do you think?

and if the parameter(e.g., wal_sender_timeout)  is not set in
configure file(or postgresql,conf) like above, what value should we
set value to parameter? and how we handle originally
wal_sender_timeout? will it leave? or delete?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-22 Thread Peter Eisentraut

On 7/22/13 3:54 AM, Craig Ringer wrote:
 It might be worth looking at gcc's strcoll() implementation. See if it
 performs better when you use the latest gcc, and if not try to improve
 gcc's strcoll() .

I think part of the problem is that we call strcoll for each comparison,
instead of doing strxfrm once for each datum and then just strcmp for
each comparison.  That is effectively equivalent to what the proposal
implements.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX checking of index constraints

2013-07-22 Thread k...@rice.edu

On Sun, Jul 21, 2013 at 11:30:54AM -0700, Josh Berkus wrote:
 Noah,
 
  Attached patch just restores the old behavior.  Would it be worth preserving
  the ability to fix an index consistency problem with a REINDEX independent
  from related heap consistency problems such as duplicate keys?
 
 I would love to have two versions of REINDEX, one which validated and
 one which didn't.   Maybe a ( validate off ) type check?
 
+1 There are reasons to reindex that do not involve its validity and it would
be great to not need to visit the heap for that.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enum-ify resource manager's xl_info values

2013-07-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 Would somebody object to making the rmgr's invo value #defines like:
 into enums?

I think that will create more problems than it fixes.  For one thing,
the same field is used to store values that would need to be multiple
independent enum types; and we also store additional bits into that
field.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enum-ify resource manager's xl_info values

2013-07-22 Thread Andres Freund

On 2013-07-22 08:53:53 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Would somebody object to making the rmgr's invo value #defines like:
  into enums?
 
 I think that will create more problems than it fixes.  For one thing,
 the same field is used to store values that would need to be multiple
 independent enum types; and we also store additional bits into that
 field.

Oh, I don't want to change the definition of XLogRecord or such. I just
want to make the series of #defines an enum so you can write something
like
inf = record-xl_info  ~XLR_INFO_MASK;
switch ((XLogXactRecordType) info)
{
case ;
}

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-22 Thread Greg Stark

On Fri, Jul 19, 2013 at 2:20 PM, Samrat Revagade
revagade.sam...@gmail.com wrote:

 for example:
 if i want to configure 2 servers then it will add 6 lines,for 3 -9, for 4-12
 setting's for particular server will be:

 considering the way of setting value to conf parameters  : guc . value

 standby_name.'AAA'
 synchronous_transfer. commit
 wal_sender_timeout.60


I have a feeling Samrat and Sawada-san have some good use cases where
this extra syntax could be a big step up in expressiveness. But I'm
having a hard time figuring out exactly what they have in mind. If
either of you could explain in more detail how the extra syntax would
apply to your use case and how it would let you do something that you
can't already do it might be helpful.

I'm assuming the idea is something like having a single config file
which can work for the server regardless of whether it's acting as the
primary or standby and then be able to switch roles by switching a
single flag which would control which parts of the config file were
applied. But I'm having a hard time seeing how exactly that would
work.
-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [HACKERS] Wal sync odirect

2013-07-22 Thread Cédric Villemain

Le lundi 22 juillet 2013 09:39:50, Craig Ringer a écrit :
 On 07/22/2013 03:30 PM, Миша Тюрин wrote:
  
  i tell about wal_level is higher than MINIMAL
 
 OK, so you want to be able to force O_DIRECT for wal_level = archive ?
 
 I guess that makes sense if you expect the archive_command to read the
 file out of the RAID controller's write cache before it gets flushed and
 your archive_command can also use direct I/O to avoid pulling it into cache.
 
 You already know where to change to start experimenting with this. What
 exactly are you trying to ask? I don't see any risk in forcing O_DIRECT
 for higher wal_level, but I'm not an expert in WAL and recovery. I'd
 recommend testing on a non-critical PostgreSQL instance.

IIRC there is also some fadvise() call to flush the buffer cache when using 
'minimal', but not when using archiving of WAL.
I'm unsure how this has been tunned with streaming replication addition.

see xlog.c|h

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-22 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 I do find the logic and variable names a bit confusing so I'm tempted
 to add some comments based on my initial confusion. And I'm tempted to
 add an ordinalityAttNum field to the executor node so we don't need to
 make these odd scanslot means this unless we have ordinality in which
 case it means that and funcslot means this logic.

I haven't read this patch, but that comment scares the heck out of me.
Even if the patch isn't broken today, it will be tomorrow, if it's
making random changes like that in data structure semantics.
Also, if you're confused, so will be everybody else who has to deal with
the code later --- so I don't think fixing the comments and variable
names is optional.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] changeset generation v5-01 - Patches git tree

2013-07-22 Thread Robert Haas

On Tue, Jul 16, 2013 at 9:00 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jul 7, 2013 at 4:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-07 15:43:17 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  3b) Add catcache 'filter' that ensures the cache stays unique and use
  that for the mapping

  I slightly prefer 3b) because it's smaller, what's your opinions?

 This is just another variation on the theme of kluging the catcache to
 do something it shouldn't.  You're still building a catcache on a
 non-unique index, and that is going to lead to trouble.

 I don't think the lurking dangers really are present. The index
 essentially *is* unique since we filter away anything non-unique. The
 catcache code hardly can be confused by tuples it never sees. That would
 even work if we started preloading catcaches by doing scans of the
 entire underlying relation or by caching all of a page when reading one
 of its tuples.

 I can definitely see that there are aesthetical reasons against doing
 3b), that's why I've also done 3a). So I'll chalk you up to voting for
 that...

 I also vote for (3a).  I did a quick once over of 1, 2, and 3a and
 they look reasonable.  Barring strenuous objections, I'd like to go
 ahead and commit these, or perhaps an updated version of them.

Hearing no objections, I have done this.  Per off-list discussion with
Andres, I also included patch 4, which gives us regression test
coverage for this code, and have fixed a few bugs and a bunch of
stylistic things that bugged me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-22 Thread Greg Stark

So the more I look at this patch the fewer things I want to change in
it. I've several times thought I should make an improvement and then
realized I was really just making unnecessary tweaks that didn't
really make much difference.

It seems a shame that the node has to call the function and get back a
slot only to laboriously copy over all the attributes into a new slot.
Worse, it's actually storing all the original tuples in a tuplestore
without the ordinality and adding in the ordinality on output. This
works because the FunctionScan only supports rescan, not mark/restore
so it can easily recalculate them consistently if the tuplestore is
rescanned. I looked into what it would take to get the ordinality
added on the original scan and it would have to go so deep that it
doesn't really seem worthwhile.

I do find the logic and variable names a bit confusing so I'm tempted
to add some comments based on my initial confusion. And I'm tempted to
add an ordinalityAttNum field to the executor node so we don't need to
make these odd scanslot means this unless we have ordinality in which
case it means that and funcslot means this logic. That has the side
benefit that if the executor node ever wants to add more attributes it
won't have this assumption that the last column is the ordinality
baked in. I think it'll make the code a bit clearer.

Also, I really think the test cases are excessive. They're mostly
repeatedly testing the same functionality over and over in cases that
are superficially different but I'm fairly certain just invoke the
same codepaths. This will just be an annoyance if we make later
changes that require adjusting the output.

Notably the test that covers the view defintiion appears in pg_views
scares me. We bash around the formatting rules for view definition
outputs pretty regularly. On the other hand it is nice to have
regression tests that make sure these cases are covered. There's been
more than one bug in the past caused by omitting updating these
functions. I'm leaning towards leaving it in but in the long run we
probably need a better solution for this.

Oh, and I'm definitely leaning towards naming the column ordinality.
Given we name columns things like generate_series and sum etc
there doesn't seem to be good reason to avoid doing that here.

All in all though I feel like I'm looking for trouble. It's not a very
complex patch and is definitely basically correct. Who should get
credit for it? David? Andrew? And who reviewed it? Dean? Anyone else?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Greg Smith

Attached is an update that I think sorts out all of the documentation 
concerns.  I broke this section into paragraphs now that it's getting so 
long too.


The only code change is that this now labels the controversial lag here 
average rate limit schedule lag.  That way if someone wants to 
introduce other measures of rate limit lag, like a more transaction 
oriented one, you might call that average rate limit transaction lag 
and tell the two apart.


The rewritten documentation here tries to communicate that there is a 
schedule that acts like it was pre-computed at the start of each client 
too.  It's not ever adjusted based on what individual transactions do. 
I also noted the way this can cause schedule lag for some time after a 
slow transaction finishes, since that's the main issue observed so far.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
new file mode 100644
index 2ad8f0b..4e6b608
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
*** int unlogged_tables = 0;
*** 137,142 
--- 137,148 
  doublesample_rate = 0.0;
  
  /*
+  * When threads are throttled to a given rate limit, this is the target delay
+  * to reach that rate in usec.  0 is the default and means no throttling.
+  */
+ int64 throttle_delay = 0;
+ 
+ /*
   * tablespace selection
   */
  char *tablespace = NULL;
*** typedef struct
*** 202,212 
--- 208,220 
int listen; /* 0 indicates that an 
async query has been
 * sent */
int sleeping;   /* 1 indicates that the 
client is napping */
+   boolthrottling; /* whether nap is for throttling */
int64   until;  /* napping until (usec) */
Variable   *variables;  /* array of variable definitions */
int nvariables;
instr_time  txn_begin;  /* used for measuring 
transaction latencies */
instr_time  stmt_begin; /* used for measuring statement 
latencies */
+   boolis_throttled;   /* whether transaction throttling is 
done */
int use_file;   /* index in sql_files 
for this client */
boolprepared[MAX_FILES];
  } CState;
*** typedef struct
*** 224,229 
--- 232,240 
instr_time *exec_elapsed;   /* time spent executing cmds (per 
Command) */
int*exec_count; /* number of cmd executions 
(per Command) */
unsigned short random_state[3]; /* separate randomness for each 
thread */
+   int64   throttle_trigger;   /* previous/next throttling (us) */
+   int64   throttle_lag;   /* total transaction lag behind 
throttling */
+   int64   throttle_lag_max;   /* max transaction lag */
  } TState;
  
  #define INVALID_THREAD((pthread_t) 0)
*** typedef struct
*** 232,237 
--- 243,250 
  {
instr_time  conn_time;
int xacts;
+   int64   throttle_lag;
+   int64   throttle_lag_max;
  } TResult;
  
  /*
*** usage(void)
*** 356,361 
--- 369,375 
 -N, --skip-some-updates  skip updates of pgbench_tellers 
and pgbench_branches\n
 -P, --progress=NUM   show thread progress report 
every NUM seconds\n
 -r, --report-latencies   report average latency per 
command\n
+-R, --rate SPEC  target rate in transactions per 
second\n
 -s, --scale=NUM  report this scale factor in 
output\n
 -S, --select-onlyperform SELECT-only 
transactions\n
 -t, --transactions   number of transactions each 
client runs 
*** doCustom(TState *thread, CState *st, ins
*** 898,914 
  {
PGresult   *res;
Command   **commands;
  
  top:
commands = sql_files[st-use_file];
  
if (st-sleeping)
{   /* are we 
sleeping? */
instr_time  now;
  
INSTR_TIME_SET_CURRENT(now);
!   if (st-until = INSTR_TIME_GET_MICROSEC(now))
st-sleeping = 0;   /* Done sleeping, go ahead with 
next command */
else
return true;/* Still sleeping, nothing to 
do here */
}
--- 912,973 
  {
PGresult   *res;
Command   **commands;
+   booltrans_needs_throttle = false;
  
  top:
commands =

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-22 Thread Greg Stark

On Mon, Jul 22, 2013 at 4:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I haven't read this patch, but that comment scares the heck out of me.
 Even if the patch isn't broken today, it will be tomorrow, if it's
 making random changes like that in data structure semantics.

It's not making random changes. It's just that it has two code paths,
in one it has the function scan write directly to the scan slot and in
the other it has the function write to a different slot and copies the
fields over to the scan slot. It's actually doing the right thing it's
just hard to tell that at first (imho) because it's using the scan
slots to determine which case applies instead of having a flag
something to drive the decision.

 Also, if you're confused, so will be everybody else who has to deal with
 the code later --- so I don't think fixing the comments and variable
 names is optional.

Well that's the main benefit of having someone review the code in my
opinion. I'm no smarter than David or Andrew who wrote the code and
there's no guarantee I'll spot any bugs. But at least I can observe
myself and see where I have difficulty following the logic which the
original author is inherently not in a position to do.

Honestly this is pretty straightforward and well written so I'm just
being especially careful not having committed anything recently.
-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-22 Thread Greg Stark

On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think part of the problem is that we call strcoll for each comparison,
 instead of doing strxfrm once for each datum and then just strcmp for
 each comparison.  That is effectively equivalent to what the proposal
 implements.

Fwiw I used to be a big proponent of using strxfrm. But upon further
analysis I realized it was a real difficult tradeoff. strxrfm saves
potentially a lot of cpu cost but at the expense of expanding the size
of the sort key. If the sort spills to disk or even if it's just
memory bandwidth limited it might actually be slower than doing the
additional cpu work of calling strcoll.

It's hard to see how to decide in advance which way will be faster. I
suspect strxfrm is still the better bet, especially for complex large
character set based locales like Chinese. strcoll might still win by a
large margin on simple mostly-ascii character sets.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Alvaro Herrera

Greg Smith wrote:

Thanks.  I didn't look at the code, but while trying to read the docs:

 +para
 + High rate limit schedule lag values, that is values not small with
 + respect to the actual transaction latency, indicate that something 
 is
 + amiss in the throttling process.

I couldn't really parse the above.  Of the first six words, which one is
a verb?  Is there a noun that needs to be plural?  Is there a word that
shouldn't be there?

... Oh, I think it makes sense if I assume that rate limit schedule lag
is a single concept .. but if so, that phrase seems too many words for it.
(So when the RLSL values are high, this indicates a problem.  Is that
what the above means?)

Also, it took me a while to understand what values not small means.  I
think there must be a way to phrase this that's easier to understand.

High lag can highlight a subtle
 + problem there even if the target rate limit is met in the end.  One
 + possible cause of schedule lage is insufficient pgbench threads to
 + handle all of the clients.

typo lage above.

   To improve that, consider reducing the
 + number of clients, increasing the number of threads in pgbench, or
 + running pgbench on a separate host.  Another possibility is that the
 + database is not keeping up with the load at some point.  When that
 + happens, you will have to reduce the expected transaction rate to
 + lower schedule lag.
 +/para

Thanks

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Fabien COELHO



Hello Greg,

Thanks for the improvement!


I have a small reservation about finish/end time schedule in the second 
paragraph, or maybe there is something that I do not understand. There is 
no schedule for finishing anything, only start times are scheduled, so I 
wish the text could avoid suggesting that finish time are scheduled.


The rate is targeted by starting transactions along a 
Poisson-distributed schedule time line.  The expected



finish time schedule


- start time schedule

moves forward based on when the client first started, not when 
the previous transaction ended.



That approach means that when transactions go past their original 
scheduled end time, it is possible for later ones to catch up again.


- That approach means that long transactions can result in later 
transactions to be late with respect to the schedule, while short 
transactions makes it possible for late ones to catch up again.


Would you be ok with that?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Fabien COELHO



Hello Alvaro,


Thanks.  I didn't look at the code, but while trying to read the docs:


+para
+ High rate limit schedule lag values, that is values not small with
+ respect to the actual transaction latency, indicate that something is
+ amiss in the throttling process.


I couldn't really parse the above.  Of the first six words, which one is
a verb?


None. High values for the time lag measured with respect to the rate 
limit schedule.


Is there a noun that needs to be plural?  Is there a word that shouldn't 
be there?


I do not think so.


... Oh, I think it makes sense if I assume that rate limit schedule lag
is a single concept .. but if so, that phrase seems too many words for it.
(So when the RLSL values are high, this indicates a problem.  Is that
what the above means?)


Yep!


Also, it took me a while to understand what values not small means.  I
think there must be a way to phrase this that's easier to understand.


That's what we are trying to do, but we still need to be precise. With 
less words it seems more understandable, but previous versions suggested 
that the meaning with ambiguous, that is people put their own intuitive 
definition of lag, which resulted in surprises at the measures and 
cumulative behavior. The alternative was either to change what is 
measured, but I insisted that this measure is the useful one, or to try to 
reduce the ambiguity of the documentation, the result of efforts by Greg  
myself your helping to debug:-)



   High lag can highlight a subtle
+ problem there even if the target rate limit is met in the end.


I'm fine with that, if it is clear from the context that the lag we're 
talking about is the one defined on the preceeding paragraph. Greg what 
do you think?


+ One possible cause of schedule lage is insufficient pgbench threads 
to handle all of the clients.


typo lage above.


Indeed.

  To improve that, consider 
reducing the + number of clients, increasing the number of threads in 
pgbench, or + running pgbench on a separate host.  Another possibility 
is that the + database is not keeping up with the load at some point. 
When that + happens, you will have to reduce the expected transaction 
rate to + lower schedule lag. + /para


Thanks for your help!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-22 Thread Andrew Gierth

Greg Stark said:
 So the more I look at this patch the fewer things I want to change in
 it. I've several times thought I should make an improvement and then
 realized I was really just making unnecessary tweaks that didn't
 really make much difference.

It's almost as though we actually thought about these things when
writing the patch...

 I looked into what it would take to get the ordinality added on the
 original scan and it would have to go so deep that it doesn't really
 seem worthwhile.

A design goal was that no SRF implementation should be affected by the
change, since there are dozens of C-language SRFs in contrib and third-
party modules.

The existence of materialize mode prevents us from changing the
structure of the tuplestore, since we're not the ones allocating it.
The only other approach that seemed possible was to have the tuplestore
code itself add an ordinality, which it would have to do unconditionally
since for materialize-mode SRFs it would have no way to know if one was
required. Doing it in FunctionScan when projecting out tuples seemed much,
much cleaner.

 I do find the logic and variable names a bit confusing so I'm tempted
 to add some comments based on my initial confusion. And I'm tempted to
 add an ordinalityAttNum field to the executor node so we don't need to
 make these odd scanslot means this unless we have ordinality in which
 case it means that and funcslot means this logic. That has the side
 benefit that if the executor node ever wants to add more attributes it
 won't have this assumption that the last column is the ordinality
 baked in. I think it'll make the code a bit clearer.

I admit the (one single) use of checking func_slot for nullity rather
than checking the funcordinality flag is a micro-optimization
(choosing to fetch the value we know we will need rather than a value
which has no other purpose). I thought the comments there were
sufficient; perhaps you could indicate what isn't clear?

Having the ordinality be the last column is required by spec - I'm
sure we could think of pg-specific extensions that would change that,
but why complicate the code now?

 Also, I really think the test cases are excessive. They're mostly
 repeatedly testing the same functionality over and over in cases that
 are superficially different but I'm fairly certain just invoke the
 same codepaths. This will just be an annoyance if we make later
 changes that require adjusting the output.

The majority of the tests are adding an ordinality version to
existing test cases that test a number of combinations of language,
SETOF, and base vs. composite type. These do exercise various different
code paths in expandRTE and thereabouts.

One thing I did do is dike out and replace the entire existing test
sequence that was commented as testing ExecReScanFunctionScan, because
many of the tests in it did not in fact invoke any rescans and
probably hadn't done since 7.4.

 Notably the test that covers the view defintiion appears in pg_views
 scares me. We bash around the formatting rules for view definition
 outputs pretty regularly. On the other hand it is nice to have
 regression tests that make sure these cases are covered. There's been
 more than one bug in the past caused by omitting updating these
 functions. I'm leaning towards leaving it in but in the long run we
 probably need a better solution for this.

There are a dozen of those kinds of tests scattered through the
regression tests (though many use pg_get_viewdef directly rather than
pg_views).

While it might be worth centralizing them somewhere, that's really a
separate issue from this patch, since it also affects aggregates.sql,
window.sql, and with.sql.

 Oh, and I'm definitely leaning towards naming the column ordinality.
 Given we name columns things like generate_series and sum etc
 there doesn't seem to be good reason to avoid doing that here.

I've already set out why I object to this.

 All in all though I feel like I'm looking for trouble. It's not a very
 complex patch and is definitely basically correct. Who should get
 credit for it? David? Andrew? And who reviewed it? Dean? Anyone else?

It was a joint project between David and myself. Credit to Dean for the
thorough review.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-22 Thread Andrew Dunstan



On 07/22/2013 12:49 PM, Greg Stark wrote:

On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut pete...@gmx.net wrote:

I think part of the problem is that we call strcoll for each comparison,
instead of doing strxfrm once for each datum and then just strcmp for
each comparison.  That is effectively equivalent to what the proposal
implements.

Fwiw I used to be a big proponent of using strxfrm. But upon further
analysis I realized it was a real difficult tradeoff. strxrfm saves
potentially a lot of cpu cost but at the expense of expanding the size
of the sort key. If the sort spills to disk or even if it's just
memory bandwidth limited it might actually be slower than doing the
additional cpu work of calling strcoll.

It's hard to see how to decide in advance which way will be faster. I
suspect strxfrm is still the better bet, especially for complex large
character set based locales like Chinese. strcoll might still win by a
large margin on simple mostly-ascii character sets.





Perhaps we need a bit of performance testing to prove the point.

Maybe the behaviour should be locale-dependent.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Greg Smith

Very minor update with V19 here, to reflect Alvaro's comments.  The 
tricky part now reads like this:


High rate limit schedule lag values, that is lag values that are large 
compared to the actual transaction latency, indicate that something is 
amiss in the throttling process.  High schedule lag can highlight a 
subtle problem there even if the target rate limit is met in the end.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
new file mode 100644
index 2ad8f0b..4e6b608
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
*** int unlogged_tables = 0;
*** 137,142 
--- 137,148 
  doublesample_rate = 0.0;
  
  /*
+  * When threads are throttled to a given rate limit, this is the target delay
+  * to reach that rate in usec.  0 is the default and means no throttling.
+  */
+ int64 throttle_delay = 0;
+ 
+ /*
   * tablespace selection
   */
  char *tablespace = NULL;
*** typedef struct
*** 202,212 
--- 208,220 
int listen; /* 0 indicates that an 
async query has been
 * sent */
int sleeping;   /* 1 indicates that the 
client is napping */
+   boolthrottling; /* whether nap is for throttling */
int64   until;  /* napping until (usec) */
Variable   *variables;  /* array of variable definitions */
int nvariables;
instr_time  txn_begin;  /* used for measuring 
transaction latencies */
instr_time  stmt_begin; /* used for measuring statement 
latencies */
+   boolis_throttled;   /* whether transaction throttling is 
done */
int use_file;   /* index in sql_files 
for this client */
boolprepared[MAX_FILES];
  } CState;
*** typedef struct
*** 224,229 
--- 232,240 
instr_time *exec_elapsed;   /* time spent executing cmds (per 
Command) */
int*exec_count; /* number of cmd executions 
(per Command) */
unsigned short random_state[3]; /* separate randomness for each 
thread */
+   int64   throttle_trigger;   /* previous/next throttling (us) */
+   int64   throttle_lag;   /* total transaction lag behind 
throttling */
+   int64   throttle_lag_max;   /* max transaction lag */
  } TState;
  
  #define INVALID_THREAD((pthread_t) 0)
*** typedef struct
*** 232,237 
--- 243,250 
  {
instr_time  conn_time;
int xacts;
+   int64   throttle_lag;
+   int64   throttle_lag_max;
  } TResult;
  
  /*
*** usage(void)
*** 356,361 
--- 369,375 
 -N, --skip-some-updates  skip updates of pgbench_tellers 
and pgbench_branches\n
 -P, --progress=NUM   show thread progress report 
every NUM seconds\n
 -r, --report-latencies   report average latency per 
command\n
+-R, --rate SPEC  target rate in transactions per 
second\n
 -s, --scale=NUM  report this scale factor in 
output\n
 -S, --select-onlyperform SELECT-only 
transactions\n
 -t, --transactions   number of transactions each 
client runs 
*** doCustom(TState *thread, CState *st, ins
*** 898,914 
  {
PGresult   *res;
Command   **commands;
  
  top:
commands = sql_files[st-use_file];
  
if (st-sleeping)
{   /* are we 
sleeping? */
instr_time  now;
  
INSTR_TIME_SET_CURRENT(now);
!   if (st-until = INSTR_TIME_GET_MICROSEC(now))
st-sleeping = 0;   /* Done sleeping, go ahead with 
next command */
else
return true;/* Still sleeping, nothing to 
do here */
}
--- 912,973 
  {
PGresult   *res;
Command   **commands;
+   booltrans_needs_throttle = false;
  
  top:
commands = sql_files[st-use_file];
  
+   /*
+* Handle throttling once per transaction by sleeping.  It is simpler
+* to do this here rather than at the end, because so much complicated
+* logic happens below when statements finish.
+*/
+   if (throttle_delay  ! st-is_throttled)
+   {
+   /*
+* Use inverse transform sampling to randomly generate a delay, 
such
+* that the

Re: [HACKERS] changeset generation v5-01 - Patches git tree

2013-07-22 Thread Robert Haas

On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 The git tree is at:
 git://git.postgresql.org/git/users/andresfreund/postgres.git branch 
 xlog-decoding-rebasing-cf4
 http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

 On 2013-06-15 00:48:17 +0200, Andres Freund wrote:
 Overview of the attached patches:
 0001: indirect toast tuples; required but submitted independently
 0002: functions for testing; not required,
 0003: (tablespace, filenode) syscache; required
 0004: RelationMapFilenodeToOid: required, simple
 0005: pg_relation_by_filenode() function; not required but useful
 0006: Introduce InvalidCommandId: required, simple
 0007: Adjust Satisfies* interface: required, mechanical,
 0008: Allow walsender to attach to a database: required, needs review
 0009: New GetOldestXmin() parameter; required, pretty boring
 0010: Log xl_running_xact regularly in the bgwriter: required
 0011: make fsync_fname() public; required, needs to be in a different file
 0012: Relcache support for an Relation's primary key: required
 0013: Actual changeset extraction; required
 0014: Output plugin demo; not required (except for testing) but useful
 0015: Add pg_receivellog program: not required but useful
 0016: Add test_logical_decoding extension; not required, but contains
   the tests for the feature. Uses 0014
 0017: Snapshot building docs; not required

I've now also committed patch #7 from this series.  My earlier commit
fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago
I committed #1.  I am not entirely convinced of the necessity or
desirability of patch #6, but as of now I haven't studied the issues
closely.  Patch #2 does not seem useful in isolation; it adds new
regression-testing stuff but doesn't use it anywhere.

I doubt that any of the remaining patches (#8-#17) can be applied
separately without understanding the shape of the whole patch set, so
I think I, or someone else, will need to set aside more time for
detailed review before proceeding further with this patch set.  I
suggest that we close out the CommitFest entry for this patch set one
way or another, as there is no way we're going to get the whole thing
done under the auspices of CF1.

I'll try to find some more time to spend on this relatively soon, but
I think this is about as far as I can take this today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread David Fetter

On Mon, Jul 22, 2013 at 01:49:39PM -0400, Greg Smith wrote:
 Very minor update with V19 here, to reflect Alvaro's comments.  The
 tricky part now reads like this:
 
 High rate limit schedule lag values,

High values of the rate limit schedule lag measurement?

 that is lag values that are large compared to the actual transaction
 latency, indicate that something is amiss in the throttling process.
 High schedule lag can highlight a subtle problem there even if the
 target rate limit is met in the end.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign Tables as Partitions

2013-07-22 Thread Robert Haas

On Fri, Jul 19, 2013 at 1:46 PM, David Fetter da...@fetter.org wrote:
 This functionality was actually present in the original submission
 for foreign tables.  I ripped it out before commit because I didn't
 think all of the interactions with other commands had been
 adequately considered.  But I think they did consider it to some
 degree, which this patch does not.

 A ref to the patch as submitted  patch as applied would help a lot :)

The patch as committed is 0d692a0dc9f0e532c67c577187fe5d7d323cb95b.  I
suspect you can find the patch as submitted by going and figuring out
which CommitFest that was part of and working it out from there.

 Were there any particular things you managed to break with the patch
 as submitted?  Places you thought would be soft but didn't poke at?

I think there was *some* discussion of this at the time.  The basic
issue was to make sure all the commands that could encounter foreign
tables with this change but not without it behaved sanely.  This would
include all variants of ALTER TABLE as well as ANALYZE and any other
utility commands that might view it as within their remit to recurse
to child tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Expression indexes and dependecies

2013-07-22 Thread Pavan Deolasee

Hello,

While doing some tests, I observed that expression indexes can malfunction
if the underlying expression changes. For example, say I define a function
foo() as:

CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$
BEGIN
  RETURN $1 + 1;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

I then create a table, an expression index on the table and insert a few
rows:

CREATE TABLE test (a int, b char(20));
CREATE UNIQUE INDEX testindx ON test(foo(a));
INSERT INTO test VALUES (generate_series(1,1), 'bar');

A query such as following would return result using the expression index:

SET enable_seqscan TO off;
SELECT * FROM test WHERE foo(a) = 100;

It will return row with a = 99 since foo() is defined to return (a + 1)

If I now REPLACE the function definition with something else, say to return
(a + 2):

CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$
BEGIN
  RETURN $1 + 2;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

I get no error/warnings, but the index and the new function definition are
now out of sync. So above query will still return the same result, though
the row with (a = 99) no longer satisfies the current definition of
function foo().

Perhaps this is a known behaviour/limitation, but I could not find that in
the documentation. But I wonder if it makes sense to check for dependencies
during function alteration and complain. Or there are other reasons why we
can't do that and its a much larger problem than what I'm imagining ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Greg Smith

The v3 patch applies perfectly here now.  Attached is a spreadsheet with 
test results from two platforms, a Mac laptop and a Linux server.  I 
used systems with high disk speed because that seemed like a worst case 
for this improvement.  The actual improvement for shrinking WAL should 
be even better on a system with slower disks.


There are enough problems with the hundred tiny fields results that I 
think this not quite ready for commit yet.  More comments on that below. 
 This seems close though, close enough that I am planning to follow up 
to see if the slow disk results are better.


Reviewing the wal-update-testsuite.sh test program, I think the only 
case missing that would be useful to add is ten tiny fields, one 
changed.  I think that one is interesting to highlight because it's 
what benchmark programs like pgbench do very often.


The GUC added by the program looks like this:

postgres=# show wal_update_compression_ratio ;
 wal_update_compression_ratio
--
 25

Is possible to add a setting here that disables the feature altogether? 
 That always makes it easier to consider a commit, knowing people can 
roll back the change if it makes performance worse.  That would make 
performance testing easier too.  wal-update-testsuit.sh takes as long as 
13 minutes, it's long enough that I'd like the easier to automate 
comparison GUC disabling adds.  If that's not practical to do given the 
intrusiveness of the code, it's not really necessary.  I haven't looked 
at the change enough to be sure how hard this is.


On the Mac, the only case that seems to have a slowdown now is hundred 
tiny fields, half nulled.  It would be nice to understand just what is 
going on with that one.  I got some ugly results in two short fields, 
no change too, along with a couple of other weird results, but I think 
those were testing procedure issues that can be ignored.  The pgbench 
throttle work I did recently highlights that I can't really make a Mac 
quiet/consistent for benchmarking very well.  Note that I ran all of the 
Mac tests with assertions on, to try and catch platform specific bugs. 
The Linux ones used the default build parameters.


On Linux hundred tiny fields, half nulled was also by far the worst 
performing one, with a 30% increase in duration despite the 14% drop in 
WAL.  Exactly what's going on there really needs to be investigated 
before this seems safe to commit.  All of the hundred tiny fields 
cases seem pretty bad on Linux, with the rest of them running about a 
11% duration increase.


This doesn't seem ready to commit for this CF, but the number of problem 
cases is getting pretty small now.  Now that I've gotten more familiar 
with the test programs and the feature, I can run more performance tests 
on this at any time really.  If updates addressing the trouble cases are 
ready from Amit or Hari before the next CF, send them out and I can look 
at them without waiting until that one starts.  This is a very promising 
looking performance feature.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


WAL-lz-v3.xls
Description: MS-Excel spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-22 Thread Dimitri Fontaine

Markus Wanner mar...@bluegap.ch writes:
   - per-installation (not even per-cluster) DSO availability

 Not sure what the issue is, here, but I agree that should be possible.

For any extension where the new package version is shipping the same .so
file name, you can only have one module on the whole system for a given
PostgreSQL major version. The extensions are per-database, the DSO
modules are per major-version (not even per-cluster).

That's the trade-off we currently need to make to be able to ship with
the current security protections we're talking about.

 Requiring the sysadmin to install the extensions there, too, seems
 justified to me. Sounds like good advice, anyways.

Yes, installing the same extensions on any standby as the ones installed
on the primary should be *required*. My idea to enforce that would be
with using Event Triggers on the standby, but how to be able to run them
when when the standby replay currently isn't opening a session is quite
a problem.

The other way to address that is of course PL/C.

 lo_import() won't write a file for LOAD to load. An untrusted PL (or any
 other extension allowing the superuser to do that) is currently required
 to do that.

Ok, here's the full worked out example:

  ~# select lo_import('/path/to/local/on/the/client/adminpack.so');
   lo_import 
  ---
   82153
  (1 row)
  
  ~# select lo_export(82153, '/tmp/foo.so');
   lo_export 
  ---
   1
  (1 row)
  
  ~# LOAD '/tmp/foo.so';
  LOAD
  
  ~# CREATE FUNCTION pg_catalog.pg_file_write(text, text, bool)
 RETURNS bigint AS '/tmp/foo.so', 'pg_file_write'
 LANGUAGE C VOLATILE STRICT;
  CREATE FUNCTION
  
 Or to put it another way: Trusted PLs exist for a good reason. And some
 people just value security a lot and want that separation of roles.

Yeah, security is important, but it needs to be addressing real threats
to have any value whatsoever. We'not talking about adding new big holes
as extensions containing modules are using LANGUAGE C in their script
and thus are already restricted to superuser. That part would not
change.

 As someone mentioned previously, $PGDATA may well be mounted noexec, so
 that seems to be a bad choice.

I don't understand. You want good default security policy in place, and
you're saying that using PGDATA allows for a really easy policy to be
implemented by people who don't want the feature. It seems to me to be
exactly what you want, why would that be a bad choice then?

   - upgrading an extension at the OS level
 
 Once you've done that, any new backend will load the newer module
 (DSO file), so you have to be real quick if installing an hot fix in
 production and the SQL definition must be changed to match the new
 module version…

 I agree, that's a problem.

 Alternatively, we could solve that problem the other way around: Rather
 than template-ify the DSO, we could instead turn the objects created by
 the SQL scripts into something that's more linked to the script.
 Something that would reload as soon as the file on disk changes.

I think that's the wrong way to do it, because you generally want more
control over those two steps (preparing the template then instanciating
it) and you're proposing to completely lose the control and have them be
a single step instead.

Use case: shipping a new plproxy bunch of functions to 256 nodes. You
want to push the templates everywhere then just do the extension update
as fast as possible so that it appears to be about 'in sync'.

Pro-tip: you can use a plproxy function that runs the alter extension
update step using RUN ON ALL facility.

 (Note how this would make out-of-line extensions a lot closer to the
 in-line variant your recent patch adds? With the dependency between
 template and instantiation?)

Those dependencies are almost gone now except for being able to deal
with DROP TEMPLATE FOR EXTENSION and guaranteeing we can actually
pg_restore our pg_dump script.

I've been changing the implementation to be what you proposed because I
think it's making more sense (thanks!), and regression tests are
reflecting that. My github branch is up-to-date with the last changes
and I'm soon to send an updated patch that will be really as ready for
commit as possible without a commiter review.

 An attacker having access to a libpq connection with superuser rights
 cannot currently run arbitrary native code. He can try a DOS by

I think I showed a simple way to do that above in this very email.

 Well, an extension can certainly perform something akin to pg_dlopen().
 And thus load pretty much any code it wants. However, I don't think an
 extension can hook into CREATE EXTENSION, yet.

When you call a function, its MODULE_PATHNAME is getting loaded by the
backend. You would need to ship a dumb DSO module file so that this
works and arrange for the real DSO module file to be loaded via a hook
or an event trigger or something else (e.g. local_preload_libraries).

I don't see a way to be

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Andres Freund

On 2013-07-19 10:40:01 +0530, Hari Babu wrote:
 
 On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
 On 7/9/13 12:09 AM, Amit Kapila wrote:
 I think the first thing to verify is whether the results posted can be 
  validated in some other environment setup by another person.
 The testcase used is posted at below link:
 http://www.postgresql.org/message-id/51366323.8070...@vmware.com
 
 That seems easy enough to do here, Heikki's test script is excellent. 
 The latest patch Hari posted on July 2 has one hunk that doesn't apply 
 anymore now.
 
 The Head code change from Heikki is correct.
 During the patch rebase to latest PG LZ optimization code, the above code 
 change is missed.
 
 Apart from the above changed some more changes are done in the patch, those 
 are. 

FWIW I don't like this approach very much:

* I'd be very surprised if this doesn't make WAL replay of update heavy
  workloads slower by at least factor of 2.

* It makes data recovery from WAL *noticeably* harder since data
  corruption now is carried forwards and you need the old data to decode
  new data

* It makes changeset extraction either more expensive or it would have
  to be disabled there.

I think my primary issue is that philosophically/architecturally I am of
the opinion that a wal record should make sense of it's own without
depending on heap data. And this patch looses that.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Robert Haas

On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello all

 I very often use a little bit adjusted psql system queries to
 detection TOP N sized tables. I am thinking so it can be useful for
 all users

 I propose a few new commands

 \dts [N|size] ... show N largest tables | show tables larger than size
 ordered by size
 \dis [N|size] ... show N largest indexes | show indexes larger than
 size ordered by size
 \dtst [N|size] ... show N largest total size | show tables where total
 size is larger than size ordered by total size
 \dtr [N]   ... show N largest tables (ordered by rows)

 example:

 \dts 10  --- top 10 tables ordered by size
 \dts 10MB -- tables larger than 10MB ordered by size

  Schema |Name | Type  |Owner| Size
 +-+---+-+---+-
  public | eshop_users | table | eshop_owner | 16 kB
  public | zamestnanci | table | eshop_owner | 16 kB

 What do you think about this proposal? Comments, notes?

I think our \d commands are in inscrutable morass of indecipherable
gobbledygook as it is, and this is only one more step down the road to
complete insanity.  :-(

Rather than just continuing to add more imposible-to-remember syntax,
we really need a better design here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-22 Thread Greg Smith


On 7/22/13 12:58 PM, Amit kapila wrote:

As per discussion, updated patch contains following changes:
1. Regression tests for Alter System are removed
2. Parsed the auto file automatically after parsing postgresql.conf
3. Removed addition of include directive in postgresql.conf
4. Removed error handling for parsing errors


These changes have shrunk the diff down to 1411 lines of code.

I'd like to identify which committer might take this on at this point. 
In a few respects this is Ready For Committer now, because the 
committer who takes this on is going to get a strong vote on how to 
resolve most of the remaining fundamental issues:


-Is this the point to finally commit to the config directory approach?

-If this does set the config directory usage precedent, is the name used 
for that appropriate?  Whoever suggested the change from conf.d to 
config made an error IMHO.  Every example I've found of other projects 
doing this style of config refactoring picked either conf.d or a 
unique, no two are alike name.  I'd rather not see Postgres add yet 
another unique one.  (I think the 'postgresql' part of 
postgresql.auto.conf as the name of the file is redundant too--what else 
would be in the Postgres config directory but postgresql files?--but 
that's not a major issue)


This could definitely use a round of committer level review of how the 
GUC handling is being done now too.  That part of the code seems to have 
settled, and things like using the new validate_conf_option could be 
committed even with other parts still being discussed.  Exactly how to 
best break this out into useful commits is another decision that really 
needs some input from the potential committer though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Pavel Stehule

2013/7/22 Robert Haas robertmh...@gmail.com:
 On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello all

 I very often use a little bit adjusted psql system queries to
 detection TOP N sized tables. I am thinking so it can be useful for
 all users

 I propose a few new commands

 \dts [N|size] ... show N largest tables | show tables larger than size
 ordered by size
 \dis [N|size] ... show N largest indexes | show indexes larger than
 size ordered by size
 \dtst [N|size] ... show N largest total size | show tables where total
 size is larger than size ordered by total size
 \dtr [N]   ... show N largest tables (ordered by rows)

 example:

 \dts 10  --- top 10 tables ordered by size
 \dts 10MB -- tables larger than 10MB ordered by size

  Schema |Name | Type  |Owner| Size
 +-+---+-+---+-
  public | eshop_users | table | eshop_owner | 16 kB
  public | zamestnanci | table | eshop_owner | 16 kB

 What do you think about this proposal? Comments, notes?

 I think our \d commands are in inscrutable morass of indecipherable
 gobbledygook as it is, and this is only one more step down the road to
 complete insanity.  :-(

these commands are targeted to advanced users, and four chars should
not be a problem.

It has a same advantage and disadvantage as vim UI. it is fast for
advanced users, and strange for beginners :(.


 Rather than just continuing to add more imposible-to-remember syntax,
 we really need a better design here.

do you have any tip?


Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Merlin Moncure

On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello all

 I very often use a little bit adjusted psql system queries to
 detection TOP N sized tables. I am thinking so it can be useful for
 all users

 I propose a few new commands

 \dts [N|size] ... show N largest tables | show tables larger than size
 ordered by size
 \dis [N|size] ... show N largest indexes | show indexes larger than
 size ordered by size
 \dtst [N|size] ... show N largest total size | show tables where total
 size is larger than size ordered by total size
 \dtr [N]   ... show N largest tables (ordered by rows)

 example:

 \dts 10  --- top 10 tables ordered by size
 \dts 10MB -- tables larger than 10MB ordered by size

  Schema |Name | Type  |Owner| Size
 +-+---+-+---+-
  public | eshop_users | table | eshop_owner | 16 kB
  public | zamestnanci | table | eshop_owner | 16 kB

 What do you think about this proposal? Comments, notes?

 I think our \d commands are in inscrutable morass of indecipherable
 gobbledygook as it is, and this is only one more step down the road to
 complete insanity.  :-(

 Rather than just continuing to add more imposible-to-remember syntax,
 we really need a better design here.

These type of administrative tasks should be implemented as stored
procedures or functions, not enhancements to psql.  That way non-psql
clients can leverage them and you can integrate them to other queries.
 Another advantage is that they can be implemented as extension.

SELECT * from top5();

Is a little more of a pain to type.  But with my psql-fu I can cut
that down with \i if I'm so inclined.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-22 Thread Robert Haas

On Sat, Jul 20, 2013 at 12:39 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I don't have a problem with getting rid of those, it's easy enough to
 register them inside the worker and it's safe since we start with
 blocked signals. I seem to remember some discussion about why they were
 added but I can't find a reference anymore. Alvaro, do you remember?

 I left them there because it was easy; but they were absolutely
 necessary only until we decided that we would start the worker's main
 function with signals blocked.  I don't think there's any serious reason
 not to remove them now.

All right, done.  FWIW, I think starting the worker's main with
signals blocked was definitely the right decision.

I think we have consensus to back-patch the other API changes as well.
 I'll work up a patch for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Robert Haas

On Mon, Jul 22, 2013 at 3:13 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello all

 I very often use a little bit adjusted psql system queries to
 detection TOP N sized tables. I am thinking so it can be useful for
 all users

 I propose a few new commands

 \dts [N|size] ... show N largest tables | show tables larger than size
 ordered by size
 \dis [N|size] ... show N largest indexes | show indexes larger than
 size ordered by size
 \dtst [N|size] ... show N largest total size | show tables where total
 size is larger than size ordered by total size
 \dtr [N]   ... show N largest tables (ordered by rows)

 example:

 \dts 10  --- top 10 tables ordered by size
 \dts 10MB -- tables larger than 10MB ordered by size

  Schema |Name | Type  |Owner| Size
 +-+---+-+---+-
  public | eshop_users | table | eshop_owner | 16 kB
  public | zamestnanci | table | eshop_owner | 16 kB

 What do you think about this proposal? Comments, notes?

 I think our \d commands are in inscrutable morass of indecipherable
 gobbledygook as it is, and this is only one more step down the road to
 complete insanity.  :-(

 Rather than just continuing to add more imposible-to-remember syntax,
 we really need a better design here.

 These type of administrative tasks should be implemented as stored
 procedures or functions, not enhancements to psql.  That way non-psql
 clients can leverage them and you can integrate them to other queries.
  Another advantage is that they can be implemented as extension.

 SELECT * from top5();

 Is a little more of a pain to type.  But with my psql-fu I can cut
 that down with \i if I'm so inclined.

Yeah, I think that's a very reasonable approach to this kind of problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-22 Thread Alvaro Herrera

Greg Smith escribió:
 On 7/22/13 12:58 PM, Amit kapila wrote:
 As per discussion, updated patch contains following changes:
 1. Regression tests for Alter System are removed
 2. Parsed the auto file automatically after parsing postgresql.conf
 3. Removed addition of include directive in postgresql.conf
 4. Removed error handling for parsing errors
 
 These changes have shrunk the diff down to 1411 lines of code.

Nice.

 I'd like to identify which committer might take this on at this
 point.

I'm considering take a stab at it soonish, FWIW, if no other committer
steps up.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-22 Thread Andrew Gierth

Tom Lane said:
 I haven't read this patch, but that comment scares the heck out of me.
 Even if the patch isn't broken today, it will be tomorrow, if it's
 making random changes like that in data structure semantics.
 Also, if you're confused, so will be everybody else who has to deal with
 the code later --- so I don't think fixing the comments and variable
 names is optional.

I must admit to finding all of this criticism of unread code a bit
bizarre.

There are no random changes in data structure semantics. All that
happens is that FunctionScan, in the ordinality case, has two tupdescs
to deal with: the one for the function return, and the one for
FunctionScan's own scan type. Likewise two slots, one of each type.
Absolutely no liberties are taken with any of the semantics. However,
since the scan structure already has a place for the scan result slot,
the extra slot that we allocate for this case is the function
result, func_slot, while in the non-ordinality case, we use the scan
result slot for the function result too.

[Greg: we just found a bug (actually two, one code + one docs); I
think David just posted the fixed version. And ironically, the bug in
the code has nothing to do with all of this discussion.]

Here, to hopefully end the issue, is the new version of FunctionNext,
which is the core of the whole patch (everything else is just setup
for this). If anyone wants to point out exactly what is unclear, or
which changes any semantics improperly, then please do indicate
exactly what you are referring to.

/* 
 *  FunctionNext
 *
 *  This is a workhorse for ExecFunctionScan
 * 
 */
static TupleTableSlot *
FunctionNext(FunctionScanState *node)
{
EState *estate;
ScanDirection direction;
Tuplestorestate *tuplestorestate;
TupleTableSlot *scanslot;
TupleTableSlot *funcslot;

if (node-func_slot)
{
/*
 * ORDINALITY case: FUNCSLOT is the function return,
 * SCANSLOT the scan result
 */

funcslot = node-func_slot;
scanslot = node-ss.ss_ScanTupleSlot;
}
else
{
funcslot = node-ss.ss_ScanTupleSlot;
scanslot = NULL;
}

/*
 * get information from the estate and scan state
 */
estate = node-ss.ps.state;
direction = estate-es_direction;

tuplestorestate = node-tuplestorestate;

/*
 * If first time through, read all tuples from function and put them in a
 * tuplestore. Subsequent calls just fetch tuples from tuplestore.
 */
if (tuplestorestate == NULL)
{
node-tuplestorestate = tuplestorestate =
ExecMakeTableFunctionResult(node-funcexpr,
node-ss.ps.ps_ExprContext,
node-func_tupdesc,
node-eflags  EXEC_FLAG_BACKWARD);
}

/*
 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 */
(void) tuplestore_gettupleslot(tuplestorestate,
   ScanDirectionIsForward(direction),
   false,
   funcslot);

if (!scanslot)
return funcslot;

/*
 * we're doing ordinality, so we copy the values from the function return
 * slot to the (distinct) scan slot. We can do this because the lifetimes
 * of the values in each slot are the same; until we reset the scan or
 * fetch the next tuple, both will be valid.
 */

ExecClearTuple(scanslot);

if (ScanDirectionIsForward(direction))
node-ordinal++;
else
node-ordinal--;

if (!TupIsNull(funcslot))
{
int natts = funcslot-tts_tupleDescriptor-natts;
int i;

slot_getallattrs(funcslot);

for (i = 0; i  natts; ++i)
{
scanslot-tts_values[i] = funcslot-tts_values[i];
scanslot-tts_isnull[i] = funcslot-tts_isnull[i];
}

scanslot-tts_values[natts] = Int64GetDatumFast(node-ordinal);
scanslot-tts_isnull[natts] = false;

ExecStoreVirtualTuple(scanslot);
}

return scanslot;
}


-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-22 Thread Markus Wanner

Dimitri,

On 07/22/2013 08:44 PM, Dimitri Fontaine wrote:
 That's the trade-off we currently need to make to be able to ship with
 the current security protections we're talking about.

Anything wrong with shipping postgis-1.5.so and postgis-2.0.so, as I we
for Debian?

 Ok, here's the full worked out example:
 
   ~# select lo_import('/path/to/local/on/the/client/adminpack.so');
lo_import 
   ---
82153
   (1 row)
   
   ~# select lo_export(82153, '/tmp/foo.so');
lo_export 
   ---
1
   (1 row)
   
   ~# LOAD '/tmp/foo.so';
   LOAD
   
   ~# CREATE FUNCTION pg_catalog.pg_file_write(text, text, bool)
  RETURNS bigint AS '/tmp/foo.so', 'pg_file_write'
  LANGUAGE C VOLATILE STRICT;
   CREATE FUNCTION

Good point.

Note that these lo_import() and lo_export() methods - unlike the client
counterparts - read and write to the server's file system.

So, your example is incomplete in that SELECT lo_import() doesn't load
data from the client into the database. But there are many other ways to
do that.

None the less, your point holds: lo_export() is a way for a superuser to
write arbitrary files on the server's file-system via libpq.

 Or to put it another way: Trusted PLs exist for a good reason. And some
 people just value security a lot and want that separation of roles.
 
 Yeah, security is important, but it needs to be addressing real threats
 to have any value whatsoever. We'not talking about adding new big holes
 as extensions containing modules are using LANGUAGE C in their script
 and thus are already restricted to superuser. That part would not
 change.

I agree, it's not a big hole. But with security, every additional layer
counts. I think the next step is to clarify whether or not we want to
provide that guarantee.

 As someone mentioned previously, $PGDATA may well be mounted noexec, so
 that seems to be a bad choice.
 
 I don't understand. You want good default security policy in place, and
 you're saying that using PGDATA allows for a really easy policy to be
 implemented by people who don't want the feature. It seems to me to be
 exactly what you want, why would that be a bad choice then?

I'm just saying that mounting $PGDATA with noexec makes $PGDATA unusable
to load libraries from; i.e. pg_ldopen() just out right fails.

And noexec is another protective layer. Building a solution that
requires $PGDATA to be mounted with exec permissions means that very
solution won't work with that protective layer. Which is a bad thing.

 Alternatively, we could solve that problem the other way around: Rather
 than template-ify the DSO, we could instead turn the objects created by
 the SQL scripts into something that's more linked to the script.
 Something that would reload as soon as the file on disk changes.
 
 I think that's the wrong way to do it, because you generally want more
 control over those two steps (preparing the template then instanciating
 it) and you're proposing to completely lose the control and have them be
 a single step instead.

Doesn't versioning take care of that? I.e. if you've version 1.0 created
in your databases and then install 1.1 system wide, that shouldn't
immediately instantiate.

 Use case: shipping a new plproxy bunch of functions to 256 nodes. You
 want to push the templates everywhere then just do the extension update
 as fast as possible so that it appears to be about 'in sync'.
 
 Pro-tip: you can use a plproxy function that runs the alter extension
 update step using RUN ON ALL facility.

Contrary use case: you actually *want* to *replace* the created version
installed, as it's a plain security fix that's backwards compatible. A
plain 'apt-get upgrade' would do the job.

Anyways, I think this is bike-shedding: The question of what mental
model fits best doesn't matter too much, here.

I'm concerned about being consistent with whatever mental model we
choose and about an implementation that would work with whatever
security standards we want to adhere to.

 (Note how this would make out-of-line extensions a lot closer to the
 in-line variant your recent patch adds? With the dependency between
 template and instantiation?)
 
 Those dependencies are almost gone now except for being able to deal
 with DROP TEMPLATE FOR EXTENSION and guaranteeing we can actually
 pg_restore our pg_dump script.

I appreciate the bug fixes. However, there still is at least one
dependency. And that might be a good thing. Depending on what mental
model you follow.

 I've been changing the implementation to be what you proposed because I
 think it's making more sense (thanks!), and regression tests are
 reflecting that. My github branch is up-to-date with the last changes
 and I'm soon to send an updated patch that will be really as ready for
 commit as possible without a commiter review.

Good, thanks.

 An attacker having access to a libpq connection with superuser rights
 cannot currently run arbitrary native code. He can try a DOS by
 
 I think

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Pavel Stehule

2013/7/22 Merlin Moncure mmonc...@gmail.com:
 On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello all

 I very often use a little bit adjusted psql system queries to
 detection TOP N sized tables. I am thinking so it can be useful for
 all users

 I propose a few new commands

 \dts [N|size] ... show N largest tables | show tables larger than size
 ordered by size
 \dis [N|size] ... show N largest indexes | show indexes larger than
 size ordered by size
 \dtst [N|size] ... show N largest total size | show tables where total
 size is larger than size ordered by total size
 \dtr [N]   ... show N largest tables (ordered by rows)

 example:

 \dts 10  --- top 10 tables ordered by size
 \dts 10MB -- tables larger than 10MB ordered by size

  Schema |Name | Type  |Owner| Size
 +-+---+-+---+-
  public | eshop_users | table | eshop_owner | 16 kB
  public | zamestnanci | table | eshop_owner | 16 kB

 What do you think about this proposal? Comments, notes?

 I think our \d commands are in inscrutable morass of indecipherable
 gobbledygook as it is, and this is only one more step down the road to
 complete insanity.  :-(

 Rather than just continuing to add more imposible-to-remember syntax,
 we really need a better design here.

 These type of administrative tasks should be implemented as stored
 procedures or functions, not enhancements to psql.  That way non-psql
 clients can leverage them and you can integrate them to other queries.
  Another advantage is that they can be implemented as extension.

 SELECT * from top5();


Is not a problem for any advanced user write these queries.

But it is hard use a parametrized query inside psql.

 Is a little more of a pain to type.  But with my psql-fu I can cut
 that down with \i if I'm so inclined.

you cannot use parameters - then I have to have prepared files like
top10, top20, ... what is not too friendly

Regards

Pavel



 merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RangeTblEntry.joinaliasvars representation change

2013-07-22 Thread Tom Lane

In
http://www.postgresql.org/message-id/cak_s-g3-fwveer1c0idvtz0745-7ryifi8whbzcnmsn+hwc...@mail.gmail.com
it's pointed out that commit 2ffa740b was a few bricks shy of a load,
because it failed to cope with the possibility of a joinaliasvars item
containing an implicit coercion.  That's not too hard to fix by adding a
strip_implicit_coercions() call, but while testing it I realized that
there's a second bug in the same place: the code also fails to cope with
a Const arising from a dropped input-relation column.  (The comment
claiming that this can't happen is flat wrong, because we *have* passed
the view query through AcquireRewriteLocks().)  I fixed that too, and
improved the comment in parsenodes.h that led me to overlook implicit
coercions in the first place, as per the attached WIP patch.

I then went looking for other places that might be assuming too much
about what is in joinaliasvars lists, and found several pre-existing
bugs :-(.  The nastiest one is in flatten_join_alias_vars_mutator's code
to expand a whole-row reference, which supposes that if it finds a Const
item then that must represent a dropped column.  That would be true in
the parser or rewriter, but at this point in planning we have already
done subquery pullup, which means we could find anything including a
Const there.  The code would thus mistake a pulled-up constant subquery
output for a dropped column.  An example in the regression database is

explain verbose select j from (int4_tbl join (select q1 as f1, q2 as z, 42 as c 
from int8_tbl) ss using(f1)) j;
QUERY PLAN 
---
 Hash Join  (cost=1.11..2.23 rows=5 width=16)
   Output: ROW(int8_tbl.q1, int8_tbl.q2)
   Hash Cond: (int8_tbl.q1 = int4_tbl.f1)
   -  Seq Scan on public.int8_tbl  (cost=0.00..1.05 rows=5 width=16)
 Output: int8_tbl.q1, int8_tbl.q2
   -  Hash  (cost=1.05..1.05 rows=5 width=4)
 Output: int4_tbl.f1
 -  Seq Scan on public.int4_tbl  (cost=0.00..1.05 rows=5 width=4)
   Output: int4_tbl.f1
(9 rows)

The 42 has disappeared entirely from the ROW() construct :-(.  This can
be reproduced in all active branches.

After some reflection I think that the best fix is to redefine
AcquireRewriteLocks' processing of dropped columns so that it puts an
actual null pointer, not a consed-up Const, into the joinaliasvars list
item.  This is reliably distinguishable from anything we might pull up
from a subquery output, and it doesn't really lose information since the
Const was completely phony anyway.  (I think the original design
envisioned having the Const carrying column datatype info, but we
abandoned that idea upon realizing that the dropped column might be of a
dropped type.  The phony Const is currently always of type INT4.)

This definitional change will not affect any rules-on-disk since the
dropped-column substitution is only done on rule trees when they are
loaded back into memory.

A risk we'd be taking with this change is that any extension code that
looks at post-rewriter joinaliasvars lists might be confused.  However,
I'm not sure why extension code would be looking at those.  In any case,
I can't see a different back-patchable change that would reduce such a
risk; we have to change the representation *somehow* if we're going to
distinguish these cases correctly.

Thoughts?

regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 40b565a11d6ddf40b3cd8e692e625efe1f6dc3a7..3292278774f15851f3ad7ae000139278c3f94fd8 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*** typedef struct
*** 235,240 
--- 235,241 
  	 * child RTE's attno and rightattnos[i] is zero; and conversely for a
  	 * column of the right child.  But for merged columns produced by JOIN
  	 * USING/NATURAL JOIN, both leftattnos[i] and rightattnos[i] are nonzero.
+ 	 * Also, if the column has been dropped, both are zero.
  	 *
  	 * If it's a JOIN USING, usingNames holds the alias names selected for the
  	 * merged columns (these might be different from the original USING list,
*** set_join_column_names(deparse_namespace 
*** 3053,3058 
--- 3054,3066 
  		char	   *colname = colinfo-colnames[i];
  		char	   *real_colname;
  
+ 		/* Ignore dropped column (only possible for non-merged column) */
+ 		if (colinfo-leftattnos[i] == 0  colinfo-rightattnos[i] == 0)
+ 		{
+ 			Assert(colname == NULL);
+ 			continue;
+ 		}
+ 
  		/* Get the child column name */
  		if (colinfo-leftattnos[i]  0)
  			real_colname = leftcolinfo-colnames[colinfo-leftattnos[i] - 1];
*** set_join_column_names(deparse_namespace 
*** 3061,3075 
  		else
  		{
  			/* We're joining system columns --- use eref name */
! 			real_colname = (char *) list_nth(rte-eref-colnames, i);
! 		}
! 
! 		/* Ignore dropped

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Andrew Dunstan



On 07/22/2013 03:11 PM, Pavel Stehule wrote:

2013/7/22 Robert Haas robertmh...@gmail.com:


Rather than just continuing to add more imposible-to-remember syntax,
we really need a better design here.

do you have any tip?





I agree with Robert. My tip is this: when you're in a hole, the first 
thing to do is to stop digging.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.

2013-07-22 Thread Robert Haas

On Mon, Jul 22, 2013 at 3:16 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we have consensus to back-patch the other API changes as well.
  I'll work up a patch for that.

Pushed that as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax

2013-07-22 Thread Karol Trzcionka

I've noticed problem with UPDATE ... FROM statement. Fix in new version.
Regards,
Karol
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 90b9208..eba35f0 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -194,12 +194,27 @@ UPDATE [ ONLY ] replaceable class=PARAMETERtable_name/replaceable [ * ] [
 termreplaceable class=PARAMETERoutput_expression/replaceable/term
 listitem
  para
-  An expression to be computed and returned by the commandUPDATE/
-  command after each row is updated.  The expression can use any
-  column names of the table named by replaceable class=PARAMETERtable_name/replaceable
-  or table(s) listed in literalFROM/.
-  Write literal*/ to return all columns.
+  An expression to be computed and returned by the
+  commandUPDATE/ command either before or after (prefixed with
+  literalBEFORE./literal and literalAFTER./literal,
+  respectively) each row is updated.  The expression can use any
+  column names of the table named by replaceable
+  class=PARAMETERtable_name/replaceable or table(s) listed in
+  literalFROM/.  Write literalAFTER.*/literal to return all 
+  columns after the update. Write literalBEFORE.*/literal for all
+  columns before the update. Write literal*/literal to return all
+  columns after update and all triggers fired (these values are in table
+  after command). You may combine BEFORE, AFTER and raw columns in the
+  expression.
  /para
+ warningpara
+ Mixing table names or aliases named before or after with the
+ above will result in confusion and suffering.  If you happen to
+ have a table called literalbefore/literal or
+ literalafter/literal, alias it to something else when using
+ RETURNING.
+ /para/warning
+
 /listitem
/varlistentry
 
@@ -287,15 +302,16 @@ UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT
   /para
 
   para
-   Perform the same operation and return the updated entries:
+   Perform the same operation and return information on the changed entries:
 
 programlisting
 UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT
   WHERE city = 'San Francisco' AND date = '2003-07-03'
-  RETURNING temp_lo, temp_hi, prcp;
+  RETURNING temp_lo AS new_low, temp_hi AS new_high, BEFORE.temp_hi/BEFORE.temp_low AS old_ratio, AFTER.temp_hi/AFTER.temp_low AS new_ratio prcp;
 /programlisting
   /para
 
+
   para
Use the alternative column-list syntax to do the same update:
 programlisting
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d86e9ad..fafd311 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2335,7 +2335,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 TupleTableSlot *
 ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	 ResultRelInfo *relinfo,
-	 ItemPointer tupleid, TupleTableSlot *slot)
+	 ItemPointer tupleid, TupleTableSlot *slot, TupleTableSlot **planSlot)
 {
 	TriggerDesc *trigdesc = relinfo-ri_TrigDesc;
 	HeapTuple	slottuple = ExecMaterializeSlot(slot);
@@ -2381,6 +2381,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	if (newSlot != NULL)
 	{
 		slot = ExecFilterJunk(relinfo-ri_junkFilter, newSlot);
+		*planSlot = newSlot;
 		slottuple = ExecMaterializeSlot(slot);
 		newtuple = slottuple;
 	}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 15f5dcc..06ebaf3 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -603,7 +603,7 @@ ExecUpdate(ItemPointer tupleid,
 		resultRelInfo-ri_TrigDesc-trig_update_before_row)
 	{
 		slot = ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
-	tupleid, slot);
+	tupleid, slot, planSlot);
 
 		if (slot == NULL)		/* do nothing */
 			return NULL;
@@ -737,6 +737,7 @@ lreplace:;
 		   hufd.xmax);
 	if (!TupIsNull(epqslot))
 	{
+		planSlot = epqslot;
 		*tupleid = hufd.ctid;
 		slot = ExecFilterJunk(resultRelInfo-ri_junkFilter, epqslot);
 		tuple = ExecMaterializeSlot(slot);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a896d76..409b4d1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1928,6 +1928,7 @@ range_table_walker(List *rtable,
 		{
 			case RTE_RELATION:
 			case RTE_CTE:
+			case RTE_BEFORE:
 /* nothing to do */
 break;
 			case RTE_SUBQUERY:
@@ -2642,6 +2643,7 @@ range_table_mutator(List *rtable,
 		{
 			case RTE_RELATION:
 			case RTE_CTE:
+			case RTE_BEFORE:
 /* we don't bother to copy eref, aliases, etc; OK? */
 break;
 			case RTE_SUBQUERY:
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 48cd9dc..79b03af 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2366,6 +2366,7 @@

Re: [HACKERS] small typo in src/backend/access/transam/xlog.c

2013-07-22 Thread Robert Haas

On Mon, Jul 22, 2013 at 6:45 AM, didier did...@gmail.com wrote:
 Hi

 in void
 BootStrapXLOG(void)

   * to seed it other than the system clock value...)  The upper half of
 the
  * uint64 value is just the tv_sec part, while the lower half is the
 XOR
  * of tv_sec and tv_usec.  This is to ensure that we don't lose
 uniqueness
  * unnecessarily if uint64 is really only 32 bits wide.  A person
  * knowing this encoding can determine the initialization time of
 the
  * installation, which could perhaps be useful sometimes.
  */
 gettimeofday(tv, NULL);
 sysidentifier = ((uint64) tv.tv_sec)  32;
 sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec);

 should be
 sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec);

And why is that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small typo in src/backend/access/transam/xlog.c

2013-07-22 Thread Andres Freund

On 2013-07-22 15:55:46 -0400, Robert Haas wrote:
 On Mon, Jul 22, 2013 at 6:45 AM, didier did...@gmail.com wrote:
  Hi
 
  in void
  BootStrapXLOG(void)
 
* to seed it other than the system clock value...)  The upper half of
  the
   * uint64 value is just the tv_sec part, while the lower half is the
  XOR
   * of tv_sec and tv_usec.  This is to ensure that we don't lose
  uniqueness
   * unnecessarily if uint64 is really only 32 bits wide.  A person
   * knowing this encoding can determine the initialization time of
  the
   * installation, which could perhaps be useful sometimes.
   */
  gettimeofday(tv, NULL);
  sysidentifier = ((uint64) tv.tv_sec)  32;
  sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec);
 
  should be
  sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec);
 
 And why is that?

The comment above tells: while the lower half is the XOR of tv_sec and
tv_usec.

I don't think it really matters. the bitwise OR has the tenency to
collect too many set bits, but ... who cares?
On the other hand, changing it is easy and shouldn't cause any problems.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enum-ify resource manager's xl_info values

2013-07-22 Thread Peter Eisentraut

On 7/22/13 7:21 AM, Andres Freund wrote:
 Would somebody object to making the rmgr's invo value #defines like:

I'm suspicious of enums that are assigned specific values.  Enums should
stand by themselves, they shouldn't be a symbolic layer on top of some
other numbering or bit-fiddling scheme.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new row-level lock error messages

2013-07-22 Thread Alvaro Herrera

Peter Eisentraut wrote:

 I would suggest that these changes be undone, except that the old
 SELECT FOR ... be replaced by a dynamic string that reverse-parses the
 LockingClause to provide the actual clause that was used.

Here's a patch for this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 839ed9d..8efb94b 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -26,6 +26,7 @@
 #include optimizer/prep.h
 #include optimizer/restrictinfo.h
 #include optimizer/var.h
+#include parser/analyze.h
 #include rewrite/rewriteManip.h
 #include utils/lsyscache.h
 
@@ -883,7 +884,10 @@ make_outerjoininfo(PlannerInfo *root,
 			(jointype == JOIN_FULL  bms_is_member(rc-rti, left_rels)))
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg(row-level locks cannot be applied to the nullable side of an outer join)));
+	 /*--
+	  translator: %s is a SQL row locking clause such as FOR UPDATE */
+	 errmsg(%s cannot be applied to the nullable side of an outer join,
+			LCS_asString(rc-strength;
 	}
 
 	sjinfo-syn_lefthand = left_rels;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 01e2fa3..9ff8050 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1107,7 +1107,11 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		if (parse-rowMarks)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg(row-level locks are not allowed with UNION/INTERSECT/EXCEPT)));
+	 /*--
+	   translator: %s is a SQL row locking clause such as FOR UPDATE */
+	 errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT,
+			LCS_asString(((RowMarkClause *)
+		  linitial(parse-rowMarks))-strength;
 
 		/*
 		 * Calculate pathkeys that represent result ordering requirements
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 16ff234..39036fb 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1221,7 +1221,11 @@ transformValuesClause(ParseState *pstate, SelectStmt *stmt)
 	if (stmt-lockingClause)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg(SELECT FOR UPDATE/SHARE cannot be applied to VALUES)));
+ /*--
+   translator: %s is a SQL row locking clause such as FOR UPDATE */
+ errmsg(%s cannot be applied to VALUES,
+		LCS_asString(((LockingClause *)
+	  linitial(stmt-lockingClause))-strength;
 
 	qry-rtable = pstate-p_rtable;
 	qry-jointree = makeFromExpr(pstate-p_joinlist, NULL);
@@ -1312,7 +1316,11 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
 	if (lockingClause)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg(SELECT FOR UPDATE/SHARE is not allowed with UNION/INTERSECT/EXCEPT)));
+ /*--
+   translator: %s is a SQL row locking clause such as FOR UPDATE */
+ errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT,
+		LCS_asString(((LockingClause *)
+	  linitial(lockingClause))-strength;
 
 	/* Process the WITH clause independently of all else */
 	if (withClause)
@@ -1506,7 +1514,11 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
 	if (stmt-lockingClause)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg(SELECT FOR UPDATE/SHARE is not allowed with UNION/INTERSECT/EXCEPT)));
+ /*--
+   translator: %s is a SQL row locking clause such as FOR UPDATE */
+ errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT,
+		LCS_asString(((LockingClause *)
+	  linitial(stmt-lockingClause))-strength;
 
 	/*
 	 * If an internal node of a set-op tree has ORDER BY, LIMIT, FOR UPDATE,
@@ -2063,21 +2075,33 @@ transformDeclareCursorStmt(ParseState *pstate, DeclareCursorStmt *stmt)
 	if (result-rowMarks != NIL  (stmt-options  CURSOR_OPT_HOLD))
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg(DECLARE CURSOR WITH HOLD ... FOR UPDATE/SHARE is not supported),
+ /*--
+   translator: %s is a SQL row locking clause such as FOR UPDATE */
+ errmsg(DECLARE CURSOR WITH HOLD ... %s is not supported,
+		LCS_asString(((RowMarkClause *)
+	  linitial(result-rowMarks))-strength)),
  errdetail(Holdable cursors must be READ ONLY.)));
 
 	/* FOR UPDATE and SCROLL are not compatible */
 	if (result-rowMarks != NIL  (stmt-options  CURSOR_OPT_SCROLL))
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		errmsg(DECLARE SCROLL CURSOR ... FOR UPDATE/SHARE is not supported),
+ /*--
+   translator: %s is a SQL row locking clause such as FOR UPDATE */
+ errmsg(DECLARE SCROLL CURSOR ... %s is not supported,
+		LCS_asString(((RowMarkClause

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Greg Smith


On 7/22/13 2:57 PM, Andres Freund wrote:

* I'd be very surprised if this doesn't make WAL replay of update heavy
   workloads slower by at least factor of 2.


I was thinking about what a benchmark of WAL replay would look like last 
year.  I don't think that data is captured very well yet, and it should be.


My idea was to break the benchmark into two pieces.  One would take a 
base backup, then run a series of tests and archive the resulting the 
WAL.  I doubt you can make a useful benchmark here without a usefully 
populated database, that's why the base backup step is needed.


The first useful result then is to measure how long commit/archiving 
took and the WAL volume, which is what's done by the test harness for 
this program.  Then the resulting backup would be setup for replay. 
tarring up the backup and WAL archive could even give you a repeatable 
test set for ones where it's only replay changes happening.  Then the 
main number that's useful, total replay time, would be measured.


The main thing I wanted this for wasn't for code changes; it was to 
benchmark configuration changes.  I'd like to be able to answer 
questions like which I/O scheduler is best for a standby in a way that 
has real test data behind it.  The same approach should useful for 
answering your concerns about the replay performance impact of this 
change too though.



* It makes changeset extraction either more expensive or it would have
   to be disabled there.


That argues that if committed at all, the ability to turn this off I was 
asking about would be necessary.  It sounds like this *could* work like 
how minimal WAL archiving levels allow optimizations that are disabled 
at higher ones--like the COPY into a truncated/new table cheat.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-22 Thread Robert Haas

On Fri, Jul 19, 2013 at 1:20 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 4. If we use GetActiveSnapshot, all the comments about about a fresh
 MVCC snapshot still apply.  However, the snapshot in question could be
 even more stale, especially in repeatable read or serializable mode.
 However, this might be thought a more consistent behavior than what we
 have now.  And I'm guessing that this function is typically run as its
 own transaction, so in practice this doesn't seem much different from
 an MVCC snapshot, only cheaper.

 At the moment, I dislike #2 and slightly prefer #4 to #3.

 +1 for #4, and if we ever need more then we can provide a non-default
 way to get at #2.

OK, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Jeff Janes

On Mon, Jul 22, 2013 at 12:40 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 07/22/2013 03:11 PM, Pavel Stehule wrote:

 2013/7/22 Robert Haas robertmh...@gmail.com:


 Rather than just continuing to add more imposible-to-remember syntax,
 we really need a better design here.

 do you have any tip?




 I agree with Robert. My tip is this: when you're in a hole, the first thing
 to do is to stop digging.

I don't think that Pavel believes himself to be in a hole.

After setting up my .psqlrc file as I normally do, I could do this:

:rtsize limit 10;

But it doesn't have the 'MB' feature, and if I want to help someone
else I first have to explain to them how to set their .psqlrc file the
same as mine, which is rather a bummer.

Is looking for the biggest tables a common enough thing that it should
be available to everyone, without needing custom customization?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-22 Thread Greg Smith


On 7/20/13 10:08 AM, Kohei KaiGai wrote:


With that change to expand_targetlist(), the change to getrelid() may
be unnecessary, and then also the new rowsec_relid field in
RangeTblEntry may not be needed.


Hmm. I didn't have this idea. It seems to me fair enough and kills
necessity to enhance RangeTblEntry and getrelid() indeed.
I try to fix up this implementation according to your suggestion.


Great, there's one useful bit of feedback for you then, and that seems 
to address Tom's getrelid concern too.


For the active CommitFest, I don't see any place we can go with this 
right now except for Returned with Feedback.  We really need more 
reviewers willing to put a significant amount of time into going through 
this code.


Anyone who would like to see RLS committed should consider how you can 
get resources to support a skilled PostgreSQL reviewer spending time on 
it.  (This is a bit much to expect new reviewers to chew on usefully) 
I've been working on that here, but I don't have anything I can publicly 
commit to yet.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expression indexes and dependecies

2013-07-22 Thread Alvaro Herrera

Pavan Deolasee escribió:
 Hello,
 
 While doing some tests, I observed that expression indexes can malfunction
 if the underlying expression changes.

[...]

 Perhaps this is a known behaviour/limitation, but I could not find that in
 the documentation. But I wonder if it makes sense to check for dependencies
 during function alteration and complain. Or there are other reasons why we
 can't do that and its a much larger problem than what I'm imagining ?

This is a tough problem.  The dependency mechanism has no way to keep
track of this kind of dependency; all it does is prevent the function
from being dropped altogether, but preventing it from acquiring a
conflicting definition is outside its charter.

One way to attack this would be registering dependencies of a new kind
on functions used by index expressions.  Then CREATE OR REPLACE function
could reject alteration for such functions.  I don't know if we care
enough about this case.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expression indexes and dependecies

2013-07-22 Thread Claudio Freire

On Mon, Jul 22, 2013 at 6:04 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Pavan Deolasee escribió:
 Hello,

 While doing some tests, I observed that expression indexes can malfunction
 if the underlying expression changes.

 [...]

 Perhaps this is a known behaviour/limitation, but I could not find that in
 the documentation. But I wonder if it makes sense to check for dependencies
 during function alteration and complain. Or there are other reasons why we
 can't do that and its a much larger problem than what I'm imagining ?

 This is a tough problem.  The dependency mechanism has no way to keep
 track of this kind of dependency; all it does is prevent the function
 from being dropped altogether, but preventing it from acquiring a
 conflicting definition is outside its charter.

 One way to attack this would be registering dependencies of a new kind
 on functions used by index expressions.  Then CREATE OR REPLACE function
 could reject alteration for such functions.  I don't know if we care
 enough about this case.

What about a warning and leave it to the dba to reindex?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expression indexes and dependecies

2013-07-22 Thread Andres Freund

On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote:
 Pavan Deolasee escribió:
  Hello,
  
  While doing some tests, I observed that expression indexes can malfunction
  if the underlying expression changes.
 
 [...]
 
  Perhaps this is a known behaviour/limitation, but I could not find that in
  the documentation. But I wonder if it makes sense to check for dependencies
  during function alteration and complain. Or there are other reasons why we
  can't do that and its a much larger problem than what I'm imagining ?
 
 This is a tough problem.  The dependency mechanism has no way to keep
 track of this kind of dependency; all it does is prevent the function
 from being dropped altogether, but preventing it from acquiring a
 conflicting definition is outside its charter.
 
 One way to attack this would be registering dependencies of a new kind
 on functions used by index expressions.  Then CREATE OR REPLACE function
 could reject alteration for such functions.  I don't know if we care
 enough about this case.

I think changing the results of a immutable function violates the
contract enough to make this the user's fault. Also the other solutions
seem hard to achieve ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Comma Comma Comma 8601

2013-07-22 Thread David E. Wheeler

Hackers,

According to [Wikipedia](https://en.wikipedia.org/wiki/ISO_8601#Times):

 Decimal fractions may also be added to any of the three time elements. A 
 decimal mark, either a comma or a dot (without any preference as stated in 
 resolution 10 of the 22nd General Conference CGPM in 2003,[11] but with a 
 preference for a comma according to ISO 8601:2004)[12] is used as a separator 
 between the time element and its fraction. A fraction may only be added to 
 the lowest order time element in the representation. To denote 14 hours, 30 
 and one half minutes, do not include a seconds figure. Represent it as 
 14:30,5, 1430,5, 14:30.5, or 1430.5. There is no limit on the number 
 of decimal places for the decimal fraction. However, the number of decimal 
 places needs to be agreed to by the communicating parties.

I assume that the Postgres project has no interest in supporting the input of 
whack times like “14:30.5”, “1430.5”, “14:30.5”, or “1430.5”, right? I mean, 
that’s just bizarre, amirite?

But I do wonder if the comma should be allowed for fractional seconds, since 
the spec says it is preferred (and often used in Javaland, I’m told). As in 
“14:30:50,232”. Thoughts?

Best,

David

PS: Apologies if you somehow ended up with a bad 80s pop song in your head.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Dimitri Fontaine

Pavel Stehule pavel.steh...@gmail.com writes:
 SELECT * from top5();

  $ TABLE top5;  -- add a view on top of the SRF

 you cannot use parameters - then I have to have prepared files like
 top10, top20, ... what is not too friendly

The SRF could be using custom GUCs so that you can parametrize it, or
just even classic parameters…

  $ TABLE top(5);  -- needs a patch to accept SRF here…
  $ TABLE top LIMIT 5;

  $ SET top.limit = 5;
  $ TABLE top;

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund

On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
 Kevin Grittner kgri...@postgresql.org writes:
  Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
 
 The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
 is broken.

Jagarundi still tells that story. At least something like the following
patch seems to be required.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 46149ee..09ea344 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -141,6 +141,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	List	   *actions;
 	Query	   *dataQuery;
 	Oid			tableSpace;
+	Oid			owner;
 	Oid			OIDNewHeap;
 	DestReceiver *dest;
 	bool		concurrent;
@@ -238,6 +239,8 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	else
 		tableSpace = matviewRel-rd_rel-reltablespace;
 
+	owner = matviewRel-rd_rel-relowner;
+
 	heap_close(matviewRel, NoLock);
 
 	/* Create the transient table that will receive the regenerated data. */
@@ -247,8 +250,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt-skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString,
- matviewRel-rd_rel-relowner);
+		refresh_matview_datafill(dest, dataQuery, queryString, owner);
 
 	/* Make the matview match the newly generated data. */
 	if (concurrent)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] InvokeObjectPostAlterHook() vs. CommandCounterIncrement()

2013-07-22 Thread Robert Haas

On Sun, Jul 21, 2013 at 4:44 AM, Ants Aasma ants.aa...@eesti.ee wrote:
 On Jul 21, 2013 4:06 AM, Noah Misch n...@leadboat.com wrote:
 If these hooks will need to apply to a larger operation, I
 think that mandates a different means to reliably expose the before/after
 object states.

 I haven't checked the code to see how it would fit the API, but what about
 taking a snapshot before altering and passing this to the hook. Would there
 be other issues besides performance? If the snapshot is taken only when
 there is a hook present then the performance can be fixed later.

I had the idea of finding a way to pass either the old tuple, or
perhaps just the TID of the old tuple.  Not sure if passing a snapshot
is better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Robert Haas

On Mon, Jul 22, 2013 at 6:01 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
 Kevin Grittner kgri...@postgresql.org writes:
  Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

 The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
 is broken.

 Jagarundi still tells that story. At least something like the following
 patch seems to be required.

Thanks, I was noticing that breakage today, too.

I committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread David Fetter

On Mon, Jul 22, 2013 at 02:44:59PM -0700, Dimitri Fontaine wrote:
 Pavel Stehule pavel.steh...@gmail.com writes:
  SELECT * from top5();
 
   $ TABLE top5;  -- add a view on top of the SRF
 
  you cannot use parameters - then I have to have prepared files like
  top10, top20, ... what is not too friendly
 
 The SRF could be using custom GUCs so that you can parametrize it, or
 just even classic parameters…
 
   $ TABLE top(5);  -- needs a patch to accept SRF here…

Andrew Gierth will probably be posting a design  patch for something
similar soon :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread David Fetter

On Mon, Jul 22, 2013 at 03:55:33PM -0700, David Fetter wrote:
 On Mon, Jul 22, 2013 at 02:44:59PM -0700, Dimitri Fontaine wrote:
  Pavel Stehule pavel.steh...@gmail.com writes:
   SELECT * from top5();
  
$ TABLE top5;  -- add a view on top of the SRF
  
   you cannot use parameters - then I have to have prepared files like
   top10, top20, ... what is not too friendly
  
  The SRF could be using custom GUCs so that you can parametrize it, or
  just even classic parameters…
  
$ TABLE top(5);  -- needs a patch to accept SRF here…
 
 Andrew Gierth will probably be posting a design  patch for something
 similar soon :)

Probably not :P

Andrew will be posting a design and patch if and when he decides it's
appropriate to do so.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
 The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
 is broken.

 Jagarundi still tells that story.

Uh, no.  Jagarundi was perfectly happy for several build cycles after
I committed 405a468.  The buildfarm history shows that it started
complaining again after this change:

f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs

 At least something like the following patch seems to be required.

This does look like a necessary change, but I suspect there is also
something rotten in f01d1ae.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List

2013-07-22 Thread Greg Smith


On 6/24/13 12:57 PM, Josh Berkus wrote:

Maciej is correct that this policy also belongs on the how to submit a
patch wiki page.  I will remedy that.


I just reviewed and heavily updated the new section you added to 
https://wiki.postgresql.org/wiki/Submitting_a_Patch  That included the 
idea that the reviewed patch should be similar in size/scope to the 
submitted one, as well a slightly deeper discussion of how to fit this 
work into a feature review quote.


I find myself needing to explain this whole subject to potential feature 
sponsors enough that I've tried a few ways of describing it.  The 
closest analog I've found so far is the way carbon offset work is 
accounted for.  I sometimes refer to the mutual review as an offsetting 
review in conversation, and I threw that term into the guidelines as well.


As far as motivating new reviewers goes, let's talk about positive 
feedback.  Anything that complicates the release notes is a non-starter 
because that resource is tightly controlled by a small number of people, 
and it's trying to satisfy a lot of purposes.  What I would like to see 
is an official but simple Review Leaderboard for each release instead.


Each time someone writes a review and adds it to CF app with a Review 
entry, the account that entered it gets a point.  Sum the points at the 
end and there's your weighted list for T-shirt prizes.  It should be 
possible to get that count with a trivial SELECT query out of the CF 
database, and then produce a simple HTML table at the end of each CF or 
release.  Anything that takes more work than that, and anything that 
takes *any* time that must come from a committer, is too much accounting.


This idea would be a bit unfair to people who review big patches instead 
of little ones.  But an approach that disproportionately rewards new 
contributors working on small things isn't that terrible.  As long as 
the review tests whether the code compiles and passes the regression 
tests, that's good enough to deserve a point.  I'd be happy if we 
suddenly had a problem where people were doing only that to try game 
their leaderboard ranking.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comma Comma Comma 8601

2013-07-22 Thread Tom Lane

David E. Wheeler da...@justatheory.com writes:
 But I do wonder if the comma should be allowed for fractional seconds,
 since the spec says it is preferred (and often used in Javaland, I'm
 told). As in 14:30:50,232. Thoughts?

Does that create any ambiguities against formats we already support?
I'm worried about examples like this one:

select 'monday, july 22, 22:30 2013'::timestamptz;
  timestamptz   

 2013-07-22 22:30:00-04
(1 row)

Right now I'm pretty sure that the datetime parser treats comma as a
noise symbol.  If that stops being true, we're likely to break some
applications out there (admittedly, possibly rather strange ones,
but still ...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund

On 2013-07-22 19:09:13 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
  The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
  is broken.
 
  Jagarundi still tells that story.
 
 Uh, no.  Jagarundi was perfectly happy for several build cycles after
 I committed 405a468.  The buildfarm history shows that it started
 complaining again after this change:
 
 f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs

Huh? That's rather strange. No code from that patch is even exececuted
until the alter_table regression tests way later. So I can't see how
that patch could cause the matview error. Which is pretty clearly using
an freed relcache entry.
I think this may be a timing issue.

  At least something like the following patch seems to be required.
 
 This does look like a necessary change, but I suspect there is also
 something rotten in f01d1ae.

The alter table failure lateron seems to be ERROR:  relation tmp
already exists, which just seems to be a consequence of incomplete
cleanup due to the earlier crashes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expression indexes and dependecies

2013-07-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote:
 One way to attack this would be registering dependencies of a new kind
 on functions used by index expressions.  Then CREATE OR REPLACE function
 could reject alteration for such functions.  I don't know if we care
 enough about this case.

 I think changing the results of a immutable function violates the
 contract enough to make this the user's fault. Also the other solutions
 seem hard to achieve ;)

Yeah.  Prohibiting any change at all would be a cure worse than the
disease, likely, but we don't have the tools to analyze more finely than
that.  And what if the index uses function A which calls function B,
and you change function B?

I'd be in favor of adding a note to the CREATE INDEX man page pointing
out that if you change the behavior of an immutable function, any bad
consequences for indexes are on your own head, and a REINDEX would be
advisable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small typo in src/backend/access/transam/xlog.c

2013-07-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2013-07-22 15:55:46 -0400, Robert Haas wrote:
 And why is that?

 The comment above tells: while the lower half is the XOR of tv_sec and
 tv_usec.

Yeah, the code doesn't match the comment; this mistake seems to be
aboriginal.

 I don't think it really matters. the bitwise OR has the tenency to
 collect too many set bits, but ... who cares?

This is making the value less unique than intended, so I think it's
worth doing something about.  However, it's also worth noting that the
intended behavior (as described by the comment) was designed to allow
for the possibility that uint64 is really only 32 bits --- a
possibility we stopped supporting several versions ago.  So rather than
just quickly s/|/^/, maybe we should step back and think about whether
we want to change the sysid generation algorithm altogether.

We could for instance keep the high half as tv_sec, while making the low
half be something like (tv_usec  12) | (getpid()  0xfff).  This would
restore the intended ability to reverse-engineer the exact creation time
from the sysidentifier, and also add a little more uniqueness by way of
the creating process's PID.  (Note tv_usec must fit in 20 bits.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-22 Thread Josh Berkus

All,

Christophe just discovered something with include files which is going
to cause issues with ALTER SYSTEM SET.

So, take as a hypothetical that you use the default postgresql.conf
file, which sets shared_buffers = 32MB.

Instead of editing this file, you do ALTER SYSTEM SET shared_buffers =
'1GB', which updates config/postgresql.auto.conf.

Then you restart PostgreSQL.

Everything is hunky-dory, until a later occasion where you *reload*
postgresql.  Then postgres startup hits the first shared_buffers=32MB
(in postgresql.conf), says I can't change shared buffers on a reload,
and aborts before ever getting to postgresql.auto.conf.

This is a problem that exists now with includes in postgresql.conf, but
having ALTER SYSTEM SET will cause more users to hit it.  Seems like
we'll need to fix it before releasing 9.4.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Tom Lane

Jeff Janes jeff.ja...@gmail.com writes:
 Is looking for the biggest tables a common enough thing that it should
 be available to everyone, without needing custom customization?

I don't really think so.  It's surely not much harder than

select relname, pg_relation_size(oid) from pg_class order by 2 desc;

Moreover, the people who need this likely don't need it as a psql
command, but rather as something available to monitoring tools.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Andrew Dunstan



On 07/22/2013 04:26 PM, Jeff Janes wrote:

On Mon, Jul 22, 2013 at 12:40 PM, Andrew Dunstan and...@dunslane.net wrote:

On 07/22/2013 03:11 PM, Pavel Stehule wrote:

2013/7/22 Robert Haas robertmh...@gmail.com:



Rather than just continuing to add more imposible-to-remember syntax,
we really need a better design here.

do you have any tip?




I agree with Robert. My tip is this: when you're in a hole, the first thing
to do is to stop digging.

I don't think that Pavel believes himself to be in a hole.




I'm not suggesting he's in a hole - I'm suggesting we are.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-22 Thread Tatsuo Ishii

 Very minor update with V19 here, to reflect Alvaro's comments.  The
 tricky part now reads like this:
 
 High rate limit schedule lag values, that is lag values that are large
 compared to the actual transaction latency, indicate that something is
 amiss in the throttling process.  High schedule lag can highlight a
 subtle problem there even if the target rate limit is met in the end.

I have committed this along with slight modification. I changed
--rate rate to --rate=rate to follow option style of pgbench.

Also I have removed a space in --progress= sec in the doc, which is
probably mistakenly added by previous commit.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-22 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 Christophe just discovered something with include files which is going
 to cause issues with ALTER SYSTEM SET.

 So, take as a hypothetical that you use the default postgresql.conf
 file, which sets shared_buffers = 32MB.

 Instead of editing this file, you do ALTER SYSTEM SET shared_buffers =
 '1GB', which updates config/postgresql.auto.conf.

 Then you restart PostgreSQL.

 Everything is hunky-dory, until a later occasion where you *reload*
 postgresql.

Everything was already *not* hunky-dory as soon as you did that, since
a SIGHUP would have had the same problem.

I'd be inclined to think that ALTER SYSTEM SET should not be allowed to
modify any PGC_POSTMASTER parameters.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund

On 2013-07-23 00:01:50 +0200, Andres Freund wrote:
 On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
  Kevin Grittner kgri...@postgresql.org writes:
   Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
  
  The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
  is broken.
 
 Jagarundi still tells that story. At least something like the following
 patch seems to be required.

Hm. There seems to be more things that need some more improvement from a
quick look.

First, I have my doubts of the general modus operandy of CONCURRENTLY
here. I think in many, many cases it would be faster to simply swap in
the newly built heap + indexes in concurrently. I think I have seen that
mentioned in the review while mostly skipping the discussion, but still.
That would be easy enough as of Robert's mvcc patch.

* Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance be
  done a good bit earlier? We're executing queries before that.
* The loop over indexes in refresh_by_match_merge should 
index_open(ExclusiveLock) the
  indexes initially instead of searching the syscache manually. They are
  opened inside the loop in many cases anyway. There probably aren't any
  hazards currently, but I am not even sure about that. The index_open()
  in the loop at the very least processes the invalidation messages
  other backends send...
  I'd even suggest using BuildIndexInfo() or such on the indexes, then
  you could use -ii_Expressions et al instead of peeking into indkeys
  by hand.
* All not explicitly looked up operators should be qualified using
  OPERATOR(). It's easy to define your own = operator for tid and set
  the search_path to public,pg_catalog. Now, the whole thing is
  restricted to talbe owners afaics, making this decidedly less dicey,
  but still. But if anyobdy actually uses a search_path like the above
  you can easily hurt them.
* Same seems to hold true for the y = x and y.* IS DISTINCT FROM x.*
  operations.
* (y.*) = (x.*) also needs to do proper operator lookups.
* OpenMatViewIncrementalMaintenance() et al seem to be underdocumented.
* why is it even allowed that matview_maintenance_depth is  1? Unless I
  miss something there doesn't seem to be support for a recursive
  refresh whatever that would mean?
* INSERT and DELETE should also alias the table names, to make sure there cannot
  be issues with the nested queries.
* I'd strongly suggest more descriptive table aliases than x, y,
  d. Those make the statements rather hard to parse.
* SPI_exec() is deprecated in favor of SPI_execute()
* ISTM deferred uniqueness checks and similar things should be enforced
  to run ub refresh_by_match_merge()

I think this patch/feature needs a good bit of work...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List

2013-07-22 Thread Greg Smith


On 7/3/13 7:25 PM, Bruce Momjian wrote:

The extrapolation of Josh's approach is that committers
have to do work that the community wants to maintain their commit
rights, but their commit rights are helping the community, so why would
people care if you take them away --- you only hurt the community
further by doing so.


The main problem with having inactive committers (which I don't intend 
to include the important subject matter committers, who I'll get into at 
the end here) is that they skew the public information about who commits 
in a counterproductive way.  People visit 
https://wiki.postgresql.org/wiki/Committers , sees that list of names, 
and then make conclusions based on its content.  And some of those 
conclusions are wrong because the data is inconsistent.  The standards 
listed are applied when new committers are promoted, but they are not 
applied symmetrically to remove ones who don't anymore.


The #1 obstacle to my getting more time to work on core PostgreSQL is 
that companies presume regular submitters who are also non-committers 
don't do a very good job.  If you are competent and have a consistent 
track record of contributions to an open source project, the project 
would make you a committer, right?  Conversely, if you've been 
contributing for a while but aren't a committer, the most likely 
explanation is that your work quality is poor.  That is a completely 
reasonable viewpoint based on how most open source projects work.  The 
really terrible part is that it means the longer you've been submitting 
patches, the *less* competent you're assumed to be.  When I tell people 
I've been submitting things since 2007 but am not a committer, the only 
logical explanation is that my submissions must suck very hard, right?


From that perspective, people who are listed as committers but don't 
actively do work for the project are causing me a serious problem.  When 
someone who rarely commits can obviously qualify, that *proves* the bar 
for PostgreSQL committers is actually very low to casual observers. 
That's the message the project is inadvertently sending by leaving 
committers on there if they stop working actively.


The main thing I'd like to see is having the committer list, and its 
associated guidelines, updated to reflect that there are subject matter 
experts committing too.  That would pull them out of any what have you 
done for me lately? computations, and possibly open up a way to get 
more of them usefully.  Here are the first two obvious labels like that:


Michael Meskes (meskes) - embedded SQL
Teodor Sigaev (teodor) - inverted indexes

When even Josh Berkus doesn't even know all of this information, it's 
clearly way too obscure to expect the rest of the world to figure it out.


It also boggles my mind that there isn't already an entry like this on 
there too:


Thom Browne - documentation

Each time Thom passes through yet another correction patch that is 
committed with no change, I find it downright bizarre that a community 
with such limited committer resources wastes their time with that 
gatekeeping.  The standards for nominating committers seem based on 
whether they can commit just about anything.  I think it's more 
important to consider whether people are trusted to keep commits within 
their known area(s) of expertise.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX checking of index constraints

2013-07-22 Thread Noah Misch

On Sun, Jul 21, 2013 at 01:47:00PM -0700, Josh Berkus wrote:
 On 07/21/2013 11:30 AM, Josh Berkus wrote:
  Attached patch just restores the old behavior.  Would it be worth 
  preserving
  the ability to fix an index consistency problem with a REINDEX independent
  from related heap consistency problems such as duplicate keys?
  
  I would love to have two versions of REINDEX, one which validated and
  one which didn't.   Maybe a ( validate off ) type check?
 
 Cancel this.  I just did some tests, and there amount of time required
 for the validation (at least, in simple two-column table test) is  10%
 of the time required to reindex in general.  At that difference, we
 don't need two options.
 
 Unless you're asking if we want a command to check the index validity
 without rebuilding it?  That might be more valuable ...

I meant to ask whether, instead of reverting the accidental behavior change,
we should do something like leave the behavior and change the documentation
instead.  I personally vote no, but that alternative seemed credible enough
to justify mentioning it.  Something more radical, like a new UI, would be a
separate patch.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-22 Thread Andrew Gierth

Ok, since Atri posted our work-so-far and there's not been much
comment, I'll outline here my proposed plan of attack.

Rather than, as in the WIP patch, using the agg finalfn to validate
the split between normal args and ORDER BY args, I propose this:

Firstly, as in the WIP patch,

  func(a) within group (order by b)

is looked up as though it were func(a,b). The result must be marked as
an ordered set function. A new pg_aggregate integer column,
aggordnargs (?), must be equal to the number of normal args (i.e. (a)
in this case). Note that this may be 0; one can see a legitimate use
case for mode() within group (order by anyelement) for example.

The finalfn must be defined to have the same signature, so its args
are processed as if it were func_final(a,b) - but only a dummy arg is
passed for b. (Similar to the case for window functions.) Resolution
of polymorphic parameters and result types therefore works as normal.

For hypothetical set functions we add a special case, aggordnargs=-1,
for which both the aggregate and the finalfn must be defined as
(variadic any) and parse analysis detects this case and unifies the
types of the normal args with those of the ORDER BY args.

I propose this new syntax:

create aggregate func(argtypes...) within group (argtypes...) (
  [ STYPE = ... , ]
  [ SORTOP = ... , ]
  [ INITCOND = ... , ]
  FINALFUNC = func_final
);

Ordered set functions will typically not need STYPE etc., but
hypothetical set functions will be declared as, e.g.:

create aggregate rank(variadic any) within group (variadic any) (
  STYPE = boolean,
  INITCOND = 'f',
  SORTOP = ,
  FINALFUNC = rank_hypothetical_final
);

(I'm open to comment as to whether to simply overload the aggsortop
column in pg_aggregate or add a new one. I'm inclined to do the latter.)

The idea here is that a column of type STYPE will be appended to the
list of columns to be sorted, using SORTOP as sort operator, and all
input rows will have this column initialized to the INITCOND value.
This is to make it easy to implement rank() and friends by simply
inserting the hypothetical row into the sort, with a true value to
flag it, and finding its position in the sort result. (Better that
than comparing the hypothetical row against the whole group.)

(Security caveat: it will be necessary for the finalfn in such cases
to validate that the additional column exists with the right
type. Producing the wrong result is acceptable if the values in it are
unexpected; crashing is not.)

Any comment before we get back to coding?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Revive line type

2013-07-22 Thread Greg Smith


On 6/26/13 9:34 PM, Peter Eisentraut wrote:

Still wondering whether to use a A,B,C-based output
format per Tom's comment.


Wouldn't it also be helpful to remove The points used in the output are 
not necessarily the points used on input by making that not true?


There are three obvious ways you might output a line:

-Math class expectations of slope-intercept form:  y = mx + b. 
Intercept forms don't handle both horizontal and vertical lines though, 
so that's easily out.


-Pair of points.  A casual observer might get lucky and observe putting 
a pair of points in and getting the same pair back again, then 
incorrectly assume that's normally the case.  Seems better to never make 
that possible in the infinite line case.  I'm reminded of how row output 
usually is in order gives a bad impression about ordering guarantees in SQL.


-Ax+By+C=0

Outputting that third one, when it's also the internal form, handles any 
time of line; will show any assumptions about individual points being 
preserved are wrong; and avoids rounding errors too.  The only downside 
seems to be that bounded lines are easier to show with a pair of points. 
 I think that suggests an alternate, secondary output format would be 
useful though, rather than that a different default one is a good idea.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preventing tuple-table leakage in plpgsql

2013-07-22 Thread Noah Misch

On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  Reasonable enough.  Code that does use subtransactions will need to be more
  careful than before to manually free tuple tables in the non-error case.
  Failure to do so has been creating a leak that lasts until SPI_finish(), but
  it will now be able to cause a transaction-lifespan leak.
 
 Hmm ... good point.  The other plan I'd been considering was to add
 explicit tracking inside spi.c of all tuple tables created within the
 current procedure, and then have AtEOSubXact_SPI flush any that were
 created inside a failed subxact.  The tables would still be children of
 the procCxt and thus could not be leaked past SPI_finish.  When you
 suggested attaching to subtransaction contexts I thought that would let
 us get away without this additional bookkeeping logic, but maybe we
 should bite the bullet and add the extra logic.

Is there reason to believe we wouldn't eventually find a half dozen other
allocations calling for similar bespoke treatment?  Does something make tuple
tables special among memory allocations, or are they just the garden-variety
allocation that happens to bother the test case at hand?

 A change that's meant
 to remove leak risks really shouldn't be introducing other, new leak
 risks.

Yes; that gives one pause.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preventing tuple-table leakage in plpgsql

2013-07-22 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote:
 Hmm ... good point.  The other plan I'd been considering was to add
 explicit tracking inside spi.c of all tuple tables created within the
 current procedure, and then have AtEOSubXact_SPI flush any that were
 created inside a failed subxact.

 Is there reason to believe we wouldn't eventually find a half dozen other
 allocations calling for similar bespoke treatment?  Does something make tuple
 tables special among memory allocations, or are they just the garden-variety
 allocation that happens to bother the test case at hand?

It's hard to speculate about the memory management habits of third-party
SPI-using code.  But in plpgsql, the convention is that random bits of
memory should be allocated in a short-term context separate from the SPI
procCxt --- typically, the estate-eval_econtext expression context,
which exec_stmt_block already takes care to clean up when catching an
exception.  So the problem is that that doesn't work for tuple tables,
which have procCxt lifespan.  The fact that they tend to be big (at
least 8K apiece) compounds the issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-22 Thread Greg Smith

After pushes from a few people, the remaining submissions are now 
waiting for commit.  I updated each of those to have the latest info in 
the CF app, and tried to identify what committers have already looked at 
them.


Access to calls stack from GET DIAGNOSTICS statement:  ?

Add more regression tests mostly for ALTER OPERATOR FAMILY .. ADD/DROP: 
 ?  Robert Haas gave a quick review last week that's generated an 
update since.  But that looks it was just a quick scan for issues.


Remove unused targets from plan:  Alvaro?  (He reviewed it already)

SQL Command to edit postgresql.conf (ALTER SYSTEM patch):  Alvaro 
volunteered to look at this if no one else gets to it first.


UNNEST() (and other functions) WITH ORDINALITY:  Greg Stark

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeTblEntry.joinaliasvars representation change

2013-07-22 Thread Tom Lane

I wrote:
 After some reflection I think that the best fix is to redefine
 AcquireRewriteLocks' processing of dropped columns so that it puts an
 actual null pointer, not a consed-up Const, into the joinaliasvars list
 item.

Here's a complete patch along that line.  Possibly worthy of note is
that I chose to keep expandRTE's API the same as before, ie, it returns
a NULL Const for a dropped column (when include_dropped is true).  I had
intended to change it to return a null pointer to match the change in
the underlying data structure, but found that most of the callers
passing include_dropped = true need the Consts, because they're going to
construct a RowExpr that has to have null constants for the dropped
columns.

In HEAD, I'm a bit tempted to move strip_implicit_coercions into
nodes/nodeFuncs.c, so that we don't have the ugliness of the parser
calling an optimizer subroutine.  But I propose applying this to back
branches as-is.

regards, tom lane

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 7eaf8d27bf08bf5dd1776d203876adb8396c73b3..5f736ad6c4060ff4d7dabb3844a04185e01fa3ef 100644
*** a/src/backend/optimizer/util/var.c
--- b/src/backend/optimizer/util/var.c
*** flatten_join_alias_vars_mutator(Node *no
*** 654,660 
  newvar = (Node *) lfirst(lv);
  attnum++;
  /* Ignore dropped columns */
! if (IsA(newvar, Const))
  	continue;
  newvar = copyObject(newvar);
  
--- 654,660 
  newvar = (Node *) lfirst(lv);
  attnum++;
  /* Ignore dropped columns */
! if (newvar == NULL)
  	continue;
  newvar = copyObject(newvar);
  
*** flatten_join_alias_vars_mutator(Node *no
*** 687,692 
--- 687,693 
  		/* Expand join alias reference */
  		Assert(var-varattno  0);
  		newvar = (Node *) list_nth(rte-joinaliasvars, var-varattno - 1);
+ 		Assert(newvar != NULL);
  		newvar = copyObject(newvar);
  
  		/*
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index a9254c8c3a2e33b7c293ef51c53c78a797b1d4f1..42de89f510190877b1f6fa357efb08c81eb7acc9 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
***
*** 24,29 
--- 24,30 
  #include funcapi.h
  #include nodes/makefuncs.h
  #include nodes/nodeFuncs.h
+ #include optimizer/clauses.h
  #include parser/parsetree.h
  #include parser/parse_relation.h
  #include parser/parse_type.h
*** markRTEForSelectPriv(ParseState *pstate,
*** 749,762 
  			 * The aliasvar could be either a Var or a COALESCE expression,
  			 * but in the latter case we should already have marked the two
  			 * referent variables as being selected, due to their use in the
! 			 * JOIN clause.  So we need only be concerned with the simple Var
! 			 * case.
  			 */
  			Var		   *aliasvar;
  
  			Assert(col  0  col = list_length(rte-joinaliasvars));
  			aliasvar = (Var *) list_nth(rte-joinaliasvars, col - 1);
! 			if (IsA(aliasvar, Var))
  markVarForSelectPriv(pstate, aliasvar, NULL);
  		}
  	}
--- 750,764 
  			 * The aliasvar could be either a Var or a COALESCE expression,
  			 * but in the latter case we should already have marked the two
  			 * referent variables as being selected, due to their use in the
! 			 * JOIN clause.  So we need only be concerned with the Var case.
! 			 * But we do need to drill down through implicit coercions.
  			 */
  			Var		   *aliasvar;
  
  			Assert(col  0  col = list_length(rte-joinaliasvars));
  			aliasvar = (Var *) list_nth(rte-joinaliasvars, col - 1);
! 			aliasvar = (Var *) strip_implicit_coercions((Node *) aliasvar);
! 			if (aliasvar  IsA(aliasvar, Var))
  markVarForSelectPriv(pstate, aliasvar, NULL);
  		}
  	}
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 1841,1850 
  	 * deleted columns in the join; but we have to check since
  	 * this routine is also used by the rewriter, and joins
  	 * found in stored rules might have join columns for
! 	 * since-deleted columns.  This will be signaled by a NULL
! 	 * Const in the alias-vars list.
  	 */
! 	if (IsA(avar, Const))
  	{
  		if (include_dropped)
  		{
--- 1843,1852 
  	 * deleted columns in the join; but we have to check since
  	 * this routine is also used by the rewriter, and joins
  	 * found in stored rules might have join columns for
! 	 * since-deleted columns.  This will be signaled by a null
! 	 * pointer in the alias-vars list.
  	 */
! 	if (avar == NULL)
  	{
  		if (include_dropped)
  		{
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 1852,1859 
  *colnames = lappend(*colnames,
  	makeString(pstrdup()));
  			if (colvars)
  *colvars = lappend(*colvars,
!    copyObject(avar));
  		}
  		continue;
  	}
--- 1854,1869 
  *colnames =

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-22 Thread Tom Lane

Greg Smith g...@2ndquadrant.com writes:
 Remove unused targets from plan:  Alvaro?  (He reviewed it already)

Really I should do that one, but it seems like all my available cycles
have been going into bug fixing lately :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-22 Thread Greg Smith


On 7/22/13 4:52 AM, KONDO Mitsumasa wrote:

The writeback source code which I indicated part of writeback is almost
same as community kernel (2.6.32.61). I also read linux kernel 3.9.7,
but it is almost same this part.


The main source code difference comes from going back to the RedHat 5 
kernel, which means 2.6.18.  For many of these versions, you are right 
that it is only the tuning parameters that were changed in newer versions.


Optimizing performance for the old RHEL5 kernel isn't the most important 
thing, but it's helpful to know the things it does very badly.



My fsync patch is only sleep returned succece of fsync and maximum sleep
time is set to 10 seconds. It does not cause bad for this problem.


It's easy to have hundreds of relations that are getting fsync calls 
during a checkpoint.  If you have 100 relations getting a 10 second 
sleep each, you could potentially delay checkpoints by 17 minutes this 
way.  I regularly see systems where shared_buffers=8GB and there are 200 
to 400 relation segments that need a sync during a checkpoint.


This is the biggest problem with your submission.  Once you give up 
following the checkpoint schedule carefully, it is very easy to end up 
with large checkpoint deadline misses on production servers.  If someone 
thinks they are doing a checkpoint every 5 minutes, but your patch makes 
them take 20 minutes instead, that is bad.  They will not expect that a 
crash might have to replay that much activity before the server is 
useful again.



You also don't seem afraid of how exceeding the
checkpoint timeout is a very bad thing yet.

I think it is important that why this problem was caused. We should try
to find the cause of which program has bug or problem.


The checkpointer process is the problem.  There's no filesystem bug or 
complicated issues involved in many of the bad cases.  Here is a simple 
example that shows how the toughest problem cases happen:


-64GB of RAM
-10% dirty_background_ratio = 6GB of dirty writes = 6144MB
-2MB/s random I/O when concurrent reads are heavy
-3027 seconds to clear the cache = 51 minutes

That's how you get to an example like the one in my slides:

LOG: checkpoint complete: wrote 33282 buers (3.2%); 0 transaction log 
file(s) added, 60 removed, 129 recycled; write=228.848 s, sync=4628.879 
s, total=4858.859 s


It's very hard to do better on these, and I don't expect any change to 
help this a lot.  But I don't want to see a change committed that makes 
this sort of checkpoint 17 minutes longer if there's 100 relations 
involved either.



My patch not only improvement of throughput but also
realize stable response time at fsync phase in checkpoint.


The main reason your patch improves latency and throughput is that it 
makes checkpoints farther apart.  That's why I drew you a graph showing 
how the time between checkpoints lined up perfectly with TPS.  If it was 
only a small problem it would be worth considering, but I think it's 
likely to end up with these 15 minute I've outlined here instead.



And I servey about ext3 file system.


I wouldn't worry too much about the problems ext3 has.  Like the old 
RHEL5 kernel I was commenting about above, there are a lot of ext3 
systems out there.  But we can't do a lot about getting good performance 
from them.  It's only important to test that you're not making them a 
lot worse with a change.



My system block size is 4096, but
8192 or more seems to better. It will decrease number of inode and get
more large sequential disk fields.


I normally increase read-ahead on Linux systems to get faster speed on 
sequential disk throughput.  Changing the block size might work better 
in some cases, but not many people are willing to do that.  Read-ahead 
is very easy to change at any time.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-22 Thread Alvaro Herrera

Tom Lane wrote:
 Greg Smith g...@2ndquadrant.com writes:
  Remove unused targets from plan:  Alvaro?  (He reviewed it already)
 
 Really I should do that one, but it seems like all my available cycles
 have been going into bug fixing lately :-(

Yeah, I only did a first pass over that patch and was expecting you
would pick it up.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto explain target tables

2013-07-22 Thread Craig Ringer

On 07/21/2013 10:42 PM, Миша Тюрин wrote:
 
 hi, list, again. the next proposal into auto explain. one would be happy if 
 could set list of target tables and indexes. sometimes it is very hard to 
 detect who is using your indexes. but turn total logging on under thousands 
 transactions per seconds is not seems like nice idea couse size of resulting 
 log files (cpu utilization might not be so critical)

That sounds like a good idea - and since auto_explain is a contrib
module, it could potentially be a way to get into PostgreSQL
development. The code is in contrib/auto_explain/auto_explain.c .

The challenge here is that planned queries don't just touch one table or
index. You'd need to walk the query plan (a graph of Node structures) to
determine which table(s) and index(es) are touched by the query there's
something in the sources for it already (I haven't checked).

You'll also need a way to supply the list of tables/indexes you are
interested in to the extension. The simplest way to start with is likely
to be reading a separate file from the data dir that contains one
relation name per line. Integrating it into the postgresql.conf GUC
machinery is harder; you'll need a way to parse a list of tables from a
GUC (maybe you can re-use the search_path code for this?) or some other
way to handle your need for a multi-valued setting.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX checking of index constraints

2013-07-22 Thread Alvaro Herrera

Noah Misch wrote:

 I meant to ask whether, instead of reverting the accidental behavior change,
 we should do something like leave the behavior and change the documentation
 instead.  I personally vote no, but that alternative seemed credible enough
 to justify mentioning it.  Something more radical, like a new UI, would be a
 separate patch.

separate patch++.  I agree some use cases probably justify new UI.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto explain target tables

2013-07-22 Thread Craig Ringer

On 07/23/2013 11:24 AM, Craig Ringer wrote:
 On 07/21/2013 10:42 PM, Миша Тюрин wrote:

 hi, list, again. the next proposal into auto explain. one would be happy if 
 could set list of target tables and indexes. sometimes it is very hard to 
 detect who is using your indexes. but turn total logging on under thousands 
 transactions per seconds is not seems like nice idea couse size of resulting 
 log files (cpu utilization might not be so critical)
 
 That sounds like a good idea - and since auto_explain is a contrib
 module, it could potentially be a way to get into PostgreSQL
 development. The code is in contrib/auto_explain/auto_explain.c .
 
 The challenge here is that planned queries don't just touch one table or
 index. You'd need to walk the query plan (a graph of Node structures) to
 determine which table(s) and index(es) are touched by the query there's
 something in the sources for it already (I haven't checked).

Showing that I'm still very much learning this area myself, a bit more
looking around found:

http://www.postgresql.org/docs/current/static/querytree.html

which makes it clear that the range table for the query will contain
what you want. I suspect you'll need to find CTEs and subqueries and
extract the relations they touch; I haven't yet checked to see whether
they're aggregated into the top level range table, but suspect not.

You'll find the code in src/backend/nodes/print.c informative and
useful, though not directly reusable.

Reading src/include/nodes/primnodes.h would be a good idea; it shows how
subqueries, etc, are represented.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Design proposal: fsync absorb linear slider

2013-07-22 Thread Greg Smith

Recently I've been dismissing a lot of suggested changes to checkpoint 
fsync timing without suggesting an alternative.  I have a simple one in 
mind that captures the biggest problem I see:  that the number of 
backend and checkpoint writes to a file are not connected at all.


We know that a 1GB relation segment can take a really long time to write 
out.  That could include up to 128 changed 8K pages, and we allow all of 
them to get dirty before any are forced to disk with fsync.


Rather than second guess the I/O scheduling, I'd like to take this on 
directly by recognizing that the size of the problem is proportional to 
the number of writes to a segment.  If you turned off fsync absorption 
altogether, you'd be at an extreme that allows only 1 write before 
fsync.  That's low latency for each write, but terrible throughput.  The 
maximum throughput case of 128 writes has the terrible latency we get 
reports about.  But what if that trade-off was just a straight, linear 
slider going from 1 to 128?  Just move it to the latency vs. throughput 
position you want, and see how that works out.


The implementation I had in mind was this one:

-Add an absorption_count to the fsync queue.

-Add a new latency vs. throughput GUC I'll call .  Its default value is 
-1 (or 0), which corresponds to ignoring this new behavior.


-Whenever the background write absorbs a fsync call for a relation 
that's already in the queue, increment the absorption count.


-max_segment_absorb  0, have the background writer scan for relations 
where absorption_count  max_segment_absorb.  When it finds one, call 
fsync on that segment.


Note that it's possible for this simple scheme to be fooled when writes 
are actually touching a small number of pages.  A process that 
constantly overwrites the same page is the worst case here.  Overwrite 
it 128 times, and this method would assume you've dirtied every page, 
while only 1 will actually go to disk when you call fsync.  It's 
possible to track this better.  The count mechanism could be replaced 
with a bitmap of the 128 blocks, so that absorbs set a bit instead of 
incrementing a count.  My gut feel is that this is more complexity than 
is really necessary here.  If in fact the fsync is slimmer than 
expected, paying for it too much isn't the worst problem to have here.


I'd like to build this myself, but if someone else wants to take a shot 
at it I won't mind.  Just be aware the review is the big part here.  I 
should be honest about one thing: I have zero incentive to actually work 
on this.  The moderate amount of sponsorship money I've raised for 9.4 
so far isn't getting anywhere near this work.  The checkpoint patch 
review I have been doing recently is coming out of my weekend volunteer 
time.


And I can't get too excited about making this as my volunteer effort 
when I consider what the resulting credit will look like.  Coding is by 
far the smallest part of work like this, first behind coming up with the 
design in the first place.  And both of those are way, way behind how 
long review benchmarking takes on something like this.  The way credit 
is distributed for this sort of feature puts coding first, design not 
credited at all, and maybe you'll see some small review credit for 
benchmarks.  That's completely backwards from the actual work ratio.  If 
all I'm getting out of something is credit, I'd at least like it to be 
an appropriate amount of it.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto explain target tables

2013-07-22 Thread Craig Ringer

On 07/23/2013 11:41 AM, Craig Ringer wrote:
 On 07/23/2013 11:24 AM, Craig Ringer wrote:
 On 07/21/2013 10:42 PM, Миша Тюрин wrote:

 hi, list, again. the next proposal into auto explain. one would be happy if 
 could set list of target tables and indexes. sometimes it is very hard to 
 detect who is using your indexes. but turn total logging on under thousands 
 transactions per seconds is not seems like nice idea couse size of 
 resulting log files (cpu utilization might not be so critical)

 That sounds like a good idea - and since auto_explain is a contrib
 module, it could potentially be a way to get into PostgreSQL
 development. The code is in contrib/auto_explain/auto_explain.c .

 The challenge here is that planned queries don't just touch one table or
 index. You'd need to walk the query plan (a graph of Node structures) to
 determine which table(s) and index(es) are touched by the query there's
 something in the sources for it already (I haven't checked).
 
 Showing that I'm still very much learning this area myself, a bit more
 looking around found:
 
 http://www.postgresql.org/docs/current/static/querytree.html
 
 which makes it clear that the range table for the query will contain
 what you want. I suspect you'll need to find CTEs and subqueries and
 extract the relations they touch; I haven't yet checked to see whether
 they're aggregated into the top level range table, but suspect not.

Of course, having said that I was then curious enough to go digging.
It's trivial to dump the parse tree or query plan tree, as documented in
the above linked documentation, and doing so clearly shows that
subqueries and CTEs have their own range tables.

So you'll have to walk the node tree looking for range tables and check
to see whether any of the range table entries match one of the tables
you're looking for.

It's possible that this is a completely backwards approach; if so,
hopefully one of the more experienced people here will correct me with
the simple and obvious way to do it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - psql - show longest tables

2013-07-22 Thread Pavel Stehule

 I agree with Robert. My tip is this: when you're in a hole, the first thing
 to do is to stop digging.

 I don't think that Pavel believes himself to be in a hole.

 After setting up my .psqlrc file as I normally do, I could do this:

 :rtsize limit 10;

 But it doesn't have the 'MB' feature, and if I want to help someone
 else I first have to explain to them how to set their .psqlrc file the
 same as mine, which is rather a bummer.

 Is looking for the biggest tables a common enough thing that it should
 be available to everyone, without needing custom customization?

I am thinking so our psql interface is not complete without this
feature in this are. But, sure, it is my opinion only.

Everybody know command \dt and then should to learn only one char more
s or r.

In this time, we use (in GoodData) patch, that change order for \dt+
from alphabet to size ordered. But it is too limited.

Regards

Pavel



 Cheers,

 Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-22 Thread Greg Smith


On 7/22/13 10:48 PM, Tom Lane wrote:

Greg Smith g...@2ndquadrant.com writes:

Remove unused targets from plan:  Alvaro?  (He reviewed it already)


Really I should do that one, but it seems like all my available cycles
have been going into bug fixing lately :-(


I just put you down as the committer on it, given Alvaro waved that way too.

A further look at recently ALTER OPERATOR FAMILY changes shows that 
Robert has been working on those quite a bit.  I'm putting him down as 
the committer on this last one.


If the CF drags on a bit due to committer time being spent on bugs, 
given the date here I think that's OK.  I would put next week as a loose 
target for finishing these off.  I worry a lot less about committers 
keeping to their schedules than when there are 50 regular contributors 
still moving around.


That leaves only 1 patch where I have no idea who will commit it:

  access to calls stack from GET DIAGNOSTICS statement

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 112 matches

Mail list logo