date:20130323

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-23 Thread Michael Paquier

On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund wrote:

> On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
> > Is someone planning to provide additional feedback about this patch at
> some
> > point?
>
> Yes, now that I have returned from my holidays - or well, am returning
> from them, I do plan to. But it should probably get some implementation
> level review from somebody but Fujii and me...
>
Yeah, it would be good to have an extra pair of fresh eyes looking at those
patches.
Thanks,
-- 
Michael

[HACKERS] Re: [GENERAL] timeofday() and clock_timestamp() produce different results when casting to timestamptz

2013-03-23 Thread Adrian Klaver


On 03/23/2013 08:16 AM, Tom Lane wrote:

Adrian Klaver  writes:

Seems the time zone info still thinks Moscow time is being setting
forward an hour for DST when in fact the time remains constant through
the year.


I think the discrepancy is between this in timezone/data/europe:

Zone Europe/Moscow   2:30:20 -  LMT 1880
 2:30   -   MMT 1916 Jul  3 # Moscow Mean Time
 2:30:48 Russia %s  1919 Jul  1 2:00
 3:00   Russia  MSK/MSD 1922 Oct
 2:00   -   EET 1930 Jun 21
 3:00   Russia  MSK/MSD 1991 Mar 31 2:00s
 2:00   Russia  EE%sT   1992 Jan 19 2:00s
 3:00   Russia  MSK/MSD 2011 Mar 27 2:00s
 4:00   -   MSK

and this in timezone/tznames/Default:

MSD 14400 D  # Moscow Daylight Time
  # (Europe/Moscow)
MSK 10800# Moscow Time
  # (Europe/Moscow)

We really need to figure out a way to update the tznames data
automatically, or at least notice when it's become inconsistent with
the underlying Olson database.


So I temporary fix would be to go into /share/timezonesets/Default

and change :

MSK 10800# Moscow Time

to

MSK 14400# Moscow Time

and then you get:

test=> set time zone 'Europe/Moscow';
SET
test=> select timeofday(), clock_timestamp();
  timeofday  |clock_timestamp
-+---
 Sun Mar 24 03:50:45.066537 2013 MSK | 2013-03-24 03:50:45.066582+04
(1 row)

test=> select timeofday()::timestamptz, clock_timestamp()::timestamptz;
   timeofday   |clock_timestamp
---+---
 2013-03-24 03:50:52.485092+04 | 2013-03-24 03:50:52.485188+04
(1 row)





regards, tom lane





--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] avoid buffer underflow in errfinish()

2013-03-23 Thread Xi Wang

A side question: at src/backend/storage/lmgr/proc.c:1150, is there a
null pointer deference for `autovac'?

There is a null pointer check `autovac != NULL', but the pointer is
already dereferenced earlier when initializing `autovac_pgxact'.  Is
this null pointer check redundant, or should we move the dereference
`autovac->pgprocno' after the check?  Thanks.

On Sat, Mar 23, 2013 at 6:38 PM, Xi Wang  wrote:
> CHECK_STACK_DEPTH checks if errordata_stack_depth is negative.
> Move the dereference of &errordata[errordata_stack_depth] after
> the check to avoid out-of-bounds read.
> ---
>  src/backend/utils/error/elog.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
> index 3a211bf..47a0a8b 100644
> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -393,13 +393,15 @@ void
>  errfinish(int dummy,...)
>  {
> ErrorData  *edata = &errordata[errordata_stack_depth];
> -   int elevel = edata->elevel;
> +   int elevel;
> MemoryContext oldcontext;
> ErrorContextCallback *econtext;
>
> recursion_depth++;
> CHECK_STACK_DEPTH();
>
> +   elevel = edata->elevel;
> +
> /*
>  * Do processing in ErrorContext, which we hope has enough reserved 
> space
>  * to report an error.
> --
> 1.7.10.4
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] avoid buffer underflow in errfinish()

2013-03-23 Thread Xi Wang

CHECK_STACK_DEPTH checks if errordata_stack_depth is negative.
Move the dereference of &errordata[errordata_stack_depth] after
the check to avoid out-of-bounds read.
---
 src/backend/utils/error/elog.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3a211bf..47a0a8b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -393,13 +393,15 @@ void
 errfinish(int dummy,...)
 {
ErrorData  *edata = &errordata[errordata_stack_depth];
-   int elevel = edata->elevel;
+   int elevel;
MemoryContext oldcontext;
ErrorContextCallback *econtext;
 
recursion_depth++;
CHECK_STACK_DEPTH();
 
+   elevel = edata->elevel;
+
/*
 * Do processing in ErrorContext, which we hope has enough reserved 
space
 * to report an error.
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-03-23 Thread Nicholas White

Thanks - I've added it here:
https://commitfest.postgresql.org/action/patch_view?id=1096 .

I've also attached a revised version that makes IGNORE and RESPECT
UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).

Nick


On 23 March 2013 14:34, Tom Lane  wrote:

> Nicholas White  writes:
> > The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
> > lag, [...]. This is not implemented in PostgreSQL
> > (http://www.postgresql.org/docs/devel/static/functions-window.html)
> > I've had a go at implementing this, and I've attached the resulting
> patch.
> > It's not finished yet, but I was hoping to find out if my solution is
> along
> > the right lines.
>
> Since we're trying to get 9.3 to closure, this patch probably isn't
> going to get much attention until the 9.4 development cycle starts
> (in a couple of months, likely).  In the meantime, please add it to
> the next commitfest list so we remember to come back to it:
> https://commitfest.postgresql.org/action/commitfest_view?id=18
>
> One comment just from a quick eyeball look is that we really hate
> adding new keywords that aren't UNRESERVED, because that risks
> breaking existing applications.  Please see if you can refactor the
> grammar to make those new entries unreserved.
>
> regards, tom lane
>


lead-lag-ignore-nulls.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Jeff Janes

On Fri, Mar 22, 2013 at 4:06 AM, Atri Sharma  wrote:

>
>

> Not yet, I figured this might be a problem and am designing test cases
> for the same. I would be glad for some help there please.
>

Perhaps this isn't the help you were looking for, but I spent a long time
looking into this a few years ago.  Then I stopped and decided to work on
other things.  I would recommend you do so too.

If I have to struggle to come up with an artificial test case that shows
that there is a problem, then why should I believe that there actually is a
problem?  If you take a well-known problem (like, say, bad performance at
shared_buffers > 8GB (or even lower, on Windows)) and create an artificial
test case to exercise and investigate that, that is one thing.  But why
invent pathological test cases with no known correspondence to reality?
 There are plenty of real problems to work on, and some of them are just as
intellectually interesting as the artificial problems are.

My conclusions were:

1) If everything fits in shared_buffers, then the replacement policy
doesn't matter.

2) If shared_buffers is much smaller than RAM (the most common case, I
believe), then what mostly matters is your OS's replacement policy, not
pgsql's.  Not much a pgsql hacker can do about this, other than turn into a
kernel hacker.

3) If little of the highly-used data fits in RAM. then any non-absurd
replacement policy is about as good as any other non-absurd one.

4) If most, but not quite all, of the highly-used data fits shared_buffers
and shared_buffers takes most of RAM (or at least, most of RAM not already
needed for other things like work_mem and executables), then the
replacement policy matters a lot.  But different policies suit different
work-loads, and there is little reason to think we can come up with a way
to choose between them.  (Also, in these conditions, performance is very
chaotic.  You can run the same algorithm for a long time, and it can
suddenly switch from good to bad or the other way around, and then stay in
that new mode for a long time).  Also, even if you come up with a good
algorithm, if you make the data set 20% smaller or 20% larger, it is no
longer a good algorithm.

5) Having buffers enter with usage_count=0 rather than 1 would probably be
slightly better most of the time under conditions described in 4, but there
is no way get enough evidence of this over enough conditions to justify
making a change.  And besides, how often do people run with shared_buffers
being most of RAM, and the hot data just barely not fitting in it?

If you want some known problems that are in this general area, we have:

1) If all data fits in RAM but not shared_buffers, and you have a very
large number of CPUs and a read-only or read-mostly workload,
then BufFreelistLock can be a major bottle neck.  (But, on a Amazon
high-CPU instance, I did not see this very much.  I suspect the degree of
problem depends a lot on whether you have a lot of sockets with a few CPUs
each, versus one chip with many CPUs).  This is very easy to come up with
model cases for, pgbench -S -c8 -j8, for example, can often show it.

2) A major reason that people run with shared_buffers much lower than RAM
is that performance seems to suffer with shared_buffers > 8GB under
write-heavy workloads, even with spread-out checkpoints.  This is
frequently reported as a real world problem, but as far as I know has never
been reduced to a simple reproducible test case. (Although there was a
recent thread, maybe "High CPU usage / load average after upgrading to
Ubuntu 12.04", that I thought might be relevant to this.  I haven't had the
time to seriously study the thread, or the hardware to investigate it
myself)

Cheers,

Jeff

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Tom Lane

Jeff Janes  writes:
> I'm more convinced in the other direction, new pages should enter with 0
> rather than with 1.  I think that the argument that a new buffer needs to
> be given more of an opportunity to get used again is mostly bogus.

IIRC, the argument for starting at 1 not 0 is that otherwise a new page
might have an infinitesmally small lifespan, if the clock sweep should
reach it just after it gets entered into the buffers.  By starting at
1, the uncertainty in a new page's lifespan runs from 1 to 2 sweep times
not 0 to 1 sweep time.

I think though that this argument only holds water if the buffer didn't
get found via the clock sweep to start with --- otherwise, it ought to
have just about one clock sweep of time before the sweep comes back to
it.  It does apply to buffers coming off the freelist, though.

Thus, if we were to get rid of the freelist then maybe we could change
the starting usage_count ... but whether that's a good idea in itself
is pretty uncertain.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Merlin Moncure

On Fri, Mar 22, 2013 at 7:27 PM, Ants Aasma  wrote:
> On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure  wrote:
>> well if you do a non-locking test first you could at least avoid some
>> cases (and, if you get the answer wrong, so what?) by jumping to the
>> next buffer immediately.  if the non locking test comes good, only
>> then do you do a hardware TAS.
>>
>> you could in fact go further and dispense with all locking in front of
>> usage_count, on the premise that it's only advisory and not a real
>> refcount.  so you only then lock if/when it's time to select a
>> candidate buffer, and only then when you did a non locking test first.
>>  this would of course require some amusing adjustments to various
>> logical checks (usage_count <= 0, heh).
>
> Moreover, if the buffer happens to miss a decrement due to a data
> race, there's a good chance that the buffer is heavily used and
> wouldn't need to be evicted soon anyway. (if you arrange it to be a
> read-test-inc/dec-store operation then you will never go out of
> bounds)

yeah. There's something to be said to have an upper bound in the
length of time to get a page out (except in the special case when most
of them are pinned).  Right now, any page contention on a buffer
header for any reason can shut down buffer allocation, and that's just
not good.  It's obviously not very likely to happen but I think it can
does does happen.  The more I think about it the more I think's a bad
idea to spin during buffer allocation for any reason, ever.

> However, clocksweep and usage_count maintenance is not what is
> causing contention because that workload is distributed. The issue is
> pinning and unpinning. There we need an accurate count and there are
> some pages like index roots that get hit very heavily. Things to do
> there would be in my opinion convert to a futex based spinlock so when
> there is contention it doesn't completely kill performance and then
> try to get rid of the contention. Converting to lock-free pinning
> won't help much here as what is killing us here is the cacheline
> bouncing.

Yup -- futexes are another way to go.  They are linux specific though.

> One way to get rid of contention is the buffer nailing idea that
> Robert came up with. If some buffer gets so hot that maintaining
> refcount on the buffer header leads to contention, promote that buffer
> to a nailed status, let everyone keep their pin counts locally and
> sometime later revisit the nailing decision and if necessary convert
> pins back to the buffer header.

Yeah this is a more general (albeit more complicated) solution and
would likely be fantastic.  Is it safe to assume that refcounting is
the only likely cause of contention?

> One other interesting idea I have seen is closeable scalable nonzero
> indication (C-SNZI) from scalable rw-locks [1]. The idea there is to
> use a tree structure to dynamically stripe access to the shared lock
> counter when contention is detected. Downside is that considerable
> amount of shared memory is needed so there needs to be some way to
> limit the resource usage. This is actually somewhat isomorphic to the
> nailing idea.
>
> The issue with the current buffer management algorithm is that it
> seems to scale badly with increasing shared_buffers. I think the
> improvements should concentrate on finding out what is the problem
> there and figuring out how to fix it. A simple idea to test would be
> to just partition shared buffers along with the whole clock sweep
> machinery into smaller ones, like the buffer mapping hash tables
> already are. This should at the very least reduce contention for the
> clock sweep even if it doesn't reduce work done per page miss.
>
> [1] http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf

I'll have to take a look.  Removing *all spinning* from from page
allocation though feels like it might be worthwhile to test (got to
give some bonus points for being a very local change and simple to
implement).  I wonder if with more shared buffers you tend to sweep
more buffers per allocation.  (IIRC Jeff J was skeptical of that).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Jeff Janes

On Thu, Mar 21, 2013 at 9:51 PM, Atri Sharma  wrote:

> Hello all,
>
> Sorry if this is a naive question.
>
> I was going through Greg Smith's slides on buffer
> cache(
> http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf).
> When going through the page replacement algorithm that we use i.e.
> clocksweep algorithm, I felt a potential problem in our current
> system.
>
> Specifically, when a new entry is allocated in the buffer, it's
> USAGE_COUNT is set to 1. On each sweep of the algorithm, the
> USAGE_COUNT is decremented and an entry whose  USAGE_COUNT becomes
> zero is replaced.
>

It is replaced when the usage_count is already found to be zero, not when
it is made zero.

> I feel that this could lead to a bias towards replacement of
> relatively younger pages in the  cache over older pages. An entry
> which has just entered the cache with USAGE_COUNT=1 could be replaced
> soon, but it may be needed frequently in the near future,

Well, it may be needed.  But then again, it may not be needed.  And that
old page, that also may be needed frequently in the future (which is far
more likely than a new page--after all an old page likely got old for a
reason).  The best evidence that it will be needed again is that it
actually has been needed again.

I'm more convinced in the other direction, new pages should enter with 0
rather than with 1.  I think that the argument that a new buffer needs to
be given more of an opportunity to get used again is mostly bogus.  You
cannot bully the shared_buffers into being larger than it is.  If all the
incoming buffers get "more opportunity", that just means the buffer-clock
ticks twice as fast, and really none of them has more opportunity when you
measure that opportunity against an outside standard (wall time, or
work-load accomplished).  All of our children cannot be above average.

Cheers,

Jeff

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Atri Sharma

>
>
> Partitioned clock sweep strikes me as a bad idea... you could certainly get
> unlucky and end up with a lot of hot stuff in one partition.
>
> Another idea that'sbeen broughht up inthe past is to have something in the
> background keep a minimum number of buffers on the free list. That's how OS
> VM systems I'm familiar with work, so there's precedent for it.
>
> I recall there were at least some theoretical concerns about this, but I
> don't remember if anyone actually tested the idea.

 One way to handle this could be to have dynamic membership of pages
in the partitions. Based on activity for a page, it could be moved to
another partition. In this manner, we *could* distribute the hot and
not so hot buffer pages and hence it could help.

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-03-23 Thread Tom Lane

Nicholas White  writes:
> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
> lag, [...]. This is not implemented in PostgreSQL
> (http://www.postgresql.org/docs/devel/static/functions-window.html)
> I've had a go at implementing this, and I've attached the resulting patch.
> It's not finished yet, but I was hoping to find out if my solution is along
> the right lines.

Since we're trying to get 9.3 to closure, this patch probably isn't
going to get much attention until the 9.4 development cycle starts
(in a couple of months, likely).  In the meantime, please add it to
the next commitfest list so we remember to come back to it:
https://commitfest.postgresql.org/action/commitfest_view?id=18

One comment just from a quick eyeball look is that we really hate
adding new keywords that aren't UNRESERVED, because that risks
breaking existing applications.  Please see if you can refactor the
grammar to make those new entries unreserved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SDP query optimizer

2013-03-23 Thread Adriano Lange


On 23-03-2013 10:15, Andres Freund wrote:

I just want to mention that unless you skew the statistics for the individual
tables from their empty/default state this mostly measures a pretty degenerate
case where optima are very rare and not very differentiated. Thats a useful
thing to test, but not to have as the target to optimize for.
So it might be interesting to run that thing with some table
stats/contents stats set up.



Yes, the search space obtained from this experiment may be very simpler 
than a real case. Beyond this experiment, I can construct classical 
queries used to evaluate this kind of algorithm, as stars, cliques, 
chains and cycles. Beyond these queries I have no idea how can I further 
test it.


Regards,

Adriano Lange


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-03-23 Thread Nicholas White

> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, [...]. This is not implemented in PostgreSQL
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resulting patch.
It's not finished yet, but I was hoping to find out if my solution is along
the right lines.

In particular, I'm storing the ignore-nulls flag in the frameOptions of a
window function definition, and am adding a function to the windowapi.h to
get at these options. I'm keeping the last non-null value in
WinGetPartitionLocalMemory (which I hope is the right place), but I'm not
using any of the *GetDatum macros to access it.

An example of my change's behaviour:

nwhite=# select *, lag(num,0) ignore nulls over (order by generate_series)
from
nwhite-# (select generate_series from generate_series(0,10)) s
nwhite-# left outer join
nwhite-# numbers n
nwhite-# on (s.generate_series = n.num);
 generate_series | num | lag
-+-+-
   0 | |
   1 |   1 |   1
   2 | |   1
   3 | |   1
   4 |   4 |   4
   5 |   5 |   5
   6 | |   5
   7 | |   5
   8 | |   5
   9 |   9 |   9
  10 | |   9
(11 rows)

I'd find this feature really useful, so I hope you can help me get my patch
to a contributable state.

Thanks -

Nick


lead-lag-ignore-nulls.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] timeofday() and clock_timestamp() produce different results when casting to timestamptz

2013-03-23 Thread Tom Lane

Adrian Klaver  writes:
> Seems the time zone info still thinks Moscow time is being setting 
> forward an hour for DST when in fact the time remains constant through 
> the year.

I think the discrepancy is between this in timezone/data/europe:

Zone Europe/Moscow   2:30:20 -  LMT 1880
 2:30   -   MMT 1916 Jul  3 # Moscow Mean Time
 2:30:48 Russia %s  1919 Jul  1 2:00
 3:00   Russia  MSK/MSD 1922 Oct
 2:00   -   EET 1930 Jun 21
 3:00   Russia  MSK/MSD 1991 Mar 31 2:00s
 2:00   Russia  EE%sT   1992 Jan 19 2:00s
 3:00   Russia  MSK/MSD 2011 Mar 27 2:00s
 4:00   -   MSK

and this in timezone/tznames/Default:

MSD 14400 D  # Moscow Daylight Time
 # (Europe/Moscow)
MSK 10800# Moscow Time
 # (Europe/Moscow)

We really need to figure out a way to update the tznames data
automatically, or at least notice when it's become inconsistent with
the underlying Olson database.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Andres Freund

On 2013-03-23 15:36:03 +0200, Ants Aasma wrote:
> On Sat, Mar 23, 2013 at 3:10 PM, Andres Freund  wrote:
> >> Andres showed that switching out the existing CRC for zlib's would
> >> result in 8-30% increase in INSERT-SELECT speed
> >> (http://www.postgresql.org/message-id/201005202227.49990.and...@anarazel.de)
> >> with the speeded up CRC still showing up as 10% of the profile. So I
> >> guess another 5% speedup by doing the CRC 8 bytes at a time instead of
> >> the used 4. And another couple % by using Fletcher or SIMD.
> >
> > I am not sure the considerations for WAL are the same as for page checksums 
> > -
> > the current WAL code only computes the CRCs in rather small chunks, so very
> > pipelineable algorithms/implementations don't necessarly show the same 
> > benefit
> > for WAL as they do for page checksums...
> 
> Sure, but I think that WAL checksums are not a big overhead in that case 
> anyway.

I have seen profiles that indicate rather the contrary... Even in the optimal
case of no FPWs a single heap_insert() results in the CRC computed in 5 steps
or so. 4 of them over potentially noncontiguous pointer addressed memory.
If you add an index or two where the situation is the same the slowdown is not
all that surprising.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump/restore syntax checking bug?

2013-03-23 Thread Joshua D. Drake



On 03/22/2013 10:13 PM, Josh Kupershmidt wrote:


On Fri, Mar 22, 2013 at 9:35 PM, Joshua D. Drake  wrote:


postgres@jd-laptop:~$ pg_restore -d test -P 'by(),hello()' foo.sqlc


Note, the pg_restore doc makes no mention of trying to squeeze
multiple function prototypes in a single argument you've done here, or
of using multiple -P flags.


It appears we need better syntax checking.


Can't really argue with this. But if you think these pg_restore
examples are bad, try this gem:
   reindexdb --table='foo; ALTER ROLE limited WITH superuser'


That is HORRIBLE! Looks like our base utilities need some attention.

jD



Josh






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Ants Aasma

On Sat, Mar 23, 2013 at 3:10 PM, Andres Freund  wrote:
>> Andres showed that switching out the existing CRC for zlib's would
>> result in 8-30% increase in INSERT-SELECT speed
>> (http://www.postgresql.org/message-id/201005202227.49990.and...@anarazel.de)
>> with the speeded up CRC still showing up as 10% of the profile. So I
>> guess another 5% speedup by doing the CRC 8 bytes at a time instead of
>> the used 4. And another couple % by using Fletcher or SIMD.
>
> I am not sure the considerations for WAL are the same as for page checksums -
> the current WAL code only computes the CRCs in rather small chunks, so very
> pipelineable algorithms/implementations don't necessarly show the same benefit
> for WAL as they do for page checksums...

Sure, but I think that WAL checksums are not a big overhead in that case anyway.

I should point out that getting the SIMD algorithm to not be a loss
for small variable sized workloads will take considerable amount of
effort and code. Whereas it's quite easy for pipelined CRC32 and
Fletcher (or should I say Adler as we want to use mod 65521).

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-23 Thread Andres Freund

On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
> Is someone planning to provide additional feedback about this patch at some
> point?

Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SDP query optimizer

2013-03-23 Thread Andres Freund

On 2013-03-22 20:35:43 -0300, Adriano Lange wrote:
> Hi all,
> 
> I have developed a new query optimizer for PostgreSQL and I would like to
> share it with the community. The optimizer's name is Sampling and Dynamic
> Programming (SDP). I put it into a plugin developed some years ago, named
> LJQO:
> 
> https://github.com/alange0001/ljqo.git
> 
> This plugin was configured to compile only against PostgreSQL 9.2. However,
> I guess it may be easily adjusted for other versions of PostgreSQL.
> 
> I would be glad for any feedback about SDP or even about LJQO.
> 
> I have some numbers about the SDP in comparison with GEQO. If interested,
> see a diff between the two ".out2" files attached. The schema and query are
> from a previous email posted by Andres Freund in this list.

I just want to mention that unless you skew the statistics for the individual
tables from their empty/default state this mostly measures a pretty degenerate
case where optima are very rare and not very differentiated. Thats a useful
thing to test, but not to have as the target to optimize for.
So it might be interesting to run that thing with some table
stats/contents stats set up.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Andres Freund

> >> Results for pgbench scale 100:
> >> No checksums:   tps = 56623.819783
> >> Fletcher checksums: tps = 55282.222687 (1.024x slowdown)
> >> CRC Checksums:  tps = 50571.324795 (1.120x slowdown)
> >> SIMD Checksums: tps = 56608.888985 (1.000x slowdown)
> >>
> >> So to conclude, the 3 approaches:
> >
> > Great analysis. Still a tough choice.

+1

> > One thing that might be interesting is to look at doing SIMD for both
> > data and WAL. I wonder if that would be a noticeable speedup for WAL
> > full-page writes? That would give greater justification for the extra
> > work it will take (intrinsics/ASM), and it would be a nice win for
> > non-checksum users.
> 
> Andres showed that switching out the existing CRC for zlib's would
> result in 8-30% increase in INSERT-SELECT speed
> (http://www.postgresql.org/message-id/201005202227.49990.and...@anarazel.de)
> with the speeded up CRC still showing up as 10% of the profile. So I
> guess another 5% speedup by doing the CRC 8 bytes at a time instead of
> the used 4. And another couple % by using Fletcher or SIMD.

I am not sure the considerations for WAL are the same as for page checksums -
the current WAL code only computes the CRCs in rather small chunks, so very
pipelineable algorithms/implementations don't necessarly show the same benefit
for WAL as they do for page checksums...

And even if the checksumming were to be changed to compute the CRC in larger
chunks - a very sensible thing imo - it would still be relatively small sizes
in many workloads.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Ants Aasma

On Sat, Mar 23, 2013 at 5:14 AM, Craig Ringer  wrote:
> Making zero a "not checksummed" magic value would significantly detract
> from the utility of checksums IMO.

FWIW using 65521 modulus to compress larger checksums into 16 bits
will leave 14 non-zero values unused.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Ants Aasma

On Sat, Mar 23, 2013 at 6:04 AM, Jim Nasby  wrote:
> Partitioned clock sweep strikes me as a bad idea... you could certainly get
> unlucky and end up with a lot of hot stuff in one partition.

Surely that is not worse than having everything in a single partition.
Given a decent partitioning function it's very highly unlikely to have
more than a few of the hottest buffers end up in a single partition.

> Another idea that'sbeen broughht up inthe past is to have something in the
> background keep a minimum number of buffers on the free list. That's how OS
> VM systems I'm familiar with work, so there's precedent for it.
>
> I recall there were at least some theoretical concerns about this, but I
> don't remember if anyone actually tested the idea.

Yes, having bgwriter do the actual cleaning up seems like a good idea.
The whole bgwriter infrastructure will need some serious tuning. There
are many things that could be shifted to background if we knew it
could keep up, like hint bit setting on dirty buffers being flushed
out. But again, we have the issue of having good tests to see where
the changes hurt.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Ants Aasma

On Sat, Mar 23, 2013 at 6:29 AM, Atri Sharma  wrote:
> One way to distribute memory contention in case of spinlocks could be
> to utilize the fundamentals of NUMA architecture. Specifically, we can
> let the contending backends spin on local flags instead on the buffer
> header flags directly. As access to local cache lines is much cheaper
> and faster than memory locations which are far away in NUMA, we could
> potentially reduce the memory overhead for a specific line and reduce
> the overall overheads as well.

This is not even something for NUMA architectures (which is by now all
multiprocessor machines), even multicore machines have overheads for
bouncing cache lines. The locks don't even have to be local, it's good
enough to just have better probability of each backend contending
hitting a different lock, if we take care of not having the locks
share cache lines. IMO that's the whole point of striping locks, the
critical section is usually cheaper than the cost of getting the cache
line in an exclusive state.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SDP query optimizer

2013-03-23 Thread Adriano Lange


Hi,

On 22-03-2013 21:22, Josh Berkus wrote:

Woah!  Way cool.

As a warning, we're in the closing throes of version 9.3 right now, so
if you code/ideas doesn't get the attention it deserves, that's why.


Ok. No problem. :-)


There is an incomplete project from a few years back to make the
non-exhaustive query planner pluggable so that we could use different
algorithms.  Unfortunately, it was never finished and merged with the
core code.  Your planner is yet another reason it would be great to
complete this.


Yes. I looked at the Julius and Tomas' project in pgFoundry [1] some 
years ago, but it was inactive. Therefore, I decided to start a new one.


[1] - http://pgfoundry.org/projects/optimizer/

Anyway, good work for all of you.

--
Adriano Lange



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Single-argument variant for array_length and friends?

2013-03-23 Thread Brendan Jurd

On 22 March 2013 09:12, Merlin Moncure  wrote:
> On Thu, Mar 21, 2013 at 2:00 AM, Pavel Stehule  
> wrote:
>> lot of postgresql functions calculate with all items in array without
>> respect to dimensions - like unnest.
>>
>> so concept "use outermost dim" is not in pg now, and should not be
>> introduced if it is possible. More it goes against a verbosity concept
>> introduced by ADA and reused in PL/SQL and PL/pgSQL.
>
> and pl/psm*

Yeah, okay.  That argument works for me.  Let's go for option (a),
only allow the user to omit the dimension argument if the array is
1-D.

We still have the issue that Tom isn't convinced that the feature is
worth pursuing -- Tom, would you please elaborate a little on what you
dislike about it?  I don't see much of a downside (just 3 extra
pg_procs).

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-23 Thread Amit Kapila

On Saturday, March 23, 2013 9:34 AM Jim Nasby wrote:
> On 3/22/13 7:27 PM, Ants Aasma wrote:
> > On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure 
> wrote:
> 
> > One other interesting idea I have seen is closeable scalable nonzero
> > indication (C-SNZI) from scalable rw-locks [1]. The idea there is to
> > use a tree structure to dynamically stripe access to the shared lock
> > counter when contention is detected. Downside is that considerable
> > amount of shared memory is needed so there needs to be some way to
> > limit the resource usage. This is actually somewhat isomorphic to the
> > nailing idea.
> >
> > The issue with the current buffer management algorithm is that it
> > seems to scale badly with increasing shared_buffers. I think the
> > improvements should concentrate on finding out what is the problem
> > there and figuring out how to fix it. A simple idea to test would be
> > to just partition shared buffers along with the whole clock sweep
> > machinery into smaller ones, like the buffer mapping hash tables
> > already are. This should at the very least reduce contention for the
> > clock sweep even if it doesn't reduce work done per page miss.
> >
> > [1] http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf
> 
> Partitioned clock sweep strikes me as a bad idea... you could certainly
> get unlucky and end up with a lot of hot stuff in one partition.
> 
> Another idea that'sbeen broughht up inthe past is to have something in
> the background keep a minimum number of buffers on the free list.
> That's how OS VM systems I'm familiar with work, so there's precedent
> for it.
> 
> I recall there were at least some theoretical concerns about this, but
> I don't remember if anyone actually tested the idea.

I have tried one of the idea's : Adding the buffers background writer finds
reusable to freelist. 
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF9
7@szxeml509-mbs
This can reduce the clock swipe as it can find buffers from freelist. 

It shows performance improvement for read loads when data can be contained
in shared buffers, 
but when the data becomes large and (I/O) is involved, it shows some dip as
well.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

[HACKERS] Re: [GENERAL] timeofday() and clock_timestamp() produce different results when casting to timestamptz

Re: [HACKERS] [PATCH] avoid buffer underflow in errfinish()

[HACKERS] [PATCH] avoid buffer underflow in errfinish()

Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

Re: [HACKERS] SDP query optimizer

[HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

Re: [HACKERS] [GENERAL] timeofday() and clock_timestamp() produce different results when casting to timestamptz

Re: [HACKERS] Enabling Checksums

Re: [HACKERS] pg_dump/restore syntax checking bug?

Re: [HACKERS] Enabling Checksums

Re: [HACKERS] Support for REINDEX CONCURRENTLY

Re: [HACKERS] SDP query optimizer

Re: [HACKERS] Enabling Checksums

Re: [HACKERS] Enabling Checksums

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] Page replacement algorithm in buffer cache

Re: [HACKERS] SDP query optimizer

Re: [HACKERS] Single-argument variant for array_length and friends?

Re: [HACKERS] Page replacement algorithm in buffer cache

26 matches

Site Navigation

Mail list logo

Footer information