Re: [PERFORM] GIST optimization to limit calls to operator on sub nodes

2014-07-01 Thread Pujol Mathieu


Le 30/06/2014 16:04, Tom Lane a écrit :

Pujol Mathieu mathieu.pu...@realfusio.com writes:

Le 29/06/2014 22:30, Tom Lane a écrit :

I don't actually understand what's being requested here that the
NotConsistent case doesn't already cover.

The NotConsistent case is correctly covered, the sub nodes are not
tested because I know that no child could pass the consistent_test.
The MaybeConsistent case is also correctly covered, all sub nodes are
tested because I don't know which sub nodes will pass the consistent_test.
My problem is with the FullyConsistent, because when I test a node I can
know that all it's childs nodes and leaves will pass the test, so I want
to notify GIST framework that it can't skip consistent test on those
nodes. Like we can notify it when testing a leaf that it could skip
consistent test on the row. Maybe I miss something on the API to do
that. On my tests, the recheck_flag works only for leaves.

Hm ... that doesn't seem like a case that'd come up often enough to be
worth complicating the APIs for, unless maybe you are expecting a lot
of exact-duplicate index entries.  If you are, you might find that GIN
is a better fit for your problem than GIST --- it's designed to be
efficient for lots-of-duplicates.

Another view of this is that if you can make exact satisfaction checks
at upper-page entries, you're probably storing too much information in
the index entries (and thereby bloating the index).  The typical tradeoff
in GIST indexes is something like storing bounding boxes for geometric
objects --- which is necessarily lossy, but it results in small indexes
that are fast to search.  It's particularly important for upper-page
entries to be small, so that fanout is high and you have a better chance
of keeping all the upper pages in cache.

If you've got a compelling example where this actually makes sense,
I'd be curious to hear the details.

regards, tom lane



Hi,
I have a table containing several millions of rows, and each row 
contains an unique integer as identifier. My goal is to select all rows 
which have an identifier that is contained into at least one range of a 
list.

CREATE TABLE t (id int4 UNIQUE, ...)
SELECT * FROM t WHERE id @@ ARRAY[...]::range[]
I use a custom operator to check if an integer is contained in an array 
of ranges (a custom structure defined by my plugin).
I build my own GIST to speedup those requests. So each node of my GIST 
is represented by a range (like a BVH or octree of 3D boxes). I have no 
duplicated keys in my index.
When I run my tests I am able to quickly discard entire portions of the 
index which leads to great performance improvements.
During GIST traversal when I test consistency of a node 
(isRangeOverlap(range,range[]) the test return Fully, Partially, No. So 
when the tests return Fully I know for sure that all subnodes will also 
return Fully.
In practice my operator is not free in execution time, so if I could 
propagate the information on subnodes it will allow to save lot of 
computation time.

This optimization could also be benefical for cube extension.
I don't think that it would complicate the API, we could use existing 
recheck_flag. Today this value is used only for leaves node. Maybe it 
could be propagated by GIST API from a node to its subnodes. So if a 
node set it to false, it will be false for it's subnodes allowing client 
to use it or no. So the API could remain the same without changes for 
existing plugins and need only small memory to propagate this boolean.
I already achieve great performance improvements with my GIST, my 
request is to optimize it in use cases that select several rows to limit 
overhead of my consistent_operator.


regards,
mathieu pujol


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-01 Thread Mark Kirkwood

On 27/06/14 21:19, Andres Freund wrote:

On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote:

My feeling is spinlock or similar, 'perf top' shows

kernel find_busiest_group
kernel _raw_spin_lock

as the top time users.


Those don't tell that much by themselves, could you do a hierarchical
profile? I.e. perf record -ga? That'll at least give the callers for
kernel level stuff. For more information compile postgres with
-fno-omit-frame-pointer.



Unfortunately this did not help - had lots of unknown symbols from 
postgres in the profile - I'm guessing the Ubuntu postgresql-9.3 package 
needs either the -dev package or to be rebuilt with the enable profile 
option (debug and no-omit-frame-pointer seem to be there already).


However further investigation did uncover *very* interesting things. 
Firstly I had previously said that read only performance looked 
ok...this was wrong, purely based on comparison to Robert's blog post. 
Rebooting the 60 core box with 32 cores enabled showed that we got 
*better* scaling performance in the read only case and illustrated we 
were hitting a serious regression with more cores. At this point data is 
needed:


Test: pgbench
Options: scale 500
 read only
Os: Ubuntu 14.04
Pg: 9.3.4
Pg Options:
max_connections = 200
shared_buffers = 10GB
maintenance_work_mem = 1GB
effective_io_concurrency = 10
wal_buffers = 32MB
checkpoint_segments = 192
checkpoint_completion_target = 0.8


Results

Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
+--+-
6   |  70400   |  71028
12  |  98918   | 129140
24  | 230345   | 240631
48  | 324042   | 409510
96  | 346929   | 120464
192 | 312621   |  92663

So we have anti scaling with 60 cores as we increase the client 
connections. Ouch! A level of urgency led to trying out Andres's 
'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 
branch and building a package from that and retesting:


Clients | 9.4 tps 60 cores (rwlock)
+--
6   |  70189
12  | 128894
24  | 233542
48  | 422754
96  | 590796
192 | 630672

Wow - that is more like it! Andres that is some nice work, we definitely 
owe you some beers for that :-) I am aware that I need to retest with an 
unpatched 9.4 src - as it is not clear from this data how much is due to 
Andres's patches and how much to the steady stream of 9.4 development. 
I'll post an update on that later, but figured this was interesting 
enough to note for now.



Regards

Mark

[1] from git://git.postgresql.org/git/users/andresfreund/postgres.git, 
commits:

4b82477dcaf81ad7b0c102f4b66e479a5eb9504a
10d72b97f108b6002210ea97a414076a62302d4e
67ffebe50111743975d54782a3a94b15ac4e755f
fe686ed18fe132021ee5e557c67cc4d7c50a1ada
f2378dc2fa5b73c688f696704976980bab90c611



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-01 Thread Mark Kirkwood

On 01/07/14 21:48, Mark Kirkwood wrote:


[1] from git://git.postgresql.org/git/users/andresfreund/postgres.git,
commits:
4b82477dcaf81ad7b0c102f4b66e479a5eb9504a
10d72b97f108b6002210ea97a414076a62302d4e
67ffebe50111743975d54782a3a94b15ac4e755f
fe686ed18fe132021ee5e557c67cc4d7c50a1ada
f2378dc2fa5b73c688f696704976980bab90c611




Hmmm, should read last 5 commits in 'rwlock-contention' and I had pasted 
the commit nos from my tree not Andres's, sorry, here are the right ones:

472c87400377a7dc418d8b77e47ba08f5c89b1bb
e1e549a8e42b753cc7ac60e914a3939584cb1c56
65c2174469d2e0e7c2894202dc63b8fa6f8d2a7f
959aa6e0084d1264e5b228e5a055d66e5173db7d
a5c3ddaef0ee679cf5e8e10d59e0a1fe9f0f1893




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-01 Thread Andres Freund
On 2014-07-01 21:48:35 +1200, Mark Kirkwood wrote:
 On 27/06/14 21:19, Andres Freund wrote:
 On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote:
 My feeling is spinlock or similar, 'perf top' shows
 
 kernel find_busiest_group
 kernel _raw_spin_lock
 
 as the top time users.
 
 Those don't tell that much by themselves, could you do a hierarchical
 profile? I.e. perf record -ga? That'll at least give the callers for
 kernel level stuff. For more information compile postgres with
 -fno-omit-frame-pointer.
 
 
 Unfortunately this did not help - had lots of unknown symbols from postgres
 in the profile - I'm guessing the Ubuntu postgresql-9.3 package needs either
 the -dev package or to be rebuilt with the enable profile option (debug and
 no-omit-frame-pointer seem to be there already).

You need to install the -dbg package. My bet is you'll see s_lock high
in the profile, called mainly from the procarray and buffer mapping
lwlocks.

 Test: pgbench
 Options: scale 500
  read only
 Os: Ubuntu 14.04
 Pg: 9.3.4
 Pg Options:
 max_connections = 200

Just as an experiment I'd suggest increasing max_connections by one and
two and quickly retesting - there's some cacheline alignment issues that
aren't fixed yet that happen to vanish with some max_connections
settings.

 shared_buffers = 10GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 10
 wal_buffers = 32MB
 checkpoint_segments = 192
 checkpoint_completion_target = 0.8
 
 
 Results
 
 Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
 +--+-
 6   |  70400   |  71028
 12  |  98918   | 129140
 24  | 230345   | 240631
 48  | 324042   | 409510
 96  | 346929   | 120464
 192 | 312621   |  92663
 
 So we have anti scaling with 60 cores as we increase the client connections.
 Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1]
 - cherry picking the last 5 commits into 9.4 branch and building a package
 from that and retesting:
 
 Clients | 9.4 tps 60 cores (rwlock)
 +--
 6   |  70189
 12  | 128894
 24  | 233542
 48  | 422754
 96  | 590796
 192 | 630672
 
 Wow - that is more like it! Andres that is some nice work, we definitely owe
 you some beers for that :-) I am aware that I need to retest with an
 unpatched 9.4 src - as it is not clear from this data how much is due to
 Andres's patches and how much to the steady stream of 9.4 development. I'll
 post an update on that later, but figured this was interesting enough to
 note for now.

Cool. That's what I like (and expect) to see :). I don't think unpatched
9.4 will show significantly different results than 9.3, but it'd be good
to validate that. If you do so, could you post the results in the
-hackers thread I just CCed you on? That'll help the work to get into
9.5.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Volatility - docs vs behaviour?

2014-07-01 Thread Merlin Moncure
On Mon, Jun 30, 2014 at 9:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 I was unaware that the planner made any attempt to catch users' errors
 in marking the strictness of functions. I thought it pretty much trusted
 the user not to lie about the mutability of functions invoked
 indirectly. I'm not really sure where in the inlining code to look to
 figure that out.

 It's in optimizer/util/clauses.c:

 /*
  * Additional validity checks on the expression.  It mustn't return a set,
  * and it mustn't be more volatile than the surrounding function (this is
  * to avoid breaking hacks that involve pretending a function is immutable
  * when it really ain't).  If the surrounding function is declared strict,
  * then the expression must contain only strict constructs and must use
  * all of the function parameters (this is overkill, but an exact analysis
  * is hard).
  */
 if (expression_returns_set(newexpr))
 goto fail;

 if (funcform-provolatile == PROVOLATILE_IMMUTABLE 
 contain_mutable_functions(newexpr))
 goto fail;
 else if (funcform-provolatile == PROVOLATILE_STABLE 
  contain_volatile_functions(newexpr))
 goto fail;

 As the comment says, this wasn't really coded with an eye towards
 catching user error.  Rather, there are known use-cases where people
 intentionally use SQL wrapper functions to lie about the mutability
 of some underlying function; inlining would expose the truth of the
 matter and thus defeat such hacks.  Now I'd be the first to agree
 that this isn't a terribly high-performance way of doing that, but
 the point here was to not change the behavior that existed before
 SQL inlining did.

some points:
*) there are several cases that look superficially immutable to the
user but are really stable.  Mostly this comes up with date time
functions because of the database dependency.  The issue I have with
the status quo is that the server punishes you by mis-decorating the
function (it gets treaded as volatile).

*) some formulations of functions like to_char() are immutable
depending on arguments (julian day for example).  so if the user wraps
this for purposes of indexing and then uses that same function for
querying, the operation is non inlineable.

*) unless you really know your stuff server inlining is completely
abstracted from you, except in terms of performance

Adding up the above, the way things work today is kind of a pain -- in
may ways I feel that if I mark a function IMMUTABLE, the server should
not overrule me.  If that argument doesn't hold water, then server
should at least tell you when a function is inlineable -- perhaps via
\df+.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Jeff Frost
On Jun 30, 2014, at 4:57 PM, Jeff Frost j...@pgexperts.com wrote:

 
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?
 
 
 Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
 pg_locks WHERE mode = 'AccessExclusiveLock' 
 run in cron every minute.
 
 That said, I'd bet it was mostly ON COMMIT DROP temp tables.
 
 The unfortunate thing is I wouldn't know how to correlate that spike with the 
 corresponding slowdown because the replica is about 5.5hrs lagged at the 
 moment.
 
 Hopefully it will get caught up tonight and we can see if there's a 
 correlation tomorrow.

And indeed it did catch up overnight and the lag increased shortly after a 
correlating spike in AccessExclusiveLocks that were generated by temp table 
creation with on commit drop.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?

 And indeed it did catch up overnight and the lag increased shortly after a 
 correlating spike in AccessExclusiveLocks that were generated by temp table 
 creation with on commit drop.

OK, so we have a pretty clear idea of where the problem is now.

It seems like there are three, not mutually exclusive, ways we might
address this:

1. Local revisions inside StandbyReleaseLocks to make it perform better in
the presence of many locks.  This would only be likely to improve matters
much if there's a fixable O(N^2) algorithmic issue; but there might well
be one.

2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
the grounds that no standby should be touching them.  I'm not entirely
sure that that argument is bulletproof though; in particular, even though
a standby couldn't access the table's data, it's possible that it would be
interested in seeing consistent catalog entries.

3. Avoid WAL-logging AccessExclusiveLocks associated with
new-in-transaction tables, temp or not, on the grounds that no standby
could even see such tables until they're committed.  We could go a bit
further and not take out any locks on a new-in-transaction table in the
first place, on the grounds that other transactions on the master can't
see 'em either.

It sounded like Andres had taken a preliminary look at #1 and found a
possible avenue for improvement, which I'd encourage him to pursue.

For both #2 and the conservative version of #3, the main implementation
problem would be whether the lock WAL-logging code has cheap access to
the necessary information.  I suspect it doesn't.

The radical version of #3 might be pretty easy to do, at least to the
extent of removing locks taken out during CREATE TABLE.  I suspect there
are some assertions or other consistency checks that would get unhappy if
we manipulate relations without locks, though, so those would have to be
taught about the exception.  Also, we sometimes forget new-in-transaction
status during relcache flush events; it's not clear if that would be a
problem for this.

I don't plan to work on this myself, but perhaps someone with more
motivation will want to run with these ideas.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

2014-07-01 Thread Niels Kristian Schjødt
Thanks for the answers.

 Den 30/06/2014 kl. 20.04 skrev Jeff Janes jeff.ja...@gmail.com:
 
 On Wed, Jun 25, 2014 at 1:48 AM, Niels Kristian Schjødt
 nielskrist...@autouncle.com wrote:
 Hi,
 I’m running a search engine for cars. It’s backed by a postgresql 9.3 
 installation.
 
 Now I’m unsure about the best approach/strategy on doing index optimization 
 for the fronted search.
 
 The problem:
 
 The table containing the cars holds a around 1,5 million rows. People that 
 searches for cars needs different criteria to search by. Some search by 
 brand/model, some by year, some by mileage, some by price and some by 
 special equipment etc. etc. - and often they combine a whole bunch of 
 criteria together. Of cause some, like brand/mode and price, are used more 
 frequently than others. In total we offer: 9 category criteria like 
 brand/model or body type, plus 5 numeric criteria like price or mileage, 
 plus 12 boolean criteria like equipment. Lastly people can order the results 
 by different columns (year, price, mileage and a score we create about the 
 cars). By default we order by our own generated score.
 
 What I’ve done so far:
 
 I have analyzed the usage of the criteria “lightly”, and created a few 
 indexes (10). Among those, are e.g. indexes on price, mileage and a combined 
 index on brand/model. Since we are only interested in showing results for 
 cars which is actually for sale, the indexes are made as partial indexes on 
 a sales state column.
 
 I'd probably partition the data on whether it is for sale, and then
 search only the for-sale partition.

Hmm okay, I already did all the indexes partial based on the for-sales state. 
If the queries always queries for-sale, and all indexes are partial based on 
those, will it then still help performance / make sense to partition the tables?
 
 
 Questions:
 
 1. How would you go about analyzing and determining what columns should be 
 indexed, and how?
 
 I'd start out with intuition about which columns are likely to be used
 most often, and in a selective way.  And followup by logging slow
 queries so they can be dissected at leisure.
 
 2. What is the best strategy when optimizing indexes for searches happening 
 on 20 + columns, where the use and the combinations varies a lot? (To just 
 index everything, to index some of the columns, to do combined indexes, to 
 only do single column indexes etc. etc.)
 
 There is no magic index.  Based on your description, you are going to
 be seq scanning your table a lot.  Focus on making it as small as
 possible, but vertical partitioning it so that the not-for-sale
 entries are hived off to an historical table, and horizontally
 partitioning it so that large columns rarely used in the where clause
 are in a separate table (Ideally you would tell postgresql to
 aggressively toast those columns, but there is no knob with which to
 do that)
 
 
 3. I expect that it does not make sense to index all columns?
 
 You mean individually, or jointly?  Either way, probably not.
 
 4. I expect it does not make sense to index boolean columns?
 
 In some cases it can, for example if the data distribution is very
 lopsided and the value with the smaller side is frequently specified.
 
 5. Is it better to do a combined index on 5 frequently used columns rather 
 than having individual indexes on each of them?
 
 How often are the columns specified together?  If they are completely
 independent it probably makes little sense to index them together.
 
 6. Would it be a goof idea to have all indexes sorted by my default sorting?
 
 You don't get to choose.  An btree index is sorted by the columns
 specified in the index, according to the operators specified (or
 defaulted).  Unless you mean that you want to add the default sort
 column to be the lead column in each index, that actually might make
 sense.
 
 7. Do you have so experiences with other approaches that could greatly 
 improve performance (e.g. forcing indexes to stay in memory etc.)?
 
 If your queries are as unstructured as you imply, I'd forget about
 indexes for the most part, as you will have a hard time findings ones
 that work.  Concentrate on making seq scans as fast as possible.  If
 most of your queries end in something like ORDER by price limit 10
 then concentrate on index scans over price.  You will probably want to
 include heuristics in your UI such that if people configure queries to
 download half your database, you disallow that.  You will probably
 find that 90% of the workload comes from people who are just playing
 around with your website and don't actually intend to do business with
 you.
 
 Cheers,
 
 Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance