Re: [PERFORM] OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

2017-11-03 Thread Laurenz Albe
Gunther wrote:
> > Bad choices are almost always caused by bad estimates.
> > Granted, there is no way that estimates can ever be perfect.
> > ...
> > Looking deeper, I would say that wrongly chosen nested loop joins
> > often come from an underestimate that is close to zero.
> > PostgreSQL already clamps row count estimates to 1, that is, it will
> > choose an estimate of 1 whenever it thinks fewer rows will be returned.
> > 
> > Perhaps using a higher clamp like 2 would get rid of many of your
> > problems, but it is a difficult gamble as it will also prevent some
> > nested loop joins that would have been the best solution.
> 
> Wow, that is very interesting! Are you saying that if PgSQL can't know 
> what the cardinality is, it assumes a default of 1? That would be very 
> slanted a guess. I would think a couple of hundred would be more 
> appropriate, or 10% of the average of the base tables for which it does 
> have statistics. I would wonder if changing 1 to 2 would make much 
> difference, as Seq Search over 1 to 10 tuples should generally be better 
> than any other approach, as long as the 1-10 tuples are already readily 
> available.

No, it is not like that.
When PostgreSQL cannot come up with a "real" estimate, it uses
default selectivity estimates.

See include/utils/selfuncs.h:

/*
 * Note: the default selectivity estimates are not chosen entirely at random.
 * We want them to be small enough to ensure that indexscans will be used if
 * available, for typical table densities of ~100 tuples/page.  Thus, for
 * example, 0.01 is not quite small enough, since that makes it appear that
 * nearly all pages will be hit anyway.  Also, since we sometimes estimate
 * eqsel as 1/num_distinct, we probably want DEFAULT_NUM_DISTINCT to equal
 * 1/DEFAULT_EQ_SEL.
 */

/* default selectivity estimate for equalities such as "A = b" */
#define DEFAULT_EQ_SEL  0.005

/* default selectivity estimate for inequalities such as "A < b" */
#define DEFAULT_INEQ_SEL  0.

/* default selectivity estimate for range inequalities "A > b AND A < c" */
#define DEFAULT_RANGE_INEQ_SEL  0.005

/* default selectivity estimate for pattern-match operators such as LIKE */
#define DEFAULT_MATCH_SEL   0.005

/* default number of distinct values in a table */
#define DEFAULT_NUM_DISTINCT  200

/* default selectivity estimate for boolean and null test nodes */
#define DEFAULT_UNK_SEL 0.005
#define DEFAULT_NOT_UNK_SEL (1.0 - DEFAULT_UNK_SEL)

Those selectivity estimates are factors, not absolute numbers.

The clamp to 1 happens when, after applying all selectivity factors, the
result is less than 1, precisely to keep the optimizer from choosing a plan
that would become very expensive if a branch is executed *at all*.

> > Finally, even though the official line of PostgreSQL is to *not* have
> > query hints, and for a number of good reasons, this is far from being
> > an unanimous decision.  The scales may tip at some point, though I
> > personally hope that this point is not too close.
> 
> I am glad to hear that hints are not completely ruled out by the 
> development team. Definitely Oracle hints are painful and should not be 
> replicated as is.  Butmay be I can nudge your (and others') personal 
> tastes with the following.

Didn't work for me.
Your hints look just like what Oracle does.

There have been better proposals that aim at fixing the selectivity
estimates, e.g. "multiply your estimate for this join by three".

> In my Aqua 
> Data Studio, if I put the set statement before the select statement, the 
> combined statement doesn't return any results. May be I am doing 
> something wrong. If there is a way, then I would ave what I need.

Check the SQL statements that are generated by your Aqua Data Studio!

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

2017-11-02 Thread Laurenz Albe
Gunther wrote:
> But there 
> is one thing that keeps bothering me both with Oracle and PgSQL. And 
> that is the preference for Nested Loops.

[...]

> But the issue is bulk searches, reports, and any analytic queries 
> scenarios. In those queries Nested Loops are almost always a bad choice, 
> even if there is an index. In over 20 years of working with RDBMs this 
> has been my unfailing heuristics. A report runs slow? Look at plan, is 
> there a Nested Loop? Yes? Squash it! And the report runs 10x faster 
> instantaneously.

[...]

> If you can set enable_nestloop off and the Hash Join is chosen and the 
> performance goes from 1 hour of 100% CPU to 10 seconds completion time, 
> then something is deadly wrong.

[...]

> The point is that Nested Loops should never be chosen except in index 
> lookup situations or may be memory constraints.
> 
> How can I prevent it on a query by query scope? I cannot set 
> enable_nestloop = off because one query will be for a full report, wile 
> another one might have indexed constraints running in the same session, 
> and I don't want to manage side effects and remember to set 
> enable_nestloop parameter on and off.
> 
> There must be a way to tell the optimizer to penalize nested loops to 
> make them the last resort. In Oracle there are those infamous hints, but 
> they don't always work either (or it is easy to make mistakes that you 
> get no feedback about).
> 
> Is there any chance PgSQL can get something like a hint feature?

PostgreSQL doesn't have a way to tell if a query is an OLAP query
running against a star schema or a regular OLTP query, it will treat
both in the same fashion.

I also have had to deal with wrongly chosen nested loop joins, and
testing a query with "enable_nestloop=off" is one of the first things
to try in my experience.

However, it is not true that PostgreSQL "perfers nested loops".
Sometimes a nested loop join is the only sane and efficient way to
process a query, and removing that capability would be just as
bad a disaster as you are experiencing with your OLAP queries.

Bad choices are almost always caused by bad estimates.
Granted, there is no way that estimates can ever be perfect.

So what could be done?

One pragmatic solution would be to wrap every query that you know
to be an OLAP query with

BEGIN;
SET LOCAL enable_nestloop=off;
SELECT ...
COMMIT;

Looking deeper, I would say that wrongly chosen nested loop joins
often come from an underestimate that is close to zero.
PostgreSQL already clamps row count estimates to 1, that is, it will
choose an estimate of 1 whenever it thinks fewer rows will be returned.

Perhaps using a higher clamp like 2 would get rid of many of your
problems, but it is a difficult gamble as it will also prevent some
nested loop joins that would have been the best solution.

Finally, even though the official line of PostgreSQL is to *not* have
query hints, and for a number of good reasons, this is far from being
an unanimous decision.  The scales may tip at some point, though I
personally hope that this point is not too close.

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] memory allocation

2017-10-19 Thread Laurenz Albe
nijam J wrote:
> our server is getting too slow again and again

Use "vmstat 1" and "iostat -mNx 1" to see if you are
running out of memory, CPU capacity or I/O bandwith.

Figure out if the slowness is due to slow queries or
an overloaded system.

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] synchronization between PostgreSQL and Oracle

2017-10-12 Thread Laurenz Albe
ROS Didier wrote:
>    I would like your advice  and recommendation about the 
> following infrastructure problem :
> What is the best way to optimize synchronization between an instance 
> PostgreSQL on Windows 7 workstation and an Oracle 11gR2 database on linux 
> RHEL  ?
> Here are more detailed explanations
> In our company we have people who collect data in a 9.6 postgresql instance 
> on their workstation that is disconnected from the internet.
> In the evening, they connect to the Internet and synchronize the collected 
> data to a remote 11gr2 Oracle database.
> What is the best performant way to do this ( Oracle_FDW ?, flat files ?, …)

If the synchronization is triggered from the workstation with
PostgreSQL on it, you can either use oracle_fdw or pg_dump/sql*loader
to transfer the data.

Using oracle_fdw is probably simpler, but it is not very performant
for bulk update operations.

If performance is the main objective, use export/import.

Yours,
Laurenz Albe



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] blocking index creation

2017-10-11 Thread Laurenz Albe
Neto pr wrote:
> When creating index on table of approximately 10GB of data, the DBMS hangs (I 
> think),
> because even after waiting 10 hours there was no return of the command.
> It happened by creating Hash indexes and B + tree indexes.
> However, for some columns, it was successfully (L_RETURNFLAG, L_PARTKEY).

> If someone has a hint how to speed up index creation so that it completes 
> successfully.

Look if CREATE INDEX is running or waiting for a lock (check the
"pg_locks" table, see if the backend consumes CPU time).

Maybe there is a long-running transaction that blocks the
ACCESS EXCLUSIVE lock required.  It could also be a prepared
transaction.

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Stored Procedure Performance

2017-10-03 Thread Laurenz Albe
Purav Chovatia wrote:
> I come from Oracle world and we are porting all our applications to 
> postgresql.
> 
> The application calls 2 stored procs, 
> - first one does a few selects and then an insert
> - second one does an update
> 
> The main table on which the insert and the update happens is truncated before 
> every performance test.
> 
> We are doing about 100 executions of both of these stored proc per second.
> 
> In Oracle each exec takes about 1millisec whereas in postgres its taking 
> 10millisec and that eventually leads to a queue build up in our application.
> 
> All indices are in place. The select, insert & update are all single row 
> operations and use the PK.
> 
> It does not look like any query taking longer but something else. How can I 
> check where is the time being spent? There are no IO waits, so its all on the 
> CPU.

You could profile the PostgreSQL server while it is executing the
workload,
see for example https://wiki.postgresql.org/wiki/Profiling_with_perf

That way you could see where the time is spent.

PL/pgSQL is not optimized for performance like PL/SQL.

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance