> -----Original Message-----
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Gunther
> Sent: Wednesday, November 01, 2017 20:29
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] OLAP/reporting queries fall into nested loops over seq
> scans or other horrible planner choices
> Hi, this is Gunther, have been with PgSQL for decades, on an off this list.
> Haven't been on for a long time making my way just fine. But there is one 
> thing
> that keeps bothering me both with Oracle and PgSQL. And that is the
> preference for Nested Loops.
> Over the years the archives have questions about Nested Loops being chosen
> over Hash Joins. But the responses seem too specific to the people's queries,
> ask many questions, make them post the query plans, and often end up
> frustrating with suggestions to change the data model or to add an index and
> stuff like that.
> One should not have to go into that personal detail.
> There are some clear boundaries that a smart database should just never cross.
> Especially with OLAP queries. Think a database that is fine for OLTP, has
> indexes and the index based accesses for a few records joined with a dozen
> other tables all with indexes is no problem. If you fall into a Seq Scan 
> scenario
> or unwanted Hash Join, you usually forgot to add an index or forgot to put 
> index
> columns into your join or other constraints. Such are novice questions and we
> should be beyond that.
> But the issue is bulk searches, reports, and any analytic queries scenarios. 
> In
> those queries Nested Loops are almost always a bad choice, even if there is an
> index. In over 20 years of working with RDBMs this has been my unfailing
> heuristics. A report runs slow? Look at plan, is there a Nested Loop? Yes?
> Squash it! And the report runs 10x faster instantaneously.
> So, all the more troublesome is if any database system (here PgSQL) would
> ever fall into a Nested Loop trap with CPU spinning at 100% for several
> minutes, with a Nested Loop body of anything from a Seq Scan or worse with a
> cardinality of anything over 10 or 100. Nested Loops of Nested Loops or Nested
> Loops of other complex query plan fragments should be a no-no and chosen
> only as an absolute last resort when the system cannot find enough memory,
> even then disk based merge sort should be better, i.e., Nested Loops should
> never be chosen. Period.
> If you can set enable_nestloop off and the Hash Join is chosen and the
> performance goes from 1 hour of 100% CPU to 10 seconds completion time,
> then something is deadly wrong. And it doesn't matter to me if I should have
> re-written my query in some funny ways or tweaked my data model, these are
> all unacceptable options when you have a complex system with hybrid
> OLTP/OLAP uses. Don't tell me to de-normalize. I know I can materialize joins
> in tables which I can then use again in joins to save time. But that is not 
> the
> point here.
> And I don't think tweaking optimizer statistics is the solution either.
> Because optimizer statistics quickly become worthless when your criteria get
> more complex.
> The point is that Nested Loops should never be chosen except in index lookup
> situations or may be memory constraints.
> How can I prevent it on a query by query scope? I cannot set enable_nestloop =
> off because one query will be for a full report, wile another one might have
> indexed constraints running in the same session, and I don't want to manage
> side effects and remember to set enable_nestloop parameter on and off.
> There must be a way to tell the optimizer to penalize nested loops to make
> them the last resort. In Oracle there are those infamous hints, but they don't
> always work either (or it is easy to make mistakes that you get no feedback
> about).
> Is there any chance PgSQL can get something like a hint feature? Or is there a
> way to use postgresql.conf to penalize nested loops so that they would only 
> ever
> be chosen in the most straight-forward situations as with query parameters
> that are indexed? I know I need to have sufficient work_mem, but if you can 
> set
> enable_nestloop = off and you get the desired Hash Join, there is obviously
> sufficient work_mem, so that isn't the answer either.
> Thanks for listening to my rant.
> regards,
> -Gunther
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

 [Laurent Hasson] 
Hello Gunther,

Just adding to your voice. I recently experienced the same issue with a complex 
multi-table view, including pivots, and was surprised to see all the nested 
loops everywhere in spite of indices being available. I spent a lot of time 
optimizing the query and went from about 1h to about 3mn, but penalizing nested 
loops in favor of other "joining" techniques seem to make sense as a strategy. 
Either that, or there is something I really don't understand here either and 
would love to be educated :)


Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Reply via email to