Re: [PERFORM] Forcing index scan on query produces 16x faster
On Wed, Mar 17, 2010 at 5:25 PM, Christian Brink wrote: > > -> Index Scan using sales_tranzdate_index on sales s > (cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464 rows=225 > loops=1) > Have you tried increasing the statistics on that table (and then analyzing)? The estimates for that index scan are off by a factor of 100, which may indicate why the planner is trying so hard to avoid a nestloop there. -- - David T. Wilson david.t.wil...@gmail.com
Re: [PERFORM] query has huge variance in execution times
On Wed, Mar 31, 2010 at 12:11 AM, Brian Cox wrote: > > 2010-03-30 18:41:11.685261-07 | select b.ts_id from > ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where > b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and > b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = > c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and > c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and > b.ts_interval_start_time < $4 > (1 row) > > about 5 mins later, I, suspecting problems, do (the values are the same as > for $1 et al above; EXPLAIN was done on purpose to keep stats [hopefully] > the same as when pid 10022 started; there are 80,000 rows in each of the 2 > tables at the time of this EXPLAIN and when 10022 started): > > cemdb=> explain select b.ts_id from ts_stats_tranunit_user_daily b, > ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = > c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and > b.ts_user_incarnation_id = c.ts_user_incarnation_id and > c.ts_interval_start_time >= '2010-3-29 01:00' and c.ts_interval_start_time < > '2010-3-29 02:00' and b.ts_interval_start_time >= '2010-3-29' and > b.ts_interval_start_time < '2010-3-30'; > > These won't necessarily get the same plan. If you want to see what plan the prepared query is getting, you'll need to prepare it ("prepare foo as ") and then explain *that* via "explain execute foo". The prepared version likely has a much more generic plan, whereas the regular query gets optimized for the actual values provided. -- - David T. Wilson david.t.wil...@gmail.com
Re: [PERFORM] query has huge variance in execution times
On Wed, Mar 31, 2010 at 2:10 PM, Brian Cox wrote: > > > so the question still remains: why did it take > 20 mins? To see if it was > due to autovacuum running ANALYZE, I turned off autovacuum, created a table > using SELECT * INTO temp FROM ts_stats_tranunit_user_daily, added the index > on ts_interval_start_time and ran the prepared query with temp, but the > query completed in a few secs. > > It's possible that statistics were updated between the >20 minute run and your most recent prepared query test. In fact, comparing the plans between your two emails, it's quite likely, as even the non-prepared versions are not producing the same plan or the same estimates; it's therefore possible that your problem has already corrected itself if you're unable to duplicate the 20 minute behaviour at this point. Taking a look at the statistics accuracy with an explain analyze might still be informative, however. -- - David T. Wilson david.t.wil...@gmail.com
Re: [PERFORM] Postgres 9.0 has a bias against indexes
On Thu, Jan 27, 2011 at 10:56 AM, Mladen Gogala wrote: > I even tried with an almost equivalent outer join: > > explain analyze select e1.empno,e1.ename,e2.empno,e2.ename > from emp e1 left outer join emp e2 on (e1.mgr=e2.empno); > QUERY PLAN > > > > -- > Nested Loop Left Join (cost=0.00..7.25 rows=14 width=16) (actual > time=0.028..0 > .105 rows=14 loops=1) > Join Filter: (e1.mgr = e2.empno) > -> Seq Scan on emp e1 (cost=0.00..2.14 rows=14 width=10) (actual > time=0.006 > ..0.010 rows=14 loops=1) > -> Materialize (cost=0.00..2.21 rows=14 width=8) (actual > time=0.001..0.003 > rows=14 loops=14) > -> Seq Scan on emp e2 (cost=0.00..2.14 rows=14 width=8) (actual > time= > 0.001..0.005 rows=14 loops=1) > Total runtime: 0.142 ms > (6 rows) > > This gives me the same result as the recursive version, minus the level > column. I am porting an application from Oracle, there is a fairly large > table that is accessed by "connect by". Rewriting it as a recursive join is > not a problem, but the optimizer doesn't really use the indexes. > > You're still using a 14 row table, though. Postgres isn't going to be stupid enough to use an index in this case when the seq scan is clearly faster unless you go out of your way to absolutely force it to do so. If the table is going to be "fairly large", that's the size you need to be testing and tuning with. -- - David T. Wilson david.t.wil...@gmail.com
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 6:05 PM, Robert Haas wrote: > > If you want to randomly pick 10,000 rows out of all the rows that are > going to be inserted in the table without knowing in advance how many > there will be, how do you do that? > Reservoir sampling, as the most well-known option: http://en.wikipedia.org/wiki/Reservoir_sampling -- - David T. Wilson david.t.wil...@gmail.com
Re: [PERFORM] Postgres not using array
On Wed, Aug 20, 2008 at 2:30 PM, André Volpato <[EMAIL PROTECTED]> wrote: > The CPU is 100% used since a few hours ago. Can anyone tell why? Sounds like you've just got a CPU bound query. The data may even all be in cache. Some information on database size, along with EXPLAIN results for your queries, would help here. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] inaccurate stats on large tables
On Thu, Sep 4, 2008 at 2:21 PM, Kiran Mukhyala <[EMAIL PROTECTED]> wrote: >Can someone please tell me how to improve the query planner >estimate? I did try vacuum analyze. Here are some details: Have you tried increasing the statistics target for that table (or in general)? -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин <[EMAIL PROTECTED]> wrote: > "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual > time=30292.802..755751.242 rows=34749 loops=1)" Have you tried increasing the default_statistics_target? The planner is expecting 1.3 billion rows to be produced from a query that's only actually producting 35k, which probably indicates some very bad statistics. At the same time, the materialize step produces 242 million rows when the planner only expects to produce 2.3, indicating a similar problem in the opposite direction. This probably means that the planner is choosing plans that would be optimal if it was making good guesses but are decidedly sub-optimal for your actual data. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Tue, Dec 9, 2008 at 2:56 PM, Richard Yen <[EMAIL PROTECTED]> wrote: > In practice, the difference is 300+ seconds when $LASTNAME == 5 chars and <1 > second when $LASTNAME != 5 chars. > > Would anyone know what's going on here? Is there something about the way > btrim works, or perhaps with the way indexes are created? It's strange that > the query plan would change for just one case ("Jones," "Smith," "Brown," > etc., all cause the query plan to use that extra heap scan). Those are likely common names, and may be showing up in the table stats as common values, causing the planner to change things around. Does this hold even for non-existent 5-character lastname strings? Speaking of table statistics, might be worth upping the statistics target on that table/column, analyzing, and seeing if you get different results. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Wed, Dec 10, 2008 at 2:13 PM, Greg Smith <[EMAIL PROTECTED]> wrote: > 3ware 9650SE-4LPML is what I'd buy today if I wanted hardware SATA RAID. FWIW, I just put together a system with exactly that (4 320g drives in raid 10) and have been pleased with the results. I won't have any downtime to be able to get performance benchmarks until the current compute/write pass finishes in a week and a half or so, but I don't have any complaints with how it's performing for our app. DB size is ~120gb now and add ~7gb/day during the current phase, at which point it'll move to a light-write, high-read data warehouse style usage pattern. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
On Mon, Jan 12, 2009 at 5:59 PM, Scott Marlowe wrote: > I can't figure out where my time's going to. Looks like it's going to: -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) I'd guess the index/pages for users isn't in memory the first time around. Next time is: -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=3.126..3.305 rows=0 loops=35) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index
On Wed, Jan 14, 2009 at 12:53 AM, Maksim Sosnovskiy wrote: Will it be more efficient to also add index on IP > Address to speedup lookups by IP? Most likely, especially if the IP address is not the first column in your primary key index. Have you done an explain analyze of your ip lookup query? If not, do so; that can help. Then try creating the index and explain analyze the query again to see what happens. Knowing your existing schema/indices and such would let us do more than guess- and not knowing the plan your current query is using makes it difficult to know if there's a better one using a to-be-created index. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Interesting query plan change linked to the LIMIT parameter
On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart wrote: > > The second query scans the whole comment table which is very dangerous for > production servers. That's not quite true. The second does an index scan- the planner seems to be guessing that it'll fulfill the required limit early in the index scan; only with a pathologically bad case would it actually have to scan the entire thing. Basically, the second query is optimized to spit out the first few rows quickly, since that's all you asked for with the limit. Note that your first query has a final cost estimate of "Limit (cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated 10261.19 to emit the first row; the second has "Limit (cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically, instant) to emit the first - and only desired - row. That all said, an explain analyze would give us a better idea of what's going on- we can't tell if the planner is making bad estimates without the knowledge of what the real timing and row count results of plan stages were. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting millions of rows
On Mon, Feb 2, 2009 at 1:17 PM, Brian Cox wrote: > I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete from > ts_defects; > Result: out of memory/Can't allocate size: 32 Is this table the target of any foreign keys? -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting millions of rows
On Mon, Feb 2, 2009 at 3:37 PM, Brian Cox wrote: > David Wilson [david.t.wil...@gmail.com] wrote: > >> Is this table the target of any foreign keys? > > There are 2 "on delete cascade" FKs that reference this table. I believe that's the source of your memory issues. I think TRUNCATE may handle this more effectively; alternately you can handle the cascading yourself in these cases. (And, as Dave Dutcher mentioned, TRUNCATE is definitely the way to go for full-table wipes). -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad plan for nested loop + limit
On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo wrote: > > Output from "explain analyze": > > Limit (cost=0.00..973.63 rows=4 width=48) (actual > time=61.554..4039.704 rows=1 loops=1) > -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual > time=61.552..4039.700 rows=1 loops=1) > -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) > (actual time=61.535..4039.682 rows=1 loops=1) Those estimates are pretty far off. Did you try increasing the statistics target? Also, is the first query repeatable (that is, is it already in cache when you do the test, or alternately, are all queries *out* of cache when you test?) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca wrote: > -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) > (actual time=0.089..543.497 rows=68505 loops=1) > -> Index Scan using core_accessor_fresh_idx on > core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual > time=0.068..54.921 rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = > 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND > (publish_state >= 60)) That index scan on core_accessor_fresh_idx has a pretty big disparity between what the planer expects to get (2970 rows) and what it actually gets (69312 rows). You should try increasing the statistics target if you haven't, then re-analyze and try the query again to see if the planner picks something better. The default of 10 is pretty small- try 100, or higher. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple threads inserting into the same table
On Mon, Mar 23, 2009 at 3:34 PM, Brian Cox wrote: > The application log shows that 99652 rows are being inserted into relation > ts_stats_transet_user_daily. 5 threads are doing the inserts. The schema is > lengthy, but it has a synthetic primary key (ts_id int8 not null) and the > following constraints: How many indexes are there on ts_stats_transset_user_daily? Are these rows being inserted in individual insert statements, or are they batched in some fashion? What's the disk/cpu activity on your system look like? -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] difficulties with time based queries
On Sun, Apr 5, 2009 at 7:26 PM, Rainer Mager wrote: > Bitmap Heap Scan on ad_log (cost=73372.57..3699152.24 rows=2488252 > width=32) (actual time=49792.862..64611.255 rows=2268490 loops=1) > > Recheck Cond: ((date(start_time) < '2009-03-31'::date) AND > (date(start_time) >= '2009-03-30'::date)) > > -> Bitmap Index Scan on ad_log_date_all (cost=0.00..72750.51 > rows=2488252 width=0) (actual time=49776.332..49776.332 rows=2268490 > loops=1) > > Index Cond: ((date(start_time) < '2009-03-31'::date) AND > (date(start_time) >= '2009-03-30'::date)) > > Total runtime: 65279.352 ms The stats look good and it's using a viable index for your query. What kind of hardware is this on, and what are the relevant postgresql.conf lines? (Or, for that matter, what does iostat say while this query's running?) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] superlative missuse
On Tue, May 12, 2009 at 5:53 PM, Angel Alvarez wrote: > we suffer a 'more optimal' superlative missuse > > there is not so 'more optimal' thing but a simple 'better' thing. > > im not native english speaker but i think it still applies. > > Well this a superlative list so all of you deserve a better "optimal" use. As a native english speaker: You are technically correct. However, "more optimal" has a well-understood meaning as "closer to optimal", and as such is appropriate and generally acceptable despite being technically incorrect. This is a postgres mailing list, not an english grammar mailing list... -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] superlative missuse
On Thu, May 14, 2009 at 9:08 PM, Craig James wrote: > I disagree -- it's a glaring error. "More optimized" or "better optimized" > are perfectly good, and correct, phrases. Why not use them? Every time I > read "more optimal," I am embarrassed for the person who is showing his/her > ignorance of the basics of English grammar. If I wrote, "It's more best," > would you find that acceptable? Oh, I agree it's an error- and it's one I personally avoid. But unfortunately, it's remarkably common and has been for some time- as Tom pointed out with the quote from the US Constitution. On the other hand, "more best" is more clearly a mistake because of the presence of "better" as an alternative that is both correct and commonly used. "More optimized" is infrequent enough to slip by a little more easily. > Since you replied on the list, it's only appropriate to get at least one > rebuttal. As is, of course, your certain right. I think that's enough on the list, though I'd be happy to continue off-list if there's any interest. :) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor overall performance unless regular VACUUM FULL
On Mon, Jul 13, 2009 at 3:31 PM, Wayne Conrad wrote: > Howdy. Some months back, when advised on one of these lists that it > should not be necessary to issue VACUUM FULL/REINDEX DATABASE, we quit > this nightly "maintenance" practice. We've been very happy to not > have to do that, since it locked the database all night. Since then, > however, our database performance has decreased. The decrease took a > few weeks to become noticable; perhaps six weeks to become awful. > I did a VACUUM VERBOSE and looked at the statistics at the end; they > seem to indicated that my max_fsm_pages is large enough to keep track > of all of the dead rows that are being created (we do a fair amount of > deleting as well as inserting). Postgres prints no complaint saying > we need more slots, and we have more than the number of slots needed > (if I recall, about twice as many). > > What options do I have for restoring performance other than VACUUM > FULL/REINDEX DATABASE? > Do you have autovacuum on, or otherwise replaced your VACUUM FULL with regular VACUUM? The symptoms are pretty classically those of table bloat. Since it's gotten so out of hand now, a VACUUM FULL/REINDEX is probably what you'll need to fix it. Going forward, you need *some* vacuuming strategy. Autovacuum is probably best, especially if you're on 8.3. If not autovacuum for some reason, you *must* at least do regular vacuums. Vacuum full/reindex is for fixing the situation you're in now, but a regular vacuum strategy should prevent you from getting back into it. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance
On Wed, Jul 15, 2009 at 11:10 AM, Marko Kreen wrote: > From security standpoint, wasting more cycles on bad passwords is good, > as it decreases the rate bruteforce password scanning can happen. > > And I cannot imagine a scenario where performance on invalid logins > can be relevant.. DoS attacks. The longer it takes to reject an invalid login, the fewer invalid login attempts it takes to DoS the server. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] More speed counting rows
On Mon, Jul 27, 2009 at 5:06 AM, Developer wrote: > Hello, > > I am trying to optimize the count of files when I am using filters > (select by some row/s parameter/s) > My question is: > Any method for indirect count like ordered indexes + quadratic count? > Any module? > Any suggestion? If all you need is a good-enough estimate, you can try reporting the tuples count out of the stats tables. It'll only be as up-to-date as autovac makes it, though. I do this in one app to give me ballpark figures for some constantly-growing tables. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] optimizing query with multiple aggregates
On Wed, Oct 21, 2009 at 6:51 PM, Doug Cole wrote: > > repeated across many different x,y values and fields to build out several > histograms of the data. The main culprit appears to be the CASE statement, > but I'm not sure what to use instead. I'm sure other people have had > similar queries and I was wondering what methods they used to build out data > like this? > Use group by with an appropriate division/rounding to create the appropriate buckets, if they're all the same size. select round(field/100) as bucket, count(*) as cnt from foo group by round(field/100); -- - David T. Wilson david.t.wil...@gmail.com