Re: [PERFORM] Calculating 95th percentiles

2011-03-05 Thread marcin mank
On Fri, Mar 4, 2011 at 4:18 PM, Landreville
landrevi...@deadtreepages.com wrote:

    create temporary table deltas on commit drop as
        select * from get_delta_table(p_switchport_id, p_start_date,
 p_end_date);

    select round(count(volume_id) * 0.95) into v_95th_row from deltas;
    select in_rate into v_record.in_95th from deltas where
 in_rate_order = v_95th_row;
    select out_rate into v_record.out_95th from deltas where
 out_rate_order = v_95th_row;
    select sum(in_delta), sum(out_delta) into v_record.in_total,
 v_record.out_total from deltas;

 Unfortunately using a temporary table means that I cannot run this
 query on the read-only slave, but I can't see a way around using one.

Is this fast enough on a slave:


with deltas as (select * from get_delta_table(...)),
p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
select
(select in_rate from deltas, p95 where
in_rate_order = p95v),
(select out_rate from deltas, p95 where
out_rate_order = p95v)
etc..

?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

2011-02-15 Thread marcin mank
On Tue, Feb 15, 2011 at 6:19 PM, Thomas Pöhler
t...@turtle-entertainment.de wrote:
 Hi list,

 See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg


What is the bottom graph? queries/minute? Looks like Your database is
just getting hammered.
Maybe there is a really badly coded page somewhere (a query for each
user or something similar)?

Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-15 Thread marcin mank
On Thu, Jan 14, 2010 at 8:17 PM, Carlo Stonebanks
stonec.regis...@sympatico.ca wrote:
 . 48 GB RAM
 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

There is not a 64-bit windows build now - You would be limited to
shared_buffers at about a gigabyte. Choose Linux

Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread marcin mank
 every update is a UPDATE ... WHERE id
= x AND id  x+10 and a commit is performed after every 1000 updates
 statement, i.e. every 1 rows.

What is the rationale behind this? How about doing 10k rows in 1
update, and committing every time?

You could try making the condition on the ctid column, to not have to
use the index on ID, and process the rows in physical order. First
make sure that newly inserted production data has the correct value in
the new column, and add 'where new_column is null' to the conditions.
But I have never tried this, use at Your own risk.

Greetings
Marcin Mank

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-25 Thread marcin mank
On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti kronos...@gmail.com wrote:
         -  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) 
 (a
 ctual time=571807.575..610178.552 rows=26185953 loops=1)


This is Your problem. The system`s estimate for the number of distinct
annotation_ids in t2 is wildly off.

The disk activity is almost certainly swapping (You can check it
iostat on the linux machine).

Can You try analyze t2 just before the delete quety? maybe try
raising statistics target for the annotation_id column.

If all else fails, You may try set enable_hashagg to false just
before the query.

Greetings
Marcin Mańk


Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query optimization

2009-11-23 Thread marcin mank
On Tue, Nov 24, 2009 at 12:49 AM, Faheem Mitha fah...@email.unc.edu wrote:

 Yes, sorry. I'm using Postgresql 8.4. I guess I should go through diag.pdf
 and make sure all the information is current. Thanks for pointing out my
 error.


excellent report!

about the copy problem: You seem to have created the primary key
before doing the copy (at least that`s what the dump before copy
says). This is bad. Create it after the copy.

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread marcin mank
 max_connections = 500                   # (change requires restart)
 work_mem = 256MB                                # min 64kB

 Not that it has to do with your current problem but this combination could
 bog your server if enough clients run sorted queries simultaneously.
 You probably should back on work_mem at least an order of magnitude.


 What's the correct way to configure this?

 * We have one client which needs to run really big transactions (therefore
 needs the work memory).


You can set the work_mem for the specific user (like set work_mem to
x) at the begginning of the session.

Here are some things I noticed (it is more like shooting in the dark,
but still...)

the expensive part is this:
 -  Sort
(cost=280201.66..281923.16 rows=688602 width=300) (actual
time=177511.806..183486.593 rows=41317448 loops=1)

Sort Key:
du_report_sku.wid, du_report_sku.storeorderid,
du_report_sku.genreorderid

Sort Method:  external
sort  Disk: 380768kB
-  HashAggregate
(cost=197936.75..206544.27 rows=688602 width=36) (actual
time=7396.426..11224.839 rows=6282564 loops=1)
  -  Seq Scan on
du_report_sku  (cost=0.00..111861.61 rows=6886011 width=36) (actual
time=0.006..573.419 rows=6897682 loops=1)


(it is pretty confusing that the HashAggregate reports ~6M rows, but
the sort does 41M rows, but maybe I can not read this).
Anyway, I think that if You up the work_mem for this query to 512M,
the sort will be in memory, an thus plenty faster.

Also, You say You are experiencing unstable query plans, and this may
mean that geqo is kicking in (but Your query seems too simple for
that, even considering the views involved). A quick way to check that
would be to run explain the query a coule tens of times, and check
if the plans change. If they do, try upping geqo_threshold.

You have seq_page_cost 4 times larger than random_page_cost. You say
You are on SSD, so there is no random access penalty. Try setting them
equal.

Your plan is full of merge-joins, some indices may be in order. Merge
join is a kind of last-chance plan.

the query is :
SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM
core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand
USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id;

Isn`t the left join equivalent to an inner join, since in where You
are comparing values from the outer side of the join? If they come out
nulls, they will get discarded anyway...

I hope You find some of this useful.

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread marcin mank
There is one thing I don`t understand:

  -  Nested Loop  (cost=0.00..180564.28 rows=1806
width=37) (actual time=0.192..60.214 rows=3174 loops=1)
-  Index Scan using visitors_userid_index2 on
visitors v  (cost=0.00..2580.97 rows=1300 width=33) (actual
time=0.052..2.342 rows=897 loops=1)
  Index Cond: (userid =
'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
-  Index Scan using
itemexperiencelog__index__visitorid on itemexperiencelog l
(cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4
loops=897)
  Index Cond: (l.visitorid = v.id)

If it expects 1300 visitors with the userid, and for each of them to
have 230 entries in itemexperiencelog, how can it come up with 1806
returned rows (and be about right!)?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
 So the bottom line here is just that the estimated n_distinct is too
 low.  We've seen before that the equation we use tends to do that more
 often than not.  I doubt that consistently erring on the high side would
 be better though :-(.  Estimating n_distinct from a limited sample of
 the population is known to be a statistically hard problem, so we'll
 probably not ever have perfect answers, but doing better is on the
 to-do list.


I hit an interestinhg paper on n_distinct calculation:

http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.

This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).

What do You think?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
 I hit an interestinhg paper on n_distinct calculation:

 http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

 the PCSA algorithm described there requires O(1) calculation per
 value. Page 22 describes what to do with updates streams.

 This I think (disclaimer: I know little about PG internals) means that
 the n_distinct estimation can be done during vacuum time (it would
 play well with the visibility map addon).

 What do You think?

ok, if You think that calculating a has function of every data field
for each insert or delete is prohibitive, just say so and don`t bother
reading the paper :]

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] not in(subselect) in 8.4

2009-02-20 Thread marcin mank
 Just as a question to Tom and team,

maybe it`s time for asktom.postgresql.org?  Oracle has it :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Marcin Mank
 Is there anything I'm missing that is preventing it from using the index?
It
 just seems weird to me that other joins like this work fine and fast
 with indexes,
 but this one won't.


Did You consider clustering both tables on the dsiacctno index?

I just checked that for a 4M rows table even with enable_seqscan=on and
default *page_cost on PG 8.1.4 an index scan is being chosen for
select * from table order by serial_pkey_field


This is essentially the question in Your case - sort it, or get it sorted
via the index at the expense of more random IO.

I think clustering should work for You, but I am no expert, check with
others.

Greetings
Marcin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Confirmation of bad query plan generated by 7.4 tree

2006-06-13 Thread Marcin Mank
 warehouse-# WHERE e.event_date  now() - interval '2 days'

Try explicitly querying:
WHERE e.event_date  '2006-06-11 20:15:00'

In my understanding 7.4 does not precalculate this timestamp value for the
purpose of choosing a plan.

Greetings
Marcin


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster