[PERFORM] Fwd: Stalled post to pgsql-performance

2017-06-27 Thread Chris Wilson
Hi Karl and Jeff,

On 26 June 2017 at 22:22, Jeff Janes <jeff.ja...@gmail.com> wrote:

> Be warned that "explain (analyze)" can substantially slow down and distort
> this type of query, especially when sorting.  You should run "explain
> (analyze, timing off)" first, and then only trust "explain (analyze)" if
> the overall execution times between them are similar.
>

Thanks, I didn't realise that. I will use TIMING OFF from now on.

On 26 June 2017 at 21:32, Karl Czajkowski <kar...@isi.edu> wrote:

> > I created the index starting with date and it did make a big
> > difference: down to 10.3 seconds using a bitmap index scan and bitmap
> > heap scan (and then two hash joins as before).
>
> By the way, what kind of machine are you using?  CPU, RAM, backing
> storage?
>
> I tried running your original test code and the query completed in
> about 8 seconds, and adding the index changes and analyze statement
> brought it down to around 2.3 seconds on my workstation with Postgres
> 9.5.7.  On an unrelated development VM with Postgres 9.6.3, the final
> form took around 4 seconds.
>

This is very interesting. I'm using a powerful box:

   - HP ProLiant DL580 G7, Xeon(R) CPU E7- 4850  @ 2.00GHz * 80 cores, 128
   GB RAM, hardware RAID, 3.6 TB SAS array.

  totalusedfree  shared  buff/cache
available
Mem:   125G2.2G834M 30G122G
91G
Swap:  9.3G 98M9.2G


And disk I/O is fast:

$ dd if=/dev/zero of=/local/tmp/bigfile bs=1M count=100k
107374182400 bytes (107 GB) copied, 234.751 s, 457 MB/s


But your question let me to investigate and discover that we were compiling
Postgres with no optimisations! I've built a new one with -O2 and got the
time down to 3.6 seconds (EXPLAIN with either TIMING OFF or BUFFERS,
there's no material difference).

And again, vacuum your tables.  Heap fetches aren't cheap.
>

Sorry, I don't understand, why does VACUUM help on a table with no deleted
rows? Do you mean ANALYZE?


> > work_mem = 100MB
>
> Can you give it more than that?  How many simultaneous connections do you
> expect?
>

Yes, I can and it does help! By increasing work_mem to 200 MB, I managed to
convert the external merge sort (on disk) to a quicksort in memory, and
reached 3.3 seconds.

The cartestian join is slightly faster at 3.0 seconds, but not enough to be
worth the overhead of creating the join table. I still wish I understood
why it helps.

Jeff, thanks for the explanation about hash joins and sorting. I wish I
understood why a hash join wouldn't preserve order in the first table even
if it has to be done incrementally, since I expect that we'd still be
reading records from the first table in order, but just in batches.

Other possible rewrites to try instead of joins:
>
>   -- replace the case statement with a scalar subquery
>
>   -- replace the case statement with a stored procedure wrapping that
> scalar subquery
>  and declare the procedure as STABLE or even IMMUTABLE
>
> These are shots in the dark, but seem easy enough to experiment with and
> might
> behave differently if the query planner realizes it can cache results for
> repeated use of the same ~100 input values.


I hit a jackpot with jsonb_object_agg, getting down to 2.1 seconds (2.8
with BUFFERS and TIMING <https://explain.depesz.com/s/uWyM>):

explain (analyze, timing off)
with metric as (select jsonb_object_agg(id, pos) AS metric_lookup from
metric_pos),
 asset  as (select jsonb_object_agg(id, pos) AS asset_lookup  from
asset_pos)
SELECT metric_lookup->id_metric AS pos_metric, asset_lookup->id_asset AS
pos_asset, date, value
FROM metric_value, metric, asset
WHERE date >= '2016-01-01' and date < '2016-06-01'
AND timerange_transaction @> current_timestamp
ORDER BY metric_value.id_metric, metric_value.id_asset, date;


Which is awesome! Thank you so much for your help, both of you!

Now if only we could make hash joins as fast as JSONB hash lookups :)

Cheers, Chris.


Re: [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Chris Wilson
till very happy for any ideas on
how to get back the 2 seconds longer <https://explain.depesz.com/s/NgfZ>
than it takes without any joins to the dimension tables (3.7 seconds), or
explain why the cartesian join helps and/or how we can get the same speedup
without materialising it.

SELECT id_metric, id_asset, date, value
FROM metric_value
WHERE
date >= '2016-01-01' and date < '2016-06-01'
AND timerange_transaction @> current_timestamp
ORDER BY date, metric_value.id_metric;


Cheers, Chris.


[PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-23 Thread Chris Wilson
ly small integers. They would probably be temporary
tables in our real use case. But removing them (and just selecting the IDs
from metric_value) cuts 4 seconds off the query time (to 3.3 seconds). Why
are they slow?

If I remove one of the joins (asset_pos) then I get a merge join between
two indexes, as expected, but it has a materialize just before it which
makes no sense to me. Why do we need to materialize here? And why
materialise 100 rows into 1.5 million rows? (explain.depesz.com
<https://explain.depesz.com/s/7mkM>)

SELECT metric_pos.pos AS pos_metric, id_asset AS pos_asset, date, value
FROM metric_value
INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric
WHERE
date >= '2016-01-01' and date < '2016-06-01'
AND timerange_transaction @> current_timestamp
ORDER BY metric_value.id_metric, metric_value.id_asset, date

 Merge Join  (cost=0.70..209302.76 <070%202093%200276> rows=1520071
width=28) (actual time=0.097..4899.972 rows=152 loops=1)
   Merge Cond: (metric_value.id_metric = metric_pos.id)
   Buffers: shared hit=76403
   ->  Index Only Scan using idx_metric_value_id_metric_id_asset_date on
metric_value  (cost=0.56..182696.87 <056%201826%209687> rows=1520071
width=20) (actual time=0.074..3259.870 rows=152 lo
ops=1)
 Index Cond: ((date >= '2016-01-01'::date) AND (date <
'2016-06-01'::date))
 Filter: (timerange_transaction @> now())
 Heap Fetches: 152
 Buffers: shared hit=76401
   ->  Materialize  (cost=0.14..4.89 rows=100 width=8) (actual
time=0.018..228.265 rows=1504801 loops=1)
 Buffers: shared hit=2
 ->  Index Only Scan using idx_metric_pos_id_pos on metric_pos
 (cost=0.14..4.64 rows=100 width=8) (actual time=0.013..0.133 rows=100
loops=1)
   Heap Fetches: 100
   Buffers: shared hit=2
 Planning time: 0.761 ms
 Execution time: 5253.260 ms


The size of the result set is approximately 91 MB (measured with psql -c |
wc -c). Why does it take 4 seconds to transfer this much data over a UNIX
socket on the same box? Can it be made faster? The data is quite redundant
(it's sorted for a start) so compression makes a big difference, and simple
prefix elimination could probably reduce the volume of redundant data sent
back to the client.

Standard background info:

   - PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
   20150623 (Red Hat 4.8.5-4), 64-bit, compiled from source.
   - shared_buffers = 15GB, work_mem = 100MB, seq_page_cost =
   0.5, random_page_cost = 1.0, cpu_tuple_cost = 0.01.
   - HP ProLiant DL580 G7, Xeon(R) CPU E7- 4850  @ 2.00GHz * 80 cores,
   hardware RAID, 3.6 TB SAS array.

Thanks again in advance for any suggestions, hints or questions.

Cheers, Chris.


Re: [PERFORM] Big number of connections

2016-03-31 Thread Chris Cogdon
Although somewhat academic, since pgboucer doesn’t support it (and might not 
ever), have a look at this ticket which, if it was ever supported, would give 
you what you needed:

https://github.com/pgbouncer/pgbouncer/issues/75 
<https://github.com/pgbouncer/pgbouncer/issues/75>


> On Mar 31, 2016, at 15:47, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> 
>> If they are going to keep the client connections open, they would need to
>> run pgbouncer in statement or transaction mode.
> 
> As I understand, in pgbouncer you cannot have connections that serve
> different users.  If each individual requires its own database-level
> user, pgbouncer would not help at all.
> 
> I would look seriously into getting rid of the always-open requirement
> for connections.

— Chris Cogdon

[PERFORM] Adding a ROLLUP switches to GroupAggregate unexpectedly

2016-03-31 Thread Chris Cogdon
Hi folks! I’ve a query where adding a rollup to the group by switches to 
GroupAggregate unexpectedly, where the standard GROUP BY uses HashAggregate. 
Since the rollup should only add one additional bucket, the switch to having to 
sort (and thus a to-disk temporary file) is very puzzling. This reads like a 
query optimiser bug to me. This is the first I’ve posted to the list, please 
forgive me if I’ve omitted any “before bugging the list” homework.


Description: Adding a summary row by changing “GROUP BY x” into “GROUP BY 
ROLLUP (x)” should not cause a switch from HashAggregate to GroupAggregate


Here’s the “explain” from the simple GROUP BY:

projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM 
api_activities GROUP BY error_code;
 QUERY PLAN 
 
-
 HashAggregate  (cost=3456930.11..3456930.16 rows=5 width=2) (actual 
time=26016.222..26016.223 rows=5 loops=1)
   Output: error_code, count(*)
   Group Key: api_activities.error_code
   ->  Seq Scan on public.api_activities  (cost=0.00..3317425.74 rows=27900874 
width=2) (actual time=0.018..16232.608 rows=36224844 loops=1)
 Output: id, client_id, date_added, kind, activity, error_code
 Planning time: 0.098 ms
 Execution time: 26016.337 ms
(7 rows)

Changing this to a GROUP BY ROLLUP switches to GroupAggregate (with the 
corresponding to-disk temporary table being created):

projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM 
api_activities GROUP BY rollup (error_code);
QUERY PLAN  
   
---
 GroupAggregate  (cost=7149357.90..7358614.52 rows=6 width=2) (actual 
time=54271.725..82354.144 rows=6 loops=1)
   Output: error_code, count(*)
   Group Key: api_activities.error_code
   Group Key: ()
   ->  Sort  (cost=7149357.90..7219110.09 rows=27900874 width=2) (actual 
time=54270.636..76651.121 rows=36222428 loops=1)
 Output: error_code
 Sort Key: api_activities.error_code
 Sort Method: external merge  Disk: 424864kB
 ->  Seq Scan on public.api_activities  (cost=0.00..3317425.74 
rows=27900874 width=2) (actual time=0.053..34282.239 rows=36222428 loops=1)
   Output: error_code
 Planning time: 2.611 ms
 Execution time: 82437.416 ms
(12 rows)


I’ve given the output of “EXPLAIN ANAYLZE VERBOSE” rather than non-analyze, but 
there was no difference in the plan.

Running VACUUM FULL ANALYZE on this table makes no difference. Switching to 
Count(error_code) makes no difference. Using GROUP BY GROUPING SETS ((), 
error_code) makes no difference.

I understand that a HashAggregate is possible only if it can fit all the 
aggregates into work_mem. There are 5 different error codes, and the statistics 
(from pg_stats) are showing that PG knows this. Adding just one more bucket for 
the “()” case should not cause a fallback to GroupAggregate.


PostgreSQL version: 9.5.2 (just upgraded today, Thank you! <3 )

(Was exhibiting same problem under 9.5.0)


How installed: apt-get package from apt.postgresql.org 
<http://apt.postgresql.org/>


Settings differences:

 application_name: psql
 client_encoding: UTF8
 DateStyle: ISO, MDY
 default_text_search_config: pg_catalog.english
 dynamic_shared_memory_type: posix
 lc_messages: en_US.UTF-8
 lc_monetary: en_US.UTF-8
 lc_numeric: en_US.UTF-8
 lc_time: en_US.UTF-8
 listen_addresses: *
 log_line_prefix: %t [%p-%c-%l][%a][%i][%e][%s][%x-%v] %q%u@%d 
 log_timezone: UTC
 logging_collector: on
 max_connections: 100
 max_stack_depth: 2MB
 port: 5432
 shared_buffers: 1GB
 ssl: on
 ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem
 ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key
 TimeZone: UTC
 work_mem: 128MB


OS and Version: Ubuntu Trusty: Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 
7 15:20:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


Program used to connect: psql


Nothing unusual in the logs, apart from the query indicating that it took a 
while to run.


I know that there’s several workarounds I can use for this simple case, such as 
using a CTE, then doing a rollup on that, but I’m simply reporting what I think 
is a bug in the query optimizer.


Thank you for your attention! Please let me know if there’s any additional 
information you need, or additional tests you’d like to run.


— Chris Cogdon <ch...@cogdon.org <mailto:ch...@cogdon.org>>
— Using PostgreSQL since 6.2! 






[PERFORM] Building multiple indexes on one table.

2014-07-17 Thread Chris Ruprecht
Is there any way that I can build multiple indexes on one table without having 
to scan the table multiple times? For small tables, that's probably not an 
issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't 
want to read that table 6 times.
Nothing I could find in the manual other than reindex, but that's not helping, 
since it only rebuilds indexes that are already there and I don't know if that 
reads the table once or multiple times. If I could create indexes inactive and 
then run reindex, which then reads the table once, I would have a solution. But 
that doesn't seem to exist either.

best regards,
chris
-- 
chris ruprecht
database grunt and bit pusher extraordinaíre



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco
robert.difa...@gmail.comwrote:

 I have several related tables that represent a call state. Let's think of
 these as phone calls to simplify things. Sometimes I need to determine the
 last time a user was called, the last time a user answered a call, or the
 last time a user completed a call.

 The basic schema is something like this:

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 CREATE TABLE calls_answered (
   idBIGINT NOT NULL,
   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
 );


 And so on for calls_connected, calls_completed, call_errors, etc.

 Occasionally I will want to know things like When was the last time a
 user answered a call or How many times has a user been called.

 I can do these queries using a combination of MAX or COUNT. But I'm
 concerned about the performance.

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Or the number of answered calls:

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Sometimes I might want to get this data for a whole bunch of users. For
 example, give me all users whose have not answered a call in the last 5
 days. Or even what percentage of users called actually answered a call.
 This approach could become a performance issue. So the other option is to
 create a call_summary table that is updated with triggers.

 The summary table would need fields like user_id, last_call_id,
 call_count, last_answered_id, answered_count, last_completed_id,
 last_completed_count, etc.

 My only issue with a summary table is that I don't want a bunch of null
 fields. For example, if the user was *called* but they have never
 *answered* at call then the last_call_id and call_count fields on the
 summary table would be non-NULL but the last_answer_id and answer_count
 fields WOULD be NULL. But over time all fields would eventually become
 non-NULL.

 So that leads me to a summary table for EACH call state. Each summary
 table would have a user id, a ref_id, and a count -- one summary table for
 each state e.g. call_summary, call_answered_summary, etc.

 This approach has the down side that it creates a lot of tables and
 triggers. It has the upside of being pretty efficient without having to
 deal with NULL values.  It's also pretty easy to reason about.

 So for my question -- is the choice between these a personal preference
 sort of thing or is there a right or wrong approach? Am I missing another
 approach that would be better?  I'm okay with SQL but I'm not expert so I'm
 not sure if there is an accepted DESIGN PATTERN for this that I am missing.

 Thanks!



My initial thought is:  that design is over-normalized. The thing you are
trying to model is the call, and it has severl attributes, some of which
may be unknown or not applicable (which is what NULL is for).  So my
thought would be to do something like this:

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
answered  TIMESTAMPTZ

  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);


-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey ch...@chriscurvey.comwrote:

 On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco robert.difa...@gmail.com
  wrote:

 I have several related tables that represent a call state. Let's think of
 these as phone calls to simplify things. Sometimes I need to determine the
 last time a user was called, the last time a user answered a call, or the
 last time a user completed a call.

 The basic schema is something like this:

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 CREATE TABLE calls_answered (
   idBIGINT NOT NULL,
   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
 );


 And so on for calls_connected, calls_completed, call_errors, etc.

 Occasionally I will want to know things like When was the last time a
 user answered a call or How many times has a user been called.

 I can do these queries using a combination of MAX or COUNT. But I'm
 concerned about the performance.

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Or the number of answered calls:

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Sometimes I might want to get this data for a whole bunch of users. For
 example, give me all users whose have not answered a call in the last 5
 days. Or even what percentage of users called actually answered a call.
 This approach could become a performance issue. So the other option is to
 create a call_summary table that is updated with triggers.

 The summary table would need fields like user_id, last_call_id,
 call_count, last_answered_id, answered_count, last_completed_id,
 last_completed_count, etc.

 My only issue with a summary table is that I don't want a bunch of null
 fields. For example, if the user was *called* but they have never
 *answered* at call then the last_call_id and call_count fields on the
 summary table would be non-NULL but the last_answer_id and answer_count
 fields WOULD be NULL. But over time all fields would eventually become
 non-NULL.

 So that leads me to a summary table for EACH call state. Each summary
 table would have a user id, a ref_id, and a count -- one summary table for
 each state e.g. call_summary, call_answered_summary, etc.

 This approach has the down side that it creates a lot of tables and
 triggers. It has the upside of being pretty efficient without having to
 deal with NULL values.  It's also pretty easy to reason about.

 So for my question -- is the choice between these a personal preference
 sort of thing or is there a right or wrong approach? Am I missing another
 approach that would be better?  I'm okay with SQL but I'm not expert so I'm
 not sure if there is an accepted DESIGN PATTERN for this that I am missing.

 Thanks!



 (Sorry, fat-fingered and hit send too early...)

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  answered  TIMESTAMPTZ NULL,
  connected  TIMESTAMPTZ NULL,
  completed  TIMESTAMPTZ NULL,


  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);

Then  your queries end up looking like this:

--last time john answered
SELECT MAX(a.id)
FROM calls
where answered is not null
and user_id = ?

-- users that have not answered a call in the last five days (I can think
of a few ways to interpret that phrase)
select myusers.*
from myusers
where not exists
(  select *
  from calls
  where calls.user_id = myusers.user_id
  and answered = five days ago)

-- average ring time
select avg(extract ('seconds' from called - answered))
where answered is not null



-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


[PERFORM] Using a window function in a view

2013-02-28 Thread Chris Hanks
I'm trying to create a view that uses a window function, but it seems that
Postgres is apparently unable to optimize it. Here's a reproduction of my
situation with 9.2.2:

---

drop table if exists values cascade; create table values ( fkey1 integer
not null, fkey2 integer not null, fkey3 integer not null, value float not
null, constraint values_pkey primary key (fkey1, fkey2, fkey3) ); -- Kind
of hacky, but it roughly resembles my dataset. insert into values select
distinct on (fkey1, fkey2, fkey3) i / 12 + 1 as fkey1, i % 4 + 1 as fkey2,
ceil(random() * 10) as fkey3, random() * 2 - 1 as value from
generate_series(0, 19) i; create or replace view values_view as select
fkey1, fkey3, (derived1 / max(derived1) over (partition by fkey1)) as
derived1, (derived2 / sum(derived1) over (partition by fkey1)) as derived2
from ( select fkey1, fkey3, cast(sum((case when (value  0.0) then 4 else 1
end)) as double precision) as derived1, sum((case when (value  0.0) then
(value * 4) else (value + 1) end)) as derived2 from values group by fkey1,
fkey3 ) as t1;
-- This query requires a sequential scan on values, though all the data it
needs could be found much more efficiently with an index scan. explain
analyze select * from values_view where fkey1 = 1263;

---

Can anyone suggest a way to rewrite this query, or maybe a workaround of
some kind?

Thanks, Chris


[PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
  audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
  audittrailid text,
  audittraileventcode text,
  clinicalactivity text,
  eventsuccessful text,
  externalunique text,
  recordstamp timestamp without time zone,
  recorddescription text,
  encountersid integer,
  eventuserlogon text,
  computername text,
  applicationcode text,
  practiceid integer,
  mmusersid integer,
  entitysid integer,
  entityname text,
  timestamp timestamp without time zone,
  lastuser integer,
  createstamp timestamp without time zone,
  pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE audittrailclinical
  OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
  ON audittrailclinical
  USING btree
  (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
  ON audittrailclinical
  USING btree
  (audittrailid COLLATE pg_catalog.default );






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
 audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 timestamp timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
 OIDS=FALSE
);
ALTER TABLE audittrailclinical
 OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog.default );


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht

On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote:

 Selecting 5 yours of data is not selective at all, so postgres decides it is 
 cheaper to do seqscan. 
 
 Do you have an index on patient.dnsortpersonnumber? Can you post a result 
 from 
 select count(*) from patient where dnsortpersonnumber = '347450'; ?
 

Yes, there is an index:

Aggregate  (cost=6427.06..6427.07 rows=1 width=0)
  -  Index Scan using patient_pracsortpatientnumber on patient  
(cost=0.00..6427.06 rows=1 width=0)
Index Cond: (dnsortpersonnumber = '347450'::text)


In fact, all the other criteria is picked using an index. I fear that the = 
and = on the timestamp is causing the issue. If I do a = of just one of 
them, I get an index scan. But I need to scan the entire range. I get queries 
like give me everything that was entered into the system for this patient 
between these two dates. A single date wouldn't work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Thanks Bruce, 

I have, and I even thought, I understood it :). 

I just ran an explain analyze on another table - and ever since the query plan 
changed. It's now using the index as expected. I guess, I have some more 
reading to do.

On Oct 16, 2012, at 20:31 , Bruce Momjian br...@momjian.us wrote:

 
 Have you read our FAQ on this matter?
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Chris Hanks

Daniel Farina-4 wrote
 
 On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer lt;ringerc@.idgt; wrote:
 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;
 
 I have seen in trivial cases -- in terms of data size -- where
 TRUNCATE is much slower than a full-table DELETE.  The most common use
 case for that is rapid setup/teardown of tests, where it can add up
 quite quickly and in a very big way. This is probably an artifact the
 speed of one's file system to truncate and/or unlink everything.
 
 I haven't tried a multi-truncate though.  Still, I don't know a
 mechanism besides slow file system truncation time that would explain
 why DELETE would be significantly faster.
 
 -- 
 fdr
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 

That's my experience - I have a set of regression tests that clean the
database (deletes everything from a single parent table and lets the
referential integrity checks cascade to delete five other tables) at the end
of each test run, and it can complete 90 tests (including 90 mass deletes)
in a little over five seconds. If I replace that simple delete with a
truncation of all six tables at once, my test run balloons to 42 seconds.

I run my development database with synchronous_commit = off, though, so I
guess TRUNCATE has to hit the disk while the mass delete doesn't.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PostgreSQL-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way-to-clean-each-non-empty-tab-tp5715643p5715734.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Select from sequence in slow query log

2012-06-01 Thread Chris Rimmer
While investigating some performance issues I have been looking at slow
queries logged to the postgresql.log file.  A strange thing that I have
seen is a series of apparently very slow queries that just select from a
sequence. It is as if access to a sequence is blocked for many sessions and
then released as I get log entries like this appearing:

LOG:  duration: 23702.553 ms  execute unnamed: /* dynamic native SQL
query */ select nextval ('my_sequence') as nextval
LOG:  duration: 23673.068 ms  execute unnamed: /* dynamic native SQL
query */ select nextval ('my_sequence') as nextval
LOG:  duration: 23632.729 ms  execute unnamed: /* dynamic native SQL
query */ select nextval ('my_sequence') as nextval
(Many similar lines)
LOG:  duration: 3055.057 ms  execute unnamed: /* dynamic native SQL query
*/ select nextval ('my_sequence') as nextval
LOG:  duration: 2377.621 ms  execute unnamed: /* dynamic native SQL query
*/ select nextval ('my_sequence') as nextval
LOG:  duration: 743.732 ms  execute unnamed: /* dynamic native SQL query
*/ select nextval ('my_sequence') as nextval

The code is being executed via Hibernate, but using
Session.createSQLQuery(), so the SQL above appears in the source as above
(minus the comment) and not as part of any ORM magic. We are using
Postgresql 9.0.

This seems very strange to me. What could cause a sequence to be locked for
such a long time?
The sequence in question has cache set at 1. Would setting this higher make
any difference?

Thanks

Chris


Re: [PERFORM] Select from sequence in slow query log

2012-06-01 Thread Chris Rimmer
It looks like this effect only occurs in the middle of the night when there
is some kind of automated dump process going on and the system is under
higher than normal load. I haven't managed to reproduce them outside of
production, but since these oddities don't seem to show up during normal
operations, I'm not worrying too much about them now.

Thanks

Chris

On 1 June 2012 14:47, Tom Lane t...@sss.pgh.pa.us wrote:

 Chris Rimmer chr...@we7.com writes:
  While investigating some performance issues I have been looking at slow
  queries logged to the postgresql.log file.  A strange thing that I have
  seen is a series of apparently very slow queries that just select from a
  sequence. It is as if access to a sequence is blocked for many sessions
 and
  then released as I get log entries like this appearing:

  LOG:  duration: 23702.553 ms  execute unnamed: /* dynamic native SQL
  query */ select nextval ('my_sequence') as nextval
  LOG:  duration: 23673.068 ms  execute unnamed: /* dynamic native SQL
  query */ select nextval ('my_sequence') as nextval
  LOG:  duration: 23632.729 ms  execute unnamed: /* dynamic native SQL
  query */ select nextval ('my_sequence') as nextval
  (Many similar lines)

 That's pretty weird.  What else is being done to that sequence?  Is it
 only the sequence ops that are slow, or does this happen at times when
 everything else is slow too?  Can you create a reproducible test case?

regards, tom lane




[PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread chris
Hi list,

My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run
our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of
tables, 200GB of indexes, and grows ~5%/month. The DB is not update
critical, but undergoes larger read and insert operations frequently.

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us. We are now looking for the remaining server parts to maximize DB
performance with costs = $4000. I digged out the following
configuration with the discount we receive from Dell:

  1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
  16 GB (4x4GB) Low Volt DDR3 1066Mhz
  PERC H700 SAS RAID controller
  4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10

I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Any comments on the configuration? Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Thanks so much!

Best,
Chris

[1]: http://www.b2net.co.uk/netapp/fas3000.pdf


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread chris r.
Hi list,

Thanks a lot for your very helpful feedback!

 I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
 seriously good performance relative to the dollars spent
Great hint, but I'm afraid that's too expensive for us. But it's a great
way to scale over the years, I'll keep that in mind.

I had a look at other server vendors who offer 4U servers with slots for
16 disks for 4k in total (w/o disks), maybe that's an even
cheaper/better solution for us. If you had the choice between 16 x 2TB
SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
disk) for data, what would you choose performance-wise?

Again, thanks so much for your help.

Best,
Chris

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Benchmarking a large server

2011-05-09 Thread Chris Hoover
I've got a fun problem.

My employer just purchased some new db servers that are very large.  The
specs on them are:

4 Intel X7550 CPU's (32 physical cores, HT turned off)
1 TB Ram
1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)
3TB Sas Array (48 15K 146GB spindles)

The issue we are running into is how do we benchmark this server,
specifically, how do we get valid benchmarks for the Fusion IO card?
 Normally to eliminate the cache effect, you run iozone and other benchmark
suites at 2x the ram.  However, we can't do that due to 2TB  1.3TB.

So, does anyone have any suggestions/experiences in benchmarking storage
when the storage is smaller then 2x memory?

Thanks,

Chris


[PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I have a table that I need to rebuild indexes on from time to time (records get 
loaded before indexes get build).

To build the indexes, I use 'create index ...', which reads the entire table 
and builds the index, one at a time.
I'm wondering if there is a way to build these indexes in parallel while 
reading the table only once for all indexes and building them all at the same 
time. Is there an index build tool that I missed somehow, that can do this?

Thanks,
Chris. 



best regards,
chris
-- 
chris ruprecht
database grunt and bit pusher extraordinaíre


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I'm running 2 tests now, one, where I'm doing the traditional indexing, in 
sequence. The server isn't doing anything else, so I should get pretty accurate 
results.
Test 2 will win all the create index sessions in separate sessions in parallel 
(echo create index ...|psql ...  ) once the 'serial build' test is done.

Maybe, in a future release, somebody will develop something that can create 
indexes as inactive and have a build tool build and activate them at the same 
time. Food for thought?
 
On Apr 9, 2011, at 13:10 , Tom Lane wrote:

 Chris Ruprecht ch...@ruprecht.org writes:
 I have a table that I need to rebuild indexes on from time to time (records 
 get loaded before indexes get build).
 To build the indexes, I use 'create index ...', which reads the entire table 
 and builds the index, one at a time.
 I'm wondering if there is a way to build these indexes in parallel while 
 reading the table only once for all indexes and building them all at the 
 same time. Is there an index build tool that I missed somehow, that can do 
 this?
 
 I don't know of any automated tool, but if you launch several CREATE
 INDEX operations on the same table at approximately the same time (in
 separate sessions), they should share the I/O required to read the
 table.  (The synchronized scans feature guarantees this in recent
 PG releases, even if you're not very careful about starting them at
 the same time.)
 
 The downside of that is that you need N times the working memory and
 you will have N times the subsidiary I/O for sort temp files and writes
 to the finished indexes.  Depending on the characteristics of your I/O
 system it's not hard to imagine this being a net loss ... but it'd be
 interesting to experiment.
 
   regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] good old VACUUM FULL

2011-03-22 Thread Chris

On 23/03/11 11:52, felix wrote:

I posted many weeks ago about a severe problem with a table that was
obviously bloated and was stunningly slow. Up to 70 seconds just to get
a row count on 300k rows.

I removed the text column, so it really was just a few columns of fixed
data.
Still very bloated.  Table size was 450M

The advice I was given was to do CLUSTER, but this did not reduce the
table size in the least.
Nor performance.

Also to resize my free space map (which still does need to be done).
Since that involves tweaking the kernel settings, taking the site down
and rebooting postgres and exposing the system to all kinds of risks and
unknowns and expensive experimentations I was unable to do it and have
had to hobble along with a slow table in my backend holding up jobs.

Much swearing that nobody should ever do VACUUM FULL.  Manual advises
against it.  Only crazy people do that.


snip


moral of the story:  if your table is really bloated, just do VACUUM FULL


You'll need to reindex that table now - vacuum full can bloat your 
indexes which will affect your other queries.


reindex table fastadder_fastadderstatus;

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-10 Thread Chris Browne
robertmh...@gmail.com (Robert Haas) writes:
 On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Well, I'm comfortable digging in my heels against doing *lame* hints
 just because it's what all the other kids are doing, which I think
 is the only thing which would have satisfied the OP on this thread.
 From both on-list posts and ones exchanged off-list with me, it
 seems he was stubbornly resistant to properly tuning the server to
 see if any problems remained, or posting particular problems to see
 how they would be most effectively handled in PostgreSQL.  We
 obviously can't be drawn into dumb approaches because of
 ill-informed demands like that.

 Nor was I proposing any such thing.  But that doesn't make we don't
 want hints an accurate statement.  Despite the impression that OP
 went away with, the real situation is a lot more nuanced than that,
 and the statement on the Todo list gives the wrong impression, IMHO.

I have added the following comment to the ToDo:

   We are not interested to implement hints in ways they are commonly
   implemented on other databases, and proposals based on because
   they've got them will not be welcomed.  If you have an idea that
   avoids the problems that have been observed with other hint systems,
   that could lead to valuable discussion.

That seems to me to characterize the nuance.
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://www3.sympatico.ca/cbbrowne/languages.html
If only women came with pull-down menus and online help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-04 Thread Chris Browne
gnuo...@rcn.com writes:
 Time for my pet meme to wiggle out of its hole (next to Phil's, and a
 day later).  For PG to prosper in the future, it has to embrace the
 multi-core/processor/SSD machine at the query level.  It has to.  And
 it has to because the Big Boys already do so, to some extent, and
 they've realized that the BCNF schema on such machines is supremely
 efficient.  PG/MySql/OSEngineOfChoice will get left behind simply
 because the efficiency offered will be worth the price.

 I know this is far from trivial, and my C skills are such that I can
 offer no help.  These machines have been the obvious current machine
 in waiting for at least 5 years, and those applications which benefit
 from parallelism (servers of all kinds, in particular) will filter out
 the winners and losers based on exploiting this parallelism.

 Much as it pains me to say it, but the MicroSoft approach to software:
 write to the next generation processor and force users to upgrade,
 will be the winning strategy for database engines.  There's just way
 too much to gain.

I'm not sure how true that is, really.  (e.g. - too much to gain.)

I know that Jan Wieck and I have been bouncing thoughts on valid use of
threading off each other for *years*, now, and it tends to be
interesting but difficult to the point of impracticality.

But how things play out are quite fundamentally different for different
usage models.

It's useful to cross items off the list, so we're left with the tough
ones that are actually a problem.

1.  For instance, OLTP applications, that generate a lot of concurrent
connections, already do perfectly well in scaling on multi-core systems.
Each connection is a separate process, and that already harnesses
multi-core systems perfectly well.  Things have improved a lot over the
last 10 years, and there may yet be further improvements to be found,
but it seems pretty reasonable to me to say that the OLTP scenario can
be treated as solved in this context.

The scenario where I can squint and see value in trying to multithread
is the contrast to that, of OLAP.  The case where we only use a single
core, today, is where there's only a single connection, and a single
query, running.

But that can reasonably be further constrained; not every
single-connection query could be improved by trying to spread work
across cores.  We need to add some further assumptions:

2.  The query needs to NOT be I/O-bound.  If it's I/O bound, then your
system is waiting for the data to come off disk, rather than to do
processing of that data.

That condition can be somewhat further strengthened...  It further needs
to be a query where multi-processing would not increase the I/O burden.

Between those two assumptions, that cuts the scope of usefulness to a
very considerable degree.

And if we *are* multiprocessing, we introduce several new problems, each
of which is quite troublesome:

 - How do we decompose the query so that the pieces are processed in
   ways that improve processing time?

   In effect, how to generate a parallel query plan?

   It would be more than stupid to consider this to be obvious.  We've
   got 15-ish years worth of query optimization efforts that have gone
   into Postgres, and many of those changes were not obvious until
   after they got thought through carefully.  This multiplies the
   complexity, and opportunity for error.

 - Coordinating processing

   Becomes quite a bit more complex.  Multiple threads/processes are
   accessing parts of the same data concurrently, so a parallelized
   query that harnesses 8 CPUs might generate 8x as many locks and
   analogous coordination points.

 - Platform specificity

   Threading is a problem in that each OS platform has its own
   implementation, and even when they claim to conform to common
   standards, they still have somewhat different interpretations.  This
   tends to go in one of the following directions:

a) You have to pick one platform to do threading on.

   Oops.  There's now PostgreSQL-Linux, that is the only platform
   where our multiprocessing thing works.  It could be worse than
   that; it might work on a particular version of a particular OS...

b) You follow some apparently portable threading standard

   And find that things are hugely buggy because the platforms
   follow the standard a bit differently.  And perhaps this means
   that, analogous to a), you've got a set of platforms where this
   works (for some value of works), and others where it can't.
   That's almost as evil as a).

c) You follow some apparently portable threading standard

   And need to wrap things in a pretty thick safety blanket to make
   sure it is compatible with all the bugs in interpretation and
   implementation.  Complexity++, and performance probably suffers.

   None of these are particularly palatable, which is why threading
   proposals get a lot of pushback.

At the end of the day, if this is 

Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 Hints are not even that complicated to program. The SQL parser should
 compile the list of hints into a table and optimizer should check
 whether any of the applicable access methods exist in the table. If it
 does - use it. If not, ignore it. This looks to me like a
 philosophical issue, not a programming issue.

It's worth looking back to what has already been elaborated on in the
ToDo.

http://wiki.postgresql.org/wiki/Todo
---
Optimizer hints (not wanted)

Optimizer hints are used to work around problems in the optimizer and
introduce upgrade and maintenance issues. We would rather have the
problems reported and fixed. We have discussed a more sophisticated
system of per-class cost adjustment instead, but a specification remains
to be developed.
---

The complaint is that kludging hints into a particular query attacks the
problem from the wrong direction.

The alternative recommended is to collect some declarative information,
that *won't* be part of the query, that *won't* be processed by the
parser, and that *won't* kludge up the query with information that is
liable to turn into crud over time.

Tom Lane was pretty specific about some kinds of declarative information
that seemed useful:
   http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php

On Jeapordy, participants are expected to phrase one's answers in the
form of a question, and doing so is rewarded.

Based on the presence of query hints on the Not Wanted portion of the
ToDo list, it's pretty clear that participants here are expected to
propose optimizer hints in ways that do NOT involve decorating queries
with crud.  You'll get a vastly friendlier response if you at least make
an attempt to attack the problem in the declarative information
fashion.

Perhaps we're all wrong in believing that pushing query optimization
information into application queries by decorating the application with
hints, is the right idea but it's a belief that certainly seems to be
regularly agreed upon by gentle readers.
-- 
cbbrowne,@,linuxdatabases.info
The people's revolutionary committee has  decided that the name e is
retrogressive, unmulticious   and reactionary, and  has  been flushed.
Please update your abbrevs.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I must say that this purist attitude is extremely surprising to
 me. All the major DB vendors support optimizer hints, yet in the
 Postgres community, they are considered bad with almost religious
 fervor.
 Postgres community is quite unique with the fatwa against hints.

Well, the community declines to add hints until there is actual
consensus on a good way to add hints.

Nobody has ever proposed a way to add hints where consensus was arrived
at that the way was good, so...
-- 
http://linuxfinances.info/info/nonrdbms.html
Rules of the Evil Overlord #192.  If I appoint someone as my consort,
I will  not subsequently inform  her that she  is being replaced  by a
younger, more attractive woman.  http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] the XID question

2011-01-19 Thread Chris Browne
kevin.gritt...@wicourts.gov (Kevin Grittner) writes:
 Filip Rembia*kowskiplk.zu...@gmail.com wrote: 
 2011/1/19 Charles.Hou giveme...@gmail.com:
  
  select * from mybook SQL command also increase the XID ?
 
 Yes. Single SELECT is a transaction. Hence, it needs a transaction
 ID.
  
 No, not in recent versions of PostgreSQL.  There's virtual
 transaction ID, too; which is all that's needed unless the
 transaction writes something.
  
 Also, as a fine point, if you use explicit database transactions
 (with BEGIN or START TRANSACTION) then you normally get one XID for
 the entire transaction, unless you use SAVEPOINTs.

Erm, not *necessarily* in recent versions of PostgreSQL.

A read-only transaction won't consume XIDs, but if you don't expressly
declare it read-only, they're still liable to get eaten...
-- 
(format nil ~S@~S cbbrowne gmail.com)
http://www3.sympatico.ca/cbbrowne/lisp.html
Parenthesize to avoid ambiguity.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] COPY TO stdout statements occurrence in log files

2011-01-14 Thread Chris Browne
msakre...@truviso.com (Maciek Sakrejda) writes:
 Is this normal? I'm afraid because my application doesn't run this kind of
 statement, so how can I know what is doing these commands? Maybe pg_dump?

 I think pg_dump is likely, yes, if you have that scheduled. I don't
 think anything in the log file will identify it as pg_dump explicitly
 (I believe as far as the server is concerned, pg_dump is just another
 client), but if you're concerned about this, you can add the client
 pid (%p) to log_line_prefix in postgresql.conf, log the pg_dump pid
 through whatever mechanism manages that, and compare.

That's an option...  More are possible...

1.  Our DBAs have been known to create users specifically for doing
backups (dumpy).  It doesn't seem like a *huge* proliferation of users
to have some 'utility' user names for common processes.

2.  In 9.1, there will be a new answer, as there's a GUC to indicate the
application_name.
-- 
Programming today  is a race  between software engineers  striving to
build bigger and better  idiot-proof programs, and the Universe trying
to  produce  bigger  and  better  idiots.  So  far,  the  Universe  is
winning.  -- Rich Cook

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best db schema for time series data?

2010-11-19 Thread Chris Browne
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand)
writes:
 On Tue, Nov 16, 2010 at 11:35:24AM -0500, Chris Browne wrote:
 vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand)
 writes:
  I have to collect lots of prices from web sites and keep track of their
  changes. What is the best option?
 
  1) one 'price' row per price change:
 
 create table price (
 id_price primary key,
 id_product integer references product,
 price integer
 );
 
  2) a single 'price' row containing all the changes:
 
 create table price (
 id_price primary key,
 id_product integer references product,
 price integer[] -- prices are 'pushed' on this array as they 
  change
 );
 
  Which is bound to give the best performance, knowing I will often need
  to access the latest and next-to-latest prices?
 
 I'd definitely bias towards #1, but with a bit of a change...
 
 create table product (
   id_product serial primary key
 );
 
 create table price (
id_product integer references product,
as_at timestamptz default now(),
primary key (id_product, as_at),
price integer
 );

 Hi Chris,

 So an id_price serial on the price table is not necessary in your
 opinion? I am using order by id_price limit X or max(id_price) to
 get at the most recent prices.

It (id_price) is an extra piece of information that doesn't reveal an
important fact, namely when the price was added.

I'm uncomfortable with adding data that doesn't provide much more
information, and it troubles me when people put a lot of interpretation
into the meanings of SERIAL columns.

I'd like to set up some schemas (for experiment, if not necessarily to
get deployed to production) where I'd use DCE UUID values rather than
sequences, so that people wouldn't make the error of imagining meanings
in the values that aren't really there.  

And I suppose that there lies a way to think about it...  If you used
UUIDs rather than SERIAL, how would your application break?  

And of the ways in which it would break, which of those are errors that
fall from:

 a) Ignorant usage, assuming order that isn't really there?  (e.g. - a
SERIAL might capture some order information, but UUID won't!)

 b) Inadequate data capture, where you're using the implicit data
collection from SERIAL to capture, poorly, information that should
be expressly captured?

When I added the timestamp to the price table, that's intended to
address b), capturing the time that the price was added.

 The query to get the last 5 prices for a product should be
 splendidly efficient:
 
select price, as_at from price
 where id_product = 17
 order by as_at desc limit 5;
 
 (That'll use the PK index perfectly nicely.)
 
 If you needed higher performance, for latest price, then I'd add a
 secondary table, and use triggers to copy latest price into place:
 
   create table latest_prices (
  id_product integer primary key references product,
  price integer
   );

 I did the same thing with a 'price_dispatch' trigger and partitioned
 tables (inheritance). It's definitely needed when the price database
 grow into the millions.

 Thanks,

The conversations are always interesting!  Cheers!
-- 
output = (cbbrowne @ gmail.com)
http://www3.sympatico.ca/cbbrowne/x.html
FLORIDA: If you think we can't vote, wait till you see us drive.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best db schema for time series data?

2010-11-16 Thread Chris Browne
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand)
writes:
 I have to collect lots of prices from web sites and keep track of their
 changes. What is the best option?

 1) one 'price' row per price change:

   create table price (
   id_price primary key,
   id_product integer references product,
   price integer
   );

 2) a single 'price' row containing all the changes:

   create table price (
   id_price primary key,
   id_product integer references product,
   price integer[] -- prices are 'pushed' on this array as they 
 change
   );

 Which is bound to give the best performance, knowing I will often need
 to access the latest and next-to-latest prices?

I'd definitely bias towards #1, but with a bit of a change...

create table product (
  id_product serial primary key
);

create table price (
   id_product integer references product,
   as_at timestamptz default now(),
   primary key (id_product, as_at),
   price integer
);

The query to get the last 5 prices for a product should be
splendidly efficient:

   select price, as_at from price
where id_product = 17
order by as_at desc limit 5;

(That'll use the PK index perfectly nicely.)

If you needed higher performance, for latest price, then I'd add a
secondary table, and use triggers to copy latest price into place:

  create table latest_prices (
 id_product integer primary key references product,
 price integer
  );

create or replace function capture_latest_price () returns trigger as $$
declare
begin
delete from latest_prices where id_product = NEW.id_product;
insert into latest_prices (id_product,price) values
   (NEW.id_product, NEW.price);
return NEW;
end
$$ language plpgsql;

create trigger price_capture after insert on price execute procedure 
capture_latest_price();

This captures *just* the latest price for each product.  (There's a bit
of race condition - if there are two concurrent price updates, one will
fail, which wouldn't happen without this trigger in place.)
--
... Turns   out that JPG  was in  fact using his  brain... and   I am
inclined to encourage him  to continue the practice  even if  it isn't
exactly what I  would have done myself.   -- Alan Bawden  (way out of
context)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-05 Thread Chris Browne
gentosa...@gmail.com (A B) writes:
 If you just wanted PostgreSQL to go as fast as possible WITHOUT any
 care for your data (you accept 100% dataloss and datacorruption if any
 error should occur), what settings should you use then?

Use /dev/null.  It is web scale, and there are good tutorials.

But seriously, there *are* cases where blind speed is of use.  When
loading data into a fresh database is a good time for this; if things
fall over, it may be pretty acceptable to start from scratch with
mkfs/initdb.

I'd:
- turn off fsync
- turn off synchronous commit
- put as much as possible onto Ramdisk/tmpfs/similar as possible
-- 
output = reverse(moc.liamg @ enworbbc)
http://linuxfinances.info/info/lsf.html
43% of all statistics are worthless.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Chris Browne
sgend...@ideasculptor.com (Samuel Gendler) writes:
 Geez.  I wish someone would have written something quite so bold as
 'xfs is always faster than ext3' in the standard tuning docs.  I
 couldn't find anything that made a strong filesystem
 recommendation.  How does xfs compare to ext4?  I wound up on ext4 on
 a dell perc6 raid card when an unexpected hardware failure on a
 production system caused my test system to get thrown into production
 before I could do any serious testing of xfs.  If there is a strong
 consensus that xfs is simply better, I could afford the downtime to
 switch.

It's news to me (in this thread!) that XFS is actually getting some
developer love, which is a pretty crucial factor to considering it
relevant.

XFS was an SGI creation, and, with:

 a) the not-scintillating performance of the company,

 b) the lack of a lot of visible work going into the filesystem,

 c) the paucity of support by Linux vendors (for a long time, if you 
told RHAT you were having problems, and were using XFS, the next
step would be to park the ticket awaiting your installing a
supported filesystem)

it didn't look like XFS was a terribly good bet.  Those issues were
certainly causing concern a couple of years ago.

Faster raw performance isn't much good if it comes with a risk of:
 - Losing data
 - Losing support from vendors

If XFS now *is* getting support from both the development and support
perspectives, then the above concerns may have been invalidated.  It
would be very encouraging, if so.
-- 
output = (cbbrowne @ gmail.com)
Rules of  the Evil Overlord  #228.  If the  hero claims he  wishes to
confess  in public  or to  me  personally, I  will remind  him that  a
notarized deposition will serve just as well.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
cr...@postnewspapers.com.au (Craig Ringer) writes:
 Hey, maybe I should try posting YouTube video answers to a few
 questions for kicks, see how people react ;-)

And make sure it uses the same voice as is used in the MongoDB is web
scale video, to ensure that people interpret it correctly :-).
-- 
output = (cbbrowne @ gmail.com)
http://linuxdatabases.info/info/nonrdbms.html
The *Worst* Things  to Say to a  Police Officer: Hey, is that  a 9 mm?
That's nothing compared to this .44 magnum.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I have a logical problem with asynchronous commit. The commit
 command should instruct the database to make the outcome of the
 transaction permanent. The application should wait to see whether the
 commit was successful or not. Asynchronous behavior in the commit
 statement breaks the ACID rules and should not be used in a RDBMS
 system. If you don't need ACID, you may not need RDBMS at all. You may
 try with MongoDB. MongoDB is web scale:
 http://www.youtube.com/watch?v=b2F-DItXtZs

The client always has the option of connecting to a set of databases,
and stowing parts of the data hither and thither.  That often leads to
the relaxation called BASE.  (And IBM has been selling that relaxation
as MQ-Series since the early '90s!)

There often *ARE* cases where it is acceptable for some of the data to
not be as durable, because that data is readily reconstructed.  This is
particularly common for calculated/cached/aggregated data.

Many things can get relaxed for a data warehouse data store, where the
database is not authoritative, but rather aggregates data drawn from
other authoritative sources.  In such applications, neither the A, C, I,
nor the D are pointedly crucial, in the DW data store.

- We don't put the original foreign key constraints into the DW
  database; they don't need to be enforced a second time.  Ditto for
  constraints of all sorts.

- Batching of the loading of updates is likely to break several of the
  letters.  And I find it *quite* acceptable to lose D if the data may
  be safely reloaded into the DW database.

I don't think this is either cavalier nor that it points to MongoDB is
web scale.
-- 
cbbrowne,@,gmail.com
Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
shipped in tightly-packed bales. Any wagonload of loose hay attempting
to pass through a checkpoint will be set on fire.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] read only transactions

2010-10-12 Thread Chris Browne
jnelson+pg...@jamponi.net (Jon Nelson) writes:
 Are there any performance implications (benefits) to executing queries
 in a transaction where
 SET TRANSACTION READ ONLY;
 has been executed?

Directly?  No.

Indirectly, well, a *leetle* bit...

Transactions done READ ONLY do not generate actual XIDs, which reduces
the amount of XID generation (pretty tautological!), which reduces the
need to do VACUUM to protect against XID wraparound.

  
http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-BASICS

If you process 50 million transactions, that chews thru 50 million XIDs.

If 45 million of those were processed via READ ONLY transactions, then
the same processing only chews thru 5 million XIDs, meaning that the
XID-relevant vacuums can be done rather less frequently.

This only terribly much matters if:
  a) your database is so large that there are tables on which VACUUM
 would run for a very long time, and

  b) you are chewing through XIDs mighty quickly.

If either condition isn't true, then the indirect effect isn't important
either.  
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
I'm not switching from slrn.   I'm quite confident that anything that
*needs* to be posted in HTML is fatuous garbage not worth my time.
-- David M. Cook davec...@home.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GPU Accelerated Sorting

2010-08-30 Thread Chris Browne
david_l...@boreham.org (David Boreham) writes:
  Feels like I fell through a worm hole in space/time, back to inmos in
 1987, and a guy from marketing has just
 walked in the office going on about there's a customer who wants to
 use our massively parallel hardware to speed up databases...

... As long as you're willing to rewrite PostgreSQL in Occam 2...
-- 
http://projects.cs.kent.ac.uk/projects/tock/trac/
The statistics on  sanity are that one out of  every four Americans is
suffering from some  form of mental illness. Think  of your three best
friends. If they're okay, then it's you. -- Rita Mae Brown

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Chris Browne
g...@2ndquadrant.com (Greg Smith) writes:
 Yeb Havinga wrote:
 * What filesystem to use on the SSD? To minimize writes and maximize
 chance for seeing errors I'd choose ext2 here. 

 I don't consider there to be any reason to deploy any part of a
 PostgreSQL database on ext2.  The potential for downtime if the fsck
 doesn't happen automatically far outweighs the minimal performance
 advantage you'll actually see in real applications.  

Ah, but if the goal is to try to torture the SSD as cruelly as possible,
these aren't necessarily downsides (important or otherwise).

I don't think ext2 helps much in maximizing chances of seeing errors
in notably useful ways, as the extra torture that takes place as part
of the post-remount fsck isn't notably PG-relevant.  (It's not obvious
that errors encountered would be readily mapped to issues relating to
PostgreSQL.)

I think the WAL-oriented test would be *way* more useful; inducing work
whose brokenness can be measured in one series of files in one
directory should be way easier than trying to find changes across a
whole PG cluster.  I don't expect the filesystem choice to be terribly
significant to that.
-- 
cbbrowne,@,gmail.com
Heuristics (from the  French heure, hour) limit the  amount of time
spent executing something.  [When using heuristics] it shouldn't take
longer than an hour to do something.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Chris Browne
j...@commandprompt.com (Joshua D. Drake) writes:
 On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote:
 Greg Smith wrote:
  Note that not all of the Sandforce drives include a capacitor; I hope 
  you got one that does!  I wasn't aware any of the SF drives with a 
  capacitor on them were even shipping yet, all of the ones I'd seen 
  were the chipset that doesn't include one still.  Haven't checked in a 
  few weeks though.
 
 Answer my own question here:  the drive Yeb got was the brand spanking 
 new OCZ Vertex 2 Pro, selling for $649 at Newegg for example:  
 http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with 
 the supercacitor listed right in the main production specifications 
 there.  This is officially the first inexpensive (relatively) SSD with a 
 battery-backed write cache built into it.  If Yeb's test results prove 
 it works as it's supposed to under PostgreSQL, I'll be happy to finally 
 have a moderately priced SSD I can recommend to people for database 
 use.  And I fear I'll be out of excuses to avoid buying one as a toy for 
 my home system.

 That is quite the toy. I can get 4 SATA-II with RAID Controller, with
 battery backed cache, for the same price or less :P

Sure, but it:
- Fits into a single slot
- Is quiet
- Consumes little power
- Generates little heat
- Is likely to be about as quick as the 4-drive array

It doesn't have the extra 4TB of storage, but if you're building big-ish
databases, metrics have to change anyways.

This is a pretty slick answer for the small OLTP server.
-- 
output = reverse(moc.liamg @ enworbbc)
http://linuxfinances.info/info/postgresql.html
Chaotic Evil means never having to say you're sorry.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] planner index choice

2010-07-28 Thread Chris

Hi there,

I have a simple query where I don't understand the planner's choice to 
use a particular index.


The main table looks like this:

# \d sq_ast_attr_val
   Table public.sq_ast_attr_val
   Column| Type  |  Modifiers
-+---+--
 assetid | character varying(15) | not null
 attrid  | integer   | not null
 contextid   | integer   | not null default 0
 custom_val  | text  |
 use_default | character(1)  | not null default '1'::bpchar
Indexes:
ast_attr_val_pk PRIMARY KEY, btree (assetid, attrid, contextid)
sq_ast_attr_val_assetid btree (assetid)
sq_ast_attr_val_attrid btree (attrid)
sq_ast_attr_val_concat btree (((assetid::text || '~'::text) || 
attrid))

sq_ast_attr_val_contextid btree (contextid)


The query:

SELECT
  assetid, custom_val
FROM
  sq_ast_attr_val
WHERE
  attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 
'is_contextable' AND (type_code = 'metadata_field_select' OR 
owning_type_code = 'metadata_field'))

  AND contextid = 0
INTERSECT
SELECT
  assetid, custom_val
FROM
  sq_ast_attr_val
WHERE
  assetid = '62321'
  AND contextid = 0;


The explain analyze plan:

http://explain.depesz.com/s/nWs

I'm not sure why it's picking the sq_ast_attr_val_contextid index to do 
the contextid = 0 check, the other parts (attrid/assetid) are much more 
selective.


If I drop that particular index:

http://explain.depesz.com/s/zp


All (I hope) relevant postgres info:

Centos 5.5 x86_64 running pg8.4.4.

Server has 8gig memory.

# select name, setting, source from pg_settings where name in 
('shared_buffers', 'effective_cache_size', 'work_mem');

 name | setting
--+
shared_buffers| 262144
effective_cache_size  | 655360
work_mem  | 32768

All planner options are enabled:

# select name, setting, source from pg_settings where name like 'enable_%';
   name| setting | source
---+-+-
 enable_bitmapscan | on  | default
 enable_hashagg| on  | default
 enable_hashjoin   | on  | default
 enable_indexscan  | on  | default
 enable_mergejoin  | on  | default
 enable_nestloop   | on  | default
 enable_seqscan| on  | default
 enable_sort   | on  | default
 enable_tidscan| on  | default

Any insights welcome - thanks!

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Chris Browne
swamp...@noao.edu (Steve Wampler) writes:
 Or does losing WAL files mandate a new initdb?

Losing WAL would mandate initdb, so I'd think this all fits into the
set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
significant to the performance focus.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] stats collector suddenly causing lots of IO

2010-04-20 Thread Chris
I have a lot of centos servers which are running postgres.  Postgres isn't used
that heavily on any of them, but lately, the stats collector process keeps
causing tons of IO load.  It seems to happen only on servers with centos 5.
The versions of postgres that are running are:

8.1.18
8.2.6
8.3.1
8.3.5
8.3.6
8.3.7
8.3.8
8.3.9
8.4.2
8.4.3

I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf
except track_counts (since auto vacuum says it needs it), but it seems to have
little affect on the IO caused by the stats collector.

Has anyone else noticed this?  Have there been recent kernel changes
that could cause this that anyone knows about?  Since we haven't touched
postgres on these boxes since they were setup initially, I'm a bit baffled as
to what might be causing the problem, and why I can't make it go away short of
kill -STOP.

Any suggestions would be much appreciated!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Chris
I have a lot of centos servers which are running postgres.  Postgres isn't used
that heavily on any of them, but lately, the stats collector process keeps
causing tons of IO load.  It seems to happen only on servers with centos 5.
The versions of postgres that are running are:

8.1.18
8.2.6
8.3.1
8.3.5
8.3.6
8.3.7
8.3.8
8.3.9
8.4.2
8.4.3

I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf
except track_counts (since auto vacuum says it needs it), but it seems to have
little affect on the IO caused by the stats collector.

Has anyone else noticed this?  Have there been recent kernel changes
that could cause this that anyone knows about?  Since we haven't touched
postgres on these boxes since they were setup initially, I'm a bit baffled as
to what might be causing the problem, and why I can't make it go away short of
kill -STOP.

Any suggestions would be much appreciated!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] significant slow down with various LIMIT

2010-04-13 Thread Chris Bowlby
I'm also wondering if a re-clustering of the table would work based on
the index that's used.

such that:

CLUSTER core_object USING plugins_plugin_addr_oid_id;

and see if that makes any change in the differences that your seeing.

On 04/13/2010 02:24 PM, Kevin Grittner wrote:
 norn andrey.perl...@gmail.com wrote:
  
   
 I am wondering why there are so big gap between two limits and how
 to avoid this...
 
  
 I think we've already established that it is because of the
 percentage of the table which must be scanned to get to the desired
 number of rows.  The problem is exacerbated by the fact that it's a
 backward scan on the index, which is slower than a forward scan --
 mainly because disks spin in one direction, and the spacing of the
 sectors is optimized for forward scans.
  
 There are a couple things to try which will give a more complete
 picture of what might work to make the run time more predictable. 
 Please try these, and run EXPLAIN ANALYZE of your problem query each
 way.
  
 (1) Try it without the ORDER BY clause and the LIMIT.
  
 (2) Temporarily take that top index out of consideration.  (Don't
 worry, it'll come back when you issue the ROLLBACK -- just don't
 forget the BEGIN statement.)
   
 BEGIN;
 DROP INDEX plugins_plugin_addr_oid_id;
 explain analyze your query
 ROLLBACK;
  
 (3) Try it like this (untested, so you may need to fix it up):
  
 explain analyze
 SELECT core_object.id
   from (SELECT id, city_id FROM plugins_guide_address)
plugins_guide_address
   JOIN plugins_plugin_addr
 ON (plugins_plugin_addr.address_id
= plugins_guide_address.id)
   JOIN core_object
 ON (core_object.id = plugins_plugin_addr.oid_id)
   WHERE plugins_guide_address.city_id = 4535
   ORDER BY core_object.id DESC
   LIMIT 4 -- or whatever it normally takes to cause the problem
 ;
  
 -Kevin

   


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Chris Browne
reeds...@rice.edu (Ross J. Reedstrom) writes:
  http://www.mythtv.org/wiki/PostgreSQL_Support 

That's a pretty hostile presentation...

The page has had two states:

 a) In 2008, someone wrote up...

After some bad experiences with MySQL (data loss by commercial power
failure, very bad performance deleting old records and more) I would
prefer to have a MythTV Application option to use PostgreSQL. I
never saw such bad database behaviour at any other RDBMS than MySQL.

I'm ready to contribute at any activity going that direction (I'm
developer for commercial database applications).

 b) Deleted by GBee in 2009, indicating (Outdated, messy and
unsupported)
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://linuxfinances.info/info/spreadsheets.html
A language that doesn't affect the way you think about programming,
is not worth knowing.  -- Alan J. Perlis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Chris Browne
t...@sss.pgh.pa.us (Tom Lane) writes:
 Ross J. Reedstrom reeds...@rice.edu writes:
 On Sat, Mar 20, 2010 at 10:47:30PM -0500, Andy Colson wrote:
 (I added the and trust as an after thought, because I do have one very 
 important 100% uptime required mysql database that is running.  Its my 
 MythTV box at home, and I have to ask permission from my GF before I take 
 the box down to upgrade anything.  And heaven forbid if it crashes or 
 anything.  So I do have experience with care and feeding of mysql.  And no, 
 I'm not kidding.)

 Andy, you are so me! I have the exact same one-and-only-one mission
 critical mysql DB, but the gatekeeper is my wife. And experience with
 that instance has made me love and trust PostgreSQL even more.

 So has anyone looked at porting MythTV to PG?

It has come up several times on the MythTV list.

http://david.hardeman.nu/files/patches/mythtv/mythletter.txt
http://www.mythtv.org/pipermail/mythtv-dev/2004-August/025385.html
http://www.mythtv.org/pipermail/mythtv-users/2006-July/141191.html

Probably worth asking David Härdeman and Danny Brow who have proposed
such to the MythTV community what happened.  (It's possible that they
will get cc'ed on this.)

If there's a meaningful way to help, that would be cool.  If not, then
we might as well not run slipshot across the same landmines that blew
the idea up before.
-- 
Transported  to a surreal  landscape,  a young  girl kills the  first
woman she  meets and  then teams  up with  three complete strangers to
kill again.  -- Unknown, Marin County newspaper's TV listing for _The
Wizard of Oz_

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is DBLINK transactional

2010-03-16 Thread Chris Browne
cr...@postnewspapers.com.au (Craig Ringer) writes:

 On 13/03/2010 5:54 AM, Jeff Davis wrote:
 On Fri, 2010-03-12 at 12:07 -0500, Merlin Moncure wrote:
 of course.  You can always explicitly open a transaction on the remote
 side over dblink, do work, and commit it at the last possible moment.
 Your transactions aren't perfectly synchronized...if you crash in the
 precise moment between committing the remote and the local you can get
 in trouble.  The chances of this are extremely remote though.

 If you want a better guarantee than that, consider using 2PC.

 Translation in case you don't know: 2PC = two phase commit.

 Note that you have to monitor lost transactions that were prepared
 for commit then abandoned by the controlling app and periodically get
 rid of them or you'll start having issues.

There can be issues even if they're not abandoned...

Note that prepared transactions establish, and maintain, until removed,
all the appropriate locks on the underlying tables and tuples.

As a consequence, maintenance-related activities may be somewhat
surprisingly affected.

foo=# begin; set transaction isolation level serializable;
BEGIN
SET
foo=# insert into my_table (date_time, hostname, duration, diag) values (now(), 
'foo', 1, 2);
INSERT 0 1
foo=# prepare transaction 'foo';
PREPARE TRANSACTION

[then, I quit the psql session...]

foo=# select * from pg_locks where relation = (select oid from pg_class where 
relname = 'my_table');
-[ RECORD 1 ]--+-
locktype   | relation
database   | 308021
relation   | 308380
page   |
tuple  |
virtualxid |
transactionid  |
classid|
objid  |
objsubid   |
virtualtransaction | -1/433653
pid|
mode   | RowExclusiveLock
granted| t

If I try to truncate the table...

foo=# truncate my_table;
[hangs, waiting on the lock...]

[looking at another session...]

foo=#  select * from pg_locks where relation = (select oid from pg_class where 
relname = 'my_table');
-[ RECORD 1 ]--+
locktype   | relation
database   | 308021
relation   | 308380
page   |
tuple  |
virtualxid |
transactionid  |
classid|
objid  |
objsubid   |
virtualtransaction | -1/433653
pid|
mode   | RowExclusiveLock
granted| t
-[ RECORD 2 ]--+
locktype   | relation
database   | 308021
relation   | 308380
page   |
tuple  |
virtualxid |
transactionid  |
classid|
objid  |
objsubid   |
virtualtransaction | 2/13
pid| 3749
mode   | AccessExclusiveLock
granted| f

Immediately upon submitting commit prepared 'foo';, both locks are
resolved quite quickly.

 The problem with things that are extremely remote possibilities are
 that they tend to be less remote than we expect ;)

 ... and they know just when they can happen despite all the odds to
 maximise the pain and chaos caused.

A lot of these kinds of things only come up as race conditions.  The
trouble is that a lot of races do wind up synchronizing themselves.

In sporting events, this is intended and desired; an official fires the
starter pistol or activates the horn, or what have you, with the
intended result that athletes begin very nearly simultaneously.  And at
the end of Olympic races, their times frequently differ only by
miniscule intervals.

In my example up above, there's a possibly unexpected synchronization
point; the interweaving of the PREPARE TRANSACTION and TRUNCATE requests
lead to a complete lock against the table.  Supposing 15 processes then
try accessing that table, they'll be blocked until the existing locks
get closed out.  Which takes place the very instant after the COMMIT
PREPARED request comes in.  At that moment, 15 racers are released
very nearly simultaneously.

If there is any further mischief to be had in the race, well, they're
set up to tickle it...
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
http://linuxdatabases.info/info/nonrdbms.html
Barf, what is all   this  prissy pedantry?  Groups,  modules,  rings,
ufds, patent-office algebra.  Barf!  -- R. William Gosper

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] full text search index scan query plan changed in 8.4.2?

2010-02-28 Thread Chris

Josh Berkus wrote:

Xufei,

List changed to psql-performance, which is where this discussion belongs.


I am testing the index used by full text search recently.

I have install 8.3.9 and 8.4.2 separately. 


In 8.3.9, the query plan is like:

postgres=# explain SELECT s.name as source , t.name as target FROM element as s, element as t WHERE to_tsvector('testcfg',s.name)  @@ to_tsquery('testcfg',replace(t.name,':','|')); QUERY PLAN  
Nested Loop  (cost=0.01..259.92 rows=491 width=18)   
 -  Seq Scan on element t  (cost=0.00..13.01 rows=701 width=9)   
 -  Index Scan using element_ftsidx_test on element s  (cost=0.01..0.33 rows=1 width=9) 
 Index Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text)))

(4 rows)

I have index: element_ftsidx_test gin (to_tsvector('testcfg'::regconfig, 
name::text))

The same index and query in 8.4.2: 

postgres=# explain SELECT s.name as source , t.name as target FROM element as s, element as t WHERE to_tsvector('testcfg',s.name)  @@ to_tsquery('testcfg',replace(t.name,':','|')) ;QUERY PLAN   -- 
Nested Loop  (cost=0.32..3123.51 rows=2457 width=18)   
 -  Seq Scan on element t  (cost=0.00..13.01 rows=701 width=9)   
 -  Bitmap Heap Scan on element s  (cost=0.32..4.36 rows=4 width=9)  Recheck Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text))) 
-  Bitmap Index Scan on element_ftsidx_test  (cost=0.00..0.32 rows=4 width=0)

 Index Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ 
to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text)))
(6 rows)

Why the query plans are different and why? Thanks!


Because the row estimates changed, since 8.4 improved row estimation for
TSearch.  The 2nd query is probably actually faster, no?  If not, you
may need to increase your stats collection.  Or at least show us a
VACUUM ANALYZE.


I'm sure you mean explain analyze :)

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How to troubleshoot high mem usage by postgres?

2010-02-27 Thread Chris
Hi, I'm having an issue where a postgres process is taking too much
memory when performing many consecutive inserts and updates from a PHP
script (running on the command line). I would like to know what sort
of logging I can turn on to help me determine what is causing memory
to be consumed and not released.

Most PHP scripts are not long-running and properly releasing the
resources using the provided functions in the pgsql PHP extension is
not necessary. However since I do have a long-running script, I have
taken steps to ensure everything is being properly released when it is
no longer needed (I am calling the functions provided, but I don't
know if the pgsql extension is doing the right thing). In spite of
this, the longer the script runs and processes records, the more
memory increases. It increases to the point that system memory is
exhausted and it starts swapping. I killed the process at this point.

I monitored the memory with top. here are the results.. the first is
10 seconds after my script started running. The second is about 26
seconds.

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  DATA COMMAND
17461 postgres  16   0  572m 405m  14m S 20.0 10.7   0:10.65 422m postmaster
17460 root  15   0  136m  14m 4632 S 10.6  0.4   0:06.16  10m php
17462 postgres  15   0  193m  46m 3936 D  3.3  1.2   0:01.77  43m postmaster


  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  DATA COMMAND
17461 postgres  16   0 1196m 980m  17m S 19.0 26.0   0:25.72 1.0g postmaster
17460 root  15   0  136m  14m 4632 R 10.3  0.4   0:14.31  10m php
17462 postgres  16   0  255m 107m 3984 R  3.0  2.9   0:04.19 105m postmaster


If I am indeed doing everything I can to release the resources (and
I'm 95% sure I am) then it looks like the pgsql extension is at fault
here.
Regardless of who/what is at fault, I need to fix it. And to do that I
need to find out what isn't getting released properly. How would I go
about that?

Thanks,
Chris

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to troubleshoot high mem usage by postgres?

2010-02-27 Thread Chris
On Sat, Feb 27, 2010 at 3:38 PM, Ben Chobot be...@silentmedia.com wrote:
 In your postgresql.conf file, what are the settings for work_mem and
 shared_buffers?

I have not done any tuning on this db yet (it is a dev box). It is
using defaults.
shared_buffers = 32MB
#work_mem = 1MB


I do appreciate the several quick responses and I will work on
responding to the them.

@Craig Ringer:
select version() reports:
PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
The system has 4GB of RAM.
The postgres log currently does not show any useful information. Only
thing in there for today is an Unexpected EOF on client connection
because I killed the process after it started swapping.

The test input for my PHP script is a csv file with about 450,000
records in it. The php script processes the each csv record in a
transaction, and on average it executes 2 insert or update statements
per record. I don't think the specific statements executed are
relevant (they are just basic INSERT and UPDATE statements).

I will try to come up with a short script that reproduces the problem.

@Tom Lane:
As I mentioned above I am not doing everything in a single
transaction. However I do want to try your suggestion regarding
getting a memory context map. But I'm afraid I don't know how to do
what you are describing. How can I set the ulimit of postmaster? And
does the postmaster stderr output go to the postgres log file? If not,
where can I find it?

Thanks again,
Chris

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to troubleshoot high mem usage by postgres?

2010-02-27 Thread Chris
On Sat, Feb 27, 2010 at 3:38 PM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Can you boil this down to a simple PHP test-case that connects to a dummy
 database and repeats something that causes the backend to grow in memory
 usage? Trying to do this - by progressively cutting things out of your test
 until it stops growing - will help you track down what, exactly, is causing
 the growth.

Thank you for your suggestion. I have done this, and in doing so I
have also discovered why this problem is occurring.

My application uses a class that abstracts away the db interaction, so
I do not normally use the pg_* functions directly. Any time any
statement was executed, it created a new named prepared statement. I
wrongly assumed that calling pg_free_result() on the statement
resource would free this prepared statement inside of postgres.

I will simply modify the class to use an empty statement name if there
is no need for it to be named (which I actually need very infrequently
anyway).

I have attached the script I created to test with, for those who are
interested. The first line of the script has the connection string. I
used a db called testdb. run from the command line with:
php -f test3.php

Note my comment in the php file
 UNCOMMENT THIS LINE AND MEMORY ISSUE IS FIXED

Thanks for the help everyone.
Chris
attachment: test3.php

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] database size growing continously

2009-10-29 Thread Chris Ernst
Hi Peter,

Sounds like you're experiencing index bloat and vacuums do nothing to
help that.  You can do one of 2 thing to remedy this:

1) The fastest and simplest (but most disruptive) way is to use REINDEX.
 But this will exclusively lock the table while rebuilding the indexes:

  REINDEX TABLE phaseangle;

2) The slower but less disruptive way is to do a concurrent build of
each index and then drop the old ones. For example, to rebuild the i
index:

  CREATE INDEX CONCURRENTLY i_new ON phaseangle (indexed columns);
  DROP INDEX i;
  ALTER INDEX i_new RENAME TO i;
  ANALYZE phaseangle (indexed columns);

Do this regularly to keep the index sizes in check.

- Chris

Peter Meszaros wrote:
 Hi All,
 
 I use postgresql 8.3.7 as a huge queue. There is a very simple table
 with six columns and two indices, and about 6 million records are
 written into it in every day continously commited every 10 seconds from
 8 clients. The table stores approximately 120 million records, because a
 cron job daily deletes those ones are older than 20 day. Autovacuum is
 on and every settings is the factory default except some unrelated ones
 (listen address, authorization). But my database is growing,
 characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
 or even 0!!!).
 
 I've also tried a test on another server running the same postgresql,
 where 300 million record was loaded into a freshly created database,
 and 25 million was deleted with single DELETE command.  The 'vacuum
 verbose phaseangle;' command seems to be running forever for hours:
 
 phasor=# vacuum VERBOSE phaseangle;
 INFO:  vacuuming public.phaseangle
 INFO:  scanned index i to remove 2796006 row versions
 DETAIL:  CPU 9.49s/120.30u sec elapsed 224.20 sec.
 INFO:  scanned index t to remove 2796006 row versions
 DETAIL:  CPU 13.57s/105.70u sec elapsed 192.71 sec.
 INFO:  phaseangle: removed 2796006 row versions in 24748 pages
 DETAIL:  CPU 0.65s/0.30u sec elapsed 39.97 sec.
 INFO:  scanned index i to remove 2795924 row versions
 DETAIL:  CPU 9.58s/121.63u sec elapsed 239.06 sec.
 INFO:  scanned index t to remove 2795924 row versions
 DETAIL:  CPU 13.10s/103.59u sec elapsed 190.84 sec.
 INFO:  phaseangle: removed 2795924 row versions in 24743 pages
 DETAIL:  CPU 0.68s/0.28u sec elapsed 40.21 sec.
 INFO:  scanned index i to remove 2796014 row versions
 DETAIL:  CPU 9.65s/117.28u sec elapsed 231.92 sec.
 INFO:  scanned index t to remove 2796014 row versions
 DETAIL:  CPU 13.48s/103.59u sec elapsed 194.49 sec.
 INFO:  phaseangle: removed 2796014 row versions in 24774 pages
 DETAIL:  CPU 0.69s/0.28u sec elapsed 40.26 sec.
 INFO:  scanned index i to remove 2795935 row versions
 DETAIL:  CPU 9.55s/119.02u sec elapsed 226.85 sec.
 INFO:  scanned index t to remove 2795935 row versions
 DETAIL:  CPU 13.09s/102.84u sec elapsed 194.74 sec.
 INFO:  phaseangle: removed 2795935 row versions in 25097 pages
 DETAIL:  CPU 0.67s/0.28u sec elapsed 41.21 sec.
 
 still running...
 
 These are the very same problems?
 Should I delete mor frequently in smaller chunks? It seems to have a
 limit...
 
 Thanks 
 
 Peter
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Databases vs Schemas

2009-10-10 Thread Chris Kratz
On Fri, Oct 9, 2009 at 11:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Carey sc...@richrelevance.com writes:
  I've got 200,000 tables in one db (8.4), and some tools barely work.  The
  system catalogs get inefficient when large and psql especially has
 trouble.
  Tab completion takes forever, even if I make a schema s with one table
 in
  it and type s. and try and tab complete -- its as if its scanning all
  without a schema qualifier or using an index.

 The tab-completion queries have never been vetted for performance
 particularly :-(

 Just out of curiosity, how much does this help?

 alter function pg_table_is_visible(oid) cost 10;

 (You'll need to do it as superuser --- if it makes things worse, just
 set the cost back to 1.)

  Sometimes it does not match
  valid tables at all, and sometimes regex matching fails too ('\dt
  schema.*_*_*' intermittently flakes out if it returns a lot of matches).

 There are some arbitrary LIMIT 1000 clauses in those queries, which
 probably explains this ... but taking them out would likely cause
 libreadline to get indigestion ...

regards, tom lane


We ran into this exact situation with a pg 8.3 database and a very large
number of tables.  psql would wait for 20 to 30 seconds if the user was
unlucky enough to hit the tab key.  After doing some research with query
logging, explain analyze and some trial and error, we came to the same
conclusion.  Altering the cost for the pg_table_is_visible function to 10
fixed our performance problem immediately.  It appears that when the cost
was set to 1, that the query optimizer first ran the function over the
entire pg_class table.  By increasing the cost, it now only runs the
function over the rows returned by the other items in the where clause.

-chris


Re: [PERFORM] improving my query plan

2009-08-20 Thread Chris

Kevin Kempter wrote:

Hi all;


I have a simple query against two very large tables (  800million rows 
in theurl_hits_category_jt table and 9.2 million in the url_hits_klk1 
table )



I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get 
a very high overall query cost:


If you had an extra where condition it might be different, but you're 
just returning results from both tables that match up so doing a 
sequential scan is going to be the fastest way anyway.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


FW: [PERFORM] Performance 8.4.0

2009-08-02 Thread Chris Dunn
The database is 8gb currently. Use to be a lot bigger but we removed all large 
objects out and developed a file server storage for it, and using default page 
costs for 8.4, I did have it changed in 8.1.4

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com]
Sent: Sunday, 2 August 2009 11:26 PM
To: Chris Dunn
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance 8.4.0

On Fri, Jul 31, 2009 at 12:22 AM, Chris Dunnchris.d...@bigredsky.com wrote:
 constraint_exclusion = on

This is critical if you need it, but a waste of CPU time if you don't.
 Other than that your paramaters look good.  Are you using the default
page cost settings?  I see you have 12 GB RAM; how big is your
database?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Greenplum MapReduce

2009-08-02 Thread Chris

Suvankar Roy wrote:


Hi all,

Has anybody worked on Greenplum MapReduce programming ?


It's a commercial product, you need to contact greenplum.

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] load / stress testing

2009-07-31 Thread Chris

Hi,

Everyone says load test using your app - out of interest how does 
everyone do that at the database level?


I've tried playr (https://area51.myyearbook.com/trac.cgi/wiki/Playr) but 
haven't been able to get it working properly. I'm not sure what other 
tools are available.


TIA.
--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance 8.4.0

2009-07-30 Thread Chris Dunn
Hi,

I would like to know if my configuration is ok, We run a web application with 
high transaction rate and the database machine on Mondays / Tuesdays is always 
at 100% CPU with no IO/Wait . the machine is a Dual Xeon Quad core, 12gb RAM, 
4gb/s Fibre Channel on Netapp SAN, with pg_xlog on separate Lun,
Could you please provide some feedback on the configuration

maintenance_work_mem = 704MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 8GB
work_mem = 72MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 2816MB
max_connections = 32

I have limited connections down to 32 as if I put up higher the machine load 
average goes through the roof and will decrease performance even more.
In the process of looking at a 4 x AMD 6 core Opteron  machine with 32GB Ram to 
replace if I cannot get any more performance out of this machine

Kind Regards
Christopher Dunn




Re: [PERFORM] Will Postgres ever lock with read only queries?

2009-07-28 Thread Chris

Robert James wrote:
Thanks for the replies.  I'm running Postgres 8.2 on Windows XP, Intel 
Core Duo (though Postgres seems to use only one 1 core).


A single query can only use one core, but it will use both if multiple 
queries come in.



The queries are self joins on very large tables, with lots of nested loops.


If you want help optimizing them, you'll need to send through
- explain analyze
- table definitions
and of course
- the query itself

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] More speed counting rows

2009-07-27 Thread Chris Ernst

Developer wrote:

Hello,

I am trying to optimize the count of files when I am using filters
(select by some row/s parameter/s)

In this case I think that postgresql really count all files.
Resulting in unacceptable times of 4 seconds in http server response.
Triggers+store in this case do not see very acceptable, because I need
store 1.5 millions of counting possibilities.

My question is:
Any method for indirect count like ordered indexes + quadratic count?
Any module?
Any suggestion?



I had a similar problem where HTTP requests triggered a count(*) over a 
table that was growing rapidly.  The bigger the table got, the longer 
the count took.  In my case, however, the counts only have to be a 
reasonable estimate of the current state, so I solved this problem with 
a count_sums table that gets updated every 30 minutes using a simple 
perl script in a cron job.  The HTTP requests now trigger a very fast 
select from a tiny, 9 row, 2 column table.


How up to date do the counts need to be?  If the count takes 4 
seconds, can you run it every minute and store the counts in a table for 
retrieval by the HTTP requests?  Or does it absolutely have to be the 
exact count at the moment of the request?


If it needs to be more real-time, you could expand on this by adding 
post insert/delete triggers that automatically update the counts table 
to keep it current.  In my case it just wasn't necessary.


- Chris

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Will Postgres ever lock with read only queries?

2009-07-27 Thread Chris

Robert James wrote:
Hi.  I'm seeing some weird behavior in Postgres.  I'm running read only 
queries (SELECT that is - no UPDATE or DELETE or INSERT is happening at 
all).  I can run one rather complicated query and the results come 
back... eventually.  Likewise with another.  But, when I run both 
queries at the same time, Postgres seems to ground to a halt.  Neither 
one completes.  In fact, pgAdmin locks up - I need to cancel them using 
psql.
I'd expect this from MySQL but not Postgres.  Am I doing something 
wrong? Or missing something?


They're probably not blocking each other but more likely you're 
exhausting your servers resources. If they return eventually 
individually, then running both at the same time will take at least 
eventually x2.


As Mark said, what are the queries? What postgres version? What o/s? 
What are your hardware specs (how much memory, disk speeds/types etc)?


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Master/Slave, DB separation or just spend $$$?

2009-07-22 Thread Chris Browne
kelv...@gmail.com (Kelvin Quee) writes:
 I will go look at Slony now.

It's worth looking at, but it is not always to be assumed that
replication will necessarily improve scalability of applications; it's
not a magic wand to wave such that presto, it's all faster!

Replication is helpful from a performance standpoint if there is a lot
of query load where it is permissible to look at *somewhat* out of
date information.

For instance, replication can be quite helpful for pushing load off
for processing accounting data where you tend to be doing analysis on
data from {yesterday, last week, last month, last year}, and where the
data tends to be inherently temporal (e.g. - you're looking at
transactions with dates on them).

On the other hand, any process that anticipates *writing* to the
master database will be more or less risky to try to shift over to a
possibly-somewhat-behind 'slave' system, as will be anything that
needs to be consistent with the master state.
-- 
(reverse (concatenate 'string ofni.secnanifxunil @ enworbbc))
http://www3.sympatico.ca/cbbrowne/spiritual.html
Nondeterminism means never having to say you're wrong.  -- Unknown

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Chris

Віталій Тимчишин wrote:



2009/7/20 Robert James srobertja...@gmail.com 
mailto:srobertja...@gmail.com



Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so? 



It's not clever enough.


Of course it is.

I'm running 8.3.7.

create table t1(id int primary key);
insert into t1(id) select a from generate_series(1, 50) as s(a);
analyze t1;

explain analyze select * from t1 where id=5000 or id=25937;
  QUERY PLAN 


--
 Bitmap Heap Scan on t1  (cost=8.60..16.44 rows=2 width=4) (actual 
time=0.077..0.083 rows=2 loops=1)

   Recheck Cond: ((id = 5000) OR (id = 25937))
   -  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual 
time=0.063..0.063 rows=0 loops=1)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.034..0.034 rows=1 loops=1)

   Index Cond: (id = 5000)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.021..0.021 rows=1 loops=1)

   Index Cond: (id = 25937)
 Total runtime: 0.153 ms
(8 rows)

What Robert didn't post was his query, see

http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php

which makes it a lot harder to 'optimize' since they aren't straight 
forward conditions.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-24 Thread Chris St Denis

Mathieu Nebra wrote:

Alexander Staubo a écrit :
  

On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:


This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID
  

We are doing pretty much same thing.



My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.
  

First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an update ... where last_read_answer_id
 ? should avoid the need for an update.



We don't work that way. We just remember he has read these answers and
then we can tell him there are no new messages for you to read.
So we just need to write what he has read when he reads it.

  

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an update with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take a few seconds. You might want to
investigate this part before you turn to further optimizations.



Yes, I know there is a problem but I don't know if I am competent enough
to tune PostgreSQL for that. It can take a while to understand the
problem, and I'm not sure I'll have the time for that.

I am, however, opened to suggestions. Maybe I'm doing something wrong
somewhere.

  

In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.



A few milliseconds would be cool.
In fact, defering to another process is a good idea, but I'm not sure if
it is easy to implement. It would be great to have some sort of UPDATE
... LOW PRIORITY to make the request non blocking.

Thanks.

  
I use pg_send_query() 
http://ca2.php.net/manual/en/function.pg-send-query.php in php to 
achieve this for a views counter. Script execution is not blocked while 
the queries are executing.


It looks like this may just be a direct translation of PQsendQuery() 
from libpq. Your preferred language may have a function like this.




[PERFORM] tsvector_update_trigger performance?

2009-06-24 Thread Chris St Denis
Is tsvector_update_trigger() smart enough to not bother updating a 
tsvector if the text in that column has not changed?


If not, can I make my own update trigger with something like

   if new.description != old.description
   return tsvector_update_trigger('fti_all', 'pg_catalog.english',
   'title', 'keywords', 'description');
   else
   return new;

or do I need to do it from scratch?


I'm seeing very high cpu load on my database server and my current 
theory is that some of the triggers may be causing it.


Re: [PERFORM] tsvector_update_trigger performance?

2009-06-24 Thread Chris St Denis

Dimitri Fontaine wrote:

Hi,

Le 24 juin 09 à 18:29, Alvaro Herrera a écrit :

Oleg Bartunov wrote:

On Wed, 24 Jun 2009, Chris St Denis wrote:


Is tsvector_update_trigger() smart enough to not bother updating a
tsvector if the text in that column has not changed?


no, you should do check yourself. There are several examples in 
mailing lists.


Or you could try using the supress_redundant_updates_trigger() function
that has been included in 8.4 (should be easy to backport)


  http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/backports/min_update/
  http://blog.tapoueh.org/projects.html#sec9

But it won't handle the case where some other random column has 
changed, but the UPDATE is not affecting the text indexed...
Tho this looks useful for some things, it doesn't solve my specific 
problem any. But thanks for the suggestion anyway.


This sounds like something that should just be on by default, not a 
trigger. Is there some reason it would waste the io of writing a new row 
to disk if nothing has changed? or is it just considered too much 
unnecessary overhead to compare them?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] superlative missuse

2009-05-13 Thread Chris Browne
cl...@uah.es (Angel Alvarez) writes:
 more optimal plan... 
 morreoptimal configuration...

 we suffer a 'more optimal' superlative missuse

 there is  not so 'more optimal' thing but a simple 'better' thing.

 im not native english speaker but i think it still applies.

If I wanted to be pedantic about it, I'd say that the word nearly is
missing.

That is, it would be strictly correct if one instead said more
nearly optimal.

I don't imagine people get too terribly confused by the lack of the
word nearly, so I nearly don't care :-).
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://linuxfinances.info/info/languages.html
Bureaucracies interpret communication as damage and route around it
-- Jamie Zawinski

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Chris

Dimitri wrote:

Hi Craig,

yes, you detailed very well the problem! :-)
all those CHAR columns are so just due historical issues :-) as well
they may contains anything else and not only numbers, that's why..
Also, all data inside are fixed, so VARCHAR will not save place, or
what kind of performance issue may we expect with CHAR vs VARCHAR if
all data have a fixed length?..


None in postgres, but the char/varchar thing may or may not bite you at 
some point later - sounds like you have it covered though.



It's 2 times faster on InnoDB, and as it's just a SELECT query no need
to go in transaction details :-)


 Total runtime: 1.442 ms
(10 rows)

You posted a query that's taking 2/1000's of a second. I don't really 
see a performance problem here :)


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I have a fusion IO drive available for testing

2009-03-31 Thread Chris Browne
craig_ja...@emolecules.com (Craig James) writes:
 Dave Cramer wrote:
 So I tried writing directly to the device, gets around 250MB/s,
 reads at around 500MB/s

 The client is using redhat so xfs is not an option.

 I'm using Red Hat and XFS, and have been for years. Why is XFS not an option 
 with Red Hat?

If you report practically any kind of problem, and you're using XFS,
or JFS, or such, their support offering is to tell you to use a
supported filesystem.
-- 
let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;;
http://linuxdatabases.info/info/linuxxian.html
The only thing  better than TV with the  sound off is  Radio with the
sound off. -- Dave Moon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-14 Thread Chris Browne
mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes:
 why is it not a good idea to give end users control over when they
 want to run it ?

It's not a particularly good idea to give end users things that they
are likely then to *immediately* use to shoot themselves in the foot.

Turning off vacuuming all day is the sort of thing that is indeed
pretty certain to hurt you when you imagined it was going to help you.

In particular, if you shut off autovac all day, heavily updated tables
with certain sorts of (pretty common!) update patterns are certain to
bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on
them.

In effect, the practical effect of autovacuum at lean hours only is
more reasonably described as cancel autovacuum and revert to the
elder policy of requiring users to do manual vacuuming.

It's worth looking at how autovacuum has been evolving over time...

- When it was introduced, 8.0-ish (maybe it was 8.1 when it became
  official), it was pretty primitive.

  Autovac was a single process, where you had three controls over
  behaviour:

   - You could run autovac, or not; 

   - You could exclude specific tables from being processed by autovac

   - There is a capability to vacuum less aggressively by using
 delays to reduce autovac I/O usage

- In 8.3, it was enhanced to add the notion of having multiple vacuum
  workers

  There was discussion about having one of those workers restrict
  itself to small tables, so that you'd never have the scenario where
  the workers were all busy and a small table that needed vacuuming
  was left unvacuumed for a long time.  It appears that didn't happen,
  which seems unfortunate, but that's life...

You should look at all the knobs that *are* offered before deciding
a policy that may be harmful to performance.  As things stand now,
there are a couple of ways I could see tuning this:

 - You might check on the GUC variables autovacuum_vacuum_cost_delay
   and autovacuum_vacuum_cost_limit, which would allow you to restrict
   the I/O cost.

   This might allow you to run autovacuum all the time without
   adversely affecting performance.

 - You might come up with a list of the *LARGE* tables that you don't
   want vacuumed during the day, and set up a cron job that adds/drops
   them from the pg_autovacuum table at the appropriate times.

   This is definitely going to be more finicky, and requires a great
   deal more awareness of the tables being updated by your
   applications.  It makes autovacuum a whole lot less automatic.

There are known enhancements coming up:

 - In 8.4, there is a capability for VACUUM to only process the
   portions of the table known to have been altered.

   That ought to be a better answer than *any* of the fiddling
   suggested, to date.  Right now, a VACUUM on public.my_huge_table,
   a table 18GB in size, will walk through the entire table, even
   though there were only a handful of pages where tuples were
   invalidated.

   This is almost certainly the single best improvement possible to
   resolve your issue; it seems likely to *directly* address the
   problem, and has the considerable merit of not requiring much if
   any configuration/reconfiguration/scheduling.
-- 
(reverse (concatenate 'string gro.mca @ enworbbc))
http://linuxdatabases.info/info/
what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
Ommon Lisp. -- #Erik

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

2009-01-29 Thread Chris Browne
phoenix.ki...@gmail.com (Phoenix Kiula) writes:
 [Ppsted similar note to PG General but I suppose it's more appropriate
 in this list. Apologies for cross-posting.]

 Hi. Further to my bafflement with the count(*) queries as described
 in this thread:

 http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php

 It seems that whenever this question has come up, Postgresql comes up
 very short in terms of count(*) functions.

 The performance is always slow, because of the planner's need to guess
 and such. I don't fully understand how the statistics work (and the
 explanation on the PG website is way too geeky) but he columns I work
 with already have a stat level of 100. Not helping at all.

That's definitely *NOT* due to planner's need to guess; it's due to
there being some *specific* work that PostgreSQL needs to do that some
other databases can avoid due to different storage strategies.

The matter is quite succinctly described here:
http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Counting_rows_in_a_table

I'll just take one excerpt:
---
It is worth observing that it is only this precise form of aggregate
that must be so pessimistic; if augmented with a WHERE clause like

SELECT COUNT(*) FROM table WHERE status = 'something'

PostgreSQL, MySQL, and most other database implementations will take
advantage of available indexes against the restricted field(s) to
limit how many records must be counted, which can greatly accelerate
such queries.
---

It is common for systems where it is necessary for aggregation
reporting to be fast to do pre-computation of the aggregates, and that
is in no way specific to PostgreSQL.

If you need *really* fast aggregates, then it will be worthwhile to
put together triggers or procedures or something of the sort to help
pre-compute the aggregates.
-- 
(reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc))
http://linuxfinances.info/info/wp.html
When you have eliminated the impossible, whatever remains, however
improbable, must be the truth. -- Sir Arthur Conan Doyle (1859-1930),
English author. Sherlock Holmes, in The Sign of Four, ch. 6 (1889).
[...but see the Holmesian Fallacy, due to Bob Frankston...
http://www.frankston.com/public/Essays/Holmesian%20Fallacy.asp]

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-19 Thread Chris

Tom Lane wrote:

Chris dmag...@gmail.com writes:
I can see it's doing the extra filter step at the start (4th line) which 
is not present without the coalesce/case statement. I just don't 
understand why it's being done at that stage.


It's not that hard to understand.  With the original view formulation
(or the COALESCE version), the fully expanded form of the query looks
like

select ... from p left join r ...
  where expression_involving_both_p_and_r = constant

If you make the view output be just p.assetid then you have

select ... from p left join r ...
  where p.assetid = constant

In the first case the planner cannot apply the WHERE restriction until
it's formed the p+r join; so you see the condition applied as a filter
on the join node's output.  In the second case, the planner can push the
WHERE restriction down into the scan of p, since the left join doesn't
affect it.  (If a p row doesn't pass the restriction, then no join row
formed from it can either; ergo there is no need to form those join rows
at all.)


So because the CASE is on (some of) the fields I'm joining on, in effect 
it's made part of the join condition. If the fields are outside that 
(r.userid/p.userid), then it's evaluated after.


Thanks!

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris

Hi all,

I have a view that looks like this:

 SELECT
CASE
WHEN r.assetid IS NULL THEN p.assetid
ELSE r.assetid
END AS assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text;



It was pointed out to me that the first CASE is useless (since r.assetid 
will always be the same as p.assetid because of the left join condition) 
so I'm looking at that to see if it'll make much of a difference and it 
does.


I won't post the whole lot but the first line is the most interesting.

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';


 Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual 
time=284.371..341.536 rows=1 loops=1)


(The row count is right - it's the total # of rows from sq_ast_perm).


When I change the view to be:

 SELECT p.assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text;



The Merge left join only returns 3 rows:

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

   Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70) 
(actual time=11.544..11.549 rows=1 loops=1)


I thought the where condition would cut down on the rows returned, then 
the case statement would take effect to do the null check. It seems to 
be doing it in reverse ??


Recently analyzed, only just imported so free of bloat. Running 8.1.11.

Thanks!
--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris



The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition).  The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE.  The bad selectivity estimate, in turn, is
leading to a poor plan choice...


If I take it out of the view, it's fine:

#  SELECT
# CASE
# WHEN r.assetid IS NULL THEN p.assetid
# ELSE r.assetid
# END AS assetid,
# CASE
# WHEN r.userid IS NULL THEN p.userid
# ELSE r.userid
# END AS userid, p.permission, p.granted, p.cascades
#FROM sq_ast_perm p
#LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text

# where p.assetid='30748';

  QUERY PLAN 


-
 Merge Left Join  (cost=9459.89..9463.13 rows=3 width=102) (actual 
time=0.096..0.098 rows=1 loops=1)


In this case I assume the planner is doing the 'WHERE' first to cut down 
the rows, then applying the CASE at the end.


The view it seems to be the opposite - I still don't understand why 
that's the case.



Though I do get the same behaviour as the view when I do it as a subselect.

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris





I thought the where condition would cut down on the rows returned, then the
case statement would take effect to do the null check. It seems to be doing
it in reverse ??

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

It aperas to me that both of your statements have where clauses, but I
believe where isn't that explicit.  I'm not sure the nature of your
problem permits the query optimizer to eliminate rows at all, even
with the where statement. assetid is probably not known when the
query optimizer hits, because it is computed based on the nullness of
the columns.  I'd assume that the optimizer *could* more easily
optimize this if you had used coalesce rather than an ad-hoc method
with CASE. My guess is you can exclude rows with WHERE if the the
column used is an run-time computation involving an ad-hoc CASE.


No difference.

Full explain plan here:

http://explain-analyze.info/query_plans/2725-query-plan-1447

I can see it's doing the extra filter step at the start (4th line) which 
is not present without the coalesce/case statement. I just don't 
understand why it's being done at that stage.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel's X25-M SSD

2008-09-10 Thread Chris Browne
[EMAIL PROTECTED] (Merlin Moncure) writes:
 I think the SSD manufacturers made a tactical error chasing the
 notebook market when they should have been chasing the server
 market...

That's a very good point; I agree totally!
-- 
output = reverse(moc.enworbbc @ enworbbc)
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
We are all somehow dreadfully cracked about the head, and sadly need
mending. --/Moby-Dick/, Ch 17 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Incorrect estimates on correlated filters

2008-08-13 Thread Chris Kratz
On Wed, Aug 13, 2008 at 10:59 AM, Decibel! [EMAIL PROTECTED] wrote:

 On Aug 12, 2008, at 4:59 PM, Chris Kratz wrote:

 Ran into a re-occuring performance problem with some report queries again
 today.  In a nutshell, we have filters on either multiple joined tables, or
 multiple columns on a single table that are highly correlated.  So, the
 estimates come out grossly incorrect (the planner has no way to know they
 are correlated).  2000:1 for one I'm looking at right now.  Generally this
 doesn't matter, except in complex reporting queries like these when this is
 the first join of 40 other joins.  Because the estimate is wrong at the
 lowest level, it snowballs up through the rest of the joins causing the
 query to run very, very slowly.   In many of these cases, forcing nested
 loops off for the duration of the query fixes the problem.  But I have a
 couple that still are painfully slow and shouldn't be.

 I've been reading through the archives with others having similar problems
 (including myself a year ago).  Am I right in assuming that at this point
 there is still little we can do in postgres to speed up this kind of query?
  Right now the planner has no way to know the correlation between different
 columns in the same table, let alone columns in different tables.  So, it
 just assumes no correlation and returns incorrectly low estimates in cases
 like these.

 The only solution I've come up with so far is to materialize portions of
 the larger query into subqueries with these correlated filters which are
 indexed and analyzed before joining into the larger query.  This would keep
 the incorrect estimates from snowballing up through the chain of joins.

 Are there any other solutions to this problem?



 Well... you could try and convince certain members of the community that we
 actually do need some kind of a query hint mechanism... ;)

 I did make a suggestion a few months ago that involved sorting a table on
 different columns and recording the correlation of other columns. The scheme
 isn't perfect, but it would help detect cases like a field populated by a
 sequence and another field that's insert timestamp; those two fields would
 correlate highly, and you should even be able to correlate the two
 histograms; that would allow you to infer that most of the insert times for
 _id's between 100 and 200 will be between 2008-01-01 00:10 and 2008-01-01
 00:20, for example.
 --
 Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
 Give your computer some brain candy! www.distributed.net Team #1828


 Thanks for the reply,

Yes, I know hints are frowned upon around here.  Though, I'd love to have
them or something equivalent on this particular query just so the customer
can run their important reports.  As it is, it's unrunnable.

Unfortunately, if I don't think the sorting idea would help in the one case
I'm looking at which involves filters on two tables that are joined
together.  The filters happen to be correlated such that about 95% of the
rows from each filtered table are actually returned after the join.
Unfortunately, the planner thinks we will get 1 row back.

I do have to find a way to make these queries runnable.  I'll keep looking.

Thanks,

-Chris


[PERFORM] Incorrect estimates on correlated filters

2008-08-12 Thread Chris Kratz
Hello All,

Ran into a re-occuring performance problem with some report queries again
today.  In a nutshell, we have filters on either multiple joined tables, or
multiple columns on a single table that are highly correlated.  So, the
estimates come out grossly incorrect (the planner has no way to know they
are correlated).  2000:1 for one I'm looking at right now.  Generally this
doesn't matter, except in complex reporting queries like these when this is
the first join of 40 other joins.  Because the estimate is wrong at the
lowest level, it snowballs up through the rest of the joins causing the
query to run very, very slowly.   In many of these cases, forcing nested
loops off for the duration of the query fixes the problem.  But I have a
couple that still are painfully slow and shouldn't be.

I've been reading through the archives with others having similar problems
(including myself a year ago).  Am I right in assuming that at this point
there is still little we can do in postgres to speed up this kind of query?
Right now the planner has no way to know the correlation between different
columns in the same table, let alone columns in different tables.  So, it
just assumes no correlation and returns incorrectly low estimates in cases
like these.

The only solution I've come up with so far is to materialize portions of the
larger query into subqueries with these correlated filters which are indexed
and analyzed before joining into the larger query.  This would keep the
incorrect estimates from snowballing up through the chain of joins.

Are there any other solutions to this problem?

Thanks,

-Chris


Re: [PERFORM] Trigger is not firing immediately

2008-07-14 Thread Chris
Praveen wrote:
  
 Hi All,
 I am having a trigger in table, If I update the the table manually
 trigger is firing immediately(say 200ms per row), But if I update the
 table through procedure the trigger is taking time to fire(say 7 to 10
 seconds per row).
  
 Please tell me what kind of changes can I make so that  trigger  fire
 immediately while updating the table through procedure ?

Sending the same email over and over again isn't going to get you a
response any quicker.

If you send the details of the trigger and the tables/fields it affects
then you might get a more helpful response.

-- 
Postgresql  php tutorials
http://www.designmagick.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [pgsql-performance] function difference(geometry,geometry) is SLOW!

2008-06-16 Thread Chris Mair


 Date: Mon, 16 Jun 2008 11:06:44 +0200 (CEST)
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 To: pgsql-performance@postgresql.org
 Subject: function difference(geometry,geometry) is SLOW!
 Message-ID:
 [EMAIL PROTECTED]

 Hi,
 In my pgsql procedure, i use the function

 geometryDiff := difference
 (geometry1,geometry2);

 but this function is very slow!!!
 What can I do to
 speed this function?
 Exists a special index for it?

 Thanks in advance!
 Luke

Hi,

this is a postgis function. Postgis is an independent project
and you might want to ask there:

http://www.postgis.org/mailman/listinfo/postgis-users

or

http://www.faunalia.com/cgi-bin/mailman/listinfo/gfoss
(italian).

Anyway, as long as you just compute the difference between
2 given shapes, no index can help you. Indices speed up
searches...

Bye,
Chris.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Adding LIMIT 1 kills performance.

2008-05-29 Thread Chris Shoemaker
[Attn list-queue maintainers: Please drop the earlier version
of this email that I accidentally sent from an unsubscribed address. ]

Hi, 

I'm having a strange problem with a slow-running select query.  The
query I use in production ends in LIMIT 1, and it runs very slowly.
But when I remove the LIMIT 1, the query runs quite quickly.  This
behavior has stumped a couple smart DBAs.

The full queries and EXPLAIN ANALYZE plans are included below, but by
way of explanation/observation:

1) The LIMIT 1 case will sometimes be quicker (but still much slower
than the non-LIMIT 1 case) for different values of
calendar_group_id.

2) The query below is a slightly simplified version of the one I
actually use.  The real one includes more conditions which explain why
each table is joined.  For reference, the original query is quoted at
the end [1].  The original query exhibits the same behavior as the
simplified versions w.r.t. the LIMIT 1 case taking _much_ longer
(even longer than the simplified version) than the non-LIMIT 1 case,
and uses the same plans.


Can anyone explain why such a slow plan is chosen when the LIMIT 1
is present?  Is there anything I can do to speed this query up?
Thanks.

-chris


production= select version();
  version   

--
 PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC 
x86_64-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2)
(1 row)

production= analyze calendar_groups;
ANALYZE
production= analyze calendar_links;
ANALYZE
production= analyze calendars;
ANALYZE
production= analyze event_updates;
ANALYZE
production= EXPLAIN ANALYZE SELECT event_updates.*
 FROM event_updates
 INNER JOIN calendars ON event_updates.feed_id = calendars.id
 INNER JOIN calendar_links ON calendars.id = 
calendar_links.source_tracker_id
 WHERE (calendar_links.calendar_group_id = 3640)
 ORDER BY event_updates.id DESC
 LIMIT 1;

   QUERY PLAN   

---
 Limit  (cost=16.55..91.73 rows=1 width=2752) (actual time=27810.058..27810.059 
rows=1 loops=1)
   -  Nested Loop  (cost=16.55..695694.18 rows=9254 width=2752) (actual 
time=27810.054..27810.054 rows=1 loops=1)
 Join Filter: (event_updates.feed_id = calendars.id)
 -  Index Scan Backward using event_updates_pkey on event_updates  
(cost=0.00..494429.30 rows=8944370 width=2752) (actual time=0.030..7452.142 
rows=5135706 loops=1)
 -  Materialize  (cost=16.55..16.56 rows=1 width=8) (actual 
time=0.001..0.002 rows=1 loops=5135706)
   -  Nested Loop  (cost=0.00..16.55 rows=1 width=8) (actual 
time=0.029..0.034 rows=1 loops=1)
 -  Index Scan using 
index_calendar_links_on_calendar_group_id_and_source_tracker_id on 
calendar_links  (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 
rows=1 loops=1)
   Index Cond: (calendar_group_id = 3640)
 -  Index Scan using harvest_trackers_pkey on calendars  
(cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
   Index Cond: (calendars.id = 
calendar_links.source_tracker_id)
 Total runtime: 27810.161 ms
(11 rows)

production= EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates
 INNER JOIN calendars ON event_updates.feed_id = calendars.id
 INNER JOIN calendar_links ON calendars.id = 
calendar_links.source_tracker_id
 WHERE (calendar_links.calendar_group_id = 3640)
 ORDER BY event_updates.id DESC;

QUERY PLAN  
  
-
 Sort  (cost=43376.36..43399.50 rows=9256 width=2752) (actual 
time=10.178..10.205 rows=36 loops=1)
   Sort Key: event_updates.id
   -  Nested Loop  (cost=249.86..31755.56 rows=9256 width=2752) (actual 
time=9.957..10.098 rows=36 loops=1)
 -  Nested Loop  (cost=0.00..16.55 rows=1 width=8) (actual 
time=9.868..9.873 rows=1 loops=1)
   -  Index Scan using 
index_calendar_links_on_calendar_group_id_and_source_tracker_id on 
calendar_links  (cost=0.00..8.27 rows=1 width=4) (actual time=9.824..9.825 
rows=1 loops=1)
 Index Cond: (calendar_group_id = 3640)
   -  Index Scan using harvest_trackers_pkey on calendars  
(cost=0.00..8.27 rows=1

[PERFORM] OVERLAPS is slow

2008-05-29 Thread Chris Browne
I'm doing some analysis on temporal usages, and was hoping to make use
of OVERLAPS, but it does not appear that it makes use of indices.

Couching this in an example...  I created a table, t1, thus:

metadata=# \d t1
 Table public.t1
 Column |   Type   |   Modifiers
   
+--+---
 id | integer  | not null default 
nextval('t1_id_seq'::regclass)
 t1 | timestamp with time zone | not null default now()
 t2 | timestamp with time zone | not null default 'infinity'::timestamp 
with time zone
 data   | text | not null
Indexes:
t1_pkey PRIMARY KEY, btree (id)
f2 btree (id) WHERE t2 = 'infinity'::timestamp with time zone
t1t1 btree (t1)
t1t2 btree (t2)

When entries go in, they default to having an effective date range
from now() until 'infinity'.

I then went off and seeded a bunch of data into the table, inserting
values:

for i in `cat /etc/dictionaries-common/words | head 2000`; do
  psql -d metadata -c insert into t1 (data) values ('$i');
done

Then, I started doing temporal updates, thus:

for i in `cat /etc/dictionaries-common/words`; do
psql -d metadata -c insert into t1 (data) values ('$i');update t1 set t2 = 
now() where t2 = 'infinity' and id in (select id from t1 where t2 = 'infinity' 
order by random() limit 1);
done

This terminates many of those entries, and creates a new one that is
effective to infinity.

After running this for a while, I have a reasonably meaningful amount
of data in the table:

metadata=# select count(*) from t1; select count(*) from t1 where t2 = 
'infinity';
 count  

 125310
(1 row)

 count 
---
  2177
(1 row)

Searching for the active items in the table, via a constructed 'overlap':

metadata=# explain analyze  select count(*)  from t1 where t1 = now() and t2 
= now();
  QUERY PLAN
  
--
 Aggregate  (cost=98.13..98.14 rows=1 width=0) (actual time=8.104..8.105 rows=1 
loops=1)
   -  Index Scan using t1t2 on t1  (cost=0.00..93.95 rows=1671 width=0) 
(actual time=0.116..6.374 rows=2177 loops=1)
 Index Cond: (t2 = now())
 Filter: (t1 = now())
 Total runtime: 8.193 ms
(5 rows)

Note, that makes use of the index on column t2, and runs nice and
quick.  (And notice that the rows found, 2177, agrees with the earlier
count.)

Unfortunately, when I try using OVERLAPS, it reverts to a Seq Scan.

metadata=# explain analyze  select *  from t1 where (t1,t2) overlaps (now(), 
now());
QUERY PLAN  
   
---
 Seq Scan on t1  (cost=0.00..3156.59 rows=43135 width=24) (actual 
time=171.248..205.941 rows=2177 loops=1)
   Filter: overlaps(t1, t2, now(), now())
 Total runtime: 207.508 ms
(3 rows)

I would surely think that I have enough data in the table for the
stats to be good, and the first query certainly does harness the index
on t2 to determine if records are overlapping (now(),now()).

Is it possible that we need to have some improvement to the optimizer
so that OVERLAPS could make use of the indices?
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxfinances.info/info/lsf.html
Very little is known about the War of 1812 because the Americans lost
it.  -- Eric Nicol

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding LIMIT 1 kills performance.

2008-05-29 Thread Chris Shoemaker
On Fri, May 30, 2008 at 02:23:46AM +0930, Shane Ambler wrote:
 Chris Shoemaker wrote:
 [Attn list-queue maintainers: Please drop the earlier version
 of this email that I accidentally sent from an unsubscribed address. ]

 Hi, 
 I'm having a strange problem with a slow-running select query.  The
 query I use in production ends in LIMIT 1, and it runs very slowly.
 But when I remove the LIMIT 1, the query runs quite quickly.  This
 behavior has stumped a couple smart DBAs.


 Can anyone explain why such a slow plan is chosen when the LIMIT 1
 is present?  Is there anything I can do to speed this query up?
 Thanks.


 From what I know using an ORDER BY and a LIMIT can often prevent 
 *shortening* the query as it still needs to find all rows to perform the 
 order by before it limits.

That makes complete sense, of course.

 The difference in plans eludes me.

 production= EXPLAIN ANALYZE SELECT event_updates.*
  FROM event_updates
  INNER JOIN calendars ON event_updates.feed_id = calendars.id
  INNER JOIN calendar_links ON calendars.id = 
 calendar_links.source_tracker_id
  WHERE (calendar_links.calendar_group_id = 3640)
  ORDER BY event_updates.id DESC
  LIMIT 1;

 Does removing the DESC from the order by give the same variation in plans? 
 Or is this only when using ORDER BY ... DESC LIMIT 1?

Except for using Index Scan instead of Index Scan Backward, the plan
is the same with ORDER BY ... or ORDER BY ... ASC as with ORDER BY
... DESC.  In case you're wondering what would happen without the
ORDER BY at all:

production= EXPLAIN SELECT event_updates.*
FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
LIMIT 1; 
   QUERY 
PLAN   

 Limit  (cost=0.00..3.95 rows=1 width=2752)
   -  Nested Loop  (cost=0.00..36992.38 rows=9362 width=2752)
 -  Nested Loop  (cost=0.00..16.55 rows=1 width=8)
   -  Index Scan using 
index_calendar_links_on_calendar_group_id_and_source_tracker_id on 
calendar_links  (cost=0.00..8.27 rows=1 width=4)
 Index Cond: (calendar_group_id = 3640)
   -  Index Scan using harvest_trackers_pkey on calendars  
(cost=0.00..8.27 rows=1 width=4)
 Index Cond: (calendars.id = 
calendar_links.source_tracker_id)
 -  Index Scan using index_event_updates_on_feed_id_and_feed_type on 
event_updates  (cost=0.00..36858.50 rows=9386 width=2752)
   Index Cond: (event_updates.feed_id = calendars.id)
(9 rows)




 One thing that interests me is try -

 EXPLAIN ANALYZE SELECT * FROM (

 SELECT event_updates.*
 FROM event_updates
 INNER JOIN calendars ON event_updates.feed_id = calendars.id
 INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
 WHERE (calendar_links.calendar_group_id = 3640)
 ORDER BY event_updates.id DESC
 ) AS foo

 LIMIT 1;

That's an interesting experiment.  Here are the results:

   QUERY PLAN   


 Limit  (cost=16.55..91.74 rows=1 width=6027) (actual 
time=490709.355..490709.357 rows=1 loops=1)
   -  Nested Loop  (cost=16.55..703794.95 rows=9361 width=2752) (actual 
time=490709.352..490709.352 rows=1 loops=1)
 Join Filter: (event_updates.feed_id = calendars.id)
 -  Index Scan Backward using event_updates_pkey on event_updates  
(cost=0.00..500211.53 rows=9047416 width=2752) (actual time=0.222..469082.071 
rows=5251179 loops=1)
 -  Materialize  (cost=16.55..16.56 rows=1 width=8) (actual 
time=0.001..0.002 rows=1 loops=5251179)
   -  Nested Loop  (cost=0.00..16.55 rows=1 width=8) (actual 
time=0.240..0.246 rows=1 loops=1)
 -  Index Scan using 
index_calendar_links_on_calendar_group_id_and_source_tracker_id on 
calendar_links  (cost=0.00..8.27 rows=1 width=4) (actual time=0.108..0.109 
rows=1 loops=1)
   Index Cond: (calendar_group_id = 3640)
 -  Index Scan using harvest_trackers_pkey on calendars  
(cost=0.00..8.27 rows=1 width=4) (actual time=0.127..0.129 rows=1 loops=1)
   Index Cond: (calendars.id = 
calendar_links.source_tracker_id)
 Total runtime: 490709.576 ms
(11 rows)


That is, no real change in the performance.

Still stumped

Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Chris Ruprecht

Joshua,

did you try to run the 345 on an IBM ServeRAID 6i?
I have one in mine, but I never actually ran any speed test.
Do you have any benchmarks that I could run and compare?

best regards,
chris
--  
chris ruprecht

database grunt and bit pusher extraordinaíre


On May 12, 2008, at 22:11, Joshua D. Drake wrote:


On Mon, 12 May 2008 22:04:03 -0400
Francisco Reyes [EMAIL PROTECTED] wrote:


Inheritted a number of servers and I am starting to look into the



[snip]

Comparing software raid on an LSI onboard for an IBM 345 versus a  
2120s

using hardware raid 10, the software raid completely blew the adaptec
away.


[more snip]
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Chris Browne
[EMAIL PROTECTED] (Gauri Kanekar) writes:
 Basically we have some background process which updates table1 and
 we don't want the application to make any changes to table1 while
 vacuum.  Vacuum requires exclusive lock on table1 and if any of
 the background or application is ON vacuum don't kick off. Thats the
 reason we need to get the site down.

VACUUM has not required an exclusive lock on tables since version 7.1.

What version of PostgreSQL are you running?
-- 
output = (cbbrowne @ acm.org)
http://linuxdatabases.info/info/sap.html
Rules of the Evil Overlord #192.  If I appoint someone as my consort,
I will  not subsequently inform  her that she  is being replaced  by a
younger, more attractive woman.  http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Chris Browne
[EMAIL PROTECTED] (A B) writes:
 So, it is time to improve performance, it is running to slow.
 AFAIK (as a novice) there are a few general areas:

 1) hardware
 2) rewriting my queries and table structures
 3) using more predefined queries
 4) tweek parameters in the db conf files

 Of these points:
 1) is nothing I can do about right now, but in the future perhaps.
 2) will be quite hard right now since there is more code than time.
 3) almost like 2 but perhaps more do-able with the current constraints.
 4) This seems to be the easiest one to start with...

 So what should I do/read concerning point 4?
 If you have other good suggestions  I'd be very interested in that.

 Thank you :-)

In the order of ease of implementation, it tends to be...

1.  Tweak postgresql.conf
2.  Make sure you ran VACUUM + ANALYZE
3.  Find some expensive queries and try to improve them, which might
involve changing the queries and/or adding relevant indices
4.  Add RAM to your server
5.  Add disk to your server
6.  Redesign your application's DB schema so that it is more performant
by design

URL below may have some material of value...
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxfinances.info/info/postgresqlperformance.html
It is usually a   good idea to  put  a capacitor of a  few microfarads
across the output, as shown.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-28 Thread Chris Browne
[EMAIL PROTECTED] (Gauri Kanekar) writes:
 We have a table table1 which get insert and updates daily in high
 numbers, bcoz of which its size is increasing and we have to vacuum
 it every alternate day. Vacuuming table1 take almost 30min and
 during that time the site is down.  We need to cut down on this
 downtime.So thought of having a replication system, for which the
 replicated DB will be up during the master is getting vacuumed.  Can
 anybody guide which will be the best suited replication solution for
 this.

The only reason that it would be necessary for VACUUM to take the
site down would be if you are running version 7.1, which was
obsoleted in 2002, which, it should be noted, was SIX YEARS AGO.

As has been noted, you seem to be presupposing a remarkably complex
solution to resolve a problem which is likely to be better handled via
running VACUUM rather more frequently.
-- 
output = reverse(ofni.sesabatadxunil @ enworbbc)
http://www3.sympatico.ca/cbbrowne/postgresql.html
Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
shipped in tightly-packed bales. Any wagonload of loose hay attempting
to pass through a checkpoint will be set on fire.
http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Message queue table..

2008-04-18 Thread Chris Browne
[EMAIL PROTECTED] (Jesper Krogh) writes:
 I have this message queue table.. currently with 8m+
 records. Picking the top priority messages seem to take quite
 long.. it is just a matter of searching the index.. (just as explain
 analyze tells me it does).

 Can anyone digest further optimizations out of this output? (All
 records have funcid=4)

 # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey,
 job.insert_time, job.run_after, job.grabbed_until, job.priority,
 job.coalesce FROM workqueue.job  WHERE (job.funcid = 4) AND
 (job.run_after = 1208442668) AND (job.grabbed_until = 1208442668)
 AND (job.coalesce = 'Efam') ORDER BY funcid, priority ASC LIMIT 1
 ;

There might be value in having one or more extra indices...

Here are *plausible* candidates:

1.  If funcid = 4 is highly significant (e.g. - you are always
running this query, and funcid often  4), then you might add a
functional index such as:

  create index job_funcid_run_after on workqueue.job (run_after) where funcid = 
4;
  create index job_funcid_grabbeduntil on workqueue.job (grabbed_until) where 
funcid = 4;

2.  Straight indices like the following:

   create index job_run_after on workqueue.job(run_after);
   create index job_grabbed_until on workqueue.job(grabbed_until);
   create index job_funcid on workqueue.job(funcid);
   create index job_coalesce on workqueue.job(coalesce);

Note that it is _possible_ (though by no means guaranteed) that all
three might prove useful, if you're running 8.1+ where PostgreSQL
supports bitmap index scans.

Another possibility...

3.  You might change your process to process multiple records in a
run so that you might instead run the query (perhaps via a cursor?)

with LIMIT [Something Bigger than 1].

It does seem mighty expensive to run a 245ms query to find just one
record.  It seems quite likely that you could return the top 100 rows
(LIMIT 100) without necessarily finding it runs in any more time.

Returning 100 tuples in 245ms seems rather more acceptable, no?  :-)
-- 
(format nil [EMAIL PROTECTED] cbbrowne linuxfinances.info)
http://linuxdatabases.info/info/linuxdistributions.html
Rules of the Evil Overlord #32. I will not fly into a rage and kill a
messenger who brings me bad news  just to illustrate how evil I really
am. Good messengers are hard to come by.
http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddly slow queries

2008-04-16 Thread Chris Browne
[EMAIL PROTECTED] (Thomas Spreng) writes:
 On 16.04.2008, at 01:24, PFC wrote:

 The queries in question (select's) occasionally take up to 5 mins
 even if they take ~2-3 sec under normal conditions, there are no
 sequencial scans done in those queries. There are not many users
 connected (around 3, maybe) to this database usually since it's
 still in a testing phase. I tried to hunt down the problem by
 playing around with resource usage cfg options but it didn't really
 made a difference.

  Could that be caused by a CHECKPOINT ?

 actually there are a few log (around 12 per day) entries concerning
 checkpoints:

 LOG:  checkpoints are occurring too frequently (10 seconds apart)
 HINT:  Consider increasing the configuration parameter
 checkpoint_segments.

 But wouldn't that only affect write performance? The main problems I'm
 concerned about affect SELECT queries.

No, that will certainly NOT just affect write performance; if the
postmaster is busy writing out checkpoints, that will block SELECT
queries that are accessing whatever is being checkpointed.

When we were on 7.4, we would *frequently* see SELECT queries that
should be running Very Quick that would get blocked by the checkpoint
flush.

We'd periodically see hordes of queries of the form:

  select id from some_table where unique_field = 'somevalue.something';

which would normally run in less than 1ms running for (say) 2s.

And the logs would show something looking rather like the following:

2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'somevalue.something'; - 952ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'somevalue.something'; - 742ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'another.something'; - 1341ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'somevalue.something'; - 911ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'another.something'; - 1244ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'another.something'; - 2311ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'another.something'; - 1799ms
2008-04-03 09:01:52 LOG select id from some_table where unique_field = 
'somevalue.something'; - 1992ms

This was happening because the checkpoint was flushing those two
tuples, and hence blocking 8 SELECTs that came in during the flush.

There are two things worth considering:

1.  If the checkpoints are taking place too frequently, then that is
clear evidence that something is taking place that is injecting REALLY
heavy update load on your database at those times.

If the postmaster is checkpointing every 10s, that implies Rather
Heavy Load, so it is pretty well guaranteed that performance of other
activity will suck at least somewhat because this load is sucking up
all the I/O bandwidth that it can.

So, to a degree, there may be little to be done to improve on this.

2.  On the other hand, if you're on 8.1 or so, you may be able to
configure the Background Writer to incrementally flush checkpoint data
earlier, and avoid the condition of 1.

Mind you, you'd have to set BgWr to be pretty aggressive, based on the
10s periodicity that you describe; that may not be a nice
configuration to have all the time :-(.
-- 
output = reverse(ofni.sesabatadxunil @ enworbbc)
http://cbbrowne.com/info/multiplexor.html
Nagging  is the  repetition  of unpalatable  truths. --Baroness  Edith
Summerskill

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background writer underemphasized ...

2008-04-16 Thread Chris Browne
[EMAIL PROTECTED] (Marinos Yannikos) writes:
 This helped with our configuration:
 bgwriter_delay = 1ms # 10-1ms between rounds
 bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round

FYI, I'd be inclined to reduce both of those numbers, as it should
reduce the variability of behaviour.

Rather than cleaning 1K pages every 10s, I would rather clean 100
pages every 1s, as that will have much the same effect, but spread the
work more evenly.  Or perhaps 10 pages every 100ms...

Cut the delay *too* low and this might make the background writer, in
effect, poll *too* often, and start chewing resources, but there's
doubtless some sweet spot in between...
-- 
cbbrowne,@,cbbrowne.com
http://linuxdatabases.info/info/oses.html
For systems, the analogue of a face-lift is to add to the control
graph an edge that creates a cycle, not just an additional node.
-- Alan J. Perlis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Creating large database of MD5 hash values

2008-04-11 Thread Chris




1. Which datatype should I use to represent the hash value? UUIDs are
also 16 bytes...


md5's are always 32 characters long so probably varchar(32).


2. Does it make sense to denormalize the hash set relationships?


The general rule is normalize as much as possible then only denormalize 
when absolutely necessary.



3. Should I index?


What sort of queries are you going to be running?


4. What other data structure options would it make sense for me to choose?


What sort of other data will you be needing to store?

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread Chris

Craig Ringer wrote:

Christian Bourque wrote:

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!
  
Would I be correct in guessing that there are foreign key relationships 
between those tables, and that there are significant numbers of indexes 
in use?


The foreign key checking costs will go up as the tables grow, and AFAIK 
the indexes get a bit more expensive to maintain too.


If possible you should probably drop your foreign key relationships and 
drop your indexes, insert your data, then re-create the indexes and 
foreign keys. The foreign keys will be rechecked when you recreate them, 
and it's *vastly* faster to do it that way. Similarly, building an index 
from scratch is quite a bit faster than progressively adding to it. Of 
course, dropping the indices is only useful if you aren't querying the 
tables as you build them.


If you are, add analyze commands through the import, eg every 10,000 
rows. Then your checks should be a bit faster.


The other suggestion would be to do block commits:

begin;
do stuff for 5000 rows;
commit;

repeat until finished.

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow pg_connect()

2008-03-24 Thread Chris



* Read about configuring and using persistent database connections
  (http://www.php.net/manual/en/function.pg-pconnect.php) with PHP


Though make sure you understand the ramifications of using persistent 
connections. You can quickly exhaust your connections by using this and 
also cause other issues for your server.


If you do this you'll probably have to adjust postgres to allow more 
connections, which usually means lowering the amount of shared memory 
each connection can use which can also cause performance issues.


I'd probably use pgpool-II and have it handle the connection stuff for 
you rather than doing it through php.


--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Chris Kratz
A number of weeks ago, I had posted a request for help regarding join
estimates in pg 8.2.6.  In moderately complex to very complex ad hoc queries
in our system, we were consistently having the system massively
underestimate the number of rows coming out of join at a low level making
these queries very slow and inefficient.  At times the mis-estimation was
1000:1.  Ie when it should have been 2000 returned rows from a join, the
planner assumed 1 or 2 rows.  Modifying stats on the join columns up to the
max made little difference (y, we analyzed tables in question after each
change).  Since the planner sees only one row coming out of the low level
join, it uses nested loops all the way up chain when it would be more
efficient to use another join type.  In our informal testing, we found that
by disabling nested loops and forcing other join types, we could get
fantastic speedups.  Those queries that seem to benefit most from this have
a lot of sub-queries being built up into a final query set as well as a fair
number of joins in the sub-queries.  Since these are user created and are
then generated via our tools, they can be quite messy at times.
After doing this testing, have since added some functionality in our ad hoc
reporting tool to allow us to tune individual queries by turning on and off
individual join types at runtime.  As we hear of slow reports, we've been
individually turning off the nested loops on those reports.  Almost always,
this has increased the performance of the reports, sometimes in a completely
amazing fashion (many, many minutes to seconds at times).  It of course
doesn't help everything and turning off nested loops in general causes
overall slowdown in other parts of the system.

As this has gone on over the last couple of weeks, it feels like we either
have a misconfiguration on the server, or we are tickling a mis-estimation
bug in the planner.  I'm hoping it's the former.  The db server has 8G of
memory and raid1 -wal, raid10- data configuration, os is linux 2.6.9, db is
8.2.6.  The db is a utf-8 db if that is of any bearing and autovac and
bgwriter are on.

Nondefault settings of interest from postgresql.conf


shared_buffers = 1024MB # min 128kB or max_connections*16kB
work_mem = 256MB# min 64kB
maintenance_work_mem = 256MB# min 1MB
random_page_cost = 1.75 # same scale as above
effective_cache_size = 4096MB
default_statistics_target = 100 # range 1-1000


If nothing else, perhaps this will help somebody else who has run into the
same problem.  If explain analyze of a query shows a large mis-estimation of
rows returned on a join (estimate=1, actual=2k) causing the planner to
choose nested loops instead of another join type, you might try running the
query with nested loops set to off and see if that helps w/ performance.

Thanks,

-Chris


Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Chris Kratz
Y, turning nested loops off in specific cases has increased performance
greatly.  It didn't fix the planner mis-estimation, just the plan it chose.
 It's certainly not a panacea, but it's something we now try early on when
trying to speed up a query that matches these characteristics.
-Chris

On 3/18/08, Joshua D. Drake [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1


 On Tue, 18 Mar 2008 11:35:08 -0400
 Chris Kratz [EMAIL PROTECTED] wrote:

  Nondefault settings of interest from postgresql.conf
 
 
  shared_buffers = 1024MB # min 128kB or
  max_connections*16kB work_mem = 256MB
  # min 64kB maintenance_work_mem = 256MB# min 1MB
  random_page_cost = 1.75 # same scale as above
  effective_cache_size = 4096MB
  default_statistics_target = 100 # range 1-1000
 
 
  If nothing else, perhaps this will help somebody else who has run
  into the same problem.  If explain analyze of a query shows a large
  mis-estimation of rows returned on a join (estimate=1, actual=2k)
  causing the planner to choose nested loops instead of another join
  type, you might try running the query with nested loops set to off
  and see if that helps w/ performance.


 Did you try that? Did it work?

 Joshua D. Drake


 - --
 The PostgreSQL Company since 1997: http://www.commandprompt.com/
 PostgreSQL Community Conference: http://www.postgresqlconference.org/
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
   PostgreSQL political pundit | Mocker of Dolphins

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)

 iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU
 eI9i5yucBH718okW3w2UewQ=
 =BO3E
 -END PGP SIGNATURE-



Re: [PERFORM] question on TRUNCATE vs VACUUM FULL

2008-03-18 Thread Chris




So my question is this:  Shouldn’t VACUUM FULL clean Table C and reclaim 
all its space?


You've got concepts mixed up.

TRUNCATE deletes all of the data from a particular table (and works in 
all dbms's).


http://www.postgresql.org/docs/8.3/interactive/sql-truncate.html



VACUUM FULL is a postgres-specific thing which does work behind the 
scenes to clean up MVCC left-overs. It does not touch any current data 
or records in the table, it's purely behind the scenes work.


http://www.postgresql.org/docs/current/interactive/sql-vacuum.html


The two have completely different uses and nothing to do with each other 
what-so-ever.


--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] migration of 7.4 to 8.1

2008-03-12 Thread Chris

[EMAIL PROTECTED] wrote:

On Wed, 12 Mar 2008, sathiya psql wrote:


In the home page itself they were saying  testing ... unstable


you are talking about the debian home page right?


then we should not use that for live.

so i prefer 8.1 .


Debian selected the version of Postgres for Etch about a year and a half 
ago. At that point selecting 8.1 was a resonable choice.


Debian has a policy that they will never change the version number of a 
package in a stable release (they will backport many bugfixes, but not 
upgrade the version)


As a result 2 years from now when Postgres is on 8.5 stable (and looking 
at 8.6), Debian Etch will still be on 8.1


I like that with debian I can install multiple postgres versions and it 
handles everything for me :) Changing the default port, config files are 
in different folders, different start up scripts.. all works very nicely :)


Which means you can have 8.1 installed and 8.3 installed at the same 
time - both from packages, no compiling etc necessary - and can switch 
between them very easily.


--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Confirmação de envio / Sending confirmation (captchaid:13266b20536d)

2008-03-05 Thread Chris

petchimuthu lingam wrote:

C5BK4513


Ahh - you are sending this to the wrong address, these are not being 
sent by the postgres mailing list.


Check which address you are replying to next time...

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread Chris

sathiya psql wrote:

count(*) tooks much time...

but with the where clause we can make this to use indexing,... what 
where clause we can use??


Am using postgres 7.4 in Debian OS with 1 GB RAM,

am having a table with nearly 50 lakh records,


Looks suspiciously like a question asked yesterday:

http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance


[PERFORM] Ramifications of turning off Nested Loops for slow queries

2008-03-04 Thread Chris Kratz

Hello Everyone,

I had posted an issue previously that we've been unable to resolve.   
An early mis-estimation in one or more subqueries causes the remainder  
of the query to choose nested loops instead of a more efficient method  
and runs very slowly (CPU Bound).  I don't think there is any way to  
suggest to the planner it not do what it's doing, so we are starting  
to think about turning off nested loops entirely.


Here is the history so far:

http://archives.postgresql.org/pgsql-performance/2008-02/msg00205.php

At the suggestion of the list, we upgraded to 8.2.6 and are still  
experiencing the same problem.  I'm now installing 8.3 on my  
workstation  to see if it chooses a better plan, but it will take some  
time to get it compiled, a db loaded, etc.


We have a number of very long running reports that will run in seconds  
if nested loops are turned off.  The other alternative we are  
exploring is programmatically turning off nested loops just for the  
problematic reports.  But with the speedups we are seeing, others are  
getting gun shy about having them on at all.


So, I've now been asked to ping the list as to whether turning off  
nested loops system wide is a bad idea, and why or why not.


Any other thoughts or suggestions?

Thanks,

-Chris

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance


Re: [PERFORM] Ramifications of turning off Nested Loops for slow queries

2008-03-04 Thread Chris Kratz
On 3/4/08, Kevin Grittner [EMAIL PROTECTED] wrote:

  On Tue, Mar 4, 2008 at  8:42 AM, in message
  Any other thoughts or suggestions?


 Make sure your effective_cache_size is properly configured.

 Increase random_page_cost and/or decrease seq_page_cost.
 You can play with the cost settings on a connection, using EXPLAIN
 on the query, to see what plan you get with each configuration
 before putting it into the postgresql.conf file.


 -Kevin


That was a good idea.  I hadn't tried playing with those settings in a
session.  This is a 8G box, and we've dedicated half of that (4G) to the
file system cache.  So, 4G is what effective_cache_size is set to.  Our
seq_page_cost is set to 1 and our random_page_cost is set to 1.75 in the
postgresql.conf.

In testing this one particular slow query in a session, I changed these
settings alternating in increments of 0.25.  The random_page_cost up to 4
and the seq_page_cost down to 0.25.  This made perhaps a second difference,
but at the end, we were back to to the 37s.  Doing a set enable_nestloop=off
in the session reduced the runtime to 1.2s with the other settings back to
our normal day to day settings.

So, for now I think we are going to have to modify the code to prepend the
problematic queries with this setting and hope the estimator is able to
better estimate this particular query in 8.3.

Thanks for the suggestions,

-Chris


  1   2   3   4   >