[PERFORM] Fwd: Stalled post to pgsql-performance
Hi Karl and Jeff, On 26 June 2017 at 22:22, Jeff Janes <jeff.ja...@gmail.com> wrote: > Be warned that "explain (analyze)" can substantially slow down and distort > this type of query, especially when sorting. You should run "explain > (analyze, timing off)" first, and then only trust "explain (analyze)" if > the overall execution times between them are similar. > Thanks, I didn't realise that. I will use TIMING OFF from now on. On 26 June 2017 at 21:32, Karl Czajkowski <kar...@isi.edu> wrote: > > I created the index starting with date and it did make a big > > difference: down to 10.3 seconds using a bitmap index scan and bitmap > > heap scan (and then two hash joins as before). > > By the way, what kind of machine are you using? CPU, RAM, backing > storage? > > I tried running your original test code and the query completed in > about 8 seconds, and adding the index changes and analyze statement > brought it down to around 2.3 seconds on my workstation with Postgres > 9.5.7. On an unrelated development VM with Postgres 9.6.3, the final > form took around 4 seconds. > This is very interesting. I'm using a powerful box: - HP ProLiant DL580 G7, Xeon(R) CPU E7- 4850 @ 2.00GHz * 80 cores, 128 GB RAM, hardware RAID, 3.6 TB SAS array. totalusedfree shared buff/cache available Mem: 125G2.2G834M 30G122G 91G Swap: 9.3G 98M9.2G And disk I/O is fast: $ dd if=/dev/zero of=/local/tmp/bigfile bs=1M count=100k 107374182400 bytes (107 GB) copied, 234.751 s, 457 MB/s But your question let me to investigate and discover that we were compiling Postgres with no optimisations! I've built a new one with -O2 and got the time down to 3.6 seconds (EXPLAIN with either TIMING OFF or BUFFERS, there's no material difference). And again, vacuum your tables. Heap fetches aren't cheap. > Sorry, I don't understand, why does VACUUM help on a table with no deleted rows? Do you mean ANALYZE? > > work_mem = 100MB > > Can you give it more than that? How many simultaneous connections do you > expect? > Yes, I can and it does help! By increasing work_mem to 200 MB, I managed to convert the external merge sort (on disk) to a quicksort in memory, and reached 3.3 seconds. The cartestian join is slightly faster at 3.0 seconds, but not enough to be worth the overhead of creating the join table. I still wish I understood why it helps. Jeff, thanks for the explanation about hash joins and sorting. I wish I understood why a hash join wouldn't preserve order in the first table even if it has to be done incrementally, since I expect that we'd still be reading records from the first table in order, but just in batches. Other possible rewrites to try instead of joins: > > -- replace the case statement with a scalar subquery > > -- replace the case statement with a stored procedure wrapping that > scalar subquery > and declare the procedure as STABLE or even IMMUTABLE > > These are shots in the dark, but seem easy enough to experiment with and > might > behave differently if the query planner realizes it can cache results for > repeated use of the same ~100 input values. I hit a jackpot with jsonb_object_agg, getting down to 2.1 seconds (2.8 with BUFFERS and TIMING <https://explain.depesz.com/s/uWyM>): explain (analyze, timing off) with metric as (select jsonb_object_agg(id, pos) AS metric_lookup from metric_pos), asset as (select jsonb_object_agg(id, pos) AS asset_lookup from asset_pos) SELECT metric_lookup->id_metric AS pos_metric, asset_lookup->id_asset AS pos_asset, date, value FROM metric_value, metric, asset WHERE date >= '2016-01-01' and date < '2016-06-01' AND timerange_transaction @> current_timestamp ORDER BY metric_value.id_metric, metric_value.id_asset, date; Which is awesome! Thank you so much for your help, both of you! Now if only we could make hash joins as fast as JSONB hash lookups :) Cheers, Chris.
Re: [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each
till very happy for any ideas on how to get back the 2 seconds longer <https://explain.depesz.com/s/NgfZ> than it takes without any joins to the dimension tables (3.7 seconds), or explain why the cartesian join helps and/or how we can get the same speedup without materialising it. SELECT id_metric, id_asset, date, value FROM metric_value WHERE date >= '2016-01-01' and date < '2016-06-01' AND timerange_transaction @> current_timestamp ORDER BY date, metric_value.id_metric; Cheers, Chris.
[PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each
ly small integers. They would probably be temporary tables in our real use case. But removing them (and just selecting the IDs from metric_value) cuts 4 seconds off the query time (to 3.3 seconds). Why are they slow? If I remove one of the joins (asset_pos) then I get a merge join between two indexes, as expected, but it has a materialize just before it which makes no sense to me. Why do we need to materialize here? And why materialise 100 rows into 1.5 million rows? (explain.depesz.com <https://explain.depesz.com/s/7mkM>) SELECT metric_pos.pos AS pos_metric, id_asset AS pos_asset, date, value FROM metric_value INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric WHERE date >= '2016-01-01' and date < '2016-06-01' AND timerange_transaction @> current_timestamp ORDER BY metric_value.id_metric, metric_value.id_asset, date Merge Join (cost=0.70..209302.76 <070%202093%200276> rows=1520071 width=28) (actual time=0.097..4899.972 rows=152 loops=1) Merge Cond: (metric_value.id_metric = metric_pos.id) Buffers: shared hit=76403 -> Index Only Scan using idx_metric_value_id_metric_id_asset_date on metric_value (cost=0.56..182696.87 <056%201826%209687> rows=1520071 width=20) (actual time=0.074..3259.870 rows=152 lo ops=1) Index Cond: ((date >= '2016-01-01'::date) AND (date < '2016-06-01'::date)) Filter: (timerange_transaction @> now()) Heap Fetches: 152 Buffers: shared hit=76401 -> Materialize (cost=0.14..4.89 rows=100 width=8) (actual time=0.018..228.265 rows=1504801 loops=1) Buffers: shared hit=2 -> Index Only Scan using idx_metric_pos_id_pos on metric_pos (cost=0.14..4.64 rows=100 width=8) (actual time=0.013..0.133 rows=100 loops=1) Heap Fetches: 100 Buffers: shared hit=2 Planning time: 0.761 ms Execution time: 5253.260 ms The size of the result set is approximately 91 MB (measured with psql -c | wc -c). Why does it take 4 seconds to transfer this much data over a UNIX socket on the same box? Can it be made faster? The data is quite redundant (it's sorted for a start) so compression makes a big difference, and simple prefix elimination could probably reduce the volume of redundant data sent back to the client. Standard background info: - PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit, compiled from source. - shared_buffers = 15GB, work_mem = 100MB, seq_page_cost = 0.5, random_page_cost = 1.0, cpu_tuple_cost = 0.01. - HP ProLiant DL580 G7, Xeon(R) CPU E7- 4850 @ 2.00GHz * 80 cores, hardware RAID, 3.6 TB SAS array. Thanks again in advance for any suggestions, hints or questions. Cheers, Chris.
Re: [PERFORM] Big number of connections
Although somewhat academic, since pgboucer doesn’t support it (and might not ever), have a look at this ticket which, if it was ever supported, would give you what you needed: https://github.com/pgbouncer/pgbouncer/issues/75 <https://github.com/pgbouncer/pgbouncer/issues/75> > On Mar 31, 2016, at 15:47, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > >> If they are going to keep the client connections open, they would need to >> run pgbouncer in statement or transaction mode. > > As I understand, in pgbouncer you cannot have connections that serve > different users. If each individual requires its own database-level > user, pgbouncer would not help at all. > > I would look seriously into getting rid of the always-open requirement > for connections. — Chris Cogdon
[PERFORM] Adding a ROLLUP switches to GroupAggregate unexpectedly
Hi folks! I’ve a query where adding a rollup to the group by switches to GroupAggregate unexpectedly, where the standard GROUP BY uses HashAggregate. Since the rollup should only add one additional bucket, the switch to having to sort (and thus a to-disk temporary file) is very puzzling. This reads like a query optimiser bug to me. This is the first I’ve posted to the list, please forgive me if I’ve omitted any “before bugging the list” homework. Description: Adding a summary row by changing “GROUP BY x” into “GROUP BY ROLLUP (x)” should not cause a switch from HashAggregate to GroupAggregate Here’s the “explain” from the simple GROUP BY: projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM api_activities GROUP BY error_code; QUERY PLAN - HashAggregate (cost=3456930.11..3456930.16 rows=5 width=2) (actual time=26016.222..26016.223 rows=5 loops=1) Output: error_code, count(*) Group Key: api_activities.error_code -> Seq Scan on public.api_activities (cost=0.00..3317425.74 rows=27900874 width=2) (actual time=0.018..16232.608 rows=36224844 loops=1) Output: id, client_id, date_added, kind, activity, error_code Planning time: 0.098 ms Execution time: 26016.337 ms (7 rows) Changing this to a GROUP BY ROLLUP switches to GroupAggregate (with the corresponding to-disk temporary table being created): projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM api_activities GROUP BY rollup (error_code); QUERY PLAN --- GroupAggregate (cost=7149357.90..7358614.52 rows=6 width=2) (actual time=54271.725..82354.144 rows=6 loops=1) Output: error_code, count(*) Group Key: api_activities.error_code Group Key: () -> Sort (cost=7149357.90..7219110.09 rows=27900874 width=2) (actual time=54270.636..76651.121 rows=36222428 loops=1) Output: error_code Sort Key: api_activities.error_code Sort Method: external merge Disk: 424864kB -> Seq Scan on public.api_activities (cost=0.00..3317425.74 rows=27900874 width=2) (actual time=0.053..34282.239 rows=36222428 loops=1) Output: error_code Planning time: 2.611 ms Execution time: 82437.416 ms (12 rows) I’ve given the output of “EXPLAIN ANAYLZE VERBOSE” rather than non-analyze, but there was no difference in the plan. Running VACUUM FULL ANALYZE on this table makes no difference. Switching to Count(error_code) makes no difference. Using GROUP BY GROUPING SETS ((), error_code) makes no difference. I understand that a HashAggregate is possible only if it can fit all the aggregates into work_mem. There are 5 different error codes, and the statistics (from pg_stats) are showing that PG knows this. Adding just one more bucket for the “()” case should not cause a fallback to GroupAggregate. PostgreSQL version: 9.5.2 (just upgraded today, Thank you! <3 ) (Was exhibiting same problem under 9.5.0) How installed: apt-get package from apt.postgresql.org <http://apt.postgresql.org/> Settings differences: application_name: psql client_encoding: UTF8 DateStyle: ISO, MDY default_text_search_config: pg_catalog.english dynamic_shared_memory_type: posix lc_messages: en_US.UTF-8 lc_monetary: en_US.UTF-8 lc_numeric: en_US.UTF-8 lc_time: en_US.UTF-8 listen_addresses: * log_line_prefix: %t [%p-%c-%l][%a][%i][%e][%s][%x-%v] %q%u@%d log_timezone: UTC logging_collector: on max_connections: 100 max_stack_depth: 2MB port: 5432 shared_buffers: 1GB ssl: on ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key TimeZone: UTC work_mem: 128MB OS and Version: Ubuntu Trusty: Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Program used to connect: psql Nothing unusual in the logs, apart from the query indicating that it took a while to run. I know that there’s several workarounds I can use for this simple case, such as using a CTE, then doing a rollup on that, but I’m simply reporting what I think is a bug in the query optimizer. Thank you for your attention! Please let me know if there’s any additional information you need, or additional tests you’d like to run. — Chris Cogdon <ch...@cogdon.org <mailto:ch...@cogdon.org>> — Using PostgreSQL since 6.2!
[PERFORM] Building multiple indexes on one table.
Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For small tables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't want to read that table 6 times. Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that are already there and I don't know if that reads the table once or multiple times. If I could create indexes inactive and then run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco robert.difa...@gmail.comwrote: I have several related tables that represent a call state. Let's think of these as phone calls to simplify things. Sometimes I need to determine the last time a user was called, the last time a user answered a call, or the last time a user completed a call. The basic schema is something like this: CREATE TABLE calls ( id BIGINT NOT NULL, // sequence generator user_id BIGINT NOT NULL, called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE ); CREATE TABLE calls_answered ( idBIGINT NOT NULL, answered TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE ); And so on for calls_connected, calls_completed, call_errors, etc. Occasionally I will want to know things like When was the last time a user answered a call or How many times has a user been called. I can do these queries using a combination of MAX or COUNT. But I'm concerned about the performance. SELECT MAX(a.id) FROM calls_answered a JOIN calls c ON c.id = a.id WHERE c.user_id = ?; Or the number of answered calls: SELECT MAX(a.id) FROM calls_answered a JOIN calls c ON c.id = a.id WHERE c.user_id = ?; Sometimes I might want to get this data for a whole bunch of users. For example, give me all users whose have not answered a call in the last 5 days. Or even what percentage of users called actually answered a call. This approach could become a performance issue. So the other option is to create a call_summary table that is updated with triggers. The summary table would need fields like user_id, last_call_id, call_count, last_answered_id, answered_count, last_completed_id, last_completed_count, etc. My only issue with a summary table is that I don't want a bunch of null fields. For example, if the user was *called* but they have never *answered* at call then the last_call_id and call_count fields on the summary table would be non-NULL but the last_answer_id and answer_count fields WOULD be NULL. But over time all fields would eventually become non-NULL. So that leads me to a summary table for EACH call state. Each summary table would have a user id, a ref_id, and a count -- one summary table for each state e.g. call_summary, call_answered_summary, etc. This approach has the down side that it creates a lot of tables and triggers. It has the upside of being pretty efficient without having to deal with NULL values. It's also pretty easy to reason about. So for my question -- is the choice between these a personal preference sort of thing or is there a right or wrong approach? Am I missing another approach that would be better? I'm okay with SQL but I'm not expert so I'm not sure if there is an accepted DESIGN PATTERN for this that I am missing. Thanks! My initial thought is: that design is over-normalized. The thing you are trying to model is the call, and it has severl attributes, some of which may be unknown or not applicable (which is what NULL is for). So my thought would be to do something like this: CREATE TABLE calls ( id BIGINT NOT NULL, // sequence generator user_id BIGINT NOT NULL, called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, answered TIMESTAMPTZ PRIMARY KEY (id), FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE ); -- I asked the Internet how to train my cat, and the Internet told me to get a dog.
Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis
On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey ch...@chriscurvey.comwrote: On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco robert.difa...@gmail.com wrote: I have several related tables that represent a call state. Let's think of these as phone calls to simplify things. Sometimes I need to determine the last time a user was called, the last time a user answered a call, or the last time a user completed a call. The basic schema is something like this: CREATE TABLE calls ( id BIGINT NOT NULL, // sequence generator user_id BIGINT NOT NULL, called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE ); CREATE TABLE calls_answered ( idBIGINT NOT NULL, answered TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE ); And so on for calls_connected, calls_completed, call_errors, etc. Occasionally I will want to know things like When was the last time a user answered a call or How many times has a user been called. I can do these queries using a combination of MAX or COUNT. But I'm concerned about the performance. SELECT MAX(a.id) FROM calls_answered a JOIN calls c ON c.id = a.id WHERE c.user_id = ?; Or the number of answered calls: SELECT MAX(a.id) FROM calls_answered a JOIN calls c ON c.id = a.id WHERE c.user_id = ?; Sometimes I might want to get this data for a whole bunch of users. For example, give me all users whose have not answered a call in the last 5 days. Or even what percentage of users called actually answered a call. This approach could become a performance issue. So the other option is to create a call_summary table that is updated with triggers. The summary table would need fields like user_id, last_call_id, call_count, last_answered_id, answered_count, last_completed_id, last_completed_count, etc. My only issue with a summary table is that I don't want a bunch of null fields. For example, if the user was *called* but they have never *answered* at call then the last_call_id and call_count fields on the summary table would be non-NULL but the last_answer_id and answer_count fields WOULD be NULL. But over time all fields would eventually become non-NULL. So that leads me to a summary table for EACH call state. Each summary table would have a user id, a ref_id, and a count -- one summary table for each state e.g. call_summary, call_answered_summary, etc. This approach has the down side that it creates a lot of tables and triggers. It has the upside of being pretty efficient without having to deal with NULL values. It's also pretty easy to reason about. So for my question -- is the choice between these a personal preference sort of thing or is there a right or wrong approach? Am I missing another approach that would be better? I'm okay with SQL but I'm not expert so I'm not sure if there is an accepted DESIGN PATTERN for this that I am missing. Thanks! (Sorry, fat-fingered and hit send too early...) CREATE TABLE calls ( id BIGINT NOT NULL, // sequence generator user_id BIGINT NOT NULL, called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, answered TIMESTAMPTZ NULL, connected TIMESTAMPTZ NULL, completed TIMESTAMPTZ NULL, PRIMARY KEY (id), FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE ); Then your queries end up looking like this: --last time john answered SELECT MAX(a.id) FROM calls where answered is not null and user_id = ? -- users that have not answered a call in the last five days (I can think of a few ways to interpret that phrase) select myusers.* from myusers where not exists ( select * from calls where calls.user_id = myusers.user_id and answered = five days ago) -- average ring time select avg(extract ('seconds' from called - answered)) where answered is not null -- I asked the Internet how to train my cat, and the Internet told me to get a dog.
[PERFORM] Using a window function in a view
I'm trying to create a view that uses a window function, but it seems that Postgres is apparently unable to optimize it. Here's a reproduction of my situation with 9.2.2: --- drop table if exists values cascade; create table values ( fkey1 integer not null, fkey2 integer not null, fkey3 integer not null, value float not null, constraint values_pkey primary key (fkey1, fkey2, fkey3) ); -- Kind of hacky, but it roughly resembles my dataset. insert into values select distinct on (fkey1, fkey2, fkey3) i / 12 + 1 as fkey1, i % 4 + 1 as fkey2, ceil(random() * 10) as fkey3, random() * 2 - 1 as value from generate_series(0, 19) i; create or replace view values_view as select fkey1, fkey3, (derived1 / max(derived1) over (partition by fkey1)) as derived1, (derived2 / sum(derived1) over (partition by fkey1)) as derived2 from ( select fkey1, fkey3, cast(sum((case when (value 0.0) then 4 else 1 end)) as double precision) as derived1, sum((case when (value 0.0) then (value * 4) else (value + 1) end)) as derived2 from values group by fkey1, fkey3 ) as t1; -- This query requires a sequential scan on values, though all the data it needs could be found much more efficiently with an index scan. explain analyze select * from values_view where fkey1 = 1263; --- Can anyone suggest a way to rewrite this query, or maybe a workaround of some kind? Thanks, Chris
[PERFORM] have: seq scan - want: index scan
Hi guys, PG = 9.1.5 OS = winDOS 2008R8 I have a table that currently has 207 million rows. there is a timestamp field that contains data. more data gets copied from another database into this database. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. I tested different indexes - no same behavior. The query does this: SELECT audittrailclinical.pgid, audittrailclinical.timestamp, mmuser.logon, audittrailclinical.entityname, audittrailclinical.clinicalactivity, audittrailclinical.audittraileventcode, account.accountnumber, patient.dnsortpersonnumber FROM public.account, public.audittrailclinical, public.encounter, public.entity, public.mmuser, public.patient, public.patientaccount WHERE audittrailclinical.encountersid = encounter.encountersid and audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone and audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone AND encounter.practiceid = patient.practiceid AND encounter.patientid = patient.patientid AND encounter.staffid = patient.staffid AND entity.entitysid = audittrailclinical.entitysid AND mmuser.mmusersid = audittrailclinical.mmusersid AND patient.practiceid = patientaccount.practiceid AND patient.patientid = patientaccount.patientid AND patientaccount.accountsid = account.accountsid AND patientaccount.defaultaccount = 'Y' AND patient.dnsortpersonnumber = '347450' ; The query plan says: - Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62) Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without time zone)) which takes forever. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. the table definitions are (the createstamp field is empty - I know, bad data): CREATE TABLE audittrailclinical ( audittrailid text, audittraileventcode text, clinicalactivity text, eventsuccessful text, externalunique text, recordstamp timestamp without time zone, recorddescription text, encountersid integer, eventuserlogon text, computername text, applicationcode text, practiceid integer, mmusersid integer, entitysid integer, entityname text, timestamp timestamp without time zone, lastuser integer, createstamp timestamp without time zone, pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass) ) WITH ( OIDS=FALSE ); ALTER TABLE audittrailclinical OWNER TO intergy; GRANT ALL ON TABLE audittrailclinical TO intergy; GRANT SELECT ON TABLE audittrailclinical TO rb; -- Index: atc_en_time CREATE INDEX atc_en_time ON audittrailclinical USING btree (entitysid , timestamp ); -- Index: atc_id -- DROP INDEX atc_id; CREATE INDEX atc_id ON audittrailclinical USING btree (audittrailid COLLATE pg_catalog.default ); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
Hi guys, PG = 9.1.5 OS = winDOS 2008R8 I have a table that currently has 207 million rows. there is a timestamp field that contains data. more data gets copied from another database into this database. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. I tested different indexes - no same behavior. The query does this: SELECT audittrailclinical.pgid, audittrailclinical.timestamp, mmuser.logon, audittrailclinical.entityname, audittrailclinical.clinicalactivity, audittrailclinical.audittraileventcode, account.accountnumber, patient.dnsortpersonnumber FROM public.account, public.audittrailclinical, public.encounter, public.entity, public.mmuser, public.patient, public.patientaccount WHERE audittrailclinical.encountersid = encounter.encountersid and audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone and audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone AND encounter.practiceid = patient.practiceid AND encounter.patientid = patient.patientid AND encounter.staffid = patient.staffid AND entity.entitysid = audittrailclinical.entitysid AND mmuser.mmusersid = audittrailclinical.mmusersid AND patient.practiceid = patientaccount.practiceid AND patient.patientid = patientaccount.patientid AND patientaccount.accountsid = account.accountsid AND patientaccount.defaultaccount = 'Y' AND patient.dnsortpersonnumber = '347450' ; The query plan says: - Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62) Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without time zone)) which takes forever. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. the table definitions are (the createstamp field is empty - I know, bad data): CREATE TABLE audittrailclinical ( audittrailid text, audittraileventcode text, clinicalactivity text, eventsuccessful text, externalunique text, recordstamp timestamp without time zone, recorddescription text, encountersid integer, eventuserlogon text, computername text, applicationcode text, practiceid integer, mmusersid integer, entitysid integer, entityname text, timestamp timestamp without time zone, lastuser integer, createstamp timestamp without time zone, pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass) ) WITH ( OIDS=FALSE ); ALTER TABLE audittrailclinical OWNER TO intergy; GRANT ALL ON TABLE audittrailclinical TO intergy; GRANT SELECT ON TABLE audittrailclinical TO rb; -- Index: atc_en_time CREATE INDEX atc_en_time ON audittrailclinical USING btree (entitysid , timestamp ); -- Index: atc_id -- DROP INDEX atc_id; CREATE INDEX atc_id ON audittrailclinical USING btree (audittrailid COLLATE pg_catalog.default ); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote: Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. Do you have an index on patient.dnsortpersonnumber? Can you post a result from select count(*) from patient where dnsortpersonnumber = '347450'; ? Yes, there is an index: Aggregate (cost=6427.06..6427.07 rows=1 width=0) - Index Scan using patient_pracsortpatientnumber on patient (cost=0.00..6427.06 rows=1 width=0) Index Cond: (dnsortpersonnumber = '347450'::text) In fact, all the other criteria is picked using an index. I fear that the = and = on the timestamp is causing the issue. If I do a = of just one of them, I get an index scan. But I need to scan the entire range. I get queries like give me everything that was entered into the system for this patient between these two dates. A single date wouldn't work. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
Thanks Bruce, I have, and I even thought, I understood it :). I just ran an explain analyze on another table - and ever since the query plan changed. It's now using the index as expected. I guess, I have some more reading to do. On Oct 16, 2012, at 20:31 , Bruce Momjian br...@momjian.us wrote: Have you read our FAQ on this matter? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Daniel Farina-4 wrote On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer lt;ringerc@.idgt; wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. I haven't tried a multi-truncate though. Still, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance That's my experience - I have a set of regression tests that clean the database (deletes everything from a single parent table and lets the referential integrity checks cascade to delete five other tables) at the end of each test run, and it can complete 90 tests (including 90 mass deletes) in a little over five seconds. If I replace that simple delete with a truncation of all six tables at once, my test run balloons to 42 seconds. I run my development database with synchronous_commit = off, though, so I guess TRUNCATE has to hit the disk while the mass delete doesn't. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way-to-clean-each-non-empty-tab-tp5715643p5715734.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Select from sequence in slow query log
While investigating some performance issues I have been looking at slow queries logged to the postgresql.log file. A strange thing that I have seen is a series of apparently very slow queries that just select from a sequence. It is as if access to a sequence is blocked for many sessions and then released as I get log entries like this appearing: LOG: duration: 23702.553 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 23673.068 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 23632.729 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval (Many similar lines) LOG: duration: 3055.057 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 2377.621 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 743.732 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval The code is being executed via Hibernate, but using Session.createSQLQuery(), so the SQL above appears in the source as above (minus the comment) and not as part of any ORM magic. We are using Postgresql 9.0. This seems very strange to me. What could cause a sequence to be locked for such a long time? The sequence in question has cache set at 1. Would setting this higher make any difference? Thanks Chris
Re: [PERFORM] Select from sequence in slow query log
It looks like this effect only occurs in the middle of the night when there is some kind of automated dump process going on and the system is under higher than normal load. I haven't managed to reproduce them outside of production, but since these oddities don't seem to show up during normal operations, I'm not worrying too much about them now. Thanks Chris On 1 June 2012 14:47, Tom Lane t...@sss.pgh.pa.us wrote: Chris Rimmer chr...@we7.com writes: While investigating some performance issues I have been looking at slow queries logged to the postgresql.log file. A strange thing that I have seen is a series of apparently very slow queries that just select from a sequence. It is as if access to a sequence is blocked for many sessions and then released as I get log entries like this appearing: LOG: duration: 23702.553 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 23673.068 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval LOG: duration: 23632.729 ms execute unnamed: /* dynamic native SQL query */ select nextval ('my_sequence') as nextval (Many similar lines) That's pretty weird. What else is being done to that sequence? Is it only the sequence ops that are slow, or does this happen at times when everything else is slow too? Can you create a reproducible test case? regards, tom lane
[PERFORM] Hardware advice for scalable warehouse db
Hi list, My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of tables, 200GB of indexes, and grows ~5%/month. The DB is not update critical, but undergoes larger read and insert operations frequently. My employer is a university with little funds and we have to find a cheap way to scale for the next 3 years, so the SAN seems a good chance to us. We are now looking for the remaining server parts to maximize DB performance with costs = $4000. I digged out the following configuration with the discount we receive from Dell: 1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache 16 GB (4x4GB) Low Volt DDR3 1066Mhz PERC H700 SAS RAID controller 4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10 I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). Any comments on the configuration? Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? Thanks so much! Best, Chris [1]: http://www.b2net.co.uk/netapp/fas3000.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
Hi list, Thanks a lot for your very helpful feedback! I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten seriously good performance relative to the dollars spent Great hint, but I'm afraid that's too expensive for us. But it's a great way to scale over the years, I'll keep that in mind. I had a look at other server vendors who offer 4U servers with slots for 16 disks for 4k in total (w/o disks), maybe that's an even cheaper/better solution for us. If you had the choice between 16 x 2TB SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA disk) for data, what would you choose performance-wise? Again, thanks so much for your help. Best, Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Benchmarking a large server
I've got a fun problem. My employer just purchased some new db servers that are very large. The specs on them are: 4 Intel X7550 CPU's (32 physical cores, HT turned off) 1 TB Ram 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) 3TB Sas Array (48 15K 146GB spindles) The issue we are running into is how do we benchmark this server, specifically, how do we get valid benchmarks for the Fusion IO card? Normally to eliminate the cache effect, you run iozone and other benchmark suites at 2x the ram. However, we can't do that due to 2TB 1.3TB. So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? Thanks, Chris
[PERFORM] Multiple index builds on same table - in one sweep?
I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build). To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time. I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? Thanks, Chris. best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
I'm running 2 tests now, one, where I'm doing the traditional indexing, in sequence. The server isn't doing anything else, so I should get pretty accurate results. Test 2 will win all the create index sessions in separate sessions in parallel (echo create index ...|psql ... ) once the 'serial build' test is done. Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought? On Apr 9, 2011, at 13:10 , Tom Lane wrote: Chris Ruprecht ch...@ruprecht.org writes: I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build). To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time. I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? I don't know of any automated tool, but if you launch several CREATE INDEX operations on the same table at approximately the same time (in separate sessions), they should share the I/O required to read the table. (The synchronized scans feature guarantees this in recent PG releases, even if you're not very careful about starting them at the same time.) The downside of that is that you need N times the working memory and you will have N times the subsidiary I/O for sort temp files and writes to the finished indexes. Depending on the characteristics of your I/O system it's not hard to imagine this being a net loss ... but it'd be interesting to experiment. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] good old VACUUM FULL
On 23/03/11 11:52, felix wrote: I posted many weeks ago about a severe problem with a table that was obviously bloated and was stunningly slow. Up to 70 seconds just to get a row count on 300k rows. I removed the text column, so it really was just a few columns of fixed data. Still very bloated. Table size was 450M The advice I was given was to do CLUSTER, but this did not reduce the table size in the least. Nor performance. Also to resize my free space map (which still does need to be done). Since that involves tweaking the kernel settings, taking the site down and rebooting postgres and exposing the system to all kinds of risks and unknowns and expensive experimentations I was unable to do it and have had to hobble along with a slow table in my backend holding up jobs. Much swearing that nobody should ever do VACUUM FULL. Manual advises against it. Only crazy people do that. snip moral of the story: if your table is really bloated, just do VACUUM FULL You'll need to reindex that table now - vacuum full can bloat your indexes which will affect your other queries. reindex table fastadder_fastadderstatus; -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
robertmh...@gmail.com (Robert Haas) writes: On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. I have added the following comment to the ToDo: We are not interested to implement hints in ways they are commonly implemented on other databases, and proposals based on because they've got them will not be welcomed. If you have an idea that avoids the problems that have been observed with other hint systems, that could lead to valuable discussion. That seems to me to characterize the nuance. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/languages.html If only women came with pull-down menus and online help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
gnuo...@rcn.com writes: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. I know this is far from trivial, and my C skills are such that I can offer no help. These machines have been the obvious current machine in waiting for at least 5 years, and those applications which benefit from parallelism (servers of all kinds, in particular) will filter out the winners and losers based on exploiting this parallelism. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. I'm not sure how true that is, really. (e.g. - too much to gain.) I know that Jan Wieck and I have been bouncing thoughts on valid use of threading off each other for *years*, now, and it tends to be interesting but difficult to the point of impracticality. But how things play out are quite fundamentally different for different usage models. It's useful to cross items off the list, so we're left with the tough ones that are actually a problem. 1. For instance, OLTP applications, that generate a lot of concurrent connections, already do perfectly well in scaling on multi-core systems. Each connection is a separate process, and that already harnesses multi-core systems perfectly well. Things have improved a lot over the last 10 years, and there may yet be further improvements to be found, but it seems pretty reasonable to me to say that the OLTP scenario can be treated as solved in this context. The scenario where I can squint and see value in trying to multithread is the contrast to that, of OLAP. The case where we only use a single core, today, is where there's only a single connection, and a single query, running. But that can reasonably be further constrained; not every single-connection query could be improved by trying to spread work across cores. We need to add some further assumptions: 2. The query needs to NOT be I/O-bound. If it's I/O bound, then your system is waiting for the data to come off disk, rather than to do processing of that data. That condition can be somewhat further strengthened... It further needs to be a query where multi-processing would not increase the I/O burden. Between those two assumptions, that cuts the scope of usefulness to a very considerable degree. And if we *are* multiprocessing, we introduce several new problems, each of which is quite troublesome: - How do we decompose the query so that the pieces are processed in ways that improve processing time? In effect, how to generate a parallel query plan? It would be more than stupid to consider this to be obvious. We've got 15-ish years worth of query optimization efforts that have gone into Postgres, and many of those changes were not obvious until after they got thought through carefully. This multiplies the complexity, and opportunity for error. - Coordinating processing Becomes quite a bit more complex. Multiple threads/processes are accessing parts of the same data concurrently, so a parallelized query that harnesses 8 CPUs might generate 8x as many locks and analogous coordination points. - Platform specificity Threading is a problem in that each OS platform has its own implementation, and even when they claim to conform to common standards, they still have somewhat different interpretations. This tends to go in one of the following directions: a) You have to pick one platform to do threading on. Oops. There's now PostgreSQL-Linux, that is the only platform where our multiprocessing thing works. It could be worse than that; it might work on a particular version of a particular OS... b) You follow some apparently portable threading standard And find that things are hugely buggy because the platforms follow the standard a bit differently. And perhaps this means that, analogous to a), you've got a set of platforms where this works (for some value of works), and others where it can't. That's almost as evil as a). c) You follow some apparently portable threading standard And need to wrap things in a pretty thick safety blanket to make sure it is compatible with all the bugs in interpretation and implementation. Complexity++, and performance probably suffers. None of these are particularly palatable, which is why threading proposals get a lot of pushback. At the end of the day, if this is
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. It's worth looking back to what has already been elaborated on in the ToDo. http://wiki.postgresql.org/wiki/Todo --- Optimizer hints (not wanted) Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. --- The complaint is that kludging hints into a particular query attacks the problem from the wrong direction. The alternative recommended is to collect some declarative information, that *won't* be part of the query, that *won't* be processed by the parser, and that *won't* kludge up the query with information that is liable to turn into crud over time. Tom Lane was pretty specific about some kinds of declarative information that seemed useful: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php On Jeapordy, participants are expected to phrase one's answers in the form of a question, and doing so is rewarded. Based on the presence of query hints on the Not Wanted portion of the ToDo list, it's pretty clear that participants here are expected to propose optimizer hints in ways that do NOT involve decorating queries with crud. You'll get a vastly friendlier response if you at least make an attempt to attack the problem in the declarative information fashion. Perhaps we're all wrong in believing that pushing query optimization information into application queries by decorating the application with hints, is the right idea but it's a belief that certainly seems to be regularly agreed upon by gentle readers. -- cbbrowne,@,linuxdatabases.info The people's revolutionary committee has decided that the name e is retrogressive, unmulticious and reactionary, and has been flushed. Please update your abbrevs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. Well, the community declines to add hints until there is actual consensus on a good way to add hints. Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... -- http://linuxfinances.info/info/nonrdbms.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] the XID question
kevin.gritt...@wicourts.gov (Kevin Grittner) writes: Filip Rembia*kowskiplk.zu...@gmail.com wrote: 2011/1/19 Charles.Hou giveme...@gmail.com: select * from mybook SQL command also increase the XID ? Yes. Single SELECT is a transaction. Hence, it needs a transaction ID. No, not in recent versions of PostgreSQL. There's virtual transaction ID, too; which is all that's needed unless the transaction writes something. Also, as a fine point, if you use explicit database transactions (with BEGIN or START TRANSACTION) then you normally get one XID for the entire transaction, unless you use SAVEPOINTs. Erm, not *necessarily* in recent versions of PostgreSQL. A read-only transaction won't consume XIDs, but if you don't expressly declare it read-only, they're still liable to get eaten... -- (format nil ~S@~S cbbrowne gmail.com) http://www3.sympatico.ca/cbbrowne/lisp.html Parenthesize to avoid ambiguity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] COPY TO stdout statements occurrence in log files
msakre...@truviso.com (Maciek Sakrejda) writes: Is this normal? I'm afraid because my application doesn't run this kind of statement, so how can I know what is doing these commands? Maybe pg_dump? I think pg_dump is likely, yes, if you have that scheduled. I don't think anything in the log file will identify it as pg_dump explicitly (I believe as far as the server is concerned, pg_dump is just another client), but if you're concerned about this, you can add the client pid (%p) to log_line_prefix in postgresql.conf, log the pg_dump pid through whatever mechanism manages that, and compare. That's an option... More are possible... 1. Our DBAs have been known to create users specifically for doing backups (dumpy). It doesn't seem like a *huge* proliferation of users to have some 'utility' user names for common processes. 2. In 9.1, there will be a new answer, as there's a GUC to indicate the application_name. -- Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning. -- Rich Cook -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: On Tue, Nov 16, 2010 at 11:35:24AM -0500, Chris Browne wrote: vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? I'd definitely bias towards #1, but with a bit of a change... create table product ( id_product serial primary key ); create table price ( id_product integer references product, as_at timestamptz default now(), primary key (id_product, as_at), price integer ); Hi Chris, So an id_price serial on the price table is not necessary in your opinion? I am using order by id_price limit X or max(id_price) to get at the most recent prices. It (id_price) is an extra piece of information that doesn't reveal an important fact, namely when the price was added. I'm uncomfortable with adding data that doesn't provide much more information, and it troubles me when people put a lot of interpretation into the meanings of SERIAL columns. I'd like to set up some schemas (for experiment, if not necessarily to get deployed to production) where I'd use DCE UUID values rather than sequences, so that people wouldn't make the error of imagining meanings in the values that aren't really there. And I suppose that there lies a way to think about it... If you used UUIDs rather than SERIAL, how would your application break? And of the ways in which it would break, which of those are errors that fall from: a) Ignorant usage, assuming order that isn't really there? (e.g. - a SERIAL might capture some order information, but UUID won't!) b) Inadequate data capture, where you're using the implicit data collection from SERIAL to capture, poorly, information that should be expressly captured? When I added the timestamp to the price table, that's intended to address b), capturing the time that the price was added. The query to get the last 5 prices for a product should be splendidly efficient: select price, as_at from price where id_product = 17 order by as_at desc limit 5; (That'll use the PK index perfectly nicely.) If you needed higher performance, for latest price, then I'd add a secondary table, and use triggers to copy latest price into place: create table latest_prices ( id_product integer primary key references product, price integer ); I did the same thing with a 'price_dispatch' trigger and partitioned tables (inheritance). It's definitely needed when the price database grow into the millions. Thanks, The conversations are always interesting! Cheers! -- output = (cbbrowne @ gmail.com) http://www3.sympatico.ca/cbbrowne/x.html FLORIDA: If you think we can't vote, wait till you see us drive. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? I'd definitely bias towards #1, but with a bit of a change... create table product ( id_product serial primary key ); create table price ( id_product integer references product, as_at timestamptz default now(), primary key (id_product, as_at), price integer ); The query to get the last 5 prices for a product should be splendidly efficient: select price, as_at from price where id_product = 17 order by as_at desc limit 5; (That'll use the PK index perfectly nicely.) If you needed higher performance, for latest price, then I'd add a secondary table, and use triggers to copy latest price into place: create table latest_prices ( id_product integer primary key references product, price integer ); create or replace function capture_latest_price () returns trigger as $$ declare begin delete from latest_prices where id_product = NEW.id_product; insert into latest_prices (id_product,price) values (NEW.id_product, NEW.price); return NEW; end $$ language plpgsql; create trigger price_capture after insert on price execute procedure capture_latest_price(); This captures *just* the latest price for each product. (There's a bit of race condition - if there are two concurrent price updates, one will fail, which wouldn't happen without this trigger in place.) -- ... Turns out that JPG was in fact using his brain... and I am inclined to encourage him to continue the practice even if it isn't exactly what I would have done myself. -- Alan Bawden (way out of context) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
gentosa...@gmail.com (A B) writes: If you just wanted PostgreSQL to go as fast as possible WITHOUT any care for your data (you accept 100% dataloss and datacorruption if any error should occur), what settings should you use then? Use /dev/null. It is web scale, and there are good tutorials. But seriously, there *are* cases where blind speed is of use. When loading data into a fresh database is a good time for this; if things fall over, it may be pretty acceptable to start from scratch with mkfs/initdb. I'd: - turn off fsync - turn off synchronous commit - put as much as possible onto Ramdisk/tmpfs/similar as possible -- output = reverse(moc.liamg @ enworbbc) http://linuxfinances.info/info/lsf.html 43% of all statistics are worthless. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
sgend...@ideasculptor.com (Samuel Gendler) writes: Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch. It's news to me (in this thread!) that XFS is actually getting some developer love, which is a pretty crucial factor to considering it relevant. XFS was an SGI creation, and, with: a) the not-scintillating performance of the company, b) the lack of a lot of visible work going into the filesystem, c) the paucity of support by Linux vendors (for a long time, if you told RHAT you were having problems, and were using XFS, the next step would be to park the ticket awaiting your installing a supported filesystem) it didn't look like XFS was a terribly good bet. Those issues were certainly causing concern a couple of years ago. Faster raw performance isn't much good if it comes with a risk of: - Losing data - Losing support from vendors If XFS now *is* getting support from both the development and support perspectives, then the above concerns may have been invalidated. It would be very encouraging, if so. -- output = (cbbrowne @ gmail.com) Rules of the Evil Overlord #228. If the hero claims he wishes to confess in public or to me personally, I will remind him that a notarized deposition will serve just as well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
cr...@postnewspapers.com.au (Craig Ringer) writes: Hey, maybe I should try posting YouTube video answers to a few questions for kicks, see how people react ;-) And make sure it uses the same voice as is used in the MongoDB is web scale video, to ensure that people interpret it correctly :-). -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/nonrdbms.html The *Worst* Things to Say to a Police Officer: Hey, is that a 9 mm? That's nothing compared to this .44 magnum. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I have a logical problem with asynchronous commit. The commit command should instruct the database to make the outcome of the transaction permanent. The application should wait to see whether the commit was successful or not. Asynchronous behavior in the commit statement breaks the ACID rules and should not be used in a RDBMS system. If you don't need ACID, you may not need RDBMS at all. You may try with MongoDB. MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs The client always has the option of connecting to a set of databases, and stowing parts of the data hither and thither. That often leads to the relaxation called BASE. (And IBM has been selling that relaxation as MQ-Series since the early '90s!) There often *ARE* cases where it is acceptable for some of the data to not be as durable, because that data is readily reconstructed. This is particularly common for calculated/cached/aggregated data. Many things can get relaxed for a data warehouse data store, where the database is not authoritative, but rather aggregates data drawn from other authoritative sources. In such applications, neither the A, C, I, nor the D are pointedly crucial, in the DW data store. - We don't put the original foreign key constraints into the DW database; they don't need to be enforced a second time. Ditto for constraints of all sorts. - Batching of the loading of updates is likely to break several of the letters. And I find it *quite* acceptable to lose D if the data may be safely reloaded into the DW database. I don't think this is either cavalier nor that it points to MongoDB is web scale. -- cbbrowne,@,gmail.com Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] read only transactions
jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit... Transactions done READ ONLY do not generate actual XIDs, which reduces the amount of XID generation (pretty tautological!), which reduces the need to do VACUUM to protect against XID wraparound. http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-BASICS If you process 50 million transactions, that chews thru 50 million XIDs. If 45 million of those were processed via READ ONLY transactions, then the same processing only chews thru 5 million XIDs, meaning that the XID-relevant vacuums can be done rather less frequently. This only terribly much matters if: a) your database is so large that there are tables on which VACUUM would run for a very long time, and b) you are chewing through XIDs mighty quickly. If either condition isn't true, then the indirect effect isn't important either. -- let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;; I'm not switching from slrn. I'm quite confident that anything that *needs* to be posted in HTML is fatuous garbage not worth my time. -- David M. Cook davec...@home.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GPU Accelerated Sorting
david_l...@boreham.org (David Boreham) writes: Feels like I fell through a worm hole in space/time, back to inmos in 1987, and a guy from marketing has just walked in the office going on about there's a customer who wants to use our massively parallel hardware to speed up databases... ... As long as you're willing to rewrite PostgreSQL in Occam 2... -- http://projects.cs.kent.ac.uk/projects/tock/trac/ The statistics on sanity are that one out of every four Americans is suffering from some form of mental illness. Think of your three best friends. If they're okay, then it's you. -- Rita Mae Brown -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
g...@2ndquadrant.com (Greg Smith) writes: Yeb Havinga wrote: * What filesystem to use on the SSD? To minimize writes and maximize chance for seeing errors I'd choose ext2 here. I don't consider there to be any reason to deploy any part of a PostgreSQL database on ext2. The potential for downtime if the fsck doesn't happen automatically far outweighs the minimal performance advantage you'll actually see in real applications. Ah, but if the goal is to try to torture the SSD as cruelly as possible, these aren't necessarily downsides (important or otherwise). I don't think ext2 helps much in maximizing chances of seeing errors in notably useful ways, as the extra torture that takes place as part of the post-remount fsck isn't notably PG-relevant. (It's not obvious that errors encountered would be readily mapped to issues relating to PostgreSQL.) I think the WAL-oriented test would be *way* more useful; inducing work whose brokenness can be measured in one series of files in one directory should be way easier than trying to find changes across a whole PG cluster. I don't expect the filesystem choice to be terribly significant to that. -- cbbrowne,@,gmail.com Heuristics (from the French heure, hour) limit the amount of time spent executing something. [When using heuristics] it shouldn't take longer than an hour to do something. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
j...@commandprompt.com (Joshua D. Drake) writes: On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote: Greg Smith wrote: Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. Answer my own question here: the drive Yeb got was the brand spanking new OCZ Vertex 2 Pro, selling for $649 at Newegg for example: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with the supercacitor listed right in the main production specifications there. This is officially the first inexpensive (relatively) SSD with a battery-backed write cache built into it. If Yeb's test results prove it works as it's supposed to under PostgreSQL, I'll be happy to finally have a moderately priced SSD I can recommend to people for database use. And I fear I'll be out of excuses to avoid buying one as a toy for my home system. That is quite the toy. I can get 4 SATA-II with RAID Controller, with battery backed cache, for the same price or less :P Sure, but it: - Fits into a single slot - Is quiet - Consumes little power - Generates little heat - Is likely to be about as quick as the 4-drive array It doesn't have the extra 4TB of storage, but if you're building big-ish databases, metrics have to change anyways. This is a pretty slick answer for the small OLTP server. -- output = reverse(moc.liamg @ enworbbc) http://linuxfinances.info/info/postgresql.html Chaotic Evil means never having to say you're sorry. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] planner index choice
Hi there, I have a simple query where I don't understand the planner's choice to use a particular index. The main table looks like this: # \d sq_ast_attr_val Table public.sq_ast_attr_val Column| Type | Modifiers -+---+-- assetid | character varying(15) | not null attrid | integer | not null contextid | integer | not null default 0 custom_val | text | use_default | character(1) | not null default '1'::bpchar Indexes: ast_attr_val_pk PRIMARY KEY, btree (assetid, attrid, contextid) sq_ast_attr_val_assetid btree (assetid) sq_ast_attr_val_attrid btree (attrid) sq_ast_attr_val_concat btree (((assetid::text || '~'::text) || attrid)) sq_ast_attr_val_contextid btree (contextid) The query: SELECT assetid, custom_val FROM sq_ast_attr_val WHERE attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 'is_contextable' AND (type_code = 'metadata_field_select' OR owning_type_code = 'metadata_field')) AND contextid = 0 INTERSECT SELECT assetid, custom_val FROM sq_ast_attr_val WHERE assetid = '62321' AND contextid = 0; The explain analyze plan: http://explain.depesz.com/s/nWs I'm not sure why it's picking the sq_ast_attr_val_contextid index to do the contextid = 0 check, the other parts (attrid/assetid) are much more selective. If I drop that particular index: http://explain.depesz.com/s/zp All (I hope) relevant postgres info: Centos 5.5 x86_64 running pg8.4.4. Server has 8gig memory. # select name, setting, source from pg_settings where name in ('shared_buffers', 'effective_cache_size', 'work_mem'); name | setting --+ shared_buffers| 262144 effective_cache_size | 655360 work_mem | 32768 All planner options are enabled: # select name, setting, source from pg_settings where name like 'enable_%'; name| setting | source ---+-+- enable_bitmapscan | on | default enable_hashagg| on | default enable_hashjoin | on | default enable_indexscan | on | default enable_mergejoin | on | default enable_nestloop | on | default enable_seqscan| on | default enable_sort | on | default enable_tidscan| on | default Any insights welcome - thanks! -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
swamp...@noao.edu (Steve Wampler) writes: Or does losing WAL files mandate a new initdb? Losing WAL would mandate initdb, so I'd think this all fits into the set of stuff worth putting onto ramfs/tmpfs. Certainly it'll all be significant to the performance focus. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] stats collector suddenly causing lots of IO
I have a lot of centos servers which are running postgres. Postgres isn't used that heavily on any of them, but lately, the stats collector process keeps causing tons of IO load. It seems to happen only on servers with centos 5. The versions of postgres that are running are: 8.1.18 8.2.6 8.3.1 8.3.5 8.3.6 8.3.7 8.3.8 8.3.9 8.4.2 8.4.3 I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf except track_counts (since auto vacuum says it needs it), but it seems to have little affect on the IO caused by the stats collector. Has anyone else noticed this? Have there been recent kernel changes that could cause this that anyone knows about? Since we haven't touched postgres on these boxes since they were setup initially, I'm a bit baffled as to what might be causing the problem, and why I can't make it go away short of kill -STOP. Any suggestions would be much appreciated! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] stats collector suddenly causing lots of IO
I have a lot of centos servers which are running postgres. Postgres isn't used that heavily on any of them, but lately, the stats collector process keeps causing tons of IO load. It seems to happen only on servers with centos 5. The versions of postgres that are running are: 8.1.18 8.2.6 8.3.1 8.3.5 8.3.6 8.3.7 8.3.8 8.3.9 8.4.2 8.4.3 I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf except track_counts (since auto vacuum says it needs it), but it seems to have little affect on the IO caused by the stats collector. Has anyone else noticed this? Have there been recent kernel changes that could cause this that anyone knows about? Since we haven't touched postgres on these boxes since they were setup initially, I'm a bit baffled as to what might be causing the problem, and why I can't make it go away short of kill -STOP. Any suggestions would be much appreciated! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] significant slow down with various LIMIT
I'm also wondering if a re-clustering of the table would work based on the index that's used. such that: CLUSTER core_object USING plugins_plugin_addr_oid_id; and see if that makes any change in the differences that your seeing. On 04/13/2010 02:24 PM, Kevin Grittner wrote: norn andrey.perl...@gmail.com wrote: I am wondering why there are so big gap between two limits and how to avoid this... I think we've already established that it is because of the percentage of the table which must be scanned to get to the desired number of rows. The problem is exacerbated by the fact that it's a backward scan on the index, which is slower than a forward scan -- mainly because disks spin in one direction, and the spacing of the sectors is optimized for forward scans. There are a couple things to try which will give a more complete picture of what might work to make the run time more predictable. Please try these, and run EXPLAIN ANALYZE of your problem query each way. (1) Try it without the ORDER BY clause and the LIMIT. (2) Temporarily take that top index out of consideration. (Don't worry, it'll come back when you issue the ROLLBACK -- just don't forget the BEGIN statement.) BEGIN; DROP INDEX plugins_plugin_addr_oid_id; explain analyze your query ROLLBACK; (3) Try it like this (untested, so you may need to fix it up): explain analyze SELECT core_object.id from (SELECT id, city_id FROM plugins_guide_address) plugins_guide_address JOIN plugins_plugin_addr ON (plugins_plugin_addr.address_id = plugins_guide_address.id) JOIN core_object ON (core_object.id = plugins_plugin_addr.oid_id) WHERE plugins_guide_address.city_id = 4535 ORDER BY core_object.id DESC LIMIT 4 -- or whatever it normally takes to cause the problem ; -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
reeds...@rice.edu (Ross J. Reedstrom) writes: http://www.mythtv.org/wiki/PostgreSQL_Support That's a pretty hostile presentation... The page has had two states: a) In 2008, someone wrote up... After some bad experiences with MySQL (data loss by commercial power failure, very bad performance deleting old records and more) I would prefer to have a MythTV Application option to use PostgreSQL. I never saw such bad database behaviour at any other RDBMS than MySQL. I'm ready to contribute at any activity going that direction (I'm developer for commercial database applications). b) Deleted by GBee in 2009, indicating (Outdated, messy and unsupported) -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxfinances.info/info/spreadsheets.html A language that doesn't affect the way you think about programming, is not worth knowing. -- Alan J. Perlis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
t...@sss.pgh.pa.us (Tom Lane) writes: Ross J. Reedstrom reeds...@rice.edu writes: On Sat, Mar 20, 2010 at 10:47:30PM -0500, Andy Colson wrote: (I added the and trust as an after thought, because I do have one very important 100% uptime required mysql database that is running. Its my MythTV box at home, and I have to ask permission from my GF before I take the box down to upgrade anything. And heaven forbid if it crashes or anything. So I do have experience with care and feeding of mysql. And no, I'm not kidding.) Andy, you are so me! I have the exact same one-and-only-one mission critical mysql DB, but the gatekeeper is my wife. And experience with that instance has made me love and trust PostgreSQL even more. So has anyone looked at porting MythTV to PG? It has come up several times on the MythTV list. http://david.hardeman.nu/files/patches/mythtv/mythletter.txt http://www.mythtv.org/pipermail/mythtv-dev/2004-August/025385.html http://www.mythtv.org/pipermail/mythtv-users/2006-July/141191.html Probably worth asking David Härdeman and Danny Brow who have proposed such to the MythTV community what happened. (It's possible that they will get cc'ed on this.) If there's a meaningful way to help, that would be cool. If not, then we might as well not run slipshot across the same landmines that blew the idea up before. -- Transported to a surreal landscape, a young girl kills the first woman she meets and then teams up with three complete strangers to kill again. -- Unknown, Marin County newspaper's TV listing for _The Wizard of Oz_ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is DBLINK transactional
cr...@postnewspapers.com.au (Craig Ringer) writes: On 13/03/2010 5:54 AM, Jeff Davis wrote: On Fri, 2010-03-12 at 12:07 -0500, Merlin Moncure wrote: of course. You can always explicitly open a transaction on the remote side over dblink, do work, and commit it at the last possible moment. Your transactions aren't perfectly synchronized...if you crash in the precise moment between committing the remote and the local you can get in trouble. The chances of this are extremely remote though. If you want a better guarantee than that, consider using 2PC. Translation in case you don't know: 2PC = two phase commit. Note that you have to monitor lost transactions that were prepared for commit then abandoned by the controlling app and periodically get rid of them or you'll start having issues. There can be issues even if they're not abandoned... Note that prepared transactions establish, and maintain, until removed, all the appropriate locks on the underlying tables and tuples. As a consequence, maintenance-related activities may be somewhat surprisingly affected. foo=# begin; set transaction isolation level serializable; BEGIN SET foo=# insert into my_table (date_time, hostname, duration, diag) values (now(), 'foo', 1, 2); INSERT 0 1 foo=# prepare transaction 'foo'; PREPARE TRANSACTION [then, I quit the psql session...] foo=# select * from pg_locks where relation = (select oid from pg_class where relname = 'my_table'); -[ RECORD 1 ]--+- locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | -1/433653 pid| mode | RowExclusiveLock granted| t If I try to truncate the table... foo=# truncate my_table; [hangs, waiting on the lock...] [looking at another session...] foo=# select * from pg_locks where relation = (select oid from pg_class where relname = 'my_table'); -[ RECORD 1 ]--+ locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | -1/433653 pid| mode | RowExclusiveLock granted| t -[ RECORD 2 ]--+ locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | 2/13 pid| 3749 mode | AccessExclusiveLock granted| f Immediately upon submitting commit prepared 'foo';, both locks are resolved quite quickly. The problem with things that are extremely remote possibilities are that they tend to be less remote than we expect ;) ... and they know just when they can happen despite all the odds to maximise the pain and chaos caused. A lot of these kinds of things only come up as race conditions. The trouble is that a lot of races do wind up synchronizing themselves. In sporting events, this is intended and desired; an official fires the starter pistol or activates the horn, or what have you, with the intended result that athletes begin very nearly simultaneously. And at the end of Olympic races, their times frequently differ only by miniscule intervals. In my example up above, there's a possibly unexpected synchronization point; the interweaving of the PREPARE TRANSACTION and TRUNCATE requests lead to a complete lock against the table. Supposing 15 processes then try accessing that table, they'll be blocked until the existing locks get closed out. Which takes place the very instant after the COMMIT PREPARED request comes in. At that moment, 15 racers are released very nearly simultaneously. If there is any further mischief to be had in the race, well, they're set up to tickle it... -- let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/nonrdbms.html Barf, what is all this prissy pedantry? Groups, modules, rings, ufds, patent-office algebra. Barf! -- R. William Gosper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] full text search index scan query plan changed in 8.4.2?
Josh Berkus wrote: Xufei, List changed to psql-performance, which is where this discussion belongs. I am testing the index used by full text search recently. I have install 8.3.9 and 8.4.2 separately. In 8.3.9, the query plan is like: postgres=# explain SELECT s.name as source , t.name as target FROM element as s, element as t WHERE to_tsvector('testcfg',s.name) @@ to_tsquery('testcfg',replace(t.name,':','|')); QUERY PLAN Nested Loop (cost=0.01..259.92 rows=491 width=18) - Seq Scan on element t (cost=0.00..13.01 rows=701 width=9) - Index Scan using element_ftsidx_test on element s (cost=0.01..0.33 rows=1 width=9) Index Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text))) (4 rows) I have index: element_ftsidx_test gin (to_tsvector('testcfg'::regconfig, name::text)) The same index and query in 8.4.2: postgres=# explain SELECT s.name as source , t.name as target FROM element as s, element as t WHERE to_tsvector('testcfg',s.name) @@ to_tsquery('testcfg',replace(t.name,':','|')) ;QUERY PLAN -- Nested Loop (cost=0.32..3123.51 rows=2457 width=18) - Seq Scan on element t (cost=0.00..13.01 rows=701 width=9) - Bitmap Heap Scan on element s (cost=0.32..4.36 rows=4 width=9) Recheck Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text))) - Bitmap Index Scan on element_ftsidx_test (cost=0.00..0.32 rows=4 width=0) Index Cond: (to_tsvector('testcfg'::regconfig, (s.name)::text) @@ to_tsquery('testcfg'::regconfig, replace((t.name)::text, ':'::text, '|'::text))) (6 rows) Why the query plans are different and why? Thanks! Because the row estimates changed, since 8.4 improved row estimation for TSearch. The 2nd query is probably actually faster, no? If not, you may need to increase your stats collection. Or at least show us a VACUUM ANALYZE. I'm sure you mean explain analyze :) -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to troubleshoot high mem usage by postgres?
Hi, I'm having an issue where a postgres process is taking too much memory when performing many consecutive inserts and updates from a PHP script (running on the command line). I would like to know what sort of logging I can turn on to help me determine what is causing memory to be consumed and not released. Most PHP scripts are not long-running and properly releasing the resources using the provided functions in the pgsql PHP extension is not necessary. However since I do have a long-running script, I have taken steps to ensure everything is being properly released when it is no longer needed (I am calling the functions provided, but I don't know if the pgsql extension is doing the right thing). In spite of this, the longer the script runs and processes records, the more memory increases. It increases to the point that system memory is exhausted and it starts swapping. I killed the process at this point. I monitored the memory with top. here are the results.. the first is 10 seconds after my script started running. The second is about 26 seconds. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ DATA COMMAND 17461 postgres 16 0 572m 405m 14m S 20.0 10.7 0:10.65 422m postmaster 17460 root 15 0 136m 14m 4632 S 10.6 0.4 0:06.16 10m php 17462 postgres 15 0 193m 46m 3936 D 3.3 1.2 0:01.77 43m postmaster PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ DATA COMMAND 17461 postgres 16 0 1196m 980m 17m S 19.0 26.0 0:25.72 1.0g postmaster 17460 root 15 0 136m 14m 4632 R 10.3 0.4 0:14.31 10m php 17462 postgres 16 0 255m 107m 3984 R 3.0 2.9 0:04.19 105m postmaster If I am indeed doing everything I can to release the resources (and I'm 95% sure I am) then it looks like the pgsql extension is at fault here. Regardless of who/what is at fault, I need to fix it. And to do that I need to find out what isn't getting released properly. How would I go about that? Thanks, Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to troubleshoot high mem usage by postgres?
On Sat, Feb 27, 2010 at 3:38 PM, Ben Chobot be...@silentmedia.com wrote: In your postgresql.conf file, what are the settings for work_mem and shared_buffers? I have not done any tuning on this db yet (it is a dev box). It is using defaults. shared_buffers = 32MB #work_mem = 1MB I do appreciate the several quick responses and I will work on responding to the them. @Craig Ringer: select version() reports: PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit The system has 4GB of RAM. The postgres log currently does not show any useful information. Only thing in there for today is an Unexpected EOF on client connection because I killed the process after it started swapping. The test input for my PHP script is a csv file with about 450,000 records in it. The php script processes the each csv record in a transaction, and on average it executes 2 insert or update statements per record. I don't think the specific statements executed are relevant (they are just basic INSERT and UPDATE statements). I will try to come up with a short script that reproduces the problem. @Tom Lane: As I mentioned above I am not doing everything in a single transaction. However I do want to try your suggestion regarding getting a memory context map. But I'm afraid I don't know how to do what you are describing. How can I set the ulimit of postmaster? And does the postmaster stderr output go to the postgres log file? If not, where can I find it? Thanks again, Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to troubleshoot high mem usage by postgres?
On Sat, Feb 27, 2010 at 3:38 PM, Craig Ringer cr...@postnewspapers.com.au wrote: Can you boil this down to a simple PHP test-case that connects to a dummy database and repeats something that causes the backend to grow in memory usage? Trying to do this - by progressively cutting things out of your test until it stops growing - will help you track down what, exactly, is causing the growth. Thank you for your suggestion. I have done this, and in doing so I have also discovered why this problem is occurring. My application uses a class that abstracts away the db interaction, so I do not normally use the pg_* functions directly. Any time any statement was executed, it created a new named prepared statement. I wrongly assumed that calling pg_free_result() on the statement resource would free this prepared statement inside of postgres. I will simply modify the class to use an empty statement name if there is no need for it to be named (which I actually need very infrequently anyway). I have attached the script I created to test with, for those who are interested. The first line of the script has the connection string. I used a db called testdb. run from the command line with: php -f test3.php Note my comment in the php file UNCOMMENT THIS LINE AND MEMORY ISSUE IS FIXED Thanks for the help everyone. Chris attachment: test3.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database size growing continously
Hi Peter, Sounds like you're experiencing index bloat and vacuums do nothing to help that. You can do one of 2 thing to remedy this: 1) The fastest and simplest (but most disruptive) way is to use REINDEX. But this will exclusively lock the table while rebuilding the indexes: REINDEX TABLE phaseangle; 2) The slower but less disruptive way is to do a concurrent build of each index and then drop the old ones. For example, to rebuild the i index: CREATE INDEX CONCURRENTLY i_new ON phaseangle (indexed columns); DROP INDEX i; ALTER INDEX i_new RENAME TO i; ANALYZE phaseangle (indexed columns); Do this regularly to keep the index sizes in check. - Chris Peter Meszaros wrote: Hi All, I use postgresql 8.3.7 as a huge queue. There is a very simple table with six columns and two indices, and about 6 million records are written into it in every day continously commited every 10 seconds from 8 clients. The table stores approximately 120 million records, because a cron job daily deletes those ones are older than 20 day. Autovacuum is on and every settings is the factory default except some unrelated ones (listen address, authorization). But my database is growing, characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, or even 0!!!). I've also tried a test on another server running the same postgresql, where 300 million record was loaded into a freshly created database, and 25 million was deleted with single DELETE command. The 'vacuum verbose phaseangle;' command seems to be running forever for hours: phasor=# vacuum VERBOSE phaseangle; INFO: vacuuming public.phaseangle INFO: scanned index i to remove 2796006 row versions DETAIL: CPU 9.49s/120.30u sec elapsed 224.20 sec. INFO: scanned index t to remove 2796006 row versions DETAIL: CPU 13.57s/105.70u sec elapsed 192.71 sec. INFO: phaseangle: removed 2796006 row versions in 24748 pages DETAIL: CPU 0.65s/0.30u sec elapsed 39.97 sec. INFO: scanned index i to remove 2795924 row versions DETAIL: CPU 9.58s/121.63u sec elapsed 239.06 sec. INFO: scanned index t to remove 2795924 row versions DETAIL: CPU 13.10s/103.59u sec elapsed 190.84 sec. INFO: phaseangle: removed 2795924 row versions in 24743 pages DETAIL: CPU 0.68s/0.28u sec elapsed 40.21 sec. INFO: scanned index i to remove 2796014 row versions DETAIL: CPU 9.65s/117.28u sec elapsed 231.92 sec. INFO: scanned index t to remove 2796014 row versions DETAIL: CPU 13.48s/103.59u sec elapsed 194.49 sec. INFO: phaseangle: removed 2796014 row versions in 24774 pages DETAIL: CPU 0.69s/0.28u sec elapsed 40.26 sec. INFO: scanned index i to remove 2795935 row versions DETAIL: CPU 9.55s/119.02u sec elapsed 226.85 sec. INFO: scanned index t to remove 2795935 row versions DETAIL: CPU 13.09s/102.84u sec elapsed 194.74 sec. INFO: phaseangle: removed 2795935 row versions in 25097 pages DETAIL: CPU 0.67s/0.28u sec elapsed 41.21 sec. still running... These are the very same problems? Should I delete mor frequently in smaller chunks? It seems to have a limit... Thanks Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases vs Schemas
On Fri, Oct 9, 2009 at 11:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: I've got 200,000 tables in one db (8.4), and some tools barely work. The system catalogs get inefficient when large and psql especially has trouble. Tab completion takes forever, even if I make a schema s with one table in it and type s. and try and tab complete -- its as if its scanning all without a schema qualifier or using an index. The tab-completion queries have never been vetted for performance particularly :-( Just out of curiosity, how much does this help? alter function pg_table_is_visible(oid) cost 10; (You'll need to do it as superuser --- if it makes things worse, just set the cost back to 1.) Sometimes it does not match valid tables at all, and sometimes regex matching fails too ('\dt schema.*_*_*' intermittently flakes out if it returns a lot of matches). There are some arbitrary LIMIT 1000 clauses in those queries, which probably explains this ... but taking them out would likely cause libreadline to get indigestion ... regards, tom lane We ran into this exact situation with a pg 8.3 database and a very large number of tables. psql would wait for 20 to 30 seconds if the user was unlucky enough to hit the tab key. After doing some research with query logging, explain analyze and some trial and error, we came to the same conclusion. Altering the cost for the pg_table_is_visible function to 10 fixed our performance problem immediately. It appears that when the cost was set to 1, that the query optimizer first ran the function over the entire pg_class table. By increasing the cost, it now only runs the function over the rows returned by the other items in the where clause. -chris
Re: [PERFORM] improving my query plan
Kevin Kempter wrote: Hi all; I have a simple query against two very large tables ( 800million rows in theurl_hits_category_jt table and 9.2 million in the url_hits_klk1 table ) I have indexes on the join columns and I've run an explain. also I've set the default statistics to 250 for both join columns. I get a very high overall query cost: If you had an extra where condition it might be different, but you're just returning results from both tables that match up so doing a sequential scan is going to be the fastest way anyway. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
FW: [PERFORM] Performance 8.4.0
The database is 8gb currently. Use to be a lot bigger but we removed all large objects out and developed a file server storage for it, and using default page costs for 8.4, I did have it changed in 8.1.4 -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Sunday, 2 August 2009 11:26 PM To: Chris Dunn Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance 8.4.0 On Fri, Jul 31, 2009 at 12:22 AM, Chris Dunnchris.d...@bigredsky.com wrote: constraint_exclusion = on This is critical if you need it, but a waste of CPU time if you don't. Other than that your paramaters look good. Are you using the default page cost settings? I see you have 12 GB RAM; how big is your database? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Greenplum MapReduce
Suvankar Roy wrote: Hi all, Has anybody worked on Greenplum MapReduce programming ? It's a commercial product, you need to contact greenplum. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] load / stress testing
Hi, Everyone says load test using your app - out of interest how does everyone do that at the database level? I've tried playr (https://area51.myyearbook.com/trac.cgi/wiki/Playr) but haven't been able to get it working properly. I'm not sure what other tools are available. TIA. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance 8.4.0
Hi, I would like to know if my configuration is ok, We run a web application with high transaction rate and the database machine on Mondays / Tuesdays is always at 100% CPU with no IO/Wait . the machine is a Dual Xeon Quad core, 12gb RAM, 4gb/s Fibre Channel on Netapp SAN, with pg_xlog on separate Lun, Could you please provide some feedback on the configuration maintenance_work_mem = 704MB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 8GB work_mem = 72MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 2816MB max_connections = 32 I have limited connections down to 32 as if I put up higher the machine load average goes through the roof and will decrease performance even more. In the process of looking at a 4 x AMD 6 core Opteron machine with 32GB Ram to replace if I cannot get any more performance out of this machine Kind Regards Christopher Dunn
Re: [PERFORM] Will Postgres ever lock with read only queries?
Robert James wrote: Thanks for the replies. I'm running Postgres 8.2 on Windows XP, Intel Core Duo (though Postgres seems to use only one 1 core). A single query can only use one core, but it will use both if multiple queries come in. The queries are self joins on very large tables, with lots of nested loops. If you want help optimizing them, you'll need to send through - explain analyze - table definitions and of course - the query itself -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] More speed counting rows
Developer wrote: Hello, I am trying to optimize the count of files when I am using filters (select by some row/s parameter/s) In this case I think that postgresql really count all files. Resulting in unacceptable times of 4 seconds in http server response. Triggers+store in this case do not see very acceptable, because I need store 1.5 millions of counting possibilities. My question is: Any method for indirect count like ordered indexes + quadratic count? Any module? Any suggestion? I had a similar problem where HTTP requests triggered a count(*) over a table that was growing rapidly. The bigger the table got, the longer the count took. In my case, however, the counts only have to be a reasonable estimate of the current state, so I solved this problem with a count_sums table that gets updated every 30 minutes using a simple perl script in a cron job. The HTTP requests now trigger a very fast select from a tiny, 9 row, 2 column table. How up to date do the counts need to be? If the count takes 4 seconds, can you run it every minute and store the counts in a table for retrieval by the HTTP requests? Or does it absolutely have to be the exact count at the moment of the request? If it needs to be more real-time, you could expand on this by adding post insert/delete triggers that automatically update the counts table to keep it current. In my case it just wasn't necessary. - Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Will Postgres ever lock with read only queries?
Robert James wrote: Hi. I'm seeing some weird behavior in Postgres. I'm running read only queries (SELECT that is - no UPDATE or DELETE or INSERT is happening at all). I can run one rather complicated query and the results come back... eventually. Likewise with another. But, when I run both queries at the same time, Postgres seems to ground to a halt. Neither one completes. In fact, pgAdmin locks up - I need to cancel them using psql. I'd expect this from MySQL but not Postgres. Am I doing something wrong? Or missing something? They're probably not blocking each other but more likely you're exhausting your servers resources. If they return eventually individually, then running both at the same time will take at least eventually x2. As Mark said, what are the queries? What postgres version? What o/s? What are your hardware specs (how much memory, disk speeds/types etc)? -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Master/Slave, DB separation or just spend $$$?
kelv...@gmail.com (Kelvin Quee) writes: I will go look at Slony now. It's worth looking at, but it is not always to be assumed that replication will necessarily improve scalability of applications; it's not a magic wand to wave such that presto, it's all faster! Replication is helpful from a performance standpoint if there is a lot of query load where it is permissible to look at *somewhat* out of date information. For instance, replication can be quite helpful for pushing load off for processing accounting data where you tend to be doing analysis on data from {yesterday, last week, last month, last year}, and where the data tends to be inherently temporal (e.g. - you're looking at transactions with dates on them). On the other hand, any process that anticipates *writing* to the master database will be more or less risky to try to shift over to a possibly-somewhat-behind 'slave' system, as will be anything that needs to be consistent with the master state. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/spiritual.html Nondeterminism means never having to say you're wrong. -- Unknown -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Can Postgres use an INDEX over an OR?
Віталій Тимчишин wrote: 2009/7/20 Robert James srobertja...@gmail.com mailto:srobertja...@gmail.com Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why is this so? It's not clever enough. Of course it is. I'm running 8.3.7. create table t1(id int primary key); insert into t1(id) select a from generate_series(1, 50) as s(a); analyze t1; explain analyze select * from t1 where id=5000 or id=25937; QUERY PLAN -- Bitmap Heap Scan on t1 (cost=8.60..16.44 rows=2 width=4) (actual time=0.077..0.083 rows=2 loops=1) Recheck Cond: ((id = 5000) OR (id = 25937)) - BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual time=0.063..0.063 rows=0 loops=1) - Bitmap Index Scan on t1_pkey (cost=0.00..4.30 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1) Index Cond: (id = 5000) - Bitmap Index Scan on t1_pkey (cost=0.00..4.30 rows=1 width=0) (actual time=0.021..0.021 rows=1 loops=1) Index Cond: (id = 25937) Total runtime: 0.153 ms (8 rows) What Robert didn't post was his query, see http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php which makes it a lot harder to 'optimize' since they aren't straight forward conditions. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How would you store read/unread topic status?
Mathieu Nebra wrote: Alexander Staubo a écrit : On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote: This flags table has more or less the following fields: UserID - TopicID - LastReadAnswerID We are doing pretty much same thing. My problem is that everytime a user READS a topic, it UPDATES this flags table to remember he has read it. This leads to multiple updates at the same time on the same table, and an update can take a few seconds. This is not acceptable for my users. First of all, and I'm sure you thought of this, an update isn't needed every time a user reads a topic; only when there are new answers that need to be marked as read. So an update ... where last_read_answer_id ? should avoid the need for an update. We don't work that way. We just remember he has read these answers and then we can tell him there are no new messages for you to read. So we just need to write what he has read when he reads it. (That said, I believe PostgreSQL diffs tuple updates, so in practice PostgreSQL might not be writing anything if you run an update with the same value. I will let someone more intimate with the internal details of updates to comment on this.) Secondly, an update should not take a few seconds. You might want to investigate this part before you turn to further optimizations. Yes, I know there is a problem but I don't know if I am competent enough to tune PostgreSQL for that. It can take a while to understand the problem, and I'm not sure I'll have the time for that. I am, however, opened to suggestions. Maybe I'm doing something wrong somewhere. In our application we defer the updates to a separate asynchronous process using a simple queue mechanism, but in our case, we found that the updates are fast enough (in the order of a few milliseconds) not to warrant batching them into single transactions. A few milliseconds would be cool. In fact, defering to another process is a good idea, but I'm not sure if it is easy to implement. It would be great to have some sort of UPDATE ... LOW PRIORITY to make the request non blocking. Thanks. I use pg_send_query() http://ca2.php.net/manual/en/function.pg-send-query.php in php to achieve this for a views counter. Script execution is not blocked while the queries are executing. It looks like this may just be a direct translation of PQsendQuery() from libpq. Your preferred language may have a function like this.
[PERFORM] tsvector_update_trigger performance?
Is tsvector_update_trigger() smart enough to not bother updating a tsvector if the text in that column has not changed? If not, can I make my own update trigger with something like if new.description != old.description return tsvector_update_trigger('fti_all', 'pg_catalog.english', 'title', 'keywords', 'description'); else return new; or do I need to do it from scratch? I'm seeing very high cpu load on my database server and my current theory is that some of the triggers may be causing it.
Re: [PERFORM] tsvector_update_trigger performance?
Dimitri Fontaine wrote: Hi, Le 24 juin 09 à 18:29, Alvaro Herrera a écrit : Oleg Bartunov wrote: On Wed, 24 Jun 2009, Chris St Denis wrote: Is tsvector_update_trigger() smart enough to not bother updating a tsvector if the text in that column has not changed? no, you should do check yourself. There are several examples in mailing lists. Or you could try using the supress_redundant_updates_trigger() function that has been included in 8.4 (should be easy to backport) http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/backports/min_update/ http://blog.tapoueh.org/projects.html#sec9 But it won't handle the case where some other random column has changed, but the UPDATE is not affecting the text indexed... Tho this looks useful for some things, it doesn't solve my specific problem any. But thanks for the suggestion anyway. This sounds like something that should just be on by default, not a trigger. Is there some reason it would waste the io of writing a new row to disk if nothing has changed? or is it just considered too much unnecessary overhead to compare them? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] superlative missuse
cl...@uah.es (Angel Alvarez) writes: more optimal plan... morreoptimal configuration... we suffer a 'more optimal' superlative missuse there is not so 'more optimal' thing but a simple 'better' thing. im not native english speaker but i think it still applies. If I wanted to be pedantic about it, I'd say that the word nearly is missing. That is, it would be strictly correct if one instead said more nearly optimal. I don't imagine people get too terribly confused by the lack of the word nearly, so I nearly don't care :-). -- select 'cbbrowne' || '@' || 'acm.org'; http://linuxfinances.info/info/languages.html Bureaucracies interpret communication as damage and route around it -- Jamie Zawinski -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Craig, yes, you detailed very well the problem! :-) all those CHAR columns are so just due historical issues :-) as well they may contains anything else and not only numbers, that's why.. Also, all data inside are fixed, so VARCHAR will not save place, or what kind of performance issue may we expect with CHAR vs VARCHAR if all data have a fixed length?.. None in postgres, but the char/varchar thing may or may not bite you at some point later - sounds like you have it covered though. It's 2 times faster on InnoDB, and as it's just a SELECT query no need to go in transaction details :-) Total runtime: 1.442 ms (10 rows) You posted a query that's taking 2/1000's of a second. I don't really see a performance problem here :) -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I have a fusion IO drive available for testing
craig_ja...@emolecules.com (Craig James) writes: Dave Cramer wrote: So I tried writing directly to the device, gets around 250MB/s, reads at around 500MB/s The client is using redhat so xfs is not an option. I'm using Red Hat and XFS, and have been for years. Why is XFS not an option with Red Hat? If you report practically any kind of problem, and you're using XFS, or JFS, or such, their support offering is to tell you to use a supported filesystem. -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/linuxxian.html The only thing better than TV with the sound off is Radio with the sound off. -- Dave Moon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes: why is it not a good idea to give end users control over when they want to run it ? It's not a particularly good idea to give end users things that they are likely then to *immediately* use to shoot themselves in the foot. Turning off vacuuming all day is the sort of thing that is indeed pretty certain to hurt you when you imagined it was going to help you. In particular, if you shut off autovac all day, heavily updated tables with certain sorts of (pretty common!) update patterns are certain to bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on them. In effect, the practical effect of autovacuum at lean hours only is more reasonably described as cancel autovacuum and revert to the elder policy of requiring users to do manual vacuuming. It's worth looking at how autovacuum has been evolving over time... - When it was introduced, 8.0-ish (maybe it was 8.1 when it became official), it was pretty primitive. Autovac was a single process, where you had three controls over behaviour: - You could run autovac, or not; - You could exclude specific tables from being processed by autovac - There is a capability to vacuum less aggressively by using delays to reduce autovac I/O usage - In 8.3, it was enhanced to add the notion of having multiple vacuum workers There was discussion about having one of those workers restrict itself to small tables, so that you'd never have the scenario where the workers were all busy and a small table that needed vacuuming was left unvacuumed for a long time. It appears that didn't happen, which seems unfortunate, but that's life... You should look at all the knobs that *are* offered before deciding a policy that may be harmful to performance. As things stand now, there are a couple of ways I could see tuning this: - You might check on the GUC variables autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, which would allow you to restrict the I/O cost. This might allow you to run autovacuum all the time without adversely affecting performance. - You might come up with a list of the *LARGE* tables that you don't want vacuumed during the day, and set up a cron job that adds/drops them from the pg_autovacuum table at the appropriate times. This is definitely going to be more finicky, and requires a great deal more awareness of the tables being updated by your applications. It makes autovacuum a whole lot less automatic. There are known enhancements coming up: - In 8.4, there is a capability for VACUUM to only process the portions of the table known to have been altered. That ought to be a better answer than *any* of the fiddling suggested, to date. Right now, a VACUUM on public.my_huge_table, a table 18GB in size, will walk through the entire table, even though there were only a handful of pages where tuples were invalidated. This is almost certainly the single best improvement possible to resolve your issue; it seems likely to *directly* address the problem, and has the considerable merit of not requiring much if any configuration/reconfiguration/scheduling. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://linuxdatabases.info/info/ what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp. -- #Erik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
phoenix.ki...@gmail.com (Phoenix Kiula) writes: [Ppsted similar note to PG General but I suppose it's more appropriate in this list. Apologies for cross-posting.] Hi. Further to my bafflement with the count(*) queries as described in this thread: http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php It seems that whenever this question has come up, Postgresql comes up very short in terms of count(*) functions. The performance is always slow, because of the planner's need to guess and such. I don't fully understand how the statistics work (and the explanation on the PG website is way too geeky) but he columns I work with already have a stat level of 100. Not helping at all. That's definitely *NOT* due to planner's need to guess; it's due to there being some *specific* work that PostgreSQL needs to do that some other databases can avoid due to different storage strategies. The matter is quite succinctly described here: http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Counting_rows_in_a_table I'll just take one excerpt: --- It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with a WHERE clause like SELECT COUNT(*) FROM table WHERE status = 'something' PostgreSQL, MySQL, and most other database implementations will take advantage of available indexes against the restricted field(s) to limit how many records must be counted, which can greatly accelerate such queries. --- It is common for systems where it is necessary for aggregation reporting to be fast to do pre-computation of the aggregates, and that is in no way specific to PostgreSQL. If you need *really* fast aggregates, then it will be worthwhile to put together triggers or procedures or something of the sort to help pre-compute the aggregates. -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://linuxfinances.info/info/wp.html When you have eliminated the impossible, whatever remains, however improbable, must be the truth. -- Sir Arthur Conan Doyle (1859-1930), English author. Sherlock Holmes, in The Sign of Four, ch. 6 (1889). [...but see the Holmesian Fallacy, due to Bob Frankston... http://www.frankston.com/public/Essays/Holmesian%20Fallacy.asp] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
Tom Lane wrote: Chris dmag...@gmail.com writes: I can see it's doing the extra filter step at the start (4th line) which is not present without the coalesce/case statement. I just don't understand why it's being done at that stage. It's not that hard to understand. With the original view formulation (or the COALESCE version), the fully expanded form of the query looks like select ... from p left join r ... where expression_involving_both_p_and_r = constant If you make the view output be just p.assetid then you have select ... from p left join r ... where p.assetid = constant In the first case the planner cannot apply the WHERE restriction until it's formed the p+r join; so you see the condition applied as a filter on the join node's output. In the second case, the planner can push the WHERE restriction down into the scan of p, since the left join doesn't affect it. (If a p row doesn't pass the restriction, then no join row formed from it can either; ergo there is no need to form those join rows at all.) So because the CASE is on (some of) the fields I'm joining on, in effect it's made part of the join condition. If the fields are outside that (r.userid/p.userid), then it's evaluated after. Thanks! -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] left join + case - how is it processed?
Hi all, I have a view that looks like this: SELECT CASE WHEN r.assetid IS NULL THEN p.assetid ELSE r.assetid END AS assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; It was pointed out to me that the first CASE is useless (since r.assetid will always be the same as p.assetid because of the left join condition) so I'm looking at that to see if it'll make much of a difference and it does. I won't post the whole lot but the first line is the most interesting. # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual time=284.371..341.536 rows=1 loops=1) (The row count is right - it's the total # of rows from sq_ast_perm). When I change the view to be: SELECT p.assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; The Merge left join only returns 3 rows: # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual time=11.544..11.549 rows=1 loops=1) I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? Recently analyzed, only just imported so free of bloat. Running 8.1.11. Thanks! -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
The reason why the CASE is affecting your query planning is because you are using a query that compares assetid to a constant: SELECT * from sq_vw_ast_perm where assetid='30748'; When PostgreSQL evaluates this statement, assetid gets expanded either into a case statement (with your first view definition) or into sq_ast_perm.assetid (with your second view definition). The latter definition allows PostgreSQL to make use of the column statistics (which are pretty accurate) whereas the former is probably leading to a SWAG, because PostgreSQL isn't very good at estimating the selectivity of CASE. The bad selectivity estimate, in turn, is leading to a poor plan choice... If I take it out of the view, it's fine: # SELECT # CASE # WHEN r.assetid IS NULL THEN p.assetid # ELSE r.assetid # END AS assetid, # CASE # WHEN r.userid IS NULL THEN p.userid # ELSE r.userid # END AS userid, p.permission, p.granted, p.cascades #FROM sq_ast_perm p #LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text # where p.assetid='30748'; QUERY PLAN - Merge Left Join (cost=9459.89..9463.13 rows=3 width=102) (actual time=0.096..0.098 rows=1 loops=1) In this case I assume the planner is doing the 'WHERE' first to cut down the rows, then applying the CASE at the end. The view it seems to be the opposite - I still don't understand why that's the case. Though I do get the same behaviour as the view when I do it as a subselect. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; It aperas to me that both of your statements have where clauses, but I believe where isn't that explicit. I'm not sure the nature of your problem permits the query optimizer to eliminate rows at all, even with the where statement. assetid is probably not known when the query optimizer hits, because it is computed based on the nullness of the columns. I'd assume that the optimizer *could* more easily optimize this if you had used coalesce rather than an ad-hoc method with CASE. My guess is you can exclude rows with WHERE if the the column used is an run-time computation involving an ad-hoc CASE. No difference. Full explain plan here: http://explain-analyze.info/query_plans/2725-query-plan-1447 I can see it's doing the extra filter step at the start (4th line) which is not present without the coalesce/case statement. I just don't understand why it's being done at that stage. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel's X25-M SSD
[EMAIL PROTECTED] (Merlin Moncure) writes: I think the SSD manufacturers made a tactical error chasing the notebook market when they should have been chasing the server market... That's a very good point; I agree totally! -- output = reverse(moc.enworbbc @ enworbbc) http://www3.sympatico.ca/cbbrowne/nonrdbms.html We are all somehow dreadfully cracked about the head, and sadly need mending. --/Moby-Dick/, Ch 17 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Incorrect estimates on correlated filters
On Wed, Aug 13, 2008 at 10:59 AM, Decibel! [EMAIL PROTECTED] wrote: On Aug 12, 2008, at 4:59 PM, Chris Kratz wrote: Ran into a re-occuring performance problem with some report queries again today. In a nutshell, we have filters on either multiple joined tables, or multiple columns on a single table that are highly correlated. So, the estimates come out grossly incorrect (the planner has no way to know they are correlated). 2000:1 for one I'm looking at right now. Generally this doesn't matter, except in complex reporting queries like these when this is the first join of 40 other joins. Because the estimate is wrong at the lowest level, it snowballs up through the rest of the joins causing the query to run very, very slowly. In many of these cases, forcing nested loops off for the duration of the query fixes the problem. But I have a couple that still are painfully slow and shouldn't be. I've been reading through the archives with others having similar problems (including myself a year ago). Am I right in assuming that at this point there is still little we can do in postgres to speed up this kind of query? Right now the planner has no way to know the correlation between different columns in the same table, let alone columns in different tables. So, it just assumes no correlation and returns incorrectly low estimates in cases like these. The only solution I've come up with so far is to materialize portions of the larger query into subqueries with these correlated filters which are indexed and analyzed before joining into the larger query. This would keep the incorrect estimates from snowballing up through the chain of joins. Are there any other solutions to this problem? Well... you could try and convince certain members of the community that we actually do need some kind of a query hint mechanism... ;) I did make a suggestion a few months ago that involved sorting a table on different columns and recording the correlation of other columns. The scheme isn't perfect, but it would help detect cases like a field populated by a sequence and another field that's insert timestamp; those two fields would correlate highly, and you should even be able to correlate the two histograms; that would allow you to infer that most of the insert times for _id's between 100 and 200 will be between 2008-01-01 00:10 and 2008-01-01 00:20, for example. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Thanks for the reply, Yes, I know hints are frowned upon around here. Though, I'd love to have them or something equivalent on this particular query just so the customer can run their important reports. As it is, it's unrunnable. Unfortunately, if I don't think the sorting idea would help in the one case I'm looking at which involves filters on two tables that are joined together. The filters happen to be correlated such that about 95% of the rows from each filtered table are actually returned after the join. Unfortunately, the planner thinks we will get 1 row back. I do have to find a way to make these queries runnable. I'll keep looking. Thanks, -Chris
[PERFORM] Incorrect estimates on correlated filters
Hello All, Ran into a re-occuring performance problem with some report queries again today. In a nutshell, we have filters on either multiple joined tables, or multiple columns on a single table that are highly correlated. So, the estimates come out grossly incorrect (the planner has no way to know they are correlated). 2000:1 for one I'm looking at right now. Generally this doesn't matter, except in complex reporting queries like these when this is the first join of 40 other joins. Because the estimate is wrong at the lowest level, it snowballs up through the rest of the joins causing the query to run very, very slowly. In many of these cases, forcing nested loops off for the duration of the query fixes the problem. But I have a couple that still are painfully slow and shouldn't be. I've been reading through the archives with others having similar problems (including myself a year ago). Am I right in assuming that at this point there is still little we can do in postgres to speed up this kind of query? Right now the planner has no way to know the correlation between different columns in the same table, let alone columns in different tables. So, it just assumes no correlation and returns incorrectly low estimates in cases like these. The only solution I've come up with so far is to materialize portions of the larger query into subqueries with these correlated filters which are indexed and analyzed before joining into the larger query. This would keep the incorrect estimates from snowballing up through the chain of joins. Are there any other solutions to this problem? Thanks, -Chris
Re: [PERFORM] Trigger is not firing immediately
Praveen wrote: Hi All, I am having a trigger in table, If I update the the table manually trigger is firing immediately(say 200ms per row), But if I update the table through procedure the trigger is taking time to fire(say 7 to 10 seconds per row). Please tell me what kind of changes can I make so that trigger fire immediately while updating the table through procedure ? Sending the same email over and over again isn't going to get you a response any quicker. If you send the details of the trigger and the tables/fields it affects then you might get a more helpful response. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [pgsql-performance] function difference(geometry,geometry) is SLOW!
Date: Mon, 16 Jun 2008 11:06:44 +0200 (CEST) From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Subject: function difference(geometry,geometry) is SLOW! Message-ID: [EMAIL PROTECTED] Hi, In my pgsql procedure, i use the function geometryDiff := difference (geometry1,geometry2); but this function is very slow!!! What can I do to speed this function? Exists a special index for it? Thanks in advance! Luke Hi, this is a postgis function. Postgis is an independent project and you might want to ask there: http://www.postgis.org/mailman/listinfo/postgis-users or http://www.faunalia.com/cgi-bin/mailman/listinfo/gfoss (italian). Anyway, as long as you just compute the difference between 2 given shapes, no index can help you. Indices speed up searches... Bye, Chris. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Adding LIMIT 1 kills performance.
[Attn list-queue maintainers: Please drop the earlier version of this email that I accidentally sent from an unsubscribed address. ] Hi, I'm having a strange problem with a slow-running select query. The query I use in production ends in LIMIT 1, and it runs very slowly. But when I remove the LIMIT 1, the query runs quite quickly. This behavior has stumped a couple smart DBAs. The full queries and EXPLAIN ANALYZE plans are included below, but by way of explanation/observation: 1) The LIMIT 1 case will sometimes be quicker (but still much slower than the non-LIMIT 1 case) for different values of calendar_group_id. 2) The query below is a slightly simplified version of the one I actually use. The real one includes more conditions which explain why each table is joined. For reference, the original query is quoted at the end [1]. The original query exhibits the same behavior as the simplified versions w.r.t. the LIMIT 1 case taking _much_ longer (even longer than the simplified version) than the non-LIMIT 1 case, and uses the same plans. Can anyone explain why such a slow plan is chosen when the LIMIT 1 is present? Is there anything I can do to speed this query up? Thanks. -chris production= select version(); version -- PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) (1 row) production= analyze calendar_groups; ANALYZE production= analyze calendar_links; ANALYZE production= analyze calendars; ANALYZE production= analyze event_updates; ANALYZE production= EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC LIMIT 1; QUERY PLAN --- Limit (cost=16.55..91.73 rows=1 width=2752) (actual time=27810.058..27810.059 rows=1 loops=1) - Nested Loop (cost=16.55..695694.18 rows=9254 width=2752) (actual time=27810.054..27810.054 rows=1 loops=1) Join Filter: (event_updates.feed_id = calendars.id) - Index Scan Backward using event_updates_pkey on event_updates (cost=0.00..494429.30 rows=8944370 width=2752) (actual time=0.030..7452.142 rows=5135706 loops=1) - Materialize (cost=16.55..16.56 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=5135706) - Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=0.029..0.034 rows=1 loops=1) - Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: (calendar_group_id = 3640) - Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: (calendars.id = calendar_links.source_tracker_id) Total runtime: 27810.161 ms (11 rows) production= EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC; QUERY PLAN - Sort (cost=43376.36..43399.50 rows=9256 width=2752) (actual time=10.178..10.205 rows=36 loops=1) Sort Key: event_updates.id - Nested Loop (cost=249.86..31755.56 rows=9256 width=2752) (actual time=9.957..10.098 rows=36 loops=1) - Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=9.868..9.873 rows=1 loops=1) - Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=9.824..9.825 rows=1 loops=1) Index Cond: (calendar_group_id = 3640) - Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1
[PERFORM] OVERLAPS is slow
I'm doing some analysis on temporal usages, and was hoping to make use of OVERLAPS, but it does not appear that it makes use of indices. Couching this in an example... I created a table, t1, thus: metadata=# \d t1 Table public.t1 Column | Type | Modifiers +--+--- id | integer | not null default nextval('t1_id_seq'::regclass) t1 | timestamp with time zone | not null default now() t2 | timestamp with time zone | not null default 'infinity'::timestamp with time zone data | text | not null Indexes: t1_pkey PRIMARY KEY, btree (id) f2 btree (id) WHERE t2 = 'infinity'::timestamp with time zone t1t1 btree (t1) t1t2 btree (t2) When entries go in, they default to having an effective date range from now() until 'infinity'. I then went off and seeded a bunch of data into the table, inserting values: for i in `cat /etc/dictionaries-common/words | head 2000`; do psql -d metadata -c insert into t1 (data) values ('$i'); done Then, I started doing temporal updates, thus: for i in `cat /etc/dictionaries-common/words`; do psql -d metadata -c insert into t1 (data) values ('$i');update t1 set t2 = now() where t2 = 'infinity' and id in (select id from t1 where t2 = 'infinity' order by random() limit 1); done This terminates many of those entries, and creates a new one that is effective to infinity. After running this for a while, I have a reasonably meaningful amount of data in the table: metadata=# select count(*) from t1; select count(*) from t1 where t2 = 'infinity'; count 125310 (1 row) count --- 2177 (1 row) Searching for the active items in the table, via a constructed 'overlap': metadata=# explain analyze select count(*) from t1 where t1 = now() and t2 = now(); QUERY PLAN -- Aggregate (cost=98.13..98.14 rows=1 width=0) (actual time=8.104..8.105 rows=1 loops=1) - Index Scan using t1t2 on t1 (cost=0.00..93.95 rows=1671 width=0) (actual time=0.116..6.374 rows=2177 loops=1) Index Cond: (t2 = now()) Filter: (t1 = now()) Total runtime: 8.193 ms (5 rows) Note, that makes use of the index on column t2, and runs nice and quick. (And notice that the rows found, 2177, agrees with the earlier count.) Unfortunately, when I try using OVERLAPS, it reverts to a Seq Scan. metadata=# explain analyze select * from t1 where (t1,t2) overlaps (now(), now()); QUERY PLAN --- Seq Scan on t1 (cost=0.00..3156.59 rows=43135 width=24) (actual time=171.248..205.941 rows=2177 loops=1) Filter: overlaps(t1, t2, now(), now()) Total runtime: 207.508 ms (3 rows) I would surely think that I have enough data in the table for the stats to be good, and the first query certainly does harness the index on t2 to determine if records are overlapping (now(),now()). Is it possible that we need to have some improvement to the optimizer so that OVERLAPS could make use of the indices? -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxfinances.info/info/lsf.html Very little is known about the War of 1812 because the Americans lost it. -- Eric Nicol -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Adding LIMIT 1 kills performance.
On Fri, May 30, 2008 at 02:23:46AM +0930, Shane Ambler wrote: Chris Shoemaker wrote: [Attn list-queue maintainers: Please drop the earlier version of this email that I accidentally sent from an unsubscribed address. ] Hi, I'm having a strange problem with a slow-running select query. The query I use in production ends in LIMIT 1, and it runs very slowly. But when I remove the LIMIT 1, the query runs quite quickly. This behavior has stumped a couple smart DBAs. Can anyone explain why such a slow plan is chosen when the LIMIT 1 is present? Is there anything I can do to speed this query up? Thanks. From what I know using an ORDER BY and a LIMIT can often prevent *shortening* the query as it still needs to find all rows to perform the order by before it limits. That makes complete sense, of course. The difference in plans eludes me. production= EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC LIMIT 1; Does removing the DESC from the order by give the same variation in plans? Or is this only when using ORDER BY ... DESC LIMIT 1? Except for using Index Scan instead of Index Scan Backward, the plan is the same with ORDER BY ... or ORDER BY ... ASC as with ORDER BY ... DESC. In case you're wondering what would happen without the ORDER BY at all: production= EXPLAIN SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) LIMIT 1; QUERY PLAN Limit (cost=0.00..3.95 rows=1 width=2752) - Nested Loop (cost=0.00..36992.38 rows=9362 width=2752) - Nested Loop (cost=0.00..16.55 rows=1 width=8) - Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) Index Cond: (calendar_group_id = 3640) - Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) Index Cond: (calendars.id = calendar_links.source_tracker_id) - Index Scan using index_event_updates_on_feed_id_and_feed_type on event_updates (cost=0.00..36858.50 rows=9386 width=2752) Index Cond: (event_updates.feed_id = calendars.id) (9 rows) One thing that interests me is try - EXPLAIN ANALYZE SELECT * FROM ( SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC ) AS foo LIMIT 1; That's an interesting experiment. Here are the results: QUERY PLAN Limit (cost=16.55..91.74 rows=1 width=6027) (actual time=490709.355..490709.357 rows=1 loops=1) - Nested Loop (cost=16.55..703794.95 rows=9361 width=2752) (actual time=490709.352..490709.352 rows=1 loops=1) Join Filter: (event_updates.feed_id = calendars.id) - Index Scan Backward using event_updates_pkey on event_updates (cost=0.00..500211.53 rows=9047416 width=2752) (actual time=0.222..469082.071 rows=5251179 loops=1) - Materialize (cost=16.55..16.56 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=5251179) - Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=0.240..0.246 rows=1 loops=1) - Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.108..0.109 rows=1 loops=1) Index Cond: (calendar_group_id = 3640) - Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.127..0.129 rows=1 loops=1) Index Cond: (calendars.id = calendar_links.source_tracker_id) Total runtime: 490709.576 ms (11 rows) That is, no real change in the performance. Still stumped
Re: [PERFORM] RAID controllers for Postgresql on large setups
Joshua, did you try to run the 345 on an IBM ServeRAID 6i? I have one in mine, but I never actually ran any speed test. Do you have any benchmarks that I could run and compare? best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre On May 12, 2008, at 22:11, Joshua D. Drake wrote: On Mon, 12 May 2008 22:04:03 -0400 Francisco Reyes [EMAIL PROTECTED] wrote: Inheritted a number of servers and I am starting to look into the [snip] Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s using hardware raid 10, the software raid completely blew the adaptec away. [more snip] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. Thats the reason we need to get the site down. VACUUM has not required an exclusive lock on tables since version 7.1. What version of PostgreSQL are you running? -- output = (cbbrowne @ acm.org) http://linuxdatabases.info/info/sap.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
[EMAIL PROTECTED] (A B) writes: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) In the order of ease of implementation, it tends to be... 1. Tweak postgresql.conf 2. Make sure you ran VACUUM + ANALYZE 3. Find some expensive queries and try to improve them, which might involve changing the queries and/or adding relevant indices 4. Add RAM to your server 5. Add disk to your server 6. Redesign your application's DB schema so that it is more performant by design URL below may have some material of value... -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxfinances.info/info/postgresqlperformance.html It is usually a good idea to put a capacitor of a few microfarads across the output, as shown. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Message queue table..
[EMAIL PROTECTED] (Jesper Krogh) writes: I have this message queue table.. currently with 8m+ records. Picking the top priority messages seem to take quite long.. it is just a matter of searching the index.. (just as explain analyze tells me it does). Can anyone digest further optimizations out of this output? (All records have funcid=4) # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce FROM workqueue.job WHERE (job.funcid = 4) AND (job.run_after = 1208442668) AND (job.grabbed_until = 1208442668) AND (job.coalesce = 'Efam') ORDER BY funcid, priority ASC LIMIT 1 ; There might be value in having one or more extra indices... Here are *plausible* candidates: 1. If funcid = 4 is highly significant (e.g. - you are always running this query, and funcid often 4), then you might add a functional index such as: create index job_funcid_run_after on workqueue.job (run_after) where funcid = 4; create index job_funcid_grabbeduntil on workqueue.job (grabbed_until) where funcid = 4; 2. Straight indices like the following: create index job_run_after on workqueue.job(run_after); create index job_grabbed_until on workqueue.job(grabbed_until); create index job_funcid on workqueue.job(funcid); create index job_coalesce on workqueue.job(coalesce); Note that it is _possible_ (though by no means guaranteed) that all three might prove useful, if you're running 8.1+ where PostgreSQL supports bitmap index scans. Another possibility... 3. You might change your process to process multiple records in a run so that you might instead run the query (perhaps via a cursor?) with LIMIT [Something Bigger than 1]. It does seem mighty expensive to run a 245ms query to find just one record. It seems quite likely that you could return the top 100 rows (LIMIT 100) without necessarily finding it runs in any more time. Returning 100 tuples in 245ms seems rather more acceptable, no? :-) -- (format nil [EMAIL PROTECTED] cbbrowne linuxfinances.info) http://linuxdatabases.info/info/linuxdistributions.html Rules of the Evil Overlord #32. I will not fly into a rage and kill a messenger who brings me bad news just to illustrate how evil I really am. Good messengers are hard to come by. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oddly slow queries
[EMAIL PROTECTED] (Thomas Spreng) writes: On 16.04.2008, at 01:24, PFC wrote: The queries in question (select's) occasionally take up to 5 mins even if they take ~2-3 sec under normal conditions, there are no sequencial scans done in those queries. There are not many users connected (around 3, maybe) to this database usually since it's still in a testing phase. I tried to hunt down the problem by playing around with resource usage cfg options but it didn't really made a difference. Could that be caused by a CHECKPOINT ? actually there are a few log (around 12 per day) entries concerning checkpoints: LOG: checkpoints are occurring too frequently (10 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. But wouldn't that only affect write performance? The main problems I'm concerned about affect SELECT queries. No, that will certainly NOT just affect write performance; if the postmaster is busy writing out checkpoints, that will block SELECT queries that are accessing whatever is being checkpointed. When we were on 7.4, we would *frequently* see SELECT queries that should be running Very Quick that would get blocked by the checkpoint flush. We'd periodically see hordes of queries of the form: select id from some_table where unique_field = 'somevalue.something'; which would normally run in less than 1ms running for (say) 2s. And the logs would show something looking rather like the following: 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 952ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 742ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1341ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 911ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1244ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 2311ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1799ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 1992ms This was happening because the checkpoint was flushing those two tuples, and hence blocking 8 SELECTs that came in during the flush. There are two things worth considering: 1. If the checkpoints are taking place too frequently, then that is clear evidence that something is taking place that is injecting REALLY heavy update load on your database at those times. If the postmaster is checkpointing every 10s, that implies Rather Heavy Load, so it is pretty well guaranteed that performance of other activity will suck at least somewhat because this load is sucking up all the I/O bandwidth that it can. So, to a degree, there may be little to be done to improve on this. 2. On the other hand, if you're on 8.1 or so, you may be able to configure the Background Writer to incrementally flush checkpoint data earlier, and avoid the condition of 1. Mind you, you'd have to set BgWr to be pretty aggressive, based on the 10s periodicity that you describe; that may not be a nice configuration to have all the time :-(. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://cbbrowne.com/info/multiplexor.html Nagging is the repetition of unpalatable truths. --Baroness Edith Summerskill -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Background writer underemphasized ...
[EMAIL PROTECTED] (Marinos Yannikos) writes: This helped with our configuration: bgwriter_delay = 1ms # 10-1ms between rounds bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round FYI, I'd be inclined to reduce both of those numbers, as it should reduce the variability of behaviour. Rather than cleaning 1K pages every 10s, I would rather clean 100 pages every 1s, as that will have much the same effect, but spread the work more evenly. Or perhaps 10 pages every 100ms... Cut the delay *too* low and this might make the background writer, in effect, poll *too* often, and start chewing resources, but there's doubtless some sweet spot in between... -- cbbrowne,@,cbbrowne.com http://linuxdatabases.info/info/oses.html For systems, the analogue of a face-lift is to add to the control graph an edge that creates a cycle, not just an additional node. -- Alan J. Perlis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Creating large database of MD5 hash values
1. Which datatype should I use to represent the hash value? UUIDs are also 16 bytes... md5's are always 32 characters long so probably varchar(32). 2. Does it make sense to denormalize the hash set relationships? The general rule is normalize as much as possible then only denormalize when absolutely necessary. 3. Should I index? What sort of queries are you going to be running? 4. What other data structure options would it make sense for me to choose? What sort of other data will you be needing to store? -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Craig Ringer wrote: Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. If you are, add analyze commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow pg_connect()
* Read about configuring and using persistent database connections (http://www.php.net/manual/en/function.pg-pconnect.php) with PHP Though make sure you understand the ramifications of using persistent connections. You can quickly exhaust your connections by using this and also cause other issues for your server. If you do this you'll probably have to adjust postgres to allow more connections, which usually means lowering the amount of shared memory each connection can use which can also cause performance issues. I'd probably use pgpool-II and have it handle the connection stuff for you rather than doing it through php. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Planner mis-estimation using nested loops followup
A number of weeks ago, I had posted a request for help regarding join estimates in pg 8.2.6. In moderately complex to very complex ad hoc queries in our system, we were consistently having the system massively underestimate the number of rows coming out of join at a low level making these queries very slow and inefficient. At times the mis-estimation was 1000:1. Ie when it should have been 2000 returned rows from a join, the planner assumed 1 or 2 rows. Modifying stats on the join columns up to the max made little difference (y, we analyzed tables in question after each change). Since the planner sees only one row coming out of the low level join, it uses nested loops all the way up chain when it would be more efficient to use another join type. In our informal testing, we found that by disabling nested loops and forcing other join types, we could get fantastic speedups. Those queries that seem to benefit most from this have a lot of sub-queries being built up into a final query set as well as a fair number of joins in the sub-queries. Since these are user created and are then generated via our tools, they can be quite messy at times. After doing this testing, have since added some functionality in our ad hoc reporting tool to allow us to tune individual queries by turning on and off individual join types at runtime. As we hear of slow reports, we've been individually turning off the nested loops on those reports. Almost always, this has increased the performance of the reports, sometimes in a completely amazing fashion (many, many minutes to seconds at times). It of course doesn't help everything and turning off nested loops in general causes overall slowdown in other parts of the system. As this has gone on over the last couple of weeks, it feels like we either have a misconfiguration on the server, or we are tickling a mis-estimation bug in the planner. I'm hoping it's the former. The db server has 8G of memory and raid1 -wal, raid10- data configuration, os is linux 2.6.9, db is 8.2.6. The db is a utf-8 db if that is of any bearing and autovac and bgwriter are on. Nondefault settings of interest from postgresql.conf shared_buffers = 1024MB # min 128kB or max_connections*16kB work_mem = 256MB# min 64kB maintenance_work_mem = 256MB# min 1MB random_page_cost = 1.75 # same scale as above effective_cache_size = 4096MB default_statistics_target = 100 # range 1-1000 If nothing else, perhaps this will help somebody else who has run into the same problem. If explain analyze of a query shows a large mis-estimation of rows returned on a join (estimate=1, actual=2k) causing the planner to choose nested loops instead of another join type, you might try running the query with nested loops set to off and see if that helps w/ performance. Thanks, -Chris
Re: [PERFORM] Planner mis-estimation using nested loops followup
Y, turning nested loops off in specific cases has increased performance greatly. It didn't fix the planner mis-estimation, just the plan it chose. It's certainly not a panacea, but it's something we now try early on when trying to speed up a query that matches these characteristics. -Chris On 3/18/08, Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Mar 2008 11:35:08 -0400 Chris Kratz [EMAIL PROTECTED] wrote: Nondefault settings of interest from postgresql.conf shared_buffers = 1024MB # min 128kB or max_connections*16kB work_mem = 256MB # min 64kB maintenance_work_mem = 256MB# min 1MB random_page_cost = 1.75 # same scale as above effective_cache_size = 4096MB default_statistics_target = 100 # range 1-1000 If nothing else, perhaps this will help somebody else who has run into the same problem. If explain analyze of a query shows a large mis-estimation of rows returned on a join (estimate=1, actual=2k) causing the planner to choose nested loops instead of another join type, you might try running the query with nested loops set to off and see if that helps w/ performance. Did you try that? Did it work? Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL political pundit | Mocker of Dolphins -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU eI9i5yucBH718okW3w2UewQ= =BO3E -END PGP SIGNATURE-
Re: [PERFORM] question on TRUNCATE vs VACUUM FULL
So my question is this: Shouldn’t VACUUM FULL clean Table C and reclaim all its space? You've got concepts mixed up. TRUNCATE deletes all of the data from a particular table (and works in all dbms's). http://www.postgresql.org/docs/8.3/interactive/sql-truncate.html VACUUM FULL is a postgres-specific thing which does work behind the scenes to clean up MVCC left-overs. It does not touch any current data or records in the table, it's purely behind the scenes work. http://www.postgresql.org/docs/current/interactive/sql-vacuum.html The two have completely different uses and nothing to do with each other what-so-ever. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] migration of 7.4 to 8.1
[EMAIL PROTECTED] wrote: On Wed, 12 Mar 2008, sathiya psql wrote: In the home page itself they were saying testing ... unstable you are talking about the debian home page right? then we should not use that for live. so i prefer 8.1 . Debian selected the version of Postgres for Etch about a year and a half ago. At that point selecting 8.1 was a resonable choice. Debian has a policy that they will never change the version number of a package in a stable release (they will backport many bugfixes, but not upgrade the version) As a result 2 years from now when Postgres is on 8.5 stable (and looking at 8.6), Debian Etch will still be on 8.1 I like that with debian I can install multiple postgres versions and it handles everything for me :) Changing the default port, config files are in different folders, different start up scripts.. all works very nicely :) Which means you can have 8.1 installed and 8.3 installed at the same time - both from packages, no compiling etc necessary - and can switch between them very easily. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Confirmação de envio / Sending confirmation (captchaid:13266b20536d)
petchimuthu lingam wrote: C5BK4513 Ahh - you are sending this to the wrong address, these are not being sent by the postgres mailing list. Check which address you are replying to next time... -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] count * performance issue
sathiya psql wrote: count(*) tooks much time... but with the where clause we can make this to use indexing,... what where clause we can use?? Am using postgres 7.4 in Debian OS with 1 GB RAM, am having a table with nearly 50 lakh records, Looks suspiciously like a question asked yesterday: http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
[PERFORM] Ramifications of turning off Nested Loops for slow queries
Hello Everyone, I had posted an issue previously that we've been unable to resolve. An early mis-estimation in one or more subqueries causes the remainder of the query to choose nested loops instead of a more efficient method and runs very slowly (CPU Bound). I don't think there is any way to suggest to the planner it not do what it's doing, so we are starting to think about turning off nested loops entirely. Here is the history so far: http://archives.postgresql.org/pgsql-performance/2008-02/msg00205.php At the suggestion of the list, we upgraded to 8.2.6 and are still experiencing the same problem. I'm now installing 8.3 on my workstation to see if it chooses a better plan, but it will take some time to get it compiled, a db loaded, etc. We have a number of very long running reports that will run in seconds if nested loops are turned off. The other alternative we are exploring is programmatically turning off nested loops just for the problematic reports. But with the speedups we are seeing, others are getting gun shy about having them on at all. So, I've now been asked to ping the list as to whether turning off nested loops system wide is a bad idea, and why or why not. Any other thoughts or suggestions? Thanks, -Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] Ramifications of turning off Nested Loops for slow queries
On 3/4/08, Kevin Grittner [EMAIL PROTECTED] wrote: On Tue, Mar 4, 2008 at 8:42 AM, in message Any other thoughts or suggestions? Make sure your effective_cache_size is properly configured. Increase random_page_cost and/or decrease seq_page_cost. You can play with the cost settings on a connection, using EXPLAIN on the query, to see what plan you get with each configuration before putting it into the postgresql.conf file. -Kevin That was a good idea. I hadn't tried playing with those settings in a session. This is a 8G box, and we've dedicated half of that (4G) to the file system cache. So, 4G is what effective_cache_size is set to. Our seq_page_cost is set to 1 and our random_page_cost is set to 1.75 in the postgresql.conf. In testing this one particular slow query in a session, I changed these settings alternating in increments of 0.25. The random_page_cost up to 4 and the seq_page_cost down to 0.25. This made perhaps a second difference, but at the end, we were back to to the 37s. Doing a set enable_nestloop=off in the session reduced the runtime to 1.2s with the other settings back to our normal day to day settings. So, for now I think we are going to have to modify the code to prepend the problematic queries with this setting and hope the estimator is able to better estimate this particular query in 8.3. Thanks for the suggestions, -Chris