[PERFORM] Which Join is better
Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) B ( 7 MB ) A has 10 columns B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which Join is better
On 2 August 2011 08:42, Adarsh Sharma adarsh.sha...@orkash.com wrote: Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) B ( 7 MB ) A has 10 columns B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Hi, it really doesn't matter. PostgreSQL can reorder the joins as it likes. And you can always check, but I think the plans will be the same. regards Szymon
Re: [PERFORM] Which Join is better
El Martes 02 Agosto 2011, Adarsh Sharma escribió: Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) B ( 7 MB ) A has 10 columns B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Thanks Hi Adarsh, What does a EXPLAIN ANALYZE say after a VACCUM? -- María Arias de Reyna Domínguez Área de Operaciones Emergya Consultoría Tfno: +34 954 51 75 77 / +34 607 43 74 27 Fax: +34 954 51 64 73 www.emergya.es -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance die when COPYing to table with bigint PK
02.08.11 11:26, Robert Ayrapetyan написав(ла): Seems this assumption is not right. Just created simple index on bigint column - situation with huge performance degradation repeated. Dropping this index solved COPY issues on the fly. So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS + bigint column index (some of these may be superfluous, but I have no resources to check on different platforms with different filesystems). Inteesting. We also have FreeBSDx64 on UFS and are using bigint (bigserial) keys. It seems I will need to perform more tests here because I do see similar problems. I for sure can do a copy of data with int4 keys and test the performance. BTW: The thing we are going to try on next upgrade is to change UFS block size from 16K to 8K. What problem I saw is that with default setting, UFS needs to read additional 8K when postgresql writes it's page (and for index random writes can be vital). Unfortunately, such a changes requires partition reformat and I can't afford it for now. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Tsearch2 - bad performance with concatenated ts-vectors
Hello everyone, This is my first post on this list, I tried to look after possible solutions in the archive, as well as in google, but I could not find an explanation for such a specific situation. I am facing a performance problem connected with Postgres Tsearch2 FTS mechanism. Here is my query: select participant.participant_id from participant participant join person person on person.person_participant_id = participant.participant_id left join registration registration on registration.registration_registered_participant_id = participant.participant_id left join enrollment enrollment on registration.registration_enrollment_id = enrollment.enrollment_id join registration_configuration registration_configuration on enrollment.enrollment_configuration_id = registration_configuration.configuration_id left join event_context context on context.context_id = registration_configuration.configuration_context_id where participant.participant_type = 'PERSON' and participant_status = 'ACTIVE' and context.context_code in ('GB2TST2010A') and registration_configuration.configuration_type in ('VISITOR') and registration_configuration.configuration_id is not null and participant.participant_tsv || person.person_tsv @@ to_tsquery('simple',to_tsquerystring('Abigail')) limit 100 As you see, I am using two vectors which I concatenate and check against a tsquery. Both vectors are indexed with GIN and updated with respective triggers in the following way: ALTER TABLE person ALTER COLUMN person_tsv SET STORAGE EXTENDED; CREATE INDEX person_ft_index ON person USING gin(person_tsv); CREATE OR REPLACE FUNCTION update_person_tsv() RETURNS trigger AS $$ BEGIN NEW.person_tsv := to_tsvector('simple',create_tsv( ARRAY[NEW.person_first_name, NEW.person_last_name, NEW.person_middle_name] )); RETURN NEW; END; $$ LANGUAGE 'plpgsql'; CREATE TRIGGER person_tsv_update BEFORE INSERT or UPDATE ON person FOR EACH ROW EXECUTE PROCEDURE update_person_tsv(); ALTER TABLE participant ALTER COLUMN participant_tsv SET STORAGE EXTENDED; CREATE INDEX participant_ft_index ON participant USING gin(participant_tsv); CREATE OR REPLACE FUNCTION update_participant_tsv() RETURNS trigger AS $$ BEGIN NEW.participant_tsv := to_tsvector('simple',create_tsv( ARRAY[NEW.participant_login, NEW.participant_email] )); RETURN NEW; END; $$ LANGUAGE 'plpgsql'; CREATE TRIGGER participant_tsv_update BEFORE INSERT or UPDATE ON participant FOR EACH ROW EXECUTE PROCEDURE update_participant_tsv(); The database is quite big - has almost one million of participant records. The above query has taken almost 67 seconds to execute and fetch 100 rows, which is unacceptable for us. As I assume, the problem is, when the vectors are concatenated, the individual indexes for each vector are not used. The execution plan done after 1st execution of the query: Limit (cost=46063.13..93586.79 rows=100 width=4) (actual time=4963.620..39703.645 rows=100 loops=1) - Nested Loop (cost=46063.13..493736.04 rows=942 width=4) (actual time=4963.617..39703.349 rows=100 loops=1) Join Filter: (registration_configuration.configuration_id = enrollment.enrollment_configuration_id) - Nested Loop (cost=46063.13..493662.96 rows=3769 width=8) (actual time=4963.517..39701.557 rows=159 loops=1) - Nested Loop (cost=46063.13..466987.33 rows=3769 width=8) (actual time=4963.498..39698.542 rows=159 loops=1) - Hash Join (cost=46063.13..430280.76 rows=4984 width=8) (actual time=4963.464..39692.676 rows=216 loops=1) Hash Cond: (participant.participant_id = person.person_participant_id) Join Filter: ((participant.participant_tsv || person.person_tsv) @@ to_tsquery('simple'::regconfig, to_tsquerystring('Abigail'::text))) - Seq Scan on participant (cost=0.00..84680.85 rows=996741 width=42) (actual time=0.012..3132.944 rows=1007151 loops=1) Filter: (((participant_type)::text = 'PERSON'::text) AND ((participant_status)::text = 'ACTIVE'::text)) - Hash (cost=25495.39..25495.39 rows=1012539 width=38) (actual time=3145.628..3145.628 rows=1007151 loops=1) Buckets: 2048 Batches: 128 Memory Usage: 556kB - Seq Scan on person (cost=0.00..25495.39 rows=1012539 width=38) (actual time=0.062..1582.990 rows=1007151 loops=1) - Index Scan using idx_registration_registered_participant_id on registration (cost=0.00..7.35 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=216) Index Cond: (registration.registration_registered_participant_id = person.person_participant_id) - Index Scan using enrollment_pkey on enrollment (cost=0.00..7.07 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=159) Index Cond:
[PERFORM] Array access performance
Hi, I'm looking for a hint how array access performs in PostgreSQL in respect to performance. Normally I would expect access of a 1-dimensional Array at slot i (array[i]) to perform in constant time (random access). Is this also true for postgres' arrays? May concrete example is a 1-dimensional array d of length = 600 (which will grow at a rate of 1 entry/day) stored in a table's column. I need to access this array two times per tuple, i.e. d[a], d[b]. Therefore I hope access is not linear. Is this correct? Also I'm having some performance issues building this array. I'm doing this with a used-defined aggregate function, starting with an empty array and using array_append and some calculation for each new entry. I assume this involves some copying/memory allocation on each call, but I could not find the implementation of array_append in postgres-source/git. Is there an efficient way to append to an array? I could also start with a pre-initialized array of the required length, but this involves some complexity. Thank you Regards, Andreas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Array access performance
Is this also true for postgres' arrays? Sorry, I'm using latest postgres 9.0.4 on debian squeeze/amd64. Greetings Andreas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance die when COPYing to table with bigint PK
Quite possible. But anyway - I don't think performance degradation must be so huge in case of using UNIQUE indexes. On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn tiv...@gmail.com wrote: 31.07.11 16:51, Robert Ayrapetyan написав(ла): Hello. I've found strange behavior of my pg installation (tested both 8.4 and 9.0 - they behave same) on FreeBSD platform. In short - when some table have PK on bigint field - COPY to that table from file becomes slower and slower as table grows. When table reaches ~5GB - COPY of 100k records may take up to 20 mins. I've experimented with all params in configs, moved indexes to separate hdd etc - nothing made any improvement. However, once I'm dropping 64 bit PK - COPY of 100k records passes in seconds. Interesting thing - same table has other indexes, including composite ones, but none of them include bigint fields, that's why I reached decision that bug connected with indexes on bigint fields only. I did see this behavior, but as for me it occurs for UNIQUE indexes only (including PK), not dependent on field type. You can check this by dropping PK and creating it as a regular non-unique index. Best regards, Vitalii Tymchyshyn -- Ayrapetyan Robert, Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS) http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance die when COPYing to table with bigint PK
Seems this assumption is not right. Just created simple index on bigint column - situation with huge performance degradation repeated. Dropping this index solved COPY issues on the fly. So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS + bigint column index (some of these may be superfluous, but I have no resources to check on different platforms with different filesystems). On Mon, Aug 1, 2011 at 12:15 PM, Robert Ayrapetyan robert.ayrapet...@comodo.com wrote: Quite possible. But anyway - I don't think performance degradation must be so huge in case of using UNIQUE indexes. On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn tiv...@gmail.com wrote: 31.07.11 16:51, Robert Ayrapetyan написав(ла): Hello. I've found strange behavior of my pg installation (tested both 8.4 and 9.0 - they behave same) on FreeBSD platform. In short - when some table have PK on bigint field - COPY to that table from file becomes slower and slower as table grows. When table reaches ~5GB - COPY of 100k records may take up to 20 mins. I've experimented with all params in configs, moved indexes to separate hdd etc - nothing made any improvement. However, once I'm dropping 64 bit PK - COPY of 100k records passes in seconds. Interesting thing - same table has other indexes, including composite ones, but none of them include bigint fields, that's why I reached decision that bug connected with indexes on bigint fields only. I did see this behavior, but as for me it occurs for UNIQUE indexes only (including PK), not dependent on field type. You can check this by dropping PK and creating it as a regular non-unique index. Best regards, Vitalii Tymchyshyn -- Ayrapetyan Robert, Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS) http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php -- Ayrapetyan Robert, Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS) http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Trigger or Function
On 01/08/11 19:18, Robert Klemme wrote: On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 24/07/11 03:58, alan wrote: My first approach would be to remove WeekAvg and MonthAvg from the table and create a view which calculates appropriate values. Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing features. Here is how I set it up. If anyone sees an issue, please let me know. I'm new to postgres. Basically, my daily_vals table contains HOST, DATE,VALUE columns. What I wanted was a way to automatically populate a 4th column called rolling_average, which would be the sum ofnpreceding columns. There seems to be contradiction in the naming here. Did you mean avg ofn preceding columns.? I created a view called weekly_average using this VIEW statement. CREATE OR REPLACE VIEW weekly_average AS SELECT *, sum(value) OVER (PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as rolling_average FROM daily_vals; The above gives just the rolling sum, you need to divide by the number of rows in the sum to get the average (I assume you want the arithmetic mean, as the are many types of average!). CREATE OR REPLACE VIEW weekly_average AS SELECT *, round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER mywindow))), 4) AS rolling_average FROM daily_vals WINDOW mywindow AS ( PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ); Why not CREATE OR REPLACE VIEW weekly_average AS SELECT *, avg(value) OVER (PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as rolling_average FROM daily_vals; What did I miss? Kind regards robert Chuckle Your fix is much more elegant and efficient, though both approaches work! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Array access performance
Andreas Brandl m...@3.141592654.de writes: I'm looking for a hint how array access performs in PostgreSQL in respect to performance. Normally I would expect access of a 1-dimensional Array at slot i (array[i]) to perform in constant time (random access). Is this also true for postgres' arrays? Only if the element type is fixed-length (no strings for instance) and the array does not contain, and never has contained, any nulls. Otherwise a scan through all the previous elements is required to find a particular element. By and large, if you're thinking of using arrays large enough to make this an interesting question, I would say stop right there and redesign your database schema. You're not thinking relationally, and it's gonna cost ya. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Array access performance
Hi Tom, I'm looking for a hint how array access performs in PostgreSQL in respect to performance. Normally I would expect access of a 1-dimensional Array at slot i (array[i]) to perform in constant time (random access). Is this also true for postgres' arrays? Only if the element type is fixed-length (no strings for instance) and the array does not contain, and never has contained, any nulls. Otherwise a scan through all the previous elements is required to find a particular element. We're using bigint elements here and don't have nulls, so this should be fine. By and large, if you're thinking of using arrays large enough to make this an interesting question, I would say stop right there and redesign your database schema. You're not thinking relationally, and it's gonna cost ya. In general, I agree. We're having a nice relational database but are facing some perfomance issues. My approach is to build a materialized view which exploits the array feature and heavily relies on constant time access on arrays. Thank you! Regards, Andreas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance die when COPYing to table with bigint PK
Robert Ayrapetyan robert.ayrapet...@comodo.com wrote: So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS + bigint column index (some of these may be superfluous, but I have no resources to check on different platforms with different filesystems). Linux 64 bit XFS bigint column index only shows a slightly longer run time for bigint versus int here. What timings do you get for the insert statements if you run the following in your environment? create table bi (big bigint not null, medium int not null); insert into bi with x(n) as (select generate_series(1, 100) select n + 50, n from x; \timing on truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; create unique index bi_medium on bi (medium); truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; drop index bi_medium; create unique index bi_big on bi (big); truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; truncate table bi; insert into bi with x(n) as (select generate_series(1, 100)) select n + 50, n from x; \timing off drop table bi; Here's what I get: Time: 1629.141 ms Time: 1638.060 ms Time: 1711.833 ms Time: 4151.953 ms Time: 4602.679 ms Time: 5107.259 ms Time: 4654.060 ms Time: 5158.157 ms Time: 5101.110 ms -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance penalty when using WITH
On Sat, Jul 30, 2011 at 8:10 AM, Robert Klemme shortcut...@googlemail.com wrote: On Thu, Jul 28, 2011 at 11:00 PM, Li Jin l...@tripadvisor.com wrote: I met with the problem that when I was using WITH clause to reuse a subquery, I got a huge performance penalty because of query planner. Here are the details, the original query is EXPLAIN ANALYZE WITH latest_identities AS ( SELECT DISTINCT ON (memberid) memberid, username, changedate FROM t_username_history WHERE memberid IN (SELECT memberid FROM t_member WHERE firstname || ' ' || substring(lastname,1,1) = 'Eddie T') ORDER BY memberid, changedate DESC ) Another observation: That criterion looks suspicious to me. I would expect any RDBMS to be better able to optimize this: WHERE firstname = 'Eddie' AND lastname like 'T%' I know it's semantically not the same but I would assume this is good enough for the common usecase. Plus, if there is an index on (firstname, lastname) then that could be used. disagree. just one of the ways that could be stymied would to change the function behind the '||' operator. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] synchronous_commit off
No: The commit has the same guarantees as a synchronous commit w.r.t. data consistency. The commit can only fail (as a whole) due to hardware problems or postgres backend crashes. And yes: The client commit returns, but the server can fail later and not persist the transaction and it will be lost (again as a whole). Your application should be able to tolerate losing the latest committed transactions if you use this. The difference to fsync=off is that a server crash will leave the database is a consistent state with just the latest transactions lost. From: Anibal David Acosta a...@devshock.com To: pgsql-performance@postgresql.org Sent: Monday, August 1, 2011 6:29 AM Subject: [PERFORM] synchronous_commit off Can a transaction committed asynchronously report an error, duplicate key or something like that, causing a client with a OK transaction but server with a FAILED transaction. Thanks
Re: [PERFORM] Which Join is better
Unless you use the explicit join syntax: select p.* from A p join B q on (p.id = q.id) and also set join_collapse_limit= 1 The order of the joins is determined by the planner. Also explain is your friend :) From: Adarsh Sharma adarsh.sha...@orkash.com To: pgsql-performance@postgresql.org Sent: Monday, August 1, 2011 11:42 PM Subject: [PERFORM] Which Join is better Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) B ( 7 MB ) A has 10 columns B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance