[PERFORM] Which Join is better

2011-08-02 Thread Adarsh Sharma

Dear all,

Just want to know which join is better for querying data faster.

I have 2 tables A ( 70 GB )  B ( 7 MB )

A has 10 columns  B has 3 columns.Indexes exist on both tables's ids.

select p.* from table A p, B q where p.id=q.id

or

select p.* from table B q , A p where q.id=p.id


Thanks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which Join is better

2011-08-02 Thread Szymon Guz
On 2 August 2011 08:42, Adarsh Sharma adarsh.sha...@orkash.com wrote:

 Dear all,

 Just want to know which join is better for querying data faster.

 I have 2 tables A ( 70 GB )  B ( 7 MB )

 A has 10 columns  B has 3 columns.Indexes exist on both tables's ids.

 select p.* from table A p, B q where p.id=q.id

 or

 select p.* from table B q , A p where q.id=p.id



Hi,
it really doesn't matter. PostgreSQL can reorder the joins as it likes.
And you can always check, but I think the plans will be the same.

regards
Szymon


Re: [PERFORM] Which Join is better

2011-08-02 Thread Maria Arias de Reyna
El Martes 02 Agosto 2011, Adarsh Sharma escribió:
 Dear all,
 
 Just want to know which join is better for querying data faster.
 
 I have 2 tables A ( 70 GB )  B ( 7 MB )
 
 A has 10 columns  B has 3 columns.Indexes exist on both tables's ids.
 
 select p.* from table A p, B q where p.id=q.id
 
 or
 
 select p.* from table B q , A p where q.id=p.id
 
 
 Thanks


Hi Adarsh,

What does a EXPLAIN ANALYZE say after a VACCUM?

-- 
María Arias de Reyna Domínguez
Área de Operaciones

Emergya Consultoría 
Tfno: +34 954 51 75 77 / +34 607 43 74 27
Fax: +34 954 51 64 73 
www.emergya.es 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-02 Thread Vitalii Tymchyshyn

02.08.11 11:26, Robert Ayrapetyan написав(ла):

Seems this assumption is not right. Just created simple index on
bigint column - situation with huge performance
degradation repeated. Dropping this index solved COPY issues on the fly.
So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS +
bigint column index
(some of these may be superfluous, but I have no resources to check on
different platforms with different filesystems).
Inteesting. We also have FreeBSDx64 on UFS and are using bigint 
(bigserial) keys. It seems I will need to perform more tests here 
because I do see similar problems. I for sure can do a copy of data with 
int4 keys and test the performance.
BTW: The thing we are going to try on next upgrade is to change UFS 
block size from 16K to 8K. What problem I saw is that with default 
setting, UFS needs to read additional 8K when postgresql writes it's 
page (and for index random writes can be vital). Unfortunately, such a 
changes requires partition reformat and I can't afford it for now.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Tsearch2 - bad performance with concatenated ts-vectors

2011-08-02 Thread Jan Wielgus
Hello everyone,

This is my first post on this list, I tried to look after possible solutions in 
the archive, as well as in google, but I could not find an explanation for such 
a specific situation.

I am facing a performance problem connected with Postgres Tsearch2 FTS 
mechanism.

Here is my query:

select participant.participant_id from participant participant
join person person on person.person_participant_id = participant.participant_id
left join registration registration on 
registration.registration_registered_participant_id = participant.participant_id
left join enrollment enrollment on registration.registration_enrollment_id = 
enrollment.enrollment_id
join registration_configuration registration_configuration on 
enrollment.enrollment_configuration_id = 
registration_configuration.configuration_id
left join event_context context on context.context_id = 
registration_configuration.configuration_context_id 
where participant.participant_type = 'PERSON'
and participant_status = 'ACTIVE'
and context.context_code in ('GB2TST2010A') 
and registration_configuration.configuration_type in ('VISITOR')
and registration_configuration.configuration_id is not null
and participant.participant_tsv || person.person_tsv @@ 
to_tsquery('simple',to_tsquerystring('Abigail'))
limit 100

As you see, I am using two vectors which I concatenate and check against a 
tsquery. 

Both vectors are indexed with GIN and updated with respective triggers in the 
following way:

ALTER TABLE person ALTER COLUMN person_tsv SET STORAGE EXTENDED; 
CREATE INDEX person_ft_index ON person USING gin(person_tsv); 
CREATE OR REPLACE FUNCTION update_person_tsv() RETURNS trigger AS $$ BEGIN 
NEW.person_tsv := to_tsvector('simple',create_tsv( ARRAY[NEW.person_first_name, 
NEW.person_last_name, NEW.person_middle_name] )); RETURN NEW; END; $$ LANGUAGE 
'plpgsql';
CREATE TRIGGER person_tsv_update BEFORE INSERT or UPDATE ON person FOR EACH ROW 
EXECUTE PROCEDURE update_person_tsv();

ALTER TABLE participant ALTER COLUMN participant_tsv SET STORAGE EXTENDED; 
CREATE INDEX participant_ft_index ON participant USING gin(participant_tsv); 
CREATE OR REPLACE FUNCTION update_participant_tsv() RETURNS trigger AS $$ BEGIN 
NEW.participant_tsv := to_tsvector('simple',create_tsv( 
ARRAY[NEW.participant_login, NEW.participant_email] )); RETURN NEW; END; $$ 
LANGUAGE 'plpgsql';
CREATE TRIGGER participant_tsv_update BEFORE INSERT or UPDATE ON participant 
FOR EACH ROW EXECUTE PROCEDURE update_participant_tsv();

The database is quite big - has almost one million of participant records. The 
above query has taken almost 67 seconds to execute and fetch 100 rows, which is 
unacceptable for us.

As I assume, the problem is, when the vectors are concatenated, the individual 
indexes for each vector are not used. The execution plan done after 1st 
execution of the query:

Limit  (cost=46063.13..93586.79 rows=100 width=4) (actual 
time=4963.620..39703.645 rows=100 loops=1)
  -  Nested Loop  (cost=46063.13..493736.04 rows=942 width=4) (actual 
time=4963.617..39703.349 rows=100 loops=1)
Join Filter: (registration_configuration.configuration_id = 
enrollment.enrollment_configuration_id)
-  Nested Loop  (cost=46063.13..493662.96 rows=3769 width=8) (actual 
time=4963.517..39701.557 rows=159 loops=1)
  -  Nested Loop  (cost=46063.13..466987.33 rows=3769 width=8) 
(actual time=4963.498..39698.542 rows=159 loops=1)
-  Hash Join  (cost=46063.13..430280.76 rows=4984 
width=8) (actual time=4963.464..39692.676 rows=216 loops=1)
  Hash Cond: (participant.participant_id = 
person.person_participant_id)
  Join Filter: ((participant.participant_tsv || 
person.person_tsv) @@ to_tsquery('simple'::regconfig, 
to_tsquerystring('Abigail'::text)))
  -  Seq Scan on participant  (cost=0.00..84680.85 
rows=996741 width=42) (actual time=0.012..3132.944 rows=1007151 loops=1)
Filter: (((participant_type)::text = 
'PERSON'::text) AND ((participant_status)::text = 'ACTIVE'::text))
  -  Hash  (cost=25495.39..25495.39 rows=1012539 
width=38) (actual time=3145.628..3145.628 rows=1007151 loops=1)
Buckets: 2048  Batches: 128  Memory Usage: 
556kB
-  Seq Scan on person  (cost=0.00..25495.39 
rows=1012539 width=38) (actual time=0.062..1582.990 rows=1007151 loops=1)
-  Index Scan using 
idx_registration_registered_participant_id on registration  (cost=0.00..7.35 
rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=216)
  Index Cond: 
(registration.registration_registered_participant_id = 
person.person_participant_id)
  -  Index Scan using enrollment_pkey on enrollment  
(cost=0.00..7.07 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=159)
Index Cond: 

[PERFORM] Array access performance

2011-08-02 Thread Andreas Brandl
Hi,

I'm looking for a hint how array access performs in PostgreSQL in respect to 
performance. Normally I would expect access of a 1-dimensional Array at slot i 
(array[i]) to perform in constant time (random access).

Is this also true for postgres' arrays?

May concrete example is a 1-dimensional array d of length = 600 (which will 
grow at a rate of 1 entry/day) stored in a table's column. I need to access 
this array two times per tuple, i.e. d[a], d[b]. Therefore I hope access is not 
linear. Is this correct?

Also I'm having some performance issues building this array. I'm doing this 
with a used-defined aggregate function, starting with an empty array and using 
array_append and some calculation for each new entry. I assume this involves 
some copying/memory allocation on each call, but I could not find the 
implementation of array_append in postgres-source/git. 

Is there an efficient way to append to an array? I could also start with a 
pre-initialized array of the required length, but this involves some complexity.

Thank you

Regards,
Andreas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Array access performance

2011-08-02 Thread Andreas Brandl

 Is this also true for postgres' arrays?

Sorry, I'm using latest postgres 9.0.4 on debian squeeze/amd64.

Greetings
Andreas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-02 Thread Robert Ayrapetyan
Quite possible.
But anyway - I don't think performance degradation must be so huge in
case of using UNIQUE indexes.

On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 31.07.11 16:51, Robert Ayrapetyan написав(ла):

 Hello.

 I've found strange behavior of my pg installation (tested both 8.4 and
 9.0 - they behave same) on FreeBSD platform.
 In short - when some table have PK on bigint field - COPY to that
 table from file becomes slower and slower as table grows. When table
 reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
 experimented with all params in configs, moved indexes to separate hdd
 etc - nothing made any improvement. However, once I'm dropping 64 bit
 PK - COPY of 100k records passes in seconds. Interesting thing - same
 table has other indexes, including composite ones, but none of them
 include bigint fields, that's why I reached decision that bug
 connected with indexes on bigint fields only.

 I did see this behavior, but as for me it occurs for UNIQUE indexes only
 (including PK), not dependent on field type.
 You can check this by dropping PK and creating it as a regular non-unique
 index.

 Best regards, Vitalii Tymchyshyn




-- 
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-02 Thread Robert Ayrapetyan
Seems this assumption is not right. Just created simple index on
bigint column - situation with huge performance
degradation repeated. Dropping this index solved COPY issues on the fly.
So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS +
bigint column index
(some of these may be superfluous, but I have no resources to check on
different platforms with different filesystems).

On Mon, Aug 1, 2011 at 12:15 PM, Robert Ayrapetyan
robert.ayrapet...@comodo.com wrote:
 Quite possible.
 But anyway - I don't think performance degradation must be so huge in
 case of using UNIQUE indexes.

 On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 31.07.11 16:51, Robert Ayrapetyan написав(ла):

 Hello.

 I've found strange behavior of my pg installation (tested both 8.4 and
 9.0 - they behave same) on FreeBSD platform.
 In short - when some table have PK on bigint field - COPY to that
 table from file becomes slower and slower as table grows. When table
 reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
 experimented with all params in configs, moved indexes to separate hdd
 etc - nothing made any improvement. However, once I'm dropping 64 bit
 PK - COPY of 100k records passes in seconds. Interesting thing - same
 table has other indexes, including composite ones, but none of them
 include bigint fields, that's why I reached decision that bug
 connected with indexes on bigint fields only.

 I did see this behavior, but as for me it occurs for UNIQUE indexes only
 (including PK), not dependent on field type.
 You can check this by dropping PK and creating it as a regular non-unique
 index.

 Best regards, Vitalii Tymchyshyn




 --
 Ayrapetyan Robert,
 Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
 http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php




-- 
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trigger or Function

2011-08-02 Thread Gavin Flower

On 01/08/11 19:18, Robert Klemme wrote:

On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower
gavinflo...@archidevsys.co.nz  wrote:

On 24/07/11 03:58, alan wrote:

My first approach would be to remove WeekAvg and MonthAvg from the
table and create a view which calculates appropriate values.

Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
features.
Here is how I set it up. If anyone sees an issue, please let me know.
I'm new to postgres.

Basically, my daily_vals table contains HOST, DATE,VALUE columns.
What I wanted was a way to automatically populate a 4th column
called rolling_average, which would be the sum ofnpreceding
columns.

There seems to be contradiction in the naming here.  Did you mean avg
ofn   preceding columns.?


I created a view called weekly_average using this VIEW statement.

CREATE OR REPLACE
   VIEW weekly_average
 AS SELECT *, sum(value) OVER (PARTITION BY host
 ORDER BY rid
 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
   ) as rolling_average FROM daily_vals;

The above gives just the rolling sum, you need to divide by the number of
rows in the sum to get the average (I assume you want the arithmetic mean,
as the are many types of average!).

CREATE OR REPLACE
  VIEW weekly_average
AS SELECT
*,
round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
mywindow))), 4) AS rolling_average
FROM daily_vals
WINDOW mywindow AS
(
PARTITION BY host
ORDER BY rid
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
);

Why not

CREATE OR REPLACE
   VIEW weekly_average
 AS SELECT *, avg(value) OVER (PARTITION BY host
 ORDER BY rid
 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
   ) as rolling_average FROM daily_vals;

What did I miss?

Kind regards

robert

Chuckle Your fix is much more elegant and efficient, though both 
approaches work!


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Array access performance

2011-08-02 Thread Tom Lane
Andreas Brandl m...@3.141592654.de writes:
 I'm looking for a hint how array access performs in PostgreSQL in respect to 
 performance. Normally I would expect access of a 1-dimensional Array at slot 
 i (array[i]) to perform in constant time (random access).

 Is this also true for postgres' arrays?

Only if the element type is fixed-length (no strings for instance) and
the array does not contain, and never has contained, any nulls.
Otherwise a scan through all the previous elements is required to find
a particular element.

By and large, if you're thinking of using arrays large enough to make
this an interesting question, I would say stop right there and redesign
your database schema.  You're not thinking relationally, and it's gonna
cost ya.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Array access performance

2011-08-02 Thread Andreas Brandl
Hi Tom,

  I'm looking for a hint how array access performs in PostgreSQL in
  respect to performance. Normally I would expect access of a
  1-dimensional Array at slot i (array[i]) to perform in constant time
  (random access).
 
  Is this also true for postgres' arrays?
 
 Only if the element type is fixed-length (no strings for instance) and
 the array does not contain, and never has contained, any nulls.
 Otherwise a scan through all the previous elements is required to find
 a particular element.

We're using bigint elements here and don't have nulls, so this should be fine.

 By and large, if you're thinking of using arrays large enough to make
 this an interesting question, I would say stop right there and
 redesign
 your database schema. You're not thinking relationally, and it's gonna
 cost ya.

In general, I agree. We're having a nice relational database but are facing 
some perfomance issues. My approach is to build a materialized view which 
exploits the array feature and heavily relies on constant time access on arrays.

Thank you!

Regards,
Andreas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-02 Thread Kevin Grittner
Robert Ayrapetyan robert.ayrapet...@comodo.com wrote:
 
 So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS
 + bigint column index
 (some of these may be superfluous, but I have no resources to
 check on different platforms with different filesystems).
 
Linux 64 bit XFS bigint column index only shows a slightly longer
run time for bigint versus int here.  What timings do you get for
the insert statements if you run the following in your environment?
 
create table bi (big bigint not null, medium int not null);
insert into bi with x(n) as (select generate_series(1, 100)
select n + 50, n from x;
\timing on
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
create unique index bi_medium on bi (medium);
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
drop index bi_medium;
create unique index bi_big on bi (big);
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 100)) select n + 50, n from x;
\timing off
drop table bi;
 
Here's what I get:
 
Time: 1629.141 ms
Time: 1638.060 ms
Time: 1711.833 ms
 
Time: 4151.953 ms
Time: 4602.679 ms
Time: 5107.259 ms
 
Time: 4654.060 ms
Time: 5158.157 ms
Time: 5101.110 ms
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance penalty when using WITH

2011-08-02 Thread Merlin Moncure
On Sat, Jul 30, 2011 at 8:10 AM, Robert Klemme
shortcut...@googlemail.com wrote:
 On Thu, Jul 28, 2011 at 11:00 PM, Li Jin l...@tripadvisor.com wrote:
 I met with the problem that when I was using WITH clause to reuse a
 subquery, I got a huge performance penalty because of query planner.
 Here are the details, the original query is
 EXPLAIN ANALYZE WITH latest_identities AS
 (
     SELECT DISTINCT ON (memberid) memberid, username, changedate
     FROM t_username_history
     WHERE memberid IN (SELECT memberid FROM t_member WHERE firstname || ' '
 || substring(lastname,1,1) = 'Eddie T')
     ORDER BY memberid, changedate DESC
 )

 Another observation: That criterion looks suspicious to me. I would
 expect any RDBMS to be better able to optimize this:

 WHERE firstname = 'Eddie' AND lastname like 'T%'

 I know it's semantically not the same but I would assume this is good
 enough for the common usecase.  Plus, if there is an index on
 (firstname, lastname) then that could be used.

disagree. just one of the ways that could be stymied would to change
the function behind the '||' operator.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] synchronous_commit off

2011-08-02 Thread lars hofhansl
No: The commit has the same guarantees as a synchronous commit w.r.t. data 
consistency. The commit can only fail (as a whole) due to hardware problems or 
postgres backend crashes. 


And yes: The client commit returns, but the server can fail later and not 
persist the transaction and it will be lost (again as a whole).

Your application should be able to tolerate losing the latest committed 
transactions if you use this.

The difference to fsync=off is that a server crash will leave the database is a 
consistent state with just the latest transactions lost.




From: Anibal David Acosta a...@devshock.com
To: pgsql-performance@postgresql.org
Sent: Monday, August 1, 2011 6:29 AM
Subject: [PERFORM] synchronous_commit off


Can a transaction committed asynchronously report an error, duplicate key or 
something like that, causing a client with a OK transaction but server with a 
FAILED transaction.
 
Thanks

Re: [PERFORM] Which Join is better

2011-08-02 Thread lars hofhansl
Unless you use the explicit join syntax:


select p.* from A p join B q on (p.id = q.id)

and also set  join_collapse_limit= 1
The order of the joins is determined by the planner.


Also explain is your friend :)


From: Adarsh Sharma adarsh.sha...@orkash.com
To: pgsql-performance@postgresql.org
Sent: Monday, August 1, 2011 11:42 PM
Subject: [PERFORM] Which Join is better

Dear all,

Just want to know which join is better for querying data faster.

I have 2 tables A ( 70 GB )  B ( 7 MB )

A has 10 columns  B has 3 columns.Indexes exist on both tables's ids.

select p.* from table A p, B q where p.id=q.id

or

select p.* from table B q , A p where q.id=p.id


Thanks

-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance