Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Tom Lane
"Dann Corbit" <[EMAIL PROTECTED]> writes:
> Yikes!  Five times slower!  But then I took Tom's incredibly helpful
> suggestion to disable the sequential scan:

Ideally, you shouldn't have to do that.  Now that you have the correct
indexes in place, could you show us the EXPLAIN ANALYZE output for both
cases (enable_seqscan = on and off)?

Also, you might try leaving enable_seqscan = on, and seeing how far you
have to decrease random_page_cost to get the planner to choose
indexscan.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Dann Corbit
> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]] 
> Sent: Monday, January 06, 2003 5:26 PM
> To: Nigel J. Andrews
> Cc: Dann Corbit; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] I feel the need for speed. What am I 
> doing wrong? 
> 
> 
> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> >> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from 
> >> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join 
> >> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = 
> >> b."RT_REC_KEY" and a."cnxarraycolumn" = b."cnxarraycolumn") where 
> >> b.oid is null ;
> 
> > I suspect you get no results because it's unlikely b.oid 
> will be null.
> 
> Try "it's impossible for b.oid to be null --- unless a dummy 
> b row is being provided by the LEFT JOIN".  I interpret the 
> purpose of the query to be to look for "a" rows that have no 
> matching "b" row.
> 
> Using OID for this is kind of cute, I guess, since it is 
> guaranteed not-null in a real row; he doesn't have to think 
> about whether any of his regular columns are not-null.

I am very happy to report that PostgreSQL now easily beats MS Access for
speed!
:-)

Here are the MS Access results, where I create on index and then a
second:
Access Outer Join   2 column index on 1st table
02:29.9
2 column index on 2nd table + 2 column index on 1st
table   02:18.3

For PostgreSQL, I originally created a 3 column index (because I have
also 3 column joins in other places) and got this result:
Postgres Outer Join
enable_seqscan  = 1
3 column Index  1   12:43.9

Showing the plan showed that the indexes were being ignored.

Yikes!  Five times slower!  But then I took Tom's incredibly helpful
suggestion to disable the sequential scan:

Postgres Outer Join
enable_seqscan  = 0
3 column Index  0   05:17.5

Changed to a 2 column index:
2 column index  0   04:58.3

Added an index to the second table:
2 column index on 2nd table + 2 column index on 1st
table   0   01:53.6

PostgreSQL is now 22% faster than Access (HAPPY DAYS)!

For my application, I happen to know that the data will be approximately
physically clustered, and that the indexes will always be very near
matches for data sets big enough to matter.  Benchmarking also showed
that adding the 3rd column to the index was counter productive, even
when the join criteria was a.f1 = b.f1, a.f2=b.f2, a.f3 = b.f3.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Tom Lane
"Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
>> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
>> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
>> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY"
>> and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;

> I suspect you get no results because it's unlikely b.oid will be null.

Try "it's impossible for b.oid to be null --- unless a dummy b row is
being provided by the LEFT JOIN".  I interpret the purpose of the query
to be to look for "a" rows that have no matching "b" row.

Using OID for this is kind of cute, I guess, since it is guaranteed
not-null in a real row; he doesn't have to think about whether any of
his regular columns are not-null.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Tom Lane
"Dann Corbit" <[EMAIL PROTECTED]> writes:
> Creating the following index had no effect on performance!
> create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
> "cnxarraycolumn", "CRC");

What does EXPLAIN ANALYZE have to say about the query?  If you set
enable_seqscan = 0, does the explain result change?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Dann Corbit
> -Original Message-
> From: Nigel J. Andrews [mailto:[EMAIL PROTECTED]] 
> Sent: Monday, January 06, 2003 4:58 PM
> To: Dann Corbit
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: [HACKERS] I feel the need for speed. What am I 
> doing wrong?
> 
> Added -general list so that the next followup can remove 
> -hackers and everyone there will have had notice.
> 
> 
> On Mon, 6 Jan 2003, Dann Corbit wrote:
> >
> > I have a query using two postgres tables.
> > One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called
> > "CNX_DS2_53_SIS_STU_OPT_FEE_TB".
> >  
> > I am getting 3 times slower performance than Microsoft Access when 
> > performing a left outer join.
> >
> > ...
> > 
> > Here is the query:
> > select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
> > "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
> > "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = 
> b."RT_REC_KEY"
> > and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;
> >  
> >  
> > Creating the following index had no effect on performance! create 
> > unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY", 
> > "cnxarraycolumn", "CRC");
> >  
> > Both tables had 6139062 rows of data.
> >  
> > In this query ... all rows of data match perfectly, so no 
> results are 
> > returned.
> 
> I suspect you get no results because it's unlikely b.oid will 
> be null. Are you sure the query is how it should be since you 
> seem to be expecting no rows to be returned and yet your 
> reason for that doesn't match the query as shown. Without the 
> oid test I'd bet you get a result set of 6139062 rows.

There are supposed to be no results for this particular query.
The data is unique with only RT_REC_KEY and cnxarraycolumn (I tried
using that as an index also).

The basic gist of it is like this:

I want to know where the keys match (e.g.: RT_REC_KEY + cnxarraycolumn)
but the CRC has changed (which will tell me what data has changed).
This is for a data synchronization system that uses PostgreSQL as a join
engine.  I store primary key data together with a 64 bit CRC in data
tables stored in PostgreSQL.  In this particular case, there were no
changes but there will be changes at other times.
 
> > Is there a way to reformulate this query so that it will use the 
> > index?
> 
> Given the above comment I'd say no since the entirety of both 
> tables will be tested to make the result set.
> 
> Alternatively, if the query is right try something along the lines of:
> 
> SELECT a.blah, a.foo,
>   FROM a, b
>   WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL

OID is never null.   I don't think that this query is equivalent.  This
Oid is just the standard PostgreSQL Oid, and not some arbitrary field.
 
> if that doesn't use a query try pushing the null test into a 
> subselect like:
> 
> SELECT a.blah, a.foo,
>   FROM a, (SELECT * FROM b WHERE oid IS NULL) b
>   WHERE a.blah = b.blah AND a.foo = b.foo
> 

OID is never null.   I don't think that this query is equivalent.
 
> After that let's hope I haven't embarrassed myself.

Probably, I did not explain myself clearly enough.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] I feel the need for speed. What am I doing wrong?

2003-01-06 Thread Nigel J. Andrews

Added -general list so that the next followup can remove -hackers and everyone
there will have had notice.


On Mon, 6 Jan 2003, Dann Corbit wrote:
>
> I have a query using two postgres tables.  
> One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called
> "CNX_DS2_53_SIS_STU_OPT_FEE_TB".
>  
> I am getting 3 times slower performance than Microsoft Access when
> performing a left outer join.
>
> ...
> 
> Here is the query: 
> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY"
> and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;
>  
>  
> Creating the following index had no effect on performance!
> create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
> "cnxarraycolumn", "CRC");
>  
> Both tables had 6139062 rows of data.
>  
> In this query ... all rows of data match perfectly, so no results are
> returned.

I suspect you get no results because it's unlikely b.oid will be null. Are you
sure the query is how it should be since you seem to be expecting no rows to be
returned and yet your reason for that doesn't match the query as shown. Without
the oid test I'd bet you get a result set of 6139062 rows.

> Is there a way to reformulate this query so that it will use the index?

Given the above comment I'd say no since the entirety of both tables will be
tested to make the result set.

Alternatively, if the query is right try something along the lines of:

SELECT a.blah, a.foo,
  FROM a, b
  WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL

if that doesn't use a query try pushing the null test into a subselect like:

SELECT a.blah, a.foo,
  FROM a, (SELECT * FROM b WHERE oid IS NULL) b
  WHERE a.blah = b.blah AND a.foo = b.foo


After that let's hope I haven't embarrassed myself.


-- 
Nigel J. Andrews


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster